2 Programs
| (require a86/ast) | package: a86 |
An a86 program is a list of instructions. To be interpretable with asm-interp, the program must be well-formed, which means:
Programs have at least one label which is declared Global; the first such label is used as the entry point.
All label definitions are unique.
All used labels are declared.
procedure
(seq x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
Examples
> (seq) '()
> (seq (Label 'foo)) (list (Label 'foo))
> (seq (list (Label 'foo))) (list (Label 'foo))
> (seq (list (Label 'foo) (Mov 'rax 0)) (Mov 'rdx 'rax) (list (Call 'bar) (Ret)))
(list
(Label 'foo)
(Mov 'rax 0)
(Mov 'rdx 'rax)
(Call 'bar)
(Ret))
procedure
(prog x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
This function is useful to do some early error checking over whole programs and can help avoid confusing assembler errors. Unlike seq it should be called at the outermost level of a function that produces a86 code and not nested.
Examples
> (prog (Global 'foo) (Label 'foo)) (list (Global 'foo) (Label 'foo))
> (prog (Label 'foo)) prog: initial label undeclared as global: ($ 'foo)
> (prog (list (Label 'foo))) prog: initial label undeclared as global: ($ 'foo)
> (prog (Mov 'rax 32)) prog: no initial label found
> (prog (Label 'foo) (Label 'foo)) prog: duplicate label declaration found: 'foo
> (prog (Jmp 'foo)) prog: undeclared labels found: '(foo)
> (prog (Global 'foo) (Label 'foo) (Jmp 'foo)) (list (Global 'foo) (Label 'foo) (Jmp 'foo))
2.1 Psuedo-Instructions
Psuedo-instructions are elements of Programs that make declarations and directives to the assembler, but don’t correspond to actual execuable Instructions.
Examples
> (Label 'fred) (Label 'fred)
> (Label "fred") Label: expects valid label name; given "fred"
> (Label 'fred-wilma) Label: label names must conform to restrictions
Examples
> (asm-display (prog (Global 'foo) (%%% "Start of foo") (Label 'foo) ; Racket comments won't appear (%% "Inputs one argument in rdi") (Mov 'rax 'rdi) (Add 'rax 'rax) (% "double it") (Sub 'rax 1) (% "subtract one") (%% "we're done!") (Ret)))
.intel_syntax noprefix
.text
.global foo
### Start of foo
foo:
## Inputs one argument in rdi
mov rax, rdi
add rax, rax # double it
sub rax, 1 # subtract one
## we're done!
ret
2.2 Instructions
Instructions are represented as structures and can take as arguments Immediates, Registers, Labels, Memory Expressions, or Assembly Expressions.
For example, (Mov 'rax 42) is a "move" instruction that when executed will move the immediate value 42 into the rax register.
See Instruction Set for a complete listing of the instruction set and instruction constructor signatures.
2.3 Immediates
Immediates are represented as exact integers of a certain bit width, which will depend upon the particular instruction and other arguments.
For example, Mov can take a 64-bit immediate source argument if the destination register is a 64-bit register. If the destination is a 32-bit register, the immediate must fit in 32-bits, etc. Cmp can take at most a 32-bit immediate argument. Instruction constructors check the size constraints of immediate arguments and signal an error when out of range.
Examples
> (Mov rax (sub1 (expt 2 64))) (Mov 'rax 18446744073709551615)
> (Mov eax (sub1 (expt 2 64))) Mov: literal must not exceed 32-bits; given
18446744073709551615 (64 bits)
> (Cmp rax (sub1 (expt 2 64))) Cmp: literal must not exceed 32-bits signed; given
18446744073709551615 (65 bits signed); go through a register
instead
Note that x86 doesn’t have a notion of signed or unsigned integers. Some instructions compute either signed or unsigned operations, but the values in registers are simply bits. For this reason, a 64-bit immediate can be any exact integer in the range (- (expt 2 63)) and (sub1 (expt 2 64)), but keep in mind that, for example (- (expt 2 63)) and (expt 2 23) are represented by the same bits. Also note that asm-interp interprets the result of an assembly program as a signed integer. If you want to interpret the result as an unsigned integer, you will need add code to do so.
Examples
> (asm-interp (Mov rax -1) (Ret)) -1
> (asm-interp (Mov rax (sub1 (expt 2 64))) (Ret)) -1
> (asm-interp (Mov rax (- (expt 2 63))) (Ret)) -9223372036854775808
> (asm-interp (Mov rax (expt 2 63)) (Ret)) -9223372036854775808
procedure
(64-bit-integer? x) → boolean?
x : any/c (32-bit-integer? x) → boolean? x : any/c (16-bit-integer? x) → boolean? x : any/c (8-bit-integer? x) → boolean? x : any/c
Examples
> (64-bit-integer? 0) #t
> (64-bit-integer? (sub1 (expt 2 64))) #t
> (64-bit-integer? (expt 2 64)) #f
> (64-bit-integer? (- (expt 2 63))) #t
> (64-bit-integer? (sub1 (- (expt 2 63)))) #f
> (32-bit-integer? 0) #t
> (32-bit-integer? (sub1 (expt 2 32))) #t
> (32-bit-integer? (expt 2 32)) #f
> (32-bit-integer? (- (expt 2 32))) #f
> (32-bit-integer? (sub1 (- (expt 2 32)))) #f
2.4 Registers
| (require a86/registers) | package: a86 |
Registers are represented as symbols, but this module also provides bindings corresponding to each register name, e.g. rax is bound to 'rax.
There are 16 64-bit registers.
value
rbx : register? rcx : register? rdx : register? rbp : register? rsp : register? rsi : register? rdi : register? r8 : register? r9 : register? r10 : register? r11 : register? r12 : register? r13 : register? r14 : register? r15 : register?
The registers rbx, rsp, rbp, and r12 through r15 are “callee-saved” registers, meaning they are preserved across function calls (and must be saved and restored by any callee code).
Each register plays the same role as in x86, so for example rsp holds the current location of the stack.
There are 16 aliases for the lower 32-bits of the above registers. These are not separate registers, but instead provide access to the least signficant 32-bits of the 64-bits register.
value
ebx : register? ecx : register? edx : register? ebp : register? esp : register? esi : register? edi : register? r8d : register? r9d : register? r10d : register? r11d : register? r12d : register? r13d : register? r14d : register? r15d : register?
There are 16 aliases for the lower 16-bits of the above registers (and thus the lower 16-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 16-bits of the 64-bits register.
value
bx : register? cx : register? dx : register? bp : register? sp : register? si : register? di : register? r8w : register? r9w : register? r10w : register? r11w : register? r12w : register? r13w : register? r14w : register? r15w : register?
There are 16 aliases for the lower 8-bits of the above registers (and thus the lower 8-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 8-bits of the 64-bits register.
value
bl : register? cl : register? dl : register? bpl : register? spl : register? sil : register? dil : register? r8b : register? r9b : register? r10b : register? r11b : register? r12b : register? r13b : register? r14b : register? r15b : register?
Finally, there are 4 aliases for next higher 8-bits of the above registers (and thus the lower 9th-16th bits of some of the 64-bit registers}. Only rax, rbx, rcx, and rdx have such aliases.
procedure
(register-size x) → (or/c 8 16 32 64)
x : register?
Examples
> (register-size rax) 64
> (register-size eax) 32
> (register-size ax) 16
> (register-size al) 8
> (register-size ah) 8
procedure
(reg-64-bit r) → register?
r : register? (reg-32-bit r) → register? r : register? (reg-16-bit r) → register? r : register? (reg-8-bit-low r) → register? r : register? (reg-8-bit-high r) → register? r : register?
Examples
> (reg-8-bit-low rax) 'al
> (reg-8-bit-high rax) 'ah
> (reg-64-bit eax) 'rax
> (reg-32-bit eax) 'eax
Examples
> (reg-8-bit-high ebx) 'bh
> (reg-8-bit-high r8) no conversion available
2.5 Labels
Labels are represented as symbols (or $ structures) that must conform to the naming restriction imposed by the assembler, so not all symbols are valid label names.
procedure
(asm-label? x) → boolean?
x : any/c
Labels must also follow the restrictions on label names: The NASM assembler’s documentation specifies: "Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with special meaning), _ and ?."
Examples
> (asm-label? 'foo) '("foo")
> (asm-label? "foo") #f
> (asm-label? 'rax) '("rax")
> (asm-label? 'foo-bar) #f
> (asm-label? 'foo.bar) '("foo.bar")
Examples
> (Label ($ 'rax)) (Label ($ 'rax))
(or (asm-symbol? x) ($? x))
2.6 Memory Expressions
Memory expressions are represented with Mem structures. A memory
expression signals that a quantity should be interpreted as a location in
memory, rather than as the bits themselves. For example, consider the
rsp register, which holds a pointer to the stack in memory, i.e., its
value is a number that can be interpreted as an address. The instruction
(Mov rax rsp) will copy that address from rsp into
rax, while the instruction (Mov rax (Mem rsp)) will copy the
64 bits of data stored in memory at the address held by rsp into
rax. Similarly, (Mov rsp rax) will copy the value held in
rax into rsp (which will overwrite our stack pointer —
struct
base : (or/c #f register? integer?) index : (or/c #f label? (and/c register? (not/c 'rsp))) scale : (or/c #f 1 2 4 8) offset : (or/c #f integer?)
The integer?-accepting arguments base and offset place restrictions on those integers depending on the addressing mode. In 64-bit mode, the integer can be no wider than 32 bits (signed).
When the scale is omitted, it is not printed out in x86. However, omitting it is effectively equivalent to specifying a scale of 1.
When two registers are given as the base and index arguments, they must be of the same width. For example, rax and r8 are compatible, but rbx and eax are not.
We don’t use the scale for anything currently, but it is supported for future extensions to the course.
Because of how complicated it can be to correctly specify the arguments to a Mem, this structure is built by a specialized smart constructor. The acceptable forms are documented below:
A relative address specification. In these cases, the printer automatically prefixes the address computation with the rip base register according to the x86 specification.The index can be given as either a symbol (e.g., ’foo) or a $-wrapped label. In the former case, the symbol will be wrapped in a $ during construction to ensure proper architecture-dependent formatting when printing.
Prints in x86 as [rip + index + offset].
An absolute address specification with an offset as the base. We don’t use this form currently, but it is supported for future extensions to the course.Prints in x86 as [offset].
procedure
base : register? index : (or/c #f (and/c register? (not/c 'rsp))) = #f scale : (or/c #f 1 2 4 8) = #f offset : (or/c #f integer?) = #f An absolute address specification with a register as the base. The index, scale, and offset arguments are optional.Prints in x86 as [base + (index * scale) + offset].
procedure
index : (and/c register? (not/c 'rsp)) scale : (or/c 1 2 4 8) offset : (or/c #f integer?) = #f An absolute address specification with a non-rsp register as the index. This form requires the scale to be given, and it accepts an optional offset argument.Prints in x86 as [(index * scale) + offset].
2.7 Assembly Expressions
Assembly expressions are represented by s-expressions conforming to the following grammar:
| ‹expr› | ::= | ‹register› |
|
| | | ‹immediate› |
|
| | | ‹label› |
|
| | | '$ |
|
| | | '$$ |
|
| | | (list ‹unop› ‹expr›) |
|
| | | (list ‹binop› ‹expr› ‹expr›) |
|
| | | (list '? ‹expr› ‹expr› ‹expr›) |
| ‹unop› | ::= | '+ |
|
| | | '- |
|
| | | '~ |
|
| | | '! |
|
| | | 'SEG |
| ‹binop› | ::= | '<<< |
|
| | | '<< |
|
| | | '< |
|
| | | '<= |
|
| | | '<=> |
|
| | | '>= |
|
| | | '> |
|
| | | '>> |
|
| | | '>>> |
|
| | | '= |
|
| | | '== |
|
| | | '!= |
|
| | | '|| |
|
| | | '\| |
|
| | | '& |
|
| | | '&& |
|
| | | '^^ |
|
| | | '^ |
|
| | | '+ |
|
| | | '- |
|
| | | '* |
|
| | | '/ |
|
| | | '// |
|
| | | '% |
|
| | | '%% |
For the meaning of assembly instructions, refer to the NASM docs.
Examples
> (exp? 0) #t
> (exp? '(+ rax 8)) #t
> (exp? '(? lab1 0 1)) #t
syntax
(@ e)
This form is useful for referencing bound variables or Racket functions within assembly expression. If the Racket identifier you want to reference conflicts with an assembly expression keyword, e.g. +, you can use begin to escape into Racket expression mode, e.g. (@ (+ 1 (begin (+ 2 3)))) is '(+ 1 5).
If any unquoted expression evaluates to something that is not an assembly expression, an error is signalled.
Examples
> (@ (+ 1 2)) '(+ 1 2)
> (@ (+ x 1)) '(+ x 1)
> (let ((x 100)) (@ (+ x 1))) '(+ 100 1)
> (let ((+ 100)) (@ (+ + +))) '(+ 100 100)
> (@ (+ + +)) not an assembly expression #<procedure:+>
2.8 Instruction Set
This section describes the instruction set of a86.
procedure
(instruction? x) → boolean?
x : any/c
procedure
(symbol->label s) → label?
s : symbol?
Examples
> (let ([l (symbol->label 'my-great-label)]) (seq (Label l) (Jmp l)))
(list
(Label 'label_my_great_label_a1d1fe873a8070d)
(Jmp 'label_my_great_label_a1d1fe873a8070d))
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Call 'f) (Add 'rax 1) (Ret) (Label 'f) (Mov 'rax 41) (Ret)) 42
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret)) 42
Either dst or src may be offsets, but not both.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rbx 42) (Mov 'rax 'rbx) (Ret)) 42
> (Mov (Mem 'rax 0) (Mem 'rbx 0)) Mov: cannot use two memory locations; given (Mem 'rax 0),
(Mem 'rbx 0)
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Add 'rax 10) (Ret)) 42
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Sub 'rax 10) (Ret)) 22
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Add 'rax 10) (Ret)) 42
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Jmp 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
> (asm-interp (Mov 'rax 42) (Pop 'rbx) (Jmp 'rbx)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jz 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jnz 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jl 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 42) (Jle 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 42) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jo 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) -9223372036854775808
Examples
> (asm-interp (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jno 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax -1) (Add 'rax 1) (Jc 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax -1) (Add 'rax 1) (Jnc 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret)) 2
Note that the semantics for conditional moves is not what many people expect. The src is always read, regardless of the condition’s evaluation. This means that if your source is illegal (such as an offset beyond the bounds of memory allocated to the current process), a segmentation fault will arise even if the condition “should have” prevented the error.
Examples
> (asm-interp (Mov 'r9 0) (Cmp 'r9 1) (Mov 'rax 0) ; doesn't move, but does read memory address 0 (Cmovz 'rax (Mem 'r9)) (Ret)) invalid memory reference. Some debugging context lost
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret)) 2
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret)) -1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret)) 9223372036854775807
Examples
> (asm-interp (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret)) -9223372036854775808
> (asm-interp (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret)) -1
Examples
> (asm-interp (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret)) 1
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (And 'rax 14) ; #b1110 = 14 (Ret)) 10
; #b1010 = 10
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (Or 'rax 14) ; #b1110 = 14 (Ret)) 15
; #b1111 = 15
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (Xor 'rax 14) ; #b1110 = 14 (Ret)) 5
; #b0101 = 5
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 4) ; #b100 = 4 = 2^2 (Sal 'rax 6) (Ret))) 256
; #b100000000 = 256
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Sar 'rax 6) (Ret))) -144115188075855872
; #b1111111000000000000000000000000000000000000000000000000000000000
struct
dst : register? i : (integer-in 0 63)
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Shr 'rax 6) (Ret))) 144115188075855872
; #b0000001000000000000000000000000000000000000000000000000000000000
struct
a1 : (or/c 32-bit-integer? register?)
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 0) (Not 'rax) (Ret)) -1
Examples
> (asm-interp (Lea 'rbx 'done) (Mov 'rax 42) (Jmp 'rbx) (Mov 'rax 0) (Label 'done) (Ret)) 42