2 Programs
(require a86/ast) | package: a86 |
An a86 program is a list of instructions. To be interpretable with asm-interp, the program must be well-formed, which means:
Programs have at least one label which is declared Global; the first such label is used as the entry point.
All label definitions are unique.
All used labels are declared.
procedure
(seq x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
Examples
> (seq) '()
> (seq (Label 'foo)) (list (Label 'foo))
> (seq (list (Label 'foo))) (list (Label 'foo))
> (seq (list (Label 'foo) (Mov 'rax 0)) (Mov 'rdx 'rax) (list (Call 'bar) (Ret)))
(list
(Label 'foo)
(Mov 'rax 0)
(Mov 'rdx 'rax)
(Call 'bar)
(Ret))
procedure
(prog x ...) → (listof instruction?)
x : (or/c instruction? (listof instruction?))
This function is useful to do some early error checking over whole programs and can help avoid confusing NASM errors. Unlike seq it should be called at the outermost level of a function that produces a86 code and not nested.
Examples
> (prog (Global 'foo) (Label 'foo)) (list (Global 'foo) (Label 'foo))
> (prog (Label 'foo)) prog: initial label undeclared as global: ($ 'foo)
> (prog (list (Label 'foo))) prog: initial label undeclared as global: ($ 'foo)
> (prog (Mov 'rax 32)) prog: no initial label found
> (prog (Label 'foo) (Label 'foo)) prog: duplicate label declaration found: 'foo
> (prog (Jmp 'foo)) prog: undeclared labels found: '(foo)
> (prog (Global 'foo) (Label 'foo) (Jmp 'foo)) (list (Global 'foo) (Label 'foo) (Jmp 'foo))
2.1 Psuedo-Instructions
Psuedo-instructions are elements of Programs that make declarations and directives to the assembler, but don’t correspond to actual execuable Instructions.
Examples
> (Label 'fred) (Label 'fred)
> (Label "fred") Label: expects valid label name; given "fred"
> (Label 'fred-wilma) Label: label names must conform to nasm restrictions
Examples
> (asm-display (prog (Global 'foo) (%%% "Start of foo") (Label 'foo) ; Racket comments won't appear (%% "Inputs one argument in rdi") (Mov 'rax 'rdi) (Add 'rax 'rax) (% "double it") (Sub 'rax 1) (% "subtract one") (%% "we're done!") (Ret)))
default rel
section .text
global $foo
;;; Start of foo
$foo:
;; Inputs one argument in rdi
mov rax, rdi
add rax, rax ; double it
sub rax, 1 ; subtract one
;; we're done!
ret
2.2 Instructions
Instructions are represented as structures and can take as arguments Immediates, Registers, Labels, Memory Expressions, or Assembly Expressions.
For example, (Mov 'rax 42) is a "move" instruction that when executed will move the immediate value 42 into the rax register.
See Instruction Set for a complete listing of the instruction set and instruction constructor signatures.
2.3 Immediates
Immediates are represented as exact integers of a certain bit width, which will depend upon the particular instruction and other arguments.
For example, Mov can take a 64-bit immediate source argument if the destination register is a 64-bit register. If the destination is a 32-bit register, the immediate must fit in 32-bits, etc. Cmp can take at most a 32-bit immediate argument. Instruction constructors check the size constraints of immediate arguments and signal an error when out of range.
Examples
> (Mov rax (sub1 (expt 2 64))) (Mov 'rax 18446744073709551615)
> (Mov eax (sub1 (expt 2 64))) Mov: literal must not exceed 32-bits; given
18446744073709551615 (64 bits)
> (Cmp rax (sub1 (expt 2 64))) Cmp: literal must not exceed 32-bits; given
18446744073709551615 (64 bits); go through a register
instead
Note that x86 doesn’t have a notion of signed or unsigned integers. Some instructions compute either signed or unsigned operations, but the values in registers are simply bits. For this reason, a 64-bit immediate can be any exact integer in the range (- (expt 2 63)) and (sub1 (expt 2 64)), but keep in mind that, for example (- (expt 2 63)) and (expt 2 23) are represented by the same bits. Also note that asm-interp interprets the result of an assembly program as a signed integer. If you want to interpret the result as an unsigned integer, you will need add code to do so.
Examples
> (asm-interp (Mov rax -1) (Ret)) -1
> (asm-interp (Mov rax (sub1 (expt 2 64))) (Ret)) -1
> (asm-interp (Mov rax (- (expt 2 63))) (Ret)) -9223372036854775808
> (asm-interp (Mov rax (expt 2 63)) (Ret)) -9223372036854775808
procedure
(64-bit-integer? x) → boolean?
x : any/c (32-bit-integer? x) → boolean? x : any/c (16-bit-integer? x) → boolean? x : any/c (8-bit-integer? x) → boolean? x : any/c
Examples
> (64-bit-integer? 0) #t
> (64-bit-integer? (sub1 (expt 2 64))) #t
> (64-bit-integer? (expt 2 64)) #f
> (64-bit-integer? (- (expt 2 63))) #t
> (64-bit-integer? (sub1 (- (expt 2 63)))) #f
> (32-bit-integer? 0) #t
> (32-bit-integer? (sub1 (expt 2 32))) #t
> (32-bit-integer? (expt 2 32)) #f
> (32-bit-integer? (- (expt 2 32))) #f
> (32-bit-integer? (sub1 (- (expt 2 32)))) #f
2.4 Registers
(require a86/registers) | package: a86 |
Registers are represented as symbols, but this module also provides bindings corresponding to each register name, e.g. rax is bound to 'rax.
There are 16 64-bit registers.
value
rbx : register? rcx : register? rdx : register? rbp : register? rsp : register? rsi : register? rdi : register? r8 : register? r9 : register? r10 : register? r11 : register? r12 : register? r13 : register? r14 : register? r15 : register?
The registers rbx, rsp, rbp, and r12 through r15 are “callee-saved” registers, meaning they are preserved across function calls (and must be saved and restored by any callee code).
Each register plays the same role as in x86, so for example rsp holds the current location of the stack.
There are 16 aliases for the lower 32-bits of the above registers. These are not separate registers, but instead provide access to the least signficant 32-bits of the 64-bits register.
value
ebx : register? ecx : register? edx : register? ebp : register? esp : register? esi : register? edi : register? r8d : register? r9d : register? r10d : register? r11d : register? r12d : register? r13d : register? r14d : register? r15d : register?
There are 16 aliases for the lower 16-bits of the above registers (and thus the lower 16-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 16-bits of the 64-bits register.
value
bx : register? cx : register? dx : register? bp : register? sp : register? si : register? di : register? r8w : register? r9w : register? r10w : register? r11w : register? r12w : register? r13w : register? r14w : register? r15w : register?
There are 16 aliases for the lower 8-bits of the above registers (and thus the lower 8-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 8-bits of the 64-bits register.
value
bl : register? cl : register? dl : register? bpl : register? spl : register? sil : register? dil : register? r8b : register? r9b : register? r10b : register? r11b : register? r12b : register? r13b : register? r14b : register? r15b : register?
Finally, there are 4 aliases for next higher 8-bits of the above registers (and thus the lower 9th-16th bits of some of the 64-bit registers}. Only rax, rbx, rcx, and rdx have such aliases.
procedure
(register-size x) → (or/c 8 16 32 64)
x : register?
Examples
> (register-size rax) 64
> (register-size eax) 32
> (register-size ax) 16
> (register-size al) 8
> (register-size ah) 8
procedure
(reg-64-bit r) → register?
r : register? (reg-32-bit r) → register? r : register? (reg-16-bit r) → register? r : register? (reg-8-bit-low r) → register? r : register? (reg-8-bit-high r) → register? r : register?
Examples
> (reg-8-bit-low rax) 'al
> (reg-8-bit-high rax) 'ah
> (reg-64-bit eax) 'rax
> (reg-32-bit eax) 'eax
Examples
> (reg-8-bit-high ebx) 'bh
> (reg-8-bit-high r8) no conversion available
2.5 Labels
Labels are represented as symbols (or $ structures) that must conform to the naming restriction imposed by NASM, so not all symbols are valid label names.
Labels must also follow the NASM restrictions on label names: "Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with special meaning), _ and ?."
Examples
> (label? 'foo) #t
> (label? "foo") #f
> (label? 'rax) #f
> (label? 'foo-bar) #f
> (label? 'foo.bar) #t
Examples
> (Label ($ 'rax)) (Label ($ 'rax))
2.6 Memory Expressions
Memory expressions are represented with Offset structures. A memory expression signals that a quantity should be interpreted as a location in memory, rather than the bits itself. For example, the rsp holds a pointer the stack memory; (Mov rax rsp) will move the pointer held in rsp into rax, while (Mov rax (Offset rsp)) will read 64-bits of memory at the location pointed at by the pointer in rsp into rax. On the other hand, (Mov rsp rax) will move the value in rax into the rsp register (overwriting the stack pointer), while (Mov (Offset rsp) rax) will write the value in rax into the memory pointed at by the rsp register.
Memory expression can take as arguments either registers or Assembly Expressions, which are commonly used to indicate offsets from a given memory location, e.g. (Mov rax (Offset (@ (+ rsp 8)))) reads the 64-bits of memory at the location held in rsp + 8, i.e. 8 bytes past wherever rsp points.
Examples
> (Offset 'rax) (Offset 'rax)
2.7 Assembly Expressions
Assembly expressions are represented by s-expressions conforming to the following grammar:
| ‹expr› | ::= | ‹register› |
|
| | | ‹immediate› |
|
| | | ‹label› |
|
| | | '$ |
|
| | | '$$ |
|
| | | (list ‹unop› ‹expr›) |
|
| | | (list ‹binop› ‹expr› ‹expr›) |
|
| | | (list '? ‹expr› ‹expr› ‹expr›) |
| ‹unop› | ::= | '+ |
|
| | | '- |
|
| | | '~ |
|
| | | '! |
|
| | | 'SEG |
| ‹binop› | ::= | '<<< |
|
| | | '<< |
|
| | | '< |
|
| | | '<= |
|
| | | '<=> |
|
| | | '>= |
|
| | | '> |
|
| | | '>> |
|
| | | '>>> |
|
| | | '= |
|
| | | '== |
|
| | | '!= |
|
| | | '|| |
|
| | | '\| |
|
| | | '& |
|
| | | '&& |
|
| | | '^^ |
|
| | | '^ |
|
| | | '+ |
|
| | | '- |
|
| | | '* |
|
| | | '/ |
|
| | | '// |
|
| | | '% |
|
| | | '%% |
For the meaning of assembly instructions, refer to the NASM docs.
Examples
> (exp? 0) #t
> (exp? '(+ rax 8)) #t
> (exp? '(? lab1 0 1)) #t
syntax
(@ e)
This form is useful for referencing bound variables or Racket functions within assembly expression. If the Racket identifier you want to reference conflicts with an assembly expression keyword, e.g. +, you can use begin to escape into Racket expression mode, e.g. (@ (+ 1 (begin (+ 2 3)))) is '(+ 1 5).
If any unquoted expression evaluates to something that is not an assembly expression, an error is signalled.
Examples
> (@ (+ 1 2)) '(+ 1 2)
> (@ (+ x 1)) '(+ x 1)
> (let ((x 100)) (@ (+ x 1))) '(+ 100 1)
> (let ((+ 100)) (@ (+ + +))) '(+ 100 100)
> (@ (+ + +)) not an assembly expression #<procedure:+>
2.8 Instruction Set
This section describes the instruction set of a86.
procedure
(instruction? x) → boolean?
x : any/c
procedure
(symbol->label s) → label?
s : symbol?
Examples
> (let ([l (symbol->label 'my-great-label)]) (seq (Label l) (Jmp l)))
(list
(Label 'label_my_great_label_a1d1fe873a8070d)
(Jmp 'label_my_great_label_a1d1fe873a8070d))
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Call 'f) (Add 'rax 1) (Ret) (Label 'f) (Mov 'rax 41) (Ret)) 42
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 42) (Ret)) 42
Either dst or src may be offsets, but not both.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rbx 42) (Mov 'rax 'rbx) (Ret)) 42
> (Mov (Offset 'rax 0) (Offset 'rbx 0)) Mov: cannot use two memory locations; given (Offset '(+ rax
0)), (Offset '(+ rbx 0))
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Add 'rax 10) (Ret)) 42
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Sub 'rax 10) (Ret)) 22
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Global 'entry) (Label 'entry) (Mov 'rax 32) (Add 'rax 10) (Ret)) 42
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Jmp 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
> (asm-interp (Mov 'rax 42) (Pop 'rbx) (Jmp 'rbx)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jz 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jnz 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jl 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 42) (Jle 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 2) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Cmp 'rax 42) (Jg 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jo 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) -9223372036854775808
Examples
> (asm-interp (Mov 'rax (sub1 (expt 2 63))) (Add 'rax 1) (Jno 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax -1) (Add 'rax 1) (Jc 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax -1) (Add 'rax 1) (Jnc 'l1) (Mov 'rax 0) (Label 'l1) (Ret)) 0
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovz 'rax 'r9) (Ret)) 2
Note that the semantics for conditional moves is not what many people expect. The src is always read, regardless of the condition’s evaluation. This means that if your source is illegal (such as an offset beyond the bounds of memory allocated to the current process), a segmentation fault will arise even if the condition “should have” prevented the error.
Examples
> (asm-interp (Mov 'r9 0) (Cmp 'r9 1) (Mov 'rax 0) ; doesn't move, but does read memory address 0 (Cmovz 'rax (Offset 'r9)) (Ret)) invalid memory reference. Some debugging context lost
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovnz 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovl 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovle 'rax 'r9) (Ret)) 2
Examples
> (asm-interp (Mov 'rax 0) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovg 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax -1) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret)) -1
> (asm-interp (Mov 'rax 2) (Cmp 'rax 0) (Mov 'r9 1) (Cmovge 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovo 'rax 'r9) (Ret)) 9223372036854775807
Examples
> (asm-interp (Mov 'rax (- (expt 2 63) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret)) -9223372036854775808
> (asm-interp (Mov 'rax (- (expt 2 63) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovno 'rax 'r9) (Ret)) 1
Examples
> (asm-interp (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret)) 1
> (asm-interp (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovc 'rax 'r9) (Ret)) -1
Examples
> (asm-interp (Mov 'rax (- (expt 2 64) 1)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret)) 0
> (asm-interp (Mov 'rax (- (expt 2 64) 2)) (Add 'rax 1) (Mov 'r9 1) (Cmovnc 'rax 'r9) (Ret)) 1
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (And 'rax 14) ; #b1110 = 14 (Ret)) 10
; #b1010 = 10
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (Or 'rax 14) ; #b1110 = 14 (Ret)) 15
; #b1111 = 15
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 11) ; #b1011 = 11 (Xor 'rax 14) ; #b1110 = 14 (Ret)) 5
; #b0101 = 5
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 4) ; #b100 = 4 = 2^2 (Sal 'rax 6) (Ret))) 256
; #b100000000 = 256
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Sar 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Sar 'rax 6) (Ret))) -144115188075855872
; #b1111111000000000000000000000000000000000000000000000000000000000
struct
dst : register? i : (integer-in 0 63)
struct
dst : register? i : (integer-in 0 63)
Examples
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 256) ; #b100000000 = 256 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 269) ; #b100001101 = 269 (Shr 'rax 6) (Ret))) 4
; #b100 = 4
> (asm-interp (prog (Global 'entry) (Label 'entry) (Mov 'rax 9223372036854775808) ; 1 in MSB (Shr 'rax 6) (Ret))) 144115188075855872
; #b0000001000000000000000000000000000000000000000000000000000000000
struct
a1 : (or/c 32-bit-integer? register?)
In the case of a 32-bit immediate, it is sign-extended to 64-bits.
Examples
> (asm-interp (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 42) (Push 'rax) (Mov 'rax 0) (Pop 'rax) (Ret)) 42
Examples
> (asm-interp (Mov 'rax 0) (Not 'rax) (Ret)) -1
Examples
> (asm-interp (Lea 'rbx 'done) (Mov 'rax 42) (Jmp 'rbx) (Mov 'rax 0) (Label 'done) (Ret)) 42