On this page:
4.1 Overview
4.2 Programs
seq
prog
4.2.1 Psuedo-Instructions
Text
Data
Label
Extern
Global
%
%%
%%%
4.2.2 Instructions
4.2.3 Immediates
64-bit-integer?
32-bit-integer?
16-bit-integer?
8-bit-integer?
4.2.4 Registers
rax
rbx
rcx
rdx
rbp
rsp
rsi
rdi
r8
r9
r10
r11
r12
r13
r14
r15
eax
ebx
ecx
edx
ebp
esp
esi
edi
r8d
r9d
r10d
r11d
r12d
r13d
r14d
r15d
ax
bx
cx
dx
bp
sp
si
di
r8w
r9w
r10w
r11w
r12w
r13w
r14w
r15w
al
bl
cl
dl
bpl
spl
sil
dil
r8b
r9b
r10b
r11b
r12b
r13b
r14b
r15b
ah
bh
ch
dh
register?
register-size
reg-64-bit
reg-32-bit
reg-16-bit
reg-8-bit-low
reg-8-bit-high
4.2.5 Labels
asm-label?
$
label?
4.2.6 Memory Expressions
Mem
4.2.7 Assembly Expressions
exp?
@
4.2.8 Instruction Set
instruction?
symbol->label
Call
Ret
Mov
Add
Sub
Mul
Cmp
Jmp
Jz
Jnz
Je
Jne
Jl
Jle
Jg
Jge
Jo
Jno
Jc
Jnc
Cmovz
Cmove
Cmovnz
Cmovne
Cmovl
Cmovle
Cmovg
Cmovge
Cmovo
Cmovno
Cmovc
Cmovnc
And
Or
Xor
Sal
Sar
Shl
Shr
Push
Pop
Not
Lea
Db
Dw
Dd
Dq
4.3 Execution Model
4.3.1 Flags
4.3.2 Stack
4.3.3 Memory
4.4 Printing
asm-display
asm-string
4.5 Interpreting
4.5.1 Running assembly programs
asm-interp
4.5.2 Resolving external labels
current-objects
extern
current-externs
asm-interp/  io
9.2

4 a86 Reference🔗

The a86 library provides functions for composing, printing, and running x86-64 assembly programs in Racket.

    4.1 Overview

    4.2 Programs

      4.2.1 Psuedo-Instructions

      4.2.2 Instructions

      4.2.3 Immediates

      4.2.4 Registers

      4.2.5 Labels

      4.2.6 Memory Expressions

      4.2.7 Assembly Expressions

      4.2.8 Instruction Set

    4.3 Execution Model

      4.3.1 Flags

      4.3.2 Stack

      4.3.3 Memory

    4.4 Printing

    4.5 Interpreting

      4.5.1 Running assembly programs

      4.5.2 Resolving external labels

4.1 Overview🔗

 (require a86) package: a86

This library provides functions for composing, printing, and running x86-64 assembly programs in Racket:

Examples

; Natural -> Asm
; Produce representation of assembly program to compute n! recursively
> (define (fact-program n)
    (list (Global 'run)
          (Label 'run)
          (Mov 'rax n)
          (Label 'fact)
          (Cmp 'rax 0)
          (Je 'done)
          (Push 'rax)
          (Sub 'rax 1)
          (Call 'fact)
          (Pop 'r9)
          (Mul 'r9)
          (Ret)
          (Label 'done)
          (Mov 'rax 1)
          (Ret)))
; compute 5!
> (asm-interp (fact-program 5))

120

; render 5! program in asm syntax
> (asm-display (fact-program 5))

        .intel_syntax noprefix

        .text

        .global "run"

"run":

        mov rax, 5

"fact":

        cmp rax, 0

        je "done"

        push rax

        sub rax, 1

        call "fact"

        pop r9

        mul r9

        ret

"done":

        mov rax, 1

        ret

Programs consist of a list of Instructions and Psuedo-Instructions. Instructions can take as arguments Labels, Immediates, Registers and Memory Expressions.

Executing instructions can read and modify Registers and Memory, including the Stack.

The a86 module provides all of the bindings from a86/ast, a86/registers, a86/printer, and a86/interp, described below.

4.2 Programs🔗

 (require a86/ast) package: a86

An a86 program is a list of instructions. To be interpretable with asm-interp, the program must be well-formed, which means:

  • Programs have at least one label which is declared Global; the first such label is used as the entry point.

  • All label definitions are unique.

  • All used labels are declared.

procedure

(seq x ...)  (listof instruction?)

  x : (or/c instruction? (listof instruction?))
A convenience function for splicing together instructions and lists of instructions.

Examples

> (seq)

'()

> (seq (Label 'foo))

(list (Label 'foo))

> (seq (list (Label 'foo)))

(list (Label 'foo))

> (seq (list (Label 'foo)
             (Mov 'rax 0))
       (Mov 'rdx 'rax)
       (list (Call 'bar)
             (Ret)))

(list

 (Label 'foo)

 (Mov 'rax 0)

 (Mov 'rdx 'rax)

 (Call 'bar)

 (Ret))

procedure

(prog x ...)  (listof instruction?)

  x : (or/c instruction? (listof instruction?))
Like seq, but also checks that the instructions form a program.

This function is useful to do some early error checking over whole programs and can help avoid confusing assembler errors. Unlike seq it should be called at the outermost level of a function that produces a86 code and not nested.

Examples

> (prog (Global 'foo) (Label 'foo))

(list (Global 'foo) (Label 'foo))

> (prog (Label 'foo))

prog: initial label undeclared as global: ($ 'foo)

> (prog (list (Label 'foo)))

prog: initial label undeclared as global: ($ 'foo)

> (prog (Mov 'rax 32))

prog: no initial label found

> (prog (Label 'foo)
        (Label 'foo))

prog: duplicate label declaration found: 'foo

> (prog (Jmp 'foo))

prog: undeclared labels found: '(foo)

> (prog (Global 'foo)
        (Label 'foo)
        (Jmp 'foo))

(list (Global 'foo) (Label 'foo) (Jmp 'foo))

4.2.1 Psuedo-Instructions🔗

Psuedo-instructions are elements of Programs that make declarations and directives to the assembler, but don’t correspond to actual execuable Instructions.

struct

(struct Text ())

Declares the start of a text section, which includes instructions to be executed.

struct

(struct Data ())

Declares the start of a data section, which includes data and constants.

struct

(struct Label (x))

  x : label?
Defines the given label, which is used as a symbolic name for this location in the program. Each defined label in a program must be unique. Label names must follow the restrictions on valid label names (see label? for details).

Examples

> (Label 'fred)

(Label 'fred)

> (Label "fred")

Label: expects valid label name; given "fred"

> (Label 'fred-wilma)

Label: label names must conform to restrictions

struct

(struct Extern (x))

  x : label?
Declares an external label. External labels may be used, but not defined within the program. In order to run a program, all external labels must be resolved using either current-externs or current-objects.

struct

(struct Global (x))

  x : label?
Declares a label as global, i.e. linkable with other object files.

struct

(struct % (s))

  s : string?

struct

(struct %% (s))

  s : string?

struct

(struct %%% (s))

  s : string?
Creates a comment in the assembly code. The % constructor adds a comment toward the right side of the current line; %% creates a comment on its own line 1 tab over; %%% creates a comment on its own line aligned to the left.

Examples

> (asm-display
    (prog (Global 'foo)
          (%%% "Start of foo")
          (Label 'foo)
          ; Racket comments won't appear
          (%% "Inputs one argument in rdi")
          (Mov 'rax 'rdi)
          (Add 'rax 'rax)    (% "double it")
          (Sub 'rax 1)       (% "subtract one")
          (%% "we're done!")
          (Ret)))

        .intel_syntax noprefix

        .text

        .global "foo"

### Start of foo

"foo":

        ## Inputs one argument in rdi

        mov rax, rdi

        add rax, rax            # double it

        sub rax, 1              # subtract one

        ## we're done!

        ret

4.2.2 Instructions🔗

Instructions are represented as structures and can take as arguments Immediates, Registers, Labels, Memory Expressions, or Assembly Expressions.

For example, (Mov 'rax 42) is a "move" instruction that when executed will move the immediate value 42 into the rax register.

See Instruction Set for a complete listing of the instruction set and instruction constructor signatures.

4.2.3 Immediates🔗

Immediates are represented as exact integers of a certain bit width, which will depend upon the particular instruction and other arguments.

For example, Mov can take a 64-bit immediate source argument if the destination register is a 64-bit register. If the destination is a 32-bit register, the immediate must fit in 32-bits, etc. Cmp can take at most a 32-bit immediate argument. Instruction constructors check the size constraints of immediate arguments and signal an error when out of range.

Examples

> (Mov rax (sub1 (expt 2 64)))

(Mov 'rax 18446744073709551615)

> (Mov eax (sub1 (expt 2 64)))

Mov: literal must not exceed 32-bits; given

18446744073709551615 (64 bits)

> (Cmp rax (sub1 (expt 2 64)))

Cmp: literal must not exceed 32-bits signed; given

18446744073709551615 (65 bits signed); go through a register

instead

Note that x86 doesn’t have a notion of signed or unsigned integers. Some instructions compute either signed or unsigned operations, but the values in registers are simply bits. For this reason, a 64-bit immediate can be any exact integer in the range (- (expt 2 63)) and (sub1 (expt 2 64)), but keep in mind that, for example (- (expt 2 63)) and (expt 2 23) are represented by the same bits. Also note that asm-interp interprets the result of an assembly program as a signed integer. If you want to interpret the result as an unsigned integer, you will need add code to do so.

Here is an example where you can see different immediate arguments resulting in the same result from asm-interp, and that the result is signed:

Examples

> (asm-interp (Mov rax -1)
              (Ret))

-1

> (asm-interp (Mov rax (sub1 (expt 2 64)))
              (Ret))

-1

> (asm-interp (Mov rax (- (expt 2 63)))
              (Ret))

-9223372036854775808

> (asm-interp (Mov rax (expt 2 63))
              (Ret))

-9223372036854775808

procedure

(64-bit-integer? x)  boolean?

  x : any/c
(32-bit-integer? x)  boolean?
  x : any/c
(16-bit-integer? x)  boolean?
  x : any/c
(8-bit-integer? x)  boolean?
  x : any/c
Predicates for determining if a value is an integer that fits in some number of bits.

Examples

> (64-bit-integer? 0)

#t

> (64-bit-integer? (sub1 (expt 2 64)))

#t

> (64-bit-integer? (expt 2 64))

#f

> (64-bit-integer? (- (expt 2 63)))

#t

> (64-bit-integer? (sub1 (- (expt 2 63))))

#f

> (32-bit-integer? 0)

#t

> (32-bit-integer? (sub1 (expt 2 32)))

#t

> (32-bit-integer? (expt 2 32))

#f

> (32-bit-integer? (- (expt 2 32)))

#f

> (32-bit-integer? (sub1 (- (expt 2 32))))

#f

4.2.4 Registers🔗

 (require a86/registers) package: a86

Registers are represented as symbols, but this module also provides bindings corresponding to each register name, e.g. rax is bound to 'rax.

There are 16 64-bit registers.

Names for corresponding 64-bit registers.

Examples

> rax

'rax

> rbx

'rbx

> rsp

'rsp

The registers rbx, rsp, rbp, and r12 through r15 are “callee-saved” registers, meaning they are preserved across function calls (and must be saved and restored by any callee code).

Each register plays the same role as in x86, so for example rsp holds the current location of the stack.

There are 16 aliases for the lower 32-bits of the above registers. These are not separate registers, but instead provide access to the least signficant 32-bits of the 64-bits register.

Names for corresponding 32-bit alias registers.

Examples

> eax

'eax

> ebx

'ebx

> esp

'esp

There are 16 aliases for the lower 16-bits of the above registers (and thus the lower 16-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 16-bits of the 64-bits register.

Names for corresponding 16-bit alias registers.

Examples

> ax

'ax

> bx

'bx

> sp

'sp

There are 16 aliases for the lower 8-bits of the above registers (and thus the lower 8-bits of the 64-bit registers). These are not separate registers, but instead provide access to the least signficant 8-bits of the 64-bits register.

Names for corresponding 8-bit alias registers.

Examples

> al

'al

> bl

'bl

> spl

'spl

Finally, there are 4 aliases for next higher 8-bits of the above registers (and thus the lower 9th-16th bits of some of the 64-bit registers}. Only rax, rbx, rcx, and rdx have such aliases.

Names for the corresponding 8-bit alias registers.

Examples

> ah

'ah

> bh

'bh

procedure

(register? x)  boolean?

  x : any/c
A predicate for registers.

Examples

> (register? 'rax)

#t

> (register? 'al)

#t

> (register? 'bmx)

#f

procedure

(register-size x)  (or/c 8 16 32 64)

  x : register?
Returns the size of a given register.

procedure

(reg-64-bit r)  register?

  r : register?
(reg-32-bit r)  register?
  r : register?
(reg-16-bit r)  register?
  r : register?
(reg-8-bit-low r)  register?
  r : register?
(reg-8-bit-high r)  register?
  r : register?
Functions for computing alias of a given register. These functions can be given any register (including alias registers) and compute the corresponding register name of the appropriate bit-width.

Examples

> (reg-8-bit-low rax)

'al

> (reg-8-bit-high rax)

'ah

> (reg-64-bit eax)

'rax

> (reg-32-bit eax)

'eax

In the case of reg-8-bit-high, an error is signalled if the given register has no corresponding alias.

Examples

> (reg-8-bit-high ebx)

'bh

> (reg-8-bit-high r8)

no conversion available

4.2.5 Labels🔗

Labels are represented as symbols (or $ structures) that must conform to the naming restriction imposed by the assembler, so not all symbols are valid label names.

procedure

(asm-label? x)  boolean?

  x : any/c
A predicate for label names, i.e., symbols that are not register names.

Labels must also follow the restrictions on label names: The NASM assembler’s documentation specifies: "Valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with special meaning), _ and ?."

Examples

> (asm-label? 'foo)

'("foo")

> (asm-label? "foo")

#f

> (asm-label? 'rax)

'("rax")

> (asm-label? 'foo-bar)

#f

> (asm-label? 'foo.bar)

'("foo.bar")

struct

(struct $ (l))

  l : symbol?
Structure for representing labels. Useful when you need to refer to a label that has a name conflicting with a register name or other reserved keyword, or just for syntactic distinction.

Examples

> (Label ($ 'rax))

(Label ($ 'rax))

procedure

(label? x)  boolean?

  x : any/c
A predicate for labels, equivalent to:

(or (asm-symbol? x)
    ($? x))
4.2.6 Memory Expressions🔗

Memory expressions are represented with Mem structures. A memory expression signals that a quantity should be interpreted as a location in memory, rather than as the bits themselves. For example, consider the rsp register, which holds a pointer to the stack in memory, i.e., its value is a number that can be interpreted as an address. The instruction (Mov rax rsp) will copy that address from rsp into rax, while the instruction (Mov rax (Mem rsp)) will copy the 64 bits of data stored in memory at the address held by rsp into rax. Similarly, (Mov rsp rax) will copy the value held in rax into rsp (which will overwrite our stack pointer — almost always a bad thing), while (Mov (Mem rsp) rax) will write the value held in rax into the memory pointed to by the rsp pointer without actually changing the value held in rsp.

struct

(struct Mem (base index scale offset))

  base : (or/c #f register? integer?)
  index : (or/c #f label? (and/c register? (not/c 'rsp)))
  scale : (or/c #f 1 2 4 8)
  offset : (or/c #f integer?)
Structure for representing memory expressions, as used by instructions like Add, Mov, Lea, and so on.

The integer?-accepting arguments base and offset place restrictions on those integers depending on the addressing mode. In 64-bit mode, the integer can be no wider than 32 bits (signed).

When the scale is omitted, it is not printed out in x86. However, omitting it is effectively equivalent to specifying a scale of 1.

When two registers are given as the base and index arguments, they must be of the same width. For example, rax and r8 are compatible, but rbx and eax are not.

We don’t use the scale for anything currently, but it is supported for future extensions to the course.

Because of how complicated it can be to correctly specify the arguments to a Mem, this structure is built by a specialized smart constructor. The acceptable forms are documented below:

procedure

(Mem index [offset])  Mem?

  index : label?
  offset : (or/c #f integer?) = #f
A relative address specification. In these cases, the printer automatically prefixes the address computation with the rip base register according to the x86 specification.

The index can be given as either a symbol (e.g., foo) or a $-wrapped label. In the former case, the symbol will be wrapped in a $ during construction to ensure proper architecture-dependent formatting when printing.

Prints in x86 as [rip + index + offset].

procedure

(Mem offset)  Mem?

  offset : integer?
An absolute address specification with an offset as the base. We don’t use this form currently, but it is supported for future extensions to the course.

Prints in x86 as [offset].

procedure

(Mem base [index #:scale scale offset])  Mem?

  base : register?
  index : (or/c #f (and/c register? (not/c 'rsp))) = #f
  scale : (or/c #f 1 2 4 8) = #f
  offset : (or/c #f integer?) = #f
An absolute address specification with a register as the base. The index, scale, and offset arguments are optional.

Prints in x86 as [base + (index * scale) + offset].

procedure

(Mem index #:scale scale [offset])  Mem?

  index : (and/c register? (not/c 'rsp))
  scale : (or/c 1 2 4 8)
  offset : (or/c #f integer?) = #f
An absolute address specification with a non-rsp register as the index. This form requires the scale to be given, and it accepts an optional offset argument.

Prints in x86 as [(index * scale) + offset].

4.2.7 Assembly Expressions🔗

Assembly expressions are represented by s-expressions conforming to the following grammar:

 

expr

 ::= 

register

 

  |  

immediate

 

  |  

label

 

  |  

'$

 

  |  

'$$

 

  |  

(list unop expr)

 

  |  

(list binop expr expr)

 

  |  

(list '? expr expr expr)

 

unop

 ::= 

'+

 

  |  

'-

 

  |  

'~

 

  |  

'!

 

  |  

'SEG

 

binop

 ::= 

'<<<

 

  |  

'<<

 

  |  

'<

 

  |  

'<=

 

  |  

'<=>

 

  |  

'>=

 

  |  

'>

 

  |  

'>>

 

  |  

'>>>

 

  |  

'=

 

  |  

'==

 

  |  

'!=

 

  |  

'||

 

  |  

'\|

 

  |  

'&

 

  |  

'&&

 

  |  

'^^

 

  |  

'^

 

  |  

'+

 

  |  

'-

 

  |  

'*

 

  |  

'/

 

  |  

'//

 

  |  

'%

 

  |  

'%%

For the meaning of assembly instructions, refer to the NASM docs.

procedure

(exp? x)  boolean?

  x : any/c
A predicate for assembly expressions.

Examples

> (exp? 0)

#t

> (exp? '(+ rax 8))

#t

> (exp? '(? lab1 0 1))

#t

syntax

(@ e)

A convenience form for constructing assembly expressions. This form implicitly quotes e and implicitly unquotes when encountering bound identifiers or forms that are not part of the assembly expression grammar. So for example (@ (+ 1 2)) is just '(+ 1 2), but (@ (+ 1 (add1 2))) is '(+ 1 3). Note that identifiers that are not bound are assumed to refer to labels, so they are quoted, but identifiers that are bound are replaced with their value.

This form is useful for referencing bound variables or Racket functions within assembly expression. If the Racket identifier you want to reference conflicts with an assembly expression keyword, e.g. +, you can use begin to escape into Racket expression mode, e.g. (@ (+ 1 (begin (+ 2 3)))) is '(+ 1 5).

If any unquoted expression evaluates to something that is not an assembly expression, an error is signalled.

Examples

> (@ (+ 1 2))

'(+ 1 2)

> (@ (+ x 1))

'(+ x 1)

> (let ((x 100)) (@ (+ x 1)))

'(+ 100 1)

> (let ((+ 100)) (@ (+ + +)))

'(+ 100 100)

> (@ (+ + +))

not an assembly expression #<procedure:+>

4.2.8 Instruction Set🔗

This section describes the instruction set of a86.

procedure

(instruction? x)  boolean?

  x : any/c
A predicate for instructions.

procedure

(symbol->label s)  label?

  s : symbol?
Returns a modified form of a symbol that follows assembler label conventions.

Examples

> (let ([l (symbol->label 'my-great-label)])
    (seq (Label l)
         (Jmp l)))

(list

 (Label 'label_my_great_label_a1d1fe873a8070d)

 (Jmp 'label_my_great_label_a1d1fe873a8070d))

struct

(struct Call (x))

  x : (or/c label? register?)
A call instruction.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Call 'f)
    (Add 'rax 1)
    (Ret)
    (Label 'f)
    (Mov 'rax 41)
    (Ret))

42

struct

(struct Ret ())

A return instruction.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 42)
    (Ret))

42

struct

(struct Mov (dst src))

  dst : (or/c register? Mem?)
  src : (or/c register? Mem? 64-bit-integer?)
A move instruction. Moves src to dst.

Either dst or src may be offsets, but not both.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Mov 'rbx 42)
    (Mov 'rax 'rbx)
    (Ret))

42

> (Mov (Mem 'rax 0) (Mem 'rbx 0))

Mov: cannot use two memory locations; given (Mem 'rax 0),

(Mem 'rbx 0)

struct

(struct Add (dst src))

  dst : register?
  src : (or/c register? Mem? 32-bit-integer?)
An addition instruction. Adds src to dst and writes the result to dst. Updates the conditional flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 32)
    (Add 'rax 10)
    (Ret))

42

struct

(struct Sub (dst src))

  dst : register?
  src : (or/c register? Mem? 32-bit-integer?)
A subtraction instruction. Subtracts src from dst and writes the result to dst. Updates the conditional flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 32)
    (Sub 'rax 10)
    (Ret))

22

struct

(struct Mul (src))

  src : (or/c register? Mem? 32-bit-integer?)
A multiplication instruction. Multiplies src by 'rax and writes the result to 'rax and 'rdx. Updates flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 32)
    (Add 'rax 10)
    (Ret))

42

struct

(struct Cmp (a1 a2))

  a1 : (or/c register? Mem?)
  a2 : (or/c register? Mem? 32-bit-integer?)
Compare a1 to a2 by subtracting a2 from a1 and updating the comparison flags. Does not store the result of subtraction.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 2)
              (Jg 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

42

struct

(struct Jmp (x))

  x : (or/c label? register?)
Jump to label x.

Examples

> (asm-interp (Mov 'rax 42)
              (Jmp 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

42

> (asm-interp (Mov 'rax 42)
              (Pop 'rbx)
              (Jmp 'rbx))

42

struct

(struct Jz (x))

  x : (or/c label? register?)
Jump to label x if the zero flag is set.

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 2)
              (Jz 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Jnz (x))

  x : (or/c label? register?)
Jump to label x if the zero flag is not set.

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 2)
              (Jnz 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

42

struct

(struct Je (x))

  x : (or/c label? register?)
An alias for Jz.

struct

(struct Jne (x))

  x : (or/c label? register?)
An alias for Jnz.

struct

(struct Jl (x))

  x : (or/c label? register?)
Jump to label x if the conditional flags are set to “less than” (see Flags).

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 2)
              (Jl 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Jle (x))

  x : (or/c label? register?)
Jump to label x if the conditional flags are set to “less than or equal” (see Flags).

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 42)
              (Jle 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

42

struct

(struct Jg (x))

  x : (or/c label? register?)
Jump to label x if the conditional flags are set to “greater than” (see Flags).

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 2)
              (Jg 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

42

struct

(struct Jge (x))

  x : (or/c label? register?)
Jump to label x if the conditional flags are set to “greater than or equal” (see Flags).

Examples

> (asm-interp (Mov 'rax 42)
              (Cmp 'rax 42)
              (Jg 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Jo (x))

  x : (or/c label? register?)
Jump to x if the overflow flag is set.

Examples

> (asm-interp (Mov 'rax (sub1 (expt 2 63)))
              (Add 'rax 1)
              (Jo 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

-9223372036854775808

struct

(struct Jno (x))

  x : (or/c label? register?)
Jump to x if the overflow flag is not set.

Examples

> (asm-interp (Mov 'rax (sub1 (expt 2 63)))
              (Add 'rax 1)
              (Jno 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Jc (x))

  x : (or/c label? register?)
Jump to x if the carry flag is set.

Examples

> (asm-interp (Mov 'rax -1)
              (Add 'rax 1)
              (Jc 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Jnc (x))

  x : (or/c label? register?)
Jump to x if the carry flag is not set.

Examples

> (asm-interp (Mov 'rax -1)
              (Add 'rax 1)
              (Jnc 'l1)
              (Mov 'rax 0)
              (Label 'l1)
              (Ret))

0

struct

(struct Cmovz (dst src))

  dst : register?
  src : (or/c register? Mem?)
Read from src, move to dst if the zero flag is set.

Examples

> (asm-interp (Mov 'rax 0)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovz 'rax 'r9)
              (Ret))

1

> (asm-interp (Mov 'rax 2)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovz 'rax 'r9)
              (Ret))

2

Note that the semantics for conditional moves is not what many people expect. The src is always read, regardless of the condition’s evaluation. This means that if your source is illegal (such as an offset beyond the bounds of memory allocated to the current process), a segmentation fault will arise even if the condition “should have” prevented the error.

Examples

> (asm-interp (Mov 'r9 0)
              (Cmp 'r9 1)
              (Mov 'rax 0)
              ; doesn't move, but does read memory address 0
              (Cmovz 'rax (Mem 'r9))
              (Ret))

invalid memory reference.  Some debugging context lost

struct

(struct Cmove (dst src))

  dst : register?
  src : (or/c register? Mem?)
An alias for Cmovz. See notes on Cmovz.

struct

(struct Cmovnz (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the zero flag is not set. See notes on Cmovz.

Examples

> (asm-interp (Mov 'rax 0)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovnz 'rax 'r9)
              (Ret))

0

> (asm-interp (Mov 'rax 2)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovnz 'rax 'r9)
              (Ret))

1

struct

(struct Cmovne (dst src))

  dst : register?
  src : (or/c register? Mem?)
An alias for Cmovnz. See notes on Cmovz.

struct

(struct Cmovl (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the conditional flags are set to “less than” (see Flags). See also the notes on Cmovz.

Examples

> (asm-interp (Mov 'rax 0)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovl 'rax 'r9)
              (Ret))

0

> (asm-interp (Mov 'rax -1)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovl 'rax 'r9)
              (Ret))

1

struct

(struct Cmovle (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the conditional flags are set to “less than or equal” (see Flags). See also the notes on Cmovz.

Examples

> (asm-interp (Mov 'rax 0)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovle 'rax 'r9)
              (Ret))

1

> (asm-interp (Mov 'rax 2)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovle 'rax 'r9)
              (Ret))

2

struct

(struct Cmovg (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the conditional flags are set to “greather than” (see Flags). See also the notes on Cmovz.

Examples

> (asm-interp (Mov 'rax 0)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovg 'rax 'r9)
              (Ret))

0

> (asm-interp (Mov 'rax 2)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovg 'rax 'r9)
              (Ret))

1

struct

(struct Cmovge (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the conditional flags are set to “greater than or equal” (see Flags). See also the notes on Cmovz.

Examples

> (asm-interp (Mov 'rax -1)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovge 'rax 'r9)
              (Ret))

-1

> (asm-interp (Mov 'rax 2)
              (Cmp 'rax 0)
              (Mov 'r9 1)
              (Cmovge 'rax 'r9)
              (Ret))

1

struct

(struct Cmovo (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the overflow flag is set. See notes on Cmovz.

Examples

> (asm-interp (Mov 'rax (- (expt 2 63) 1))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovo 'rax 'r9)
              (Ret))

1

> (asm-interp (Mov 'rax (- (expt 2 63) 2))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovo 'rax 'r9)
              (Ret))

9223372036854775807

struct

(struct Cmovno (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the overflow flag is not set. See notes on Cmovz.

Examples

> (asm-interp (Mov 'rax (- (expt 2 63) 1))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovno 'rax 'r9)
              (Ret))

-9223372036854775808

> (asm-interp (Mov 'rax (- (expt 2 63) 2))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovno 'rax 'r9)
              (Ret))

1

struct

(struct Cmovc (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the carry flag is set. See notes on Cmovz.

Examples

> (asm-interp (Mov 'rax (- (expt 2 64) 1))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovc 'rax 'r9)
              (Ret))

1

> (asm-interp (Mov 'rax (- (expt 2 64) 2))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovc 'rax 'r9)
              (Ret))

-1

struct

(struct Cmovnc (dst src))

  dst : register?
  src : (or/c register? Mem?)
Move from src to dst if the carry flag is not set. See notes on Cmovz.

Examples

> (asm-interp (Mov 'rax (- (expt 2 64) 1))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovnc 'rax 'r9)
              (Ret))

0

> (asm-interp (Mov 'rax (- (expt 2 64) 2))
              (Add 'rax 1)
              (Mov 'r9 1)
              (Cmovnc 'rax 'r9)
              (Ret))

1

struct

(struct And (dst src))

  dst : (or/c register? Mem?)
  src : (or/c register? Mem? 32-bit-integer?)
Compute logical “and” of dst and src and put result in dst. Updates the conditional flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp (Mov 'rax 11) ; #b1011 = 11
              (And 'rax 14) ; #b1110 = 14
              (Ret))

10

; #b1010 = 10

struct

(struct Or (dst src))

  dst : (or/c register? Mem?)
  src : (or/c register? Mem? 32-bit-integer?)
Compute logical “or” of dst and src and put result in dst. Updates the conditional flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp (Mov 'rax 11) ; #b1011 = 11
              (Or 'rax 14)  ; #b1110 = 14
              (Ret))

15

; #b1111 = 15

struct

(struct Xor (dst src))

  dst : (or/c register? Mem?)
  src : (or/c register? Mem? 32-bit-integer?)
Compute logical “exclusive or” of dst and src and put result in dst. Updates the conditional flags.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp (Mov 'rax 11) ; #b1011 = 11
              (Xor 'rax 14) ; #b1110 = 14
              (Ret))

5

; #b0101 = 5

struct

(struct Sal (dst i))

  dst : register?
  i : (integer-in 0 63)
Shift dst to the left i bits and put result in dst. The most-significant (leftmost) bits are discarded. Updates the conditional flags.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 4) ; #b100 = 4 = 2^2
    (Sal 'rax 6)
    (Ret)))

256

; #b100000000 = 256

struct

(struct Sar (dst i))

  dst : register?
  i : (integer-in 0 63)
Shift dst to the right i bits and put result in dst. For each shift count, the least-significant (rightmost) bit is shifted into the carry flag. The new most-significant (leftmost) bits are filled with the sign bit of the original dst value. Updates the conditional flags.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 256) ; #b100000000 = 256
    (Sar 'rax 6)
    (Ret)))

4

; #b100 = 4
> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 269) ; #b100001101 = 269
    (Sar 'rax 6)
    (Ret)))

4

; #b100 = 4
> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 9223372036854775808) ; 1 in MSB
    (Sar 'rax 6)
    (Ret)))

-144115188075855872

; #b1111111000000000000000000000000000000000000000000000000000000000

struct

(struct Shl (dst i))

  dst : register?
  i : (integer-in 0 63)
Alias for Sal.

struct

(struct Shr (dst i))

  dst : register?
  i : (integer-in 0 63)
Shift dst to the right i bits and put result in dst. For each shift count, the least-significant (rightmost) bit is shifted into the carry flag, and the most-significant bit is cleared. Updates the conditional flags.

Examples

> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 256) ; #b100000000 = 256
    (Shr 'rax 6)
    (Ret)))

4

; #b100 = 4
> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 269) ; #b100001101 = 269
    (Shr 'rax 6)
    (Ret)))

4

; #b100 = 4
> (asm-interp
   (prog
    (Global 'entry)
    (Label 'entry)
    (Mov 'rax 9223372036854775808) ; 1 in MSB
    (Shr 'rax 6)
    (Ret)))

144115188075855872

; #b0000001000000000000000000000000000000000000000000000000000000000

struct

(struct Push (a1))

  a1 : (or/c 32-bit-integer? register?)
Decrements the stack pointer and then stores the source operand on the top of the stack.

In the case of a 32-bit immediate, it is sign-extended to 64-bits.

Examples

> (asm-interp (Mov 'rax 42)
              (Push 'rax)
              (Mov 'rax 0)
              (Pop 'rax)
              (Ret))

42

struct

(struct Pop (a1))

  a1 : register?
Loads the value from the top of the stack to the destination operand and then increments the stack pointer.

Examples

> (asm-interp (Mov 'rax 42)
              (Push 'rax)
              (Mov 'rax 0)
              (Pop 'rax)
              (Ret))

42

struct

(struct Not (a1))

  a1 : register?
Perform bitwise not operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand.

Examples

> (asm-interp (Mov 'rax 0)
              (Not 'rax)
              (Ret))

-1

struct

(struct Lea (dst x))

  dst : (or/c register? Mem?)
  x : label?
Loads the address of the given label into dst.

Examples

> (asm-interp (Lea 'rbx 'done)
              (Mov 'rax 42)
              (Jmp 'rbx)
              (Mov 'rax 0)
              (Label 'done)
              (Ret))

42

struct

(struct Db (d))

  d : integer?
Psuedo-instruction for declaring 8-bits of initialized static memory.

struct

(struct Dw (d))

  d : integer?
Psuedo-instruction for declaring 16-bits of initialized static memory.

struct

(struct Dd (d))

  d : integer?
Psuedo-instruction for declaring 32-bits of initialized static memory.

struct

(struct Dq (d))

  d : integer?
Psuedo-instruction for declaring 64-bits of initialized static memory.

4.3 Execution Model🔗

The execution model of a86 programs is the same as that of x86, but this section gives a brief overview of the most important aspects.

Execution proceeds instruction by instructions. Instructions may read or modify the state of Registers, Flags, and Memory and, except for jumping instructions, execution proceeds to the next instruction in memory.

4.3.1 Flags🔗

The processor makes use of flags to handle comparisons. For our purposes, there are four flags to be aware of: zero (ZF), sign (SF), carry (CF), and overflow (OF).

These flags are set by each of the arithmetic operations, which are appropriately annotated in the Instruction Set. Each of these operations is binary (meaning they take two arguments), and the flags are set according to properties of the result of the arithmetic operation. Many of these properties look at the most-significant bit (MSB) of the inputs and output.

  • ZF is set when the result is 0.

  • SF is set when the MSB of the result is set.

  • CF is set when a bit was set beyond the MSB.

  • OF is set when one of two conditions is met:

    1. The MSB of each input is set and the MSB of the result is not set.

    2. The MSB of each input is not set and the MSB of the result is set.

Note that CF is only useful for unsigned arithmetic, while OF is only useful for signed arithmetic. In opposite cases, they provide no interesting information.

These flags, along with many others, are stored in a special FLAGS register that cannot be accessed by normal means. Each flag is represented by a single bit in the register, and they all have specific bits assigned by the x86 specification. For example, CF is bit 0, ZF is bit 6, SF is bit 7, and OF is bit 11, as indexed from the least-significant bit position (but you don’t need to know these numbers).

The various conditions that can be tested for correspond to combinations of the flags. For example, the Jc instruction will jump if CF is set, otherwise execution will fall through to the next instruction. Most of the condition suffixes are straightforward to deduce from their spelling, but some are not. The suffixes (e.g., the c in Jc) and their meanings are given below. For brevity’s sake the flags’ names are abbreviated by ommitting the F suffix and prefixing them with either + or - to indicate set and unset positions, respectively, as needed. Some of the meanings require use of the bitwise operators | (OR), & (AND), ^ (XOR), and =? (equality).

Suffix

Flag

Suffix

Flag

z

+Z

nz

-Z

e

+Z

ne

-Z

s

+S

ns

-S

c

+C

nc

-C

o

+O

no

-O

l

      (S ^ O)

g

(-Z & (S =? O))

le

(+Z | (S ^ O))

ge

      (S =? O)

The e suffix (“equal?”) is just a synonym for the z suffix (“zero?”). This is because it is common to use the Cmp instruction to perform comparisons, but Cmp is actually identical to Sub with the exception that the result is not stored anywhere (i.e., it is only used for setting flags according to subtraction). If two values are subtracted and the resulting difference is zero (ZF is set), then the values are equal.

4.3.2 Stack🔗

The a86 execution model includes access to memory that can be used as a stack data structure. There are operations that manipulate the stack, such as Push, Pop, Call, and Ret, and the stack register pointer 'rsp is dedicated to the stack. Stack memory is allocated in “low” address space and grows downward. So pushing an element on to the stack decrements 'rsp.

The stack is useful as a way to save away values that may be needed later. For example, let’s say you have two (assembly-level) functions and you want to produce the sum of their results. By convention, functions return their result in 'rax, so doing something like this won’t work:

(seq (Call 'f)
     (Call 'g)
     (Add 'rax ...))

The problem is the return value of 'f gets clobbered by 'g. You might be tempted to fix the problem by moving the result to another register:

(seq (Call 'f)
     (Mov 'rbx 'rax)
     (Call 'g)
     (Add 'rax 'rbx))

This works only so long as 'g doesn’t clobber 'rbx. In general, it might not be possible to avoid that situation. So the solution is to use the stack to save the return value of 'f while the call to 'g proceeds:

(seq (Call 'f)
     (Push 'rax)
     (Call 'g)
     (Pop 'rbx)
     (Add 'rax 'rbx))

This code pushes the value in 'rax on to the stack and then pops it off and into 'rbx after 'g returns. Everything works out so long as 'g maintains a stack-discipline, i.e. the stack should be in the same state when 'g returns as when it was called.

We can make a complete example to confirm that this works as expected. First let’s set up a little function for letting us try out examples:

Examples

> (define (eg asm)
    (asm-interp
     (prog
      (Global 'entry)
      (Label 'entry)
      asm  ; the example code we want to try out
      (Ret)
  
      (Label 'f)      ; calling 'f returns 36
      (Mov 'rax 36)
      (Ret)
  
      (Label 'g)      ; calling 'g returns 6, but
      (Mov 'rbx 4)    ; it clobbers 'rbx just for the lulz
      (Mov 'rax 6)
      (Ret))))

Now let’s try it, using the stack to confirm it does the right thing:

Examples

> (eg (seq (Call 'f)
           (Push 'rax)
           (Call 'g)
           (Pop 'rbx)
           (Add 'rax 'rbx)))

42

Compare that with the first version that used a register to save the result of 'f:

Examples

> (eg (seq (Call 'f)
           (Mov 'rbx 'rax)
           (Call 'g)
           (Add 'rax 'rbx)))

10

The Push and Pop instructions offer a useful illusion, but of course, there’s not really any data structure abstraction here; there’s just raw memory and registers. But so long as code abides by conventions, the illusion turns out to be the true state of affairs.

What’s really going on under the hood of Push and Pop is that the 'rsp register is decremented and the value is written to the memory location pointed to by the value of 'rsp.

The following code is mostly equivalent to what we wrote above (and we will discuss the difference in the next section):

Examples

> (eg (seq (Call 'f)
           (Sub 'rsp 8)                ; "allocate" a word on the stack
           (Mov (Mem 'rsp 0) 'rax)     ; write 'rax to top frame
           (Call 'g)
           (Mov 'rbx (Mem 'rsp 0))     ; load top frame into 'rbx
           (Add 'rsp 8)                ; "deallocate" word on the stack
           (Add 'rax 'rbx)))

42

As you can see from this code, it would be easy to violate the usual invariants of stack data structure to, for example, access elements beyond the top of the stack. The value of Push and Pop is they make clear that you are using things in a stack-like way and they keep you from screwing up the accesses, offsets, and adjustments to 'rsp.

Just as Push and Pop are useful illusions, so too are Call and Ret. They give the impression that there is a notion of a procedure and procedure call mechanism in assembly, but actually there’s no such thing.

Think for a moment about what it means to “call” 'f in the examples above. When executing (Call 'f), control jumps to the instruction following (Label 'f). When we then get to (Ret), somehow the CPU knows to jump back to the instruction following the (Call 'f) that we started with.

What’s really going on is that (Call 'f) is pushing the address of subsequent instruction on to the stack and then jumping to the label 'f. This works in concert with Ret, which pops the return address off the stack and jumping to it.

Just as we could write equivalent code without Push and Pop, we can write the same code without Call and Ret.

We do need one new trick, which is the Lea instruction, which loads an effective address. You can think of it like Mov except that it loads the address of something rather than what is pointed to by an address. For our purposes, it is useful for loading the address of a label:

(Lea 'rax 'f)

This instruction puts the address of label 'f into rax. You can think of this as loading a function pointer into 'rax. With this new instruction, we can illuminate what is really going on with Call and Ret:

Examples

> (eg (seq (Lea 'rax 'fret)  ; load address of 'fret label into 'rax
           (Push 'rax)       ; push the return pointer on to stack
           (Jmp 'f)          ; jump to 'f
           (Label 'fret)     ; < return point for "call" to 'f
           (Push 'rax)       ; save result (like before)
           (Lea 'rax 'gret)  ; load address of 'gret label into 'rax
           (Push 'rax)       ; push the return pointer on to stack
           (Jmp 'g)          ; jump to 'g
           (Label 'gret)     ; < return point for "call" to 'g
           (Pop 'rbx)        ; pop saved result from calling 'f
           (Add 'rax 'rbx)))

42

The above shows how to encode Call as Lea, Push, and Jmp. The encoding of Ret is just:

(seq (Pop 'rbx)
     (Jmp 'rbx))

While the Push and Pop operations are essentially equivalent to manually adjusting the stack pointer and target register. The one difference is that these special stack-manipulation operations do not set any flags like Add and Sub do. So while you can often choose to manually implement stack manipulation, you’ll need to use these instructions specifically if you want to preserve the condition flags after adjusting the stack.

4.3.3 Memory🔗

The stack is really just a pointer some location in memory, but it is possible to alloacte, read, and modify memory elsewhere too. The stack memory is allocated by the operating system and the location of this memory is initially placed in the rsp register.

It is possible to statically allocate memory within the program itself using the Data section and psuedo-instructions such as Dq, etc.

For example, this program statically allocates a quad-word (64-bits), initialized to 0. The program then modifies the memory by writing 42 and then returning the value obtained by dereferencing that memory, i.e. 42:

Examples

> (asm-interp (Mov r8 42)
              (Mov (Mem 'm) r8)
              (Mov rax (Mem 'm))
              (Ret)
              (Data)
              (Label 'm)
              (Dq 0))

42

It is also possible to dynamically allocate memory. This can be done by a wrapper, written e.g. in C, that allocates memory and passes in a pointer to that memory as an argument to the assembly code. It’s also possible to call standard C library function like malloc to allocate memory within an assembly program.

This program is analogous to the one above, but instead of statically allocating a quad-word of memory, it makes a call to malloc with an argument of 8 in order to allocate 8 bytes of memory. A pointer to the newly allocated memory is returned in rax, which is then written to with the value 42, before being dereferenced and returned:

Examples

> (asm-interp (Mov rdi 8)
              (Extern 'malloc)
              (Call 'malloc)
              (Mov r8 42)
              (Mov (Mem rax) r8)
              (Mov rax (Mem rax))
              (Ret))

42

4.4 Printing🔗

 (require a86/printer) package: a86

procedure

(asm-display is)  void?

  is : (listof instruction?)
Prints an a86 program to the current output port in Intel syntax.

Examples

> (asm-display (prog (Global 'entry)
                     (Label 'entry)
                     (Mov 'rax 42)
                     (Ret)))

        .intel_syntax noprefix

        .text

        .global "entry"

"entry":

        mov rax, 42

        ret

procedure

(asm-string is)  string?

  is : (listof instruction?)
Converts an a86 program to a string in Intel syntax.

Examples

> (asm-string (prog (Global 'entry)
                    (Label 'entry)
                    (Mov 'rax 42)
                    (Ret)))

"        .intel_syntax noprefix\n        .text\n        .global \"entry\"\n\"entry\":\n        mov rax, 42\n        ret\n"

4.5 Interpreting🔗

 (require a86/interp) package: a86

4.5.1 Running assembly programs🔗

It is possible to run a86 Programs from within Racket using asm-interp.

If you have code written in a86 that you would like to execute directly, you should instead use the printing facilities to save the program to a file and then use an external assembler (e.g. clang) and linker to produce either object files or executables. It’s possible to use The Racket Foreign Interface to interact with those files from within Racket.

The simplest form of interpreting an a86 program is to use asm-interp.

procedure

(asm-interp is ...)  integer?

  is : (or/c instruction? (listof instruction?))
Assemble, link, and execute an a86 program.

Examples

> (asm-interp (prog (Global 'entry)
                    (Label 'entry)
                    (Mov 'rax 42)
                    (Ret)))

42

Programs do not have to start with a label named 'entry. The interpreter will jump to whatever the first label in the program is (which must be declared Global):

Examples

> (asm-interp (prog (Global 'f)
                    (Label 'f)
                    (Mov 'rax 42)
                    (Ret)))

42

As a convenience, asm-interp accepts any number of arguments that are either instructions or lists of instructions and it will splice them together to form a program, like seq:

Examples

> (asm-interp (Global 'f)
              (Label 'f)
              (Mov 'rax 42)
              (Ret))

42

As another convenience, if the first defined label of the instructions given to asm-interp is not declared Global or there is no first defined label, asm-interp will generate a globally defined label at the beginning of the instructions and start executing there:

Examples

> (asm-interp (Mov 'rax 42)
              (Ret))

42

With the exception of these conveniences, the argument of asm-interp should form a complete, well-formed a86 program in the sense of prog.

While this library tries to make assembly syntax errors impossible, it is possible—quite easy, in fact—to write well-formed, but erroneous assembly programs. For example, this program tries to jump to null, which causes a segmentation fault:

Examples

> (asm-interp (Mov rax 0)
              (Jmp rax))

invalid memory reference.  Some debugging context lost

4.5.2 Resolving external labels🔗

It is often the case that we want our assembly programs to interact with the oustide or to use functionality implemented in other programming languages. For that reason, it is possible to link in object files to the running of an a86 program.

The mechanism for controlling which objects should be linked in is a parameter called current-objects, which contains a list of paths to object files which are linked to the assembly code when it is interpreted.

parameter

(current-objects)  (listof path-string?)

(current-objects objs)  void?
  objs : (listof path-string?)
 = '()
Parameter that controls object files that will be linked in to assembly code when running asm-interp.

For example, suppose there’s a GCD function in C:

gcd.c

int gcd(int n1, int n2) {
    return (n2 == 0) ? n1 : gcd(n2, n1 % n2);
}

First, compile the program to an object file:

shell

> gcc -fPIC -c gcd.c -o gcd.o

The option -fPIC is important; it causes the C compiler to emit “position independent code,” which is what enables Racket to dynamically load and run the code.

Once the object file exists, using the current-objects parameter, we can run code that uses things defined in the C code:

Examples

> (parameterize ((current-objects (list "gcd.o")))
    (asm-interp (Extern 'gcd)
                (Mov 'rdi 11571)
                (Mov 'rsi 1767)
                (Sub 'rsp 8)
                (Call 'gcd)
                (Add 'rsp 8)
                (Ret)))

57

Note that if you forget to set current-objects, you will get a linking error saying a symbol is undefined:

Examples

> (asm-interp (Extern 'gcd)
              (Mov 'rdi 11571)
              (Mov 'rsi 1767)
              (Sub 'rsp 8)
              (Call 'gcd)
              (Add 'rsp 8)
              (Ret))

jit-call: Symbols not found: [ gcd ]

lookup of label 'label_init6775_38bf' failed: Failed to

materialize symbols: { (a86_prog_72, { label_init6775_38bf

}) }

Sometimes that other programming language we want our assembly programs to interact with is Racket. In this case, we can actually resolve external symbols in the assembly code to Racket values.

For example, suppose there’s a GCD function in Racket:

Examples

> (define (gcd n1 n2)
    (if (zero? n2)
        n1
        (gcd n2 (modulo n1 n2))))

We can define a host-gcd function, which essentially attaches a C-style type declaration (using bindings from the FFI) to this function and the external symbol 'gcd:

struct

(struct extern (name value type))

  name : symbol?
  value : any/c
  type : ctype?
Structure for representing a Racket-hosted external.

Examples

> (require ffi/unsafe)
> (define host-gcd
    (extern 'gcd gcd (_fun _int64 _int64 -> _int64)))

Then we can inform the interpreter to resolve the 'gcd external label to host-gcd by using the current-externs parameter.

parameter

(current-externs)  (listof extern?)

(current-externs externs)  void?
  externs : (listof extern?)
 = '()
Parameter that controls Racket-hosted externs that will be linked in to assembly code when running asm-interp.

Examples

> (parameterize ([current-externs (list host-gcd)])
    (asm-interp (Extern 'gcd)
                (Mov 'rdi 11571)
                (Mov 'rsi 1767)
                (Sub 'rsp 8)
                (Call 'gcd)
                (Add 'rsp 8)
                (Ret)))

57

procedure

(asm-interp/io is in)  (cons integer? string?)

  is : (listof instruction?)
  in : string?
Like asm-interp, but uses in for input and produce the result along with any output as a string.