3 a86: a Little Assembly Language
You need to let the little things that would ordinarily bore you suddenly thrill you.
3.1 Overview
x86 is an instruction set architecture (ISA), which is a fancy way of saying a programming language whose interpreter is implemented in hardware. Really, x86 is a family of instruction set architectures, and the first member of the family was 8086, a 16-bit ISA for Intel’s 8086 chipset.
x86 is old. It’s older than the professors teaching this class. But it lives on today in Intel and AMD based computers in the form x86-64, the 64-bit descendant of the 8086 language.
Because it’s old and because the design of each generation of x86 has had significant backwards compatability requirements and because modern processors are sophisticated pieces of machinery, the x86-64 language is, well, complicated. For example, Intel’s x86 software developer’s manual is 5,066 pages long. AMD’s manual is 3,242 pages.
x86-64 is going to be used as the target language of our compiler. We’ll build up a translation from a very high-level language, based on Racket, down to the very low-level langauge of x86-64.
However, that doesn’t mean we need to digest 8K+ pages of dense technical manuals. We will only use very small fraction of the x86-64 instruction set.
To make our lives easier, we will do what programming language designers often do, we will abstract the behemoth of x86-64 to a small, core language (which we call a86). Our compilers will target a86 and compiling from a86 to x86 as the last step in the compiler pipeline will be dead simple.
This chapter describes the a86 language at a high-level. See a86 Reference for a complete reference manual.
3.2 Giving x86 a try
Before describing a86, let’s take a brief look at x86.
There are a few different human-readable formats for writing x86 assembly code, but we’ll be using the one supported by the Netwide Assembler (NASM).
Here’s an example x86 program, written using nasm syntax. The program has one global label called entry, which will be the main entry point for the program. This program computes the 36th triangular number, which will reside in register rax when the code returns.
The conventions for label names differ between Mac and Linux systems. On MacOS, you need to prefix all label names with an underscore, while on Linux you do not. So on a Mac, you would use the names _entry, _tri, and _done, while on Linux you would use entry, tri, and done.
Alternatively, you can impose the underscore naming convention by passing –gprefix _ to nasm; this way you avoid having to write the underscores within the file.
default relsection .text global $entry $entry: mov rbx, 36 ; the "input" ;;; tri: a recursive function for computing nth ;;; triangular number, where n is given in rbx. $tri: cmp rbx, 0 ; if rbx = 0, done je $done push rbx ; save rbx sub rbx, 1 call $tri ; compute tri(rbx-1) in rax pop rbx ; restore rbx add rax, rbx ; result is rbx+tri(rbx-1) ret $done: ; jump here for base case mov rax, 0 ; return 0 ret
The nth triangular number is the sum of the integers from 0 to n, so the 36th triangular number is 0 + 1 + 2 + 3 + ... + 34 + 35 + 36.
This code is not intended to be a model of efficient computation, rather it demonstrates some important x86 instructions and shows how to compute in a recursive style, even at a low-level.
Without getting too bogged down in the details, here how the code works. Instructions execute one after another. There are a number of registers which can be used to hold values. This code makes use of the rax and rbx register (and some other registers are implicitly used and altered by the call, push, pop and ret instructions). The lines like entry:, tri:, and done: are not instructions, but labels – they are names for locations in the source code and can be used as the targets of jumps and calls.
Suppose we start executing at entry.
mov rbx, 36 sets the rbx register to 36.
cmp rbx 0 compares the value in register rbx to zero. Executing this instruction sets flags in the CPU, which affect subsequent “conditional” instructions. In this program, the next instruction is a conditional jump.
je done either jumps to the instruction following label done or proceeds to the next instruction, based on the state of the comparison flags. The je instruction jumps if the comparison was equal, so control jumps to done if rbx was 0 in this program. If not, the next instruction is executed.
push rbx uses memory as a stack to save the value of rbx. Under the hood this is modifying a register that holds a pointer to the stack memory location (register rsp).
sub rbx, 1 decrements rbx by 1.
call tri performs something like a function call; it uses memory as a stack to save the current location in the code (which is where control should return to after the function has completed). After saving this return pointer, it jumps to the label tri. There aren’t really functions, but this uses the stack to mimic the call-and-return mechanism of functions.
pop rbx uses the stack memory to pop off the top element and move it into rbx, adjusting the stack pointer appropriately. This has the effect of restoring rbx to the value saved earlier by the push, i.e. before the decrement and any changes done in the call to tri.
add rax, rbx updates rax to hold rax plus rbx.
ret does a “return,” i.e. it pops an address from the stack and jumps to it. In this case, the jump either returns from to a previous call to tri or to original caller of entry.
mov rax, 0 this instruction is only reached from the earlier conditional jump. It sets rax to 0. This program computes its result in rax so this is saying that when rbx (the “input”) is 0, then (the “output”) is 0.
ret does a “return,” either to a prior call to tri or the caller of entry.
Despite the lower-level mechanisms, this code computes in a way similar to a non-tail recursive definition of the tri function written in Racket:
(define (tri n) (if (= n 0) 0 (+ n (tri (sub1 n))))) (tri 36)
As an exercise to check your understanding, try writing a tail-recursive version of tri and the corresponding x86 code, which should not need to push anything to the stack or use the call instruction.
We can compile the tri.s assembly program to an object file with nasm:
The format argument should be macho64 on macOS and elf64 on Linux. The –gprefix _ argument should be given on macOS in order to follow the system’s naming convention requirements.
shell
> nasm -f elf64 -o tri.o tri.s
To run the object file, we will need to link with a small C program that can call the entry label of our assembly code and then print the result:
#include <stdio.h> #include <inttypes.h> int64_t entry(); int main(int argc, char** argv) { int64_t result = entry(); "%" PRId64 "\n", result); printf(return 0; }
Notice that from the C program’s perspective, the assembly code defines what looks like a C function called entry that returns an int64_t result.
How does this work? When the C program calls entry it places a return pointer on the stack and jumps to entry. The fact that we decided to put the result in register rax was not arbitrary – that’s the register that by convention is used for a return value. When the assembly code executes it’s final ret instruction, it jumps back to C with the 36th triangular number in rax, which the C side knows is the return value. This convention is part of a larger set of conventions known as the Application Binary Interface. For a reference, see the Texts section of the notes.
We can compile the main.c C program to an object file with gcc:
shell
> gcc -c main.c -o main.o
Now we can make an executable by linking the two together:
shell
> gcc main.o tri.o -o tri /usr/bin/ld: warning: tri.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
Finally, we can run the executable:
shell
> ./tri 666
There, of course, is a lot more to x86-64 than what’s been shown here. If you want to dig deeper, check the references in Texts. But now let’s turn to a86.
3.3 a86: Representing x86 Code as Data
Here we will employ one of the great ideas in computer science: we will represent programs as data. Rather than toil away at the level of x86, writing programs directly in nasm syntax, compiling, and running them, we will instead design a data type definition for representing x86 programs and compute programs.
Our representation will not be complete – this is going to help us simplify x86 down to something manageable. We won’t cover every instruction and we won’t cover every variation of the instructions we do cover.
An a86 program is a list of a86 instructions. Each instruction is represented as a structure, described in the following section.
Before working through these examples, you’ll need to install the a86 module, part of the langs package for this course. See The langs package for details on installing.
Here’s the triangular number example:
Examples
; import the a86 library > (require a86) ; a86 code that computes the 36th triangular number
> (define tri-36 (list (Global 'entry) (Label 'entry) (Mov 'rbx 36) (% "the \"input\"") (%%% "tri: a recursive function for computing nth") (%%% "triangular number, where n is given in rbx.") (Label 'tri) (Cmp 'rbx 0) (% "if rbx = 0, done") (Je 'done) (Push 'rbx) (% "save rbx") (Sub 'rbx 1) (Call 'tri) (% "compute tri(rbx-1) in rax") (Pop 'rbx) (% "restore rbx") (Add 'rax 'rbx) (% "result is rbx+tri(rbx-1)") (Ret) (Label 'done) (% "jump here for base case") (Mov 'rax 0) (% "return 0") (Ret)))
And who doesn’t love more parentheses?
So for example, let’s say you have two a86 programs and you want to glue them together into one: well that just append. Suppose you want to compute which registers are used in a given a86 program? Suppose you want to replace uses of 'rax with 'rdi? It just a matter of writing the right Racket function to do it.
Here’s another immediate benefit. Instead of writing a single x86 program, let’s write an infinite set of a86 programs, one for computing each nth triangular number. Easy-peasy:
Examples
; Natural -> a86 ; Computes a86 code that computes the nth triangular number
> (define (tri n) (list (Global 'entry) (Label 'entry) (Mov 'rbx n) (% "the \"input\"") (%%% "tri: a recursive function for computing nth") (%%% "triangular number, where n is given in rbx.") (Label 'tri) (Cmp 'rbx 0) (% "if rbx = 0, done") (Je 'done) (Push 'rbx) (% "save rbx") (Sub 'rbx 1) (Call 'tri) (% "compute tri(rbx-1) in rax") (Pop 'rbx) (% "restore rbx") (Add 'rax 'rbx) (% "result is rbx+tri(rbx-1)") (Ret) (Label 'done) (Mov 'rax 0) (Ret))) ; recreate original program > (define tri-36 (tri 36))
It’s also easy to go from our data representation to its interpretation as an x86 program.
There is a function provided for printing an a86 program as an x86 program using nasm notation, called asm-display. Calling this function prints to the current output port, but it’s also possible to write the output to a file or convert it to a string.
Examples
> (asm-display (tri 36))
default rel
section .text
global $entry
$entry:
mov rbx, 36 ; the "input"
;;; tri: a recursive function for computing nth
;;; triangular number, where n is given in rbx.
$tri:
cmp rbx, 0 ; if rbx = 0, done
je $done
push rbx ; save rbx
sub rbx, 1
call $tri ; compute tri(rbx-1) in rax
pop rbx ; restore rbx
add rax, rbx ; result is rbx+tri(rbx-1)
ret
$done:
mov rax, 0
ret
Notice how this generates exactly what you saw in tri.s.
From here, we can assemble, link, and execute.
We can also, since we have a general purpose programming language at our disposal in the meta-language, write a program to do all that for us, which what the implementors of the a86 library have done:
Examples
> (asm-interp (tri 36)) 666
The asm-interp function consumes an a86 program as input and produces the integer result the program computes, i.e. it is an Interpreter for a86. Behind the scenes it does this by converting to nasm, assemblying, compiling a thin C wrapper, executing the program, and reading the result. This will be a rather handy tool both in interactively exploring the a86 language (you can write assembly in a REPL), but also an important tool when it comes time to test the compilers we write.
There is more to a86, which you can find documented in the a86 Reference, although we try to introduce features of a86 as we encounter them.