10 Evildoer: change the world a couple nibbles at a time
Warning: Side effects may include itching, burning, oozing, weeping. Not intended for heart patients and those with nervous disorders.
10.1 Reading and writing bytes
So far, the languages we’ve consider have had the following property: the result of evaluation is determined entirely as a function of the expression being evaluated. Which is to say, the meaning of a program is determined entirely by the text of the program. This is a nice property in that it makes reasoning about programs pretty clear cut: if you know the program, you can know exactly what it computes. However, many real-world programs (like the very compiler we are writing!) do not have this property. Instead they interact with the outside world and compute results based on the state of the world.
For example, consider the compile-stdin.rkt program, which reads the contents of stdin and compiles it. The meaning of this program depends on the state of input port. Similarly, it prints out assembly code to the standard output port. So not only does this program depend on the outside world, it changes it too.
Let’s design a language that has a simple mechanism for interacting with the outside world. It will be able to read and write a byte of information at a time (i.e. an integer between 0 and 256) from the standard input port and output port, respectively.
We’ll call it Evildoer.
To the syntax of expressions, we add the following operations:
write-byte : Byte -> Void: writes given byte to stdout, produces nothing.
read-byte : -> Byte or EOF: reads a byte from stdin, if there is one, EOF otherwise.
peek-byte : -> Byte or EOF: peeks a byte from stdin, if there is one, EOF otherwise.
These operations will behave like their Racket counterparts.
To complement these operations, we add two new values:
void: a value used to indicate something has been done for effect only and has no useful result.
eof: a value used to indicate the end of an input port has been reached.
The void value arises as the result of an expression that is evaluated for effect only, such as write-byte. The eof value arises as the result of reading the end of a port. So that these values can be accessed more directly, we’ll also add:
eof : EOF bound to the end-of-file value, and
void : -> Void a function that produces the void value.
In order to recognize the end-of-file value, we add the following predicate (just as in Racket):
eof-object? : Any -> Boolean: determines if argument is the eof value.
Finally, we add a simple sequencing construct to first evaluate an expression for effect and then evaluate another expression for its result.
(begin e0 e1): evaluates e0, then e1.
Abstract syntax and parsing is done as you would expect. Since we now have primitive operations that take 0 arguments, we split the Prim constructor into Prim0 and Prim1.
#lang racket (provide Lit Prim0 Prim1 If Eof Begin) ;; type Expr = (Lit Datum) ;; | (Eof) ;; | (Prim0 Op0) ;; | (Prim1 Op1 Expr) ;; | (If Expr Expr Expr) ;; | (Begin Expr Expr) ;; type Datum = Integer ;; | Boolean ;; | Character ;; type Op0 = 'read-byte | 'peek-byte | 'void ;; type Op1 = 'add1 | 'sub1 ;; | 'zero? ;; | 'char? | 'integer->char | 'char->integer ;; | 'write-byte | 'eof-object? (struct Eof () #:prefab) (struct Lit (d) #:prefab) (struct Prim0 (p) #:prefab) (struct Prim1 (p e) #:prefab) (struct If (e1 e2 e3) #:prefab) (struct Begin (e1 e2) #:prefab)
The s-expression parser is defined as follows:
#lang racket (provide parse) (require "ast.rkt") ;; S-Expr -> Expr (define (parse s) (match s ['eof (Eof)] [(? datum?) (Lit s)] [(list-rest (? symbol? k) sr) (match k [(? op0? o) (match sr ['() (Prim0 o)] [_ (error "op0: bad syntax" s)])] [(? op1? o) (match sr [(list s1) (Prim1 o (parse s1))] [_ (error "op1: bad syntax" s)])] ['begin (match sr [(list s1 s2) (Begin (parse s1) (parse s2))] [_ (error "begin: bad syntax" s)])] ['if (match sr [(list s1 s2 s3) (If (parse s1) (parse s2) (parse s3))] [_ (error "if: bad syntax" s)])] [_ (error "parse error" s)])] [_ (error "parse error" s)])) ;; Any -> Boolean (define (datum? x) (or (exact-integer? x) (boolean? x) (char? x))) ;; Any -> Boolean (define (op0? x) (memq x '(read-byte peek-byte void))) (define (op1? x) (memq x '(add1 sub1 zero? char? integer->char char->integer write-byte eof-object?)))
Examples
> (parse 'eof) '#s(Eof)
> (parse '(void)) '#s(Prim0 void)
> (parse '(read-byte)) '#s(Prim0 read-byte)
> (parse '(peek-byte)) '#s(Prim0 peek-byte)
> (parse '(write-byte 97)) '#s(Prim1 write-byte #s(Lit 97))
> (parse '(eof-object? eof)) '#s(Prim1 eof-object? #s(Eof))
> (parse '(begin (write-byte 97) (write-byte 98)))
'#s(Begin
#s(Prim1 write-byte #s(Lit 97))
#s(Prim1 write-byte #s(Lit 98)))
10.2 Reading and writing bytes in Racket
Racket has a Byte data type that is not disjoint from other datatypes; it’s simply an integer in (0,256]. The operations read-byte and write-byte read and write, respectively, a byte to or from stdin or stdout (by default).
Let’s look at an example of write-byte:
Examples
> (write-byte 97) a
> (write-byte 109) m
A byte, when written corresponds to an ASCII character, which is why you see a for 97 and m for 109.
A subtle, but crucial point here that these expressions are printing, i.e. writing bytes to stdout. But they don’t produce any value. Or more precisely, they print and then produce a value that indicates "no useful value has been produced." In OCaml, this value is called "unit"; in Racket, it’s called "void." When the REPL is given an expression that produces void as its result, it doesn’t show anything. Here’s any example that uses the Racket void function, which simply returns the void value:
Examples
> (void) > (+ 2 3) 5
> (void) > (void) > "fred" "fred"
It’s important to note that void is just a value like any other value; it’s not literally "nothing." It’s just handled specially by the REPL in this case by not showing anything. Were we to put the void value within another value, the REPL shows it:
Examples
> (define xs (list (void) (void) 3 (void))) > xs '(#<void> #<void> 3 #<void>)
> (length xs) 4
> (first xs) > (void? (first xs)) #t
So what write-byte is doing is printing something and producing void. If we were to sequence write-byte using begin, would print something and produce a non-void value:
Examples
> (begin (write-byte 97) #t) a
#t
Notice how the REPL in the notes is helpfully using color to distinguish the printed output from the program and the result of the program.
Now’s let’s look at read-byte. It takes no arguments and reads a byte from stdin. If there’s no more bytes on stdin, it produces eof. Its cousin peek-byte also gets a byte from stdin, but it leaves the stream intact so that the same byte would be read the next time around.
Now, making examples of read-byte is a bit more tricky. While write-byte interacts with the outside world by printing something to stdout, what it prints is determined by the program text. On the other hand, what read-byte reads depends on what’s in the stdin stream. If you launch Racket and type (read-byte), Racket will then block, waiting for you to type something, so that a byte can be read. It will produce whatever the first byte of what you type is.
So how can I make examples?
One option is to use the operating system shell to “pipe” output from one program as input to the Racket process, which will then read that data as it’s input. Together with printf program, we can write the data we want the Racket program to see on stdin:
shell
> printf 'hello' | racket -e '(read-byte)' 104
shell
> printf 'hello' | racket -e '(list (read-byte) (read-byte))' '(104 101)
If we pipe the empty string, the program will produce the eof value:
shell
> printf '' | racket -e '(read-byte)' #<eof>
Another possibility is to use a similar mechanism but from within Racket. The with-input-from-string uses a given string as the data available on stdin. Then (read-byte) will read from this data:
Examples
> (with-input-from-string "hello" (λ () (read-byte))) 104
> (with-input-from-string "hello" (λ () (list (read-byte) (read-byte)))) '(104 101)
> (with-input-from-string "" (λ () (read-byte))) #<eof>
This uses with-input-from-string which takes a string and a zero-argument function. It then installs the contents of the string in stdin as though you had typed this data, then invokes the function, thereby running a computation with a predetermined input stream.
There’s a matching with-output-to-string function that takes a zero-argument function and runs it in a way that collects the output and produces it as a string. This let’s you capture what was printed and turn it in to a value that is produced:
Examples
> (with-output-to-string (λ () (write-byte 104))) "h"
These facilities will be useful for making examples, but also for writing test cases, since we can set up the state of the outside world and capture changes as values, which we can then use to assert the expected behavior:
Examples
> (check-equal? (with-input-from-string "hello" (λ () (with-output-to-string (λ () (write-byte (read-byte)))))) "h")
10.3 Meaning of Evildoer programs
Here’s an interpreter for Evildoer:
evildoer/interpreter/interp.rkt
#lang racket (provide interp) (require "../syntax/ast.rkt") (require "interp-prim.rkt") ;; type Value = ;; | Integer ;; | Boolean ;; | Character ;; | Eof ;; | Void ;; Expr -> Value (define (interp e) (match e [(Lit d) d] [(Eof) eof] [(Prim0 p) (interp-prim0 p)] [(Prim1 p e) (interp-prim1 p (interp e))] [(If e1 e2 e3) (if (interp e1) (interp e2) (interp e3))] [(Begin e1 e2) (begin (interp e1) (interp e2))]))
The interpretation of primitives relies on the underlying implementations read-byte, write-byte, etc. from Racket (just like it does for all the other operations):
evildoer/interpreter/interp-prim.rkt
#lang racket (provide interp-prim0 interp-prim1) ;; Op0 -> Value (define (interp-prim0 op) (match op ['read-byte (read-byte)] ['peek-byte (peek-byte)] ['void (void)])) ;; Op1 Value -> Value (define (interp-prim1 op v) (match op ['add1 (add1 v)] ['sub1 (sub1 v)] ['zero? (zero? v)] ['char? (char? v)] ['integer->char (integer->char v)] ['char->integer (char->integer v)] ['write-byte (write-byte v)] ['eof-object? (eof-object? v)]))
Interpreting a program that reads and writes will itself read and write:
Examples
> (interp (parse '(write-byte 104))) h
> (with-input-from-string "hello" (λ () (interp (parse '(write-byte (read-byte)))))) h
Using with-input-from-string and with-output-to-string, we can also build a useful utility for interpreting programs with strings representing stdin and stdout:
evildoer/interpreter/interp-io.rkt
#lang racket (provide interp/io) (require "interp.rkt") ;; String Expr -> (Cons Value String) ;; Interpret e with given string as input, ;; return value and collected output as string (define (interp/io e in) (parameterize ((current-output-port (open-output-string)) (current-input-port (open-input-string in))) (cons (interp e) (get-output-string (current-output-port)))))
Examples
> (interp/io (parse '(write-byte 104)) "") '(#<void> . "h")
> (interp/io (parse '(write-byte (read-byte))) "hello") '(#<void> . "h")
This is useful to write tests about programs that have side-effects because we can turn what was an effectful computation into a pure one:
Examples
> (check-equal? (interp/io (parse '(write-byte (read-byte))) "hello") (cons (void) "h"))
10.4 Encoding values in Evildoer
With new values, namely the void and eof values, comes the need to add new bit encodings. So we add new encodings for eof and void, for which we simply pick two unused bit patterns: #b1011 and #b1111, respectively.
#lang racket (provide (all-defined-out)) (define int-shift 1) (define mask-int #b1) (define char-shift 2) (define type-int #b0) (define type-char #b01) (define mask-char #b11) ;; Value -> Integer (define (value->bits v) (cond [(eq? v #t) #b011] [(eq? v #f) #b111] [(eq? v eof) #b1011] [(eq? v (void)) #b1111] [(integer? v) (arithmetic-shift v int-shift)] [(char? v) (bitwise-ior type-char (arithmetic-shift (char->integer v) char-shift))])) (define (int-bits? v) (= type-int (bitwise-and v mask-int))) (define (char-bits? v) (= type-char (bitwise-and v mask-char)))
10.5 Detour: Calling external functions
Some aspects of the Evildoer compiler will be straightforward, e.g., adding eof, (void), eof-object?, etc. There’s conceptually nothing new going on there. But what about read-byte, write-byte and peek-byte? These will require a new set of tricks to implement.
We have a couple of options for how to approach these primitives:
generate assembly code for issuing operating system calls to do I/O operations, or
assume some code in the stand-alone runtime system or the host system will provide the functionality to I/O and have the compiler emit calls to those external functions.
The first option will require looking up details for system calls on the particular operating system in use, generating code to make those calls, and adding logic to check for errors. For the second option, we can leave that up to the runtime or host system and produce code that calls this code. This is the simpler approach and the one we adopt.
In order to call code potentially written in a different language and compiled by a different compiler, we have to conform to the System V ABI calling convention, which is essentially an agreement on how calls should work (how are arguments passed, how are return values returned, etc.) that enables calls to work at the level of assembly code.
Up to this point, we’ve seen how C code can call code written in assembly as though it were a C function. To go the other direction, we need to explore how to make calls to functions written in C (or whatever else) from assembly. Let’s look at that now.
If you haven’t already, be sure to read up on how calls work in a86: a Little Assembly Language.
Let’s start of by assuming our runtime system or host is going to provide a gcd function for computing the greatest common divisor of two integers given as arguments to the function.
Now, how can we call gcd from aseembly code? Just as there is a convention that a return value is communicated through rax, there are conventions governing the communication of arguments. The conventions are known as an Application Binary Interface or ABI. The set of conventions we’re following is called the System V ABI, and it used by Unix variants like Mac OS, Linux, and BSD systems. (Windows follows a different ABI.)
The convention for arguments is that the first six integer or pointer parameters are passed in the registers rdi, rsi, rdx, rcx, r8, r9. Additional arguments and large arguments such as structs are passed on the stack.
So we will pass the two arguments of gcd in registers rdi and rsi, respectively, then we use the Call instruction to call gcd. Suppose we want to compute gcd(36,60):
Examples
> (define example (prog (Global 'entry) (Label 'entry) (Mov 'rdi 36) ; first arg (Mov 'rsi 60) ; second arg (Sub 'rsp 8) ; align stack to 16-bytes (Extern 'gcd) (Call 'gcd) ; call extern gcd (Sal 'rax int-shift) ; convert result to a value (Add 'rsp 8) ; restore stack (Ret)))
A few things to notice in the above code:
The label 'gcd is declared to be external with Extern, this means the label is used but not defined in this program. We placed this declaration immediately before the Call instruction, but it can appear anywhere in the program.
The stack pointer register rsp is decremented by 8 before the Call instruction and then incremented by 8 after the call returns. This is to ensure the stack is aligned to 16-bytes for call; a requirement of the System V ABI.
For consistency with our run-time system, we return the result encoded as an integer value, which is accomplished by shifting the result in rax to the left by 1.
We could attempt to run this program with asm-interp, but it will complain about gcd being an undefined label. That makes sense since we haven’t actually said what we want 'gcd to mean.
Examples
> (bits->value (asm-interp example)) jit-call: Symbols not found: [ gcd ]
lookup of label 'entry' failed: Failed to materialize
symbols: { (a86_prog_170, { entry }) }
The idea is that either our standalone runtime could provide the 'gcd function, perhaps written in C, or the host language Racket could provide the functionality. Let’s look at both approaches.
10.5.1 Calling host functions from asm-interp
From within Racket it is possible to provide function definitions for external labels when running code with asm-interp. The way this works is we parameterize the assembly interpeter’s current set of external labels and provide:
the name of the label we want to link,
a Racket function that will be called when the label is called,
a function signatures for the Racket function that describes how the arguments and return values are encoded and decode as Racket values.
Examples
> (require ffi/unsafe) ; for _fun, _int64, etc.
> (parameterize ([current-externs (list (extern 'gcd gcd (_fun _int64 _int64 -> _int64)))]) (bits->value (asm-interp example))) 12
Here we are declaring that the Racket gcd function can be called using the 'gcd label following the System V ABI calling convention. The bit arguments are automatically converted to Racket integers when called, and the Racket integer result is convert to bits when gcd returns.
10.5.2 Calling functions in the stand-alone runtime system
Host provided functions are useful for testing and exploration, but for a stand-alone executable, we want the runtime system to provide that functionality. The compiler doesn’t change, but instead we will link the object code produced by the compiler with a definition of gcd.
First, let’s write that function in C.
#include <inttypes.h> int gcd(int64_t n1, int64_t n2) { return (n2 == 0) ? n1 : gcd(n2, n1 % n2); }
We can compile this into an object file:
shell
> clang -c gcd.c
Now let’s save our example program as an assembly file:
Examples
> (with-output-to-file "example.s" (λ () (asm-display example)) #:exists 'truncate)
Now we can assemble it into an object file, link the objects together to make an executable, and then run it:
shell
> clang -c example.s > clang -c gcd.c > make -C runtime make[1]: Entering directory '/home/runner/work/cmsc430.github.io/cmsc430.github.io/langs/evildoer/runtime' clang -fPIC -g -c -o main.o main.c clang -fPIC -g -c -o print.o print.c clang -fPIC -g -c -o values.o values.c clang -fPIC -g -c -o io.o io.c ld -r main.o print.o values.o io.o -o runtime.o make[1]: Leaving directory '/home/runner/work/cmsc430.github.io/cmsc430.github.io/langs/evildoer/runtime' > clang example.o gcd.o runtime/runtime.o -o example /usr/bin/ld: warning: example.o: missing .note.GNU-stack section implies executable stack /usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker > ./example 12
So now we’ve seen the essence of how to call functions from assembly code, which opens up an implementation strategy for implementing features: write C code as part of the run-time system and call it from the compiled code.
10.6 Hosted I/O primitives
We’ve now seen how functionality can be made available to compiled programs, either as Racket hosted functions with asm-interp or as functions linked into the stand-alone run-time system.
Our approach to adding read-byte, write-byte and peek-byte is going to be to assume we have functions read_byte, write_byte, and peek_byte available, either from the host or runtime, and have the compiler generate calls to these functions according to the System V ABI calling convention. Both read_byte and peek_byte will take no arguments and return either a byte value or the eof value, while write_byte takes a byte value and returns the eof value.
Let us first use the hosted approach, then develop the compiler, then we can return to issue of the stand-alone runtime system.
To host these primitives, we will make the Racket read-byte, write-byte, and peek-byte callable from assembly code. The only problem is that these function produce and consume Racket values, not Evildoer values encoded as bits. We can solve this issue by wrapping the functions to decode and encode on the way in and out. For example:
Examples
> (define (prim-read-byte) (value->bits (read-byte)))
> (define (prim-peek-byte) (value->bits (peek-byte)))
> (define (prim-write-byte bs) (value->bits (write-byte (bits->value bs))))
Now it’s important to observe that these functions always return bit encodings of values, which is exactly what our caller code will expect.
Examples
> (with-input-from-string "hello" (lambda () (prim-read-byte))) 208
> (with-input-from-string "" (lambda () (prim-read-byte))) 11
And if we want to call prim-write-byte, we need to provide a bit encoding of a byte.
Examples
> (with-output-to-string (lambda () (prim-write-byte (value->bits 97)))) "a"
But now we are in a good position to enrich asm-interp with this functionality. Let’s make this part of the language implementation. We’ll define a asm-interp/host function that includes all of the functionality we expect for the Evildoer language.
#lang racket (require a86/interp) (require ffi/unsafe) (require "decode.rkt") (require "../runtime/types.rkt") (provide (all-defined-out)) (define (prim-read-byte) (value->bits (read-byte))) (define (prim-peek-byte) (value->bits (peek-byte))) (define (prim-write-byte bs) (value->bits (write-byte (bits->value bs)))) (define (asm-interp/host asm) (parameterize ([current-externs (list (extern 'read_byte prim-read-byte (_fun -> _int64)) (extern 'peek_byte prim-peek-byte (_fun -> _int64)) (extern 'write_byte prim-write-byte (_fun _int64 -> _int64)))]) (asm-interp asm)))
Now let’s write an assembly program that calls some of these functions.
Examples
> (define read-write-byte (prog (Global 'entry) (Extern 'read_byte) (Extern 'write_byte) (Label 'entry) (Sub rsp 8) (Call 'read_byte) (Mov rdi rax) (Call 'write_byte) (Add rsp 8) (Ret)))
This program reads a byte with read_byte and then immediately writes that byte with write_byte and returns, so the return value should be void since that’s what will be in the rax register at the end.
Examples
> (with-input-from-string "hello" (lambda () (asm-interp/host read-write-byte))) h
15
Notice that this prints "h" and return 15.
10.7 A Compiler for Evildoer
Implementing eof, void, eof-object? and begin are all straightfoward and don’t really involve anything new.
For peek-byte, read-byte, and write-byte, we generate code that calls the appropriate external function. In the case of write-byte, we arrange for the byte that we’d like to write to be in rdi before the call.
Finally, since the emitted code is potentially issuing calls to external functions, we make sure to align the stack to 16-bytes. Rather than do this at each call site, we take advantage of the fact that no other stack changes occur and adjust the stack just once at the entry and exit of the code.
The top-level compiler:
#lang racket (provide compile compile-e) (require "../syntax/ast.rkt") (require "compile-ops.rkt") (require "../runtime/types.rkt") (require a86/ast a86/registers) ;; Expr -> Asm (define (compile e) (prog (Global 'entry) (Label 'entry) (Sub rsp 8) (compile-e e) (Add rsp 8) (Ret))) ;; Expr -> Asm (define (compile-e e) (match e [(Lit d) (compile-datum d)] [(Eof) (seq (Mov rax (value->bits eof)))] [(Prim0 p) (compile-prim0 p)] [(Prim1 p e) (compile-prim1 p e)] [(If e1 e2 e3) (compile-if e1 e2 e3)] [(Begin e1 e2) (compile-begin e1 e2)])) ;; Datum -> Asm (define (compile-datum d) (seq (Mov rax (value->bits d)))) ;; Op0 -> Asm (define (compile-prim0 p) (compile-op0 p)) ;; Op1 Expr -> Asm (define (compile-prim1 p e) (seq (compile-e e) (compile-op1 p))) ;; Expr Expr Expr -> Asm (define (compile-if e1 e2 e3) (let ((l1 (gensym 'if)) (l2 (gensym 'if))) (seq (compile-e e1) (Cmp rax (value->bits #f)) (Je l1) (compile-e e2) (Jmp l2) (Label l1) (compile-e e3) (Label l2)))) ;; Expr Expr -> Asm (define (compile-begin e1 e2) (seq (compile-e e1) (compile-e e2)))
The primitive operation compiler:
evildoer/compiler/compile-ops.rkt
#lang racket (provide compile-op0 compile-op1) (require "../syntax/ast.rkt") (require "../runtime/types.rkt") (require a86/ast a86/registers) ;; Op0 -> Asm (define (compile-op0 p) (match p ['void (seq (Mov rax (value->bits (void))))] ['read-byte (seq (Extern 'read_byte) (Call 'read_byte))] ['peek-byte (seq (Extern 'peek_byte) (Call 'peek_byte))])) ;; Op1 -> Asm (define (compile-op1 p) (match p ['add1 (Add rax (value->bits 1))] ['sub1 (Sub rax (value->bits 1))] ['zero? (seq (Cmp rax 0) if-equal)] ['char? (seq (And rax mask-char) (Cmp rax type-char) if-equal)] ['char->integer (seq (Sar rax char-shift) (Sal rax int-shift))] ['integer->char (seq (Sar rax int-shift) (Sal rax char-shift) (Xor rax type-char))] ['eof-object? (seq (Cmp rax (value->bits eof)) if-equal)] ['write-byte (seq (Extern 'write_byte) (Mov rdi rax) (Call 'write_byte))])) ;; Asm ;; set rax to #t or #f if comparison flag is equal (define if-equal (seq (Mov rax (value->bits #f)) (Mov r9 (value->bits #t)) (Cmove rax r9)))
Notice how expressions like (read-byte) and (write-byte) compile to calls into the run-time/host system:
Examples
> (compile-op0 'read-byte) (list (Extern 'read_byte) (Call 'read_byte))
> (compile-op1 'write-byte)
(list
(Extern 'write_byte)
(Mov 'rdi 'rax)
(Call 'write_byte))
So we can write some nice high-level examples:
Examples
> (asm-interp/host (compile (parse '(write-byte 97)))) a
15
10.8 Testing and correctness
Let’s define an exec function that should be equivalent to interp. We can also define an exec/io function equivalent to interp/io that’s useful for testing programs that do I/O.
#lang racket (provide exec exec/io) (require "../compiler/compile.rkt") (require "decode.rkt") (require "host.rkt") ;; Expr -> Value (define (exec e) (bits->value (asm-interp/host (compile e)))) ;; Asm String -> (cons Value String) (define (exec/io e in) (parameterize ((current-output-port (open-output-string)) (current-input-port (open-input-string in))) (cons (exec e) (get-output-string (current-output-port)))))
And we can confirm that it works as expect:
Examples
> (exec (parse '(write-byte 97))) a
> (exec/io (parse '(write-byte 97)) "") '(#<void> . "a")
> (exec/io (parse '(write-byte (read-byte))) "a") '(#<void> . "a")
> (exec/io (parse '(eof-object? (read-byte))) "") '(#t . "")
We can now state the correctness property we want of the compiler:
Compiler Correctness: For all e ∈ Expr, i, o ∈ String, and v ∈ Value, if (interp/io e i) equals (cons v o), then (exec/io e i) equals (cons v o).
Testing compiler correctness is updated as follows, notice it now takes an additional parameter representing the state of the input stream:
#lang racket (provide check-compiler) (require rackunit) (require "interpreter/interp-io.rkt") (require "executor/exec.rkt") (require "compiler/compile.rkt") ;; Expr String -> Void (define (check-compiler e i) (let ((r (with-handlers ([exn:fail? identity]) (interp/io e i)))) (unless (exn? r) (check-equal? r (exec/io e i)))))
(check-compiler (parse '(void)) "") (check-compiler (parse '(read-byte)) "a") (check-compiler (parse '(write-byte 97)) "")
The random-expr function generates random expressions and random-well-defined-expr generates random expressions that are guaranteed to be well-defined, as usual. Additionally, the random-input function produces a random string that can be used as the input.
(require "syntax/random.rkt") (random-expr) (random-well-defined-expr) (random-input)
Together, these can be used to randomly test the correctness of the compiler:
(for ((i 100)) (check-compiler (random-expr) (random-input))) (for ((i 100)) (check-compiler (random-well-defined-expr) (random-input)))
10.9 Stand-alone runtime system for Evildoer
With new values comes the need to add new bit encodings. So we add new encodings for eof and void:
#ifndef TYPES_H #define TYPES_H /* Bit layout of values Values are either: - Integers: end in #b0 - Characters: end in #b01 - True: #b11 - False: #b111 - Eof: #b1011 - Void: #b1111 */ #define int_shift 1 #define int_type_mask ((1 << int_shift) - 1) #define int_type_tag (0 << (int_shift - 1)) #define nonint_type_tag (1 << (int_shift - 1)) #define char_shift (int_shift + 1) #define char_type_mask ((1 << char_shift) - 1) #define char_type_tag ((0 << (char_shift - 1)) | nonint_type_tag) #define nonchar_type_tag ((1 << (char_shift - 1)) | nonint_type_tag) #define val_true ((0 << char_shift) | nonchar_type_tag) #define val_false ((1 << char_shift) | nonchar_type_tag) #define val_eof ((2 << char_shift) | nonchar_type_tag) #define val_void ((3 << char_shift) | nonchar_type_tag) #endif
The interface for the run-time system is extended to include file pointers for the input and output ports:
#ifndef RUNTIME_H #define RUNTIME_H #include "values.h" val_t entry(); #endif /* RUNTIME_H */
The main entry point for the run-time sets up the input and output pointers to point to stdin and stdout and is updated to handle the proper printing of a void result:
#include <stdio.h> #include "values.h" #include "print.h" #include "runtime.h" int main(int argc, char** argv) { val_t result = entry(); print_result(result); if (val_typeof(result) != T_VOID) putchar('\n'); return 0; }
But the real novelty of the Evildoer run-time is that there will be new functions that implement read-byte, peek-byte, and write-byte; these will be C functions called read_byte, peek_byte and write_byte:
#include <stdio.h> #include <inttypes.h> #include "types.h" #include "values.h" #include "runtime.h" val_t read_byte(void) { char c = getc(stdin); return (c == EOF) ? val_wrap_eof() : val_wrap_byte(c); } val_t peek_byte(void) { char c = getc(stdin); ungetc(c, stdin); return (c == EOF) ? val_wrap_eof() : val_wrap_byte(c); } val_t write_byte(val_t c) { putc((char) val_unwrap_int(c), stdout); return val_wrap_void(); }
This functionality is implemented in terms of standard C library functions getc, ungetc, putc and the run-time system’s functions for encoding and decoding values such as val_unwrap_int, val_wrap_void, etc.