On this page:
11.1 Errors
11.2 Meaning of Extort programs
11.3 Checking and signalling errors at run-time
11.4 Run-time for Extort
11.5 Correctness, revisited
8.18

11 Extort: when errors exist🔗

image Source code.

The greatest mistake is to imagine that we never err.

    11.1 Errors

    11.2 Meaning of Extort programs

    11.3 Checking and signalling errors at run-time

    11.4 Run-time for Extort

    11.5 Correctness, revisited

11.1 Errors🔗

We have added multiple, disjoint types, but mostly swept issues of errors under the rug by considering type mismatches as meaningless. Now let’s redesign the semantics to specify the error behavior of such programs.

We’ll call it Extort.

Nothing changes in the syntax of Extort from the previous language, although we will need to talk about two kinds of results from evaluating programs: values and errors. We will say that evaluation produces an answer, which is either a value or error:

image

11.2 Meaning of Extort programs🔗

Languages adopt several approaches to type mismatches:

  • Prohibit such programs statically with a type system (e.g. OCaml, Java)

  • Coerce values to different types (e.g. JavaScript)

  • Signal a run-time error (e.g. Racket)

  • Leave the behavior unspecified (e.g. Scheme, C)

We’ve previously seen the last approach. Now let’s do what Racket does and signal an error.

The meaning of Extort programs that have type errors will now be defined as 'err. (You shouldn’t make too much out of how we choose to represent the “this program caused an error” answer; all that really matters at this point is that it is disjoint from values. Since the langauge doesn’t yet include symbols, using a symbol is a fine choice.):

In order to define the semantics, we first introduce the type of results that may be given by the interpretation function:

Examples

; type Answer = Value | 'err

Type mismatches can arise as the result of primitive operations being applied to arguments for which the primitive is undefined, so we revise interp-prim1 to check all necessary preconditions before carrying out an operation, and producing an error in case those conditions are not met:

extort/interp-prim.rkt

  #lang racket
  (provide interp-prim0 interp-prim1)
   
  ;; Op0 -> Value
  (define (interp-prim0 op)
    (match op
      ['read-byte (read-byte)]
      ['peek-byte (peek-byte)]
      ['void      (void)]))
   
  ;; Op1 Value -> Answer
  (define (interp-prim1 op v)
    (match (list op v)
      [(list 'add1 (? integer?))            (add1 v)]
      [(list 'sub1 (? integer?))            (sub1 v)]
      [(list 'zero? (? integer?))           (zero? v)]
      [(list 'char? v)                      (char? v)]
      [(list 'integer->char (? codepoint?)) (integer->char v)]
      [(list 'char->integer (? char?))      (char->integer v)]
      [(list 'write-byte    (? byte?))      (write-byte v)]
      [(list 'eof-object? v)                (eof-object? v)]
      [_ 'err]))
   
  ;; Any -> Boolean
  (define (codepoint? v)
    (and (integer? v)
         (or (<= 0 v 55295)
             (<= 57344 v 1114111))))
   
   

Within the interpreter, we update the type signature to reflect the fact that interpreting an expression produces an answer, no longer just an expression. We must also take care to observe that evaluating a subexpression may produce an error and as such it should prevent further evaluation. To do this, the interpreter is written to check for an error result of any subexpression it evaluates before proceeding to evaluate another subexpression:

extort/interp.rkt

  #lang racket
  (provide interp)
  (require "ast.rkt")
  (require "interp-prim.rkt")
   
  ;; type Value =
  ;; | Integer
  ;; | Boolean
  ;; | Character
  ;; | Eof
  ;; | Void
   
  ;; type Answer = Value | 'err
  ;; Expr -> Answer
  (define (interp e)
    (match e
      [(Lit d) d]
      [(Eof)   eof]
      [(Prim0 p)
       (interp-prim0 p)]
      [(Prim1 p e)
       (match (interp e)
         ['err 'err]
         [v (interp-prim1 p v)])]
      [(If e1 e2 e3)
       (match (interp e1)
         ['err 'err]
         [v (if v
                (interp e2)
                (interp e3))])]
      [(Begin e1 e2)
       (match (interp e1)
         ['err 'err]
         [v (interp e2)])]))
   
   

We can confirm the interpreter computes the right result for the examples given earlier:

Examples

> (interp (parse '(add1 #f)))

make[1]: Entering directory '/home/runner/work/cmsc430.github.io/cmsc430.github.io/langs/extort'

make[1]: Leaving directory '/home/runner/work/cmsc430.github.io/cmsc430.github.io/langs/extort'

'err

> (interp (parse '(zero? #t)))

'err

> (interp (parse '(if (zero? #f) 1 2)))

'err

This interpreter implicitly relies on the state of the input and output port, but we can define a pure interpreter like before, which we take as the specification of our language:

extort/interp-io.rkt

  #lang racket
  (provide interp/io)
  (require "interp.rkt")
  ;; String Expr -> (Cons Answer String)
  ;; Interpret e with given string as input,
  ;; return answer and collected output as string
  (define (interp/io e input)
    (define result (box #f))
    (define output
      (with-input-from-string input
        (λ ()
          (with-output-to-string
            (λ ()
              (set-box! result (interp e)))))))
    (cons (unbox result) output))
   
   

An important property of this semantics is that it provides a meaning for all possible expressions; in other words, interp/io is a total function, and our language has no undefined behaviors.

Total Semantics: For all e Expr, there exists a Answer, o String, such that (interp/io e i) equals (cons a o).

The statement of correctness, which is revised to use the answer type in place of values, reads:

Compiler Correctness: For all e Expr, i, o String, and a Answer, if (interp/io e i) equals (cons a o), then (exec/io e i) equals (cons a o).

By virtue of the semantics being total, we have a complete specification for the compiler. There are no longer programs for which it is free to do arbitrary things; it is always obligated to produce the same answer as the interpreter.

11.3 Checking and signalling errors at run-time🔗

Suppose we want to compile (add1 #f), what needs to happen? Just as in the interpreter, we need to check the integerness of the argument’s value before doing the addition operation.

This checking needs to be emitted as part of the compilation of the add1 primitive. It no longer suffices to simply do an Add instruction on whatever is in the rax register; the compiled code should first check that the value in rax encodes an integer, and if it doesn’t, it should somehow stop the computation and signal that an error has occurred.

The checking part is fairly easy. Our encoding of values, first discussed in Dupe: a duplicity of types, devotes some number of bits within a value to indicate the type. Checking whether something is an integer involves inspecting just those parts of the value.

Suppose we have an arbitrary value in rax. If it’s an integer, the least significant bit has to be 0. We could mask out just that bit by Anding the value with the bit 1 and compare the result to type tag for integers (i.e. 0). Let’s write a little function to play around with this idea:

Examples

; Produces 0 if v is an integer value
> (define (is-int? v) ; Value -> 0 | 1
    (asm-interp
      (prog (Global 'entry)
            (Label 'entry)
            (Mov 'rax (value->bits v))
            (And 'rax mask-int)
            (Ret))))
> (is-int? 0)

0

> (is-int? 1)

0

> (is-int? 2)

0

> (is-int? #t)

1

> (is-int? #f)

1

Unfortunately, the use of And here destroys the value in rax in order to determine if it is an integer. That’s fine for this predicate, but if we wanted to compute something with the integer, we’d be toast as soon as checked it’s type.

Suppose we wanted to write a function that did add1 in case the argument is an integer value, otherwise it produces false. For that, we would need to use a temporary register for the type tag check:

Examples

; Produces (add1 v) if v is an integer value, #f otherwise
> (define (plus1 v) ; Value -> Integer | Boolean
    (bits->value
      (asm-interp
        (prog (Global 'entry)
              (Label 'entry)
              (Mov 'rax (value->bits v))
              (Mov 'r9 'rax)
              (And 'r9 mask-int)
              (Cmp 'r9 type-int)
              (Jne 'err)
              (Add 'rax (value->bits 1))
              (Ret)
              (Label 'err)
              (Mov 'rax (value->bits #f))
              (Ret)))))
> (plus1 0)

1

> (plus1 1)

2

> (plus1 2)

3

> (plus1 #t)

#f

> (plus1 #f)

#f

This is pretty close to how we can implement primitives like add1. The only missing piece is that instead of returning a specific value, like #f, we want to stop computation and signal that an error has occurred. To accomplish this we add a function called raise_error to our run-time system that, when called, prints err and exits with a non-zero number to signal an error. The asm-interp intercepts these calls are returns the 'err symbol to match what the interpreter does:

Examples

> (current-objs '("runtime.o"))
> (asm-interp
    (prog (Global 'entry)
          (Label 'entry)
          (Extern 'raise_error)
          (Call 'raise_error)))

'err

Now we can make a function that either does the addition or signals an error:

Examples

; Produces (add1 v) if v is an integer, 'err otherwise
> (define (plus1 v) ; Value -> Integer | 'err
    (match
      (asm-interp
        (prog (Global 'entry)
              (Label 'entry)
              (Mov 'rax (value->bits v))
              (Mov 'r9 'rax)
              (And 'r9 mask-int)
              (Cmp 'r9 type-int)
              (Jne 'err)
              (Add 'rax (value->bits 1))
              (Ret)
              (Label 'err)
              (Extern 'raise_error)
              (Call 'raise_error)))
      ['err 'err]
      [b (bits->value b)]))
> (plus1 0)

1

> (plus1 1)

2

> (plus1 2)

3

> (plus1 #t)

'err

> (plus1 #f)

'err

This can form the basis of how primitives with error checking can be implemented. Looking at interp-prim1, we can see that in addition to checking for whether an argument is an integer, we will also need checks for characters, bytes, and unicode codepoints; the latter two are refinements of the integer check: they require there arguments to be integers in a certain range.

To support these checks, we develop a small library of type assertion functions that, given a register name, produce code to check the type of value held in the register, jumping to err whenever the value is not of the asserted type:

  • assert-integer : Register -> Asm produces code to check that the value in the given register is an integer,

  • assert-char : Register -> Asm produces code to check that the value in the given register is a character,

  • assert-byte : Register -> Asm produces code to check that the value in the given register is an integer and in the range [0,256).

  • assert-codepoint : Register -> Asm produces code to check that the value in the given register is an integer and either in the range [0,55295] or [57344, 1114111].

extort/assert.rkt

  #lang racket
  (provide assert-integer assert-char assert-byte assert-codepoint)
  (require a86/ast)
  (require "types.rkt")
   
  (define r9 'r9)
   
  ;; Register -> Asm
  (define (assert-integer r)
    (seq (Mov r9 r)
         (And r9 mask-int)
         (Cmp r9 type-int)
         (Jne 'err)))
   
  ;; Register -> Asm
  (define (assert-char r)
    (seq (Mov r9 r)
         (And r9 mask-char)
         (Cmp r9 type-char)
         (Jne 'err)))
   
  ;; Register -> Asm
  (define (assert-codepoint r)
    (let ((ok (gensym)))
      (seq (assert-integer r)
           (Cmp r (value->bits 0))
           (Jl 'err)
           (Cmp r (value->bits 1114111))
           (Jg 'err)
           (Cmp r (value->bits 55295))
           (Jl ok)
           (Cmp r (value->bits 57344))
           (Jg ok)
           (Jmp 'err)
           (Label ok))))
   
  ;; Register -> Asm
  (define (assert-byte r)
    (seq (assert-integer r)
         (Cmp r (value->bits 0))
         (Jl 'err)
         (Cmp r (value->bits 255))
         (Jg 'err)))
   
   

The compiler for primitive operations is updated to include appropriate type assertions:

extort/compile-ops.rkt

  #lang racket
  (provide compile-op0 compile-op1)
  (require "ast.rkt")
  (require "types.rkt")
  (require "assert.rkt")
  (require a86/ast)
   
  (define rax 'rax)
  (define rdi 'rdi) ; arg
  (define r9  'r9)  ; scratch
   
  ;; Op0 -> Asm
  (define (compile-op0 p)
    (match p
      ['void      (seq (Mov rax (value->bits (void))))]
      ['read-byte (seq (Call 'read_byte))]
      ['peek-byte (seq (Call 'peek_byte))]))
   
  ;; Op1 -> Asm
  (define (compile-op1 p)
    (match p
      ['add1
       (seq (assert-integer rax)
            (Add rax (value->bits 1)))]
      ['sub1
       (seq (assert-integer rax)
            (Sub rax (value->bits 1)))]
      ['zero?
       (seq (assert-integer rax)
            (Cmp rax 0)
            if-equal)]
      ['char?
       (seq (And rax mask-char)
            (Cmp rax type-char)
            if-equal)]
      ['char->integer
       (seq (assert-char rax)
            (Sar rax char-shift)
            (Sal rax int-shift))]
      ['integer->char
       (seq (assert-codepoint rax)
            (Sar rax int-shift)
            (Sal rax char-shift)
            (Xor rax type-char))]
      ['eof-object?
       (seq (Cmp rax (value->bits eof))
            if-equal)]
      ['write-byte
       (seq (assert-byte rax)
            (Mov rdi rax)
            (Call 'write_byte))]))
   
  ;; Asm
  ;; set rax to #t or #f if comparison flag is equal
  (define if-equal
    (seq (Mov rax (value->bits #f))
         (Mov r9  (value->bits #t))
         (Cmove rax r9)))
   
   

Examples

> (compile-op1 'add1)

(list

 (Mov 'r9 'rax)

 (And 'r9 1)

 (Cmp 'r9 0)

 (Jne 'err)

 (Add 'rax 2))

> (compile-op1 'char->integer)

(list

 (Mov 'r9 'rax)

 (And 'r9 3)

 (Cmp 'r9 1)

 (Jne 'err)

 (Sar 'rax 2)

 (Sal 'rax 1))

> (compile-op1 'write-byte)

(list

 (Mov 'r9 'rax)

 (And 'r9 1)

 (Cmp 'r9 0)

 (Jne 'err)

 (Cmp 'rax 0)

 (Jl 'err)

 (Cmp 'rax 510)

 (Jg 'err)

 (Mov 'rdi 'rax)

 (Call 'write_byte))

The top-level compiler largely stays the same, but it now declares an external label 'raise_error that will be defined by the run-time system and defines a label called 'err that calls raise_error:

extort/compile.rkt

  #lang racket
  (provide compile
           compile-e)
   
  (require "ast.rkt")
  (require "compile-ops.rkt")
  (require "types.rkt")
  (require a86/ast)
   
  (define rax 'rax)
  (define rsp 'rsp) ; stack
   
  ;; Expr -> Asm
  (define (compile e)
    (prog (Global 'entry)
          (Extern 'peek_byte)
          (Extern 'read_byte)
          (Extern 'write_byte)
          (Extern 'raise_error)
          (Label 'entry)
          (Sub rsp 8)
          (compile-e e)
          (Add rsp 8)
          (Ret)
          ;; Error handler
          (Label 'err)
          (Call 'raise_error)))
   
  ;; Expr -> Asm
  (define (compile-e e)
    (match e
      [(Lit d) (compile-value d)]
      [(Eof) (compile-value eof)]
      [(Prim0 p) (compile-prim0 p)]
      [(Prim1 p e) (compile-prim1 p e)]
      [(If e1 e2 e3)
       (compile-if e1 e2 e3)]
      [(Begin e1 e2)
       (compile-begin e1 e2)]))
   
  ;; Value -> Asm
  (define (compile-value v)
    (seq (Mov rax (value->bits v))))
   
  ;; Op0 -> Asm
  (define (compile-prim0 p)
    (compile-op0 p))
   
  ;; Op1 Expr -> Asm
  (define (compile-prim1 p e)
    (seq (compile-e e)
         (compile-op1 p)))
   
  ;; Expr Expr Expr -> Asm
  (define (compile-if e1 e2 e3)
    (let ((l1 (gensym 'if))
          (l2 (gensym 'if)))
      (seq (compile-e e1)
           (Cmp rax (value->bits #f))
           (Je l1)
           (compile-e e2)
           (Jmp l2)
           (Label l1)
           (compile-e e3)
           (Label l2))))
   
  ;; Expr Expr -> Asm
  (define (compile-begin e1 e2)
    (seq (compile-e e1)
         (compile-e e2)))
   
   

Examples

> (compile-e (parse '(add1 #f)))

(list

 (Mov 'rax 7)

 (Mov 'r9 'rax)

 (And 'r9 1)

 (Cmp 'r9 0)

 (Jne 'err)

 (Add 'rax 2))

> (compile-e (parse '(char->integer #\a)))

(list

 (Mov 'rax 389)

 (Mov 'r9 'rax)

 (And 'r9 3)

 (Cmp 'r9 1)

 (Jne 'err)

 (Sar 'rax 2)

 (Sal 'rax 1))

> (compile-e (parse '(write-byte 97)))

(list

 (Mov 'rax 194)

 (Mov 'r9 'rax)

 (And 'r9 1)

 (Cmp 'r9 0)

 (Jne 'err)

 (Cmp 'rax 0)

 (Jl 'err)

 (Cmp 'rax 510)

 (Jg 'err)

 (Mov 'rdi 'rax)

 (Call 'write_byte))

11.4 Run-time for Extort🔗

We extend the run-time system with a C function called raise_error that prints "err" and exits with a non-zero status to indicate something has gone wrong.

The runtime system is written with a level of indirection between raise_error and the code that prints and exits in error_exit; this is done so that the testing framework can intercede and replace the error function, but it can be ignored.

main.c

#include <stdio.h>
#include <stdlib.h>
#include "values.h"
#include "print.h"
#include "runtime.h"

FILE* in;
FILE* out;
void (*error_handler)();

void error_exit()
{
  printf("err\n");
  exit(1);
}

void raise_error()
{
  return error_handler();
}

int main(int argc, char** argv)
{
  in = stdin;
  out = stdout;
  error_handler = &error_exit;
  
  val_t result;

  result = entry();
  print_result(result);
  if (val_typeof(result) != T_VOID)
    putchar('\n');

  return 0;
}

Linking in the run-time allows us to define the exec and exec/io functions:

extort/exec.rkt

  #lang racket
  (require a86/interp)
  (require "compile.rkt")
  (require "types.rkt")
  (require "build-runtime.rkt")
  (provide exec)
  ;; Expr -> Answer
  (define (exec e)
    (parameterize ((current-objs (list (path->string runtime-path))))
      (match (asm-interp (compile e))
        ['err 'err]
        [b (bits->value b)])))
   
   

extort/exec-io.rkt

  #lang racket
  (require a86/interp)
  (require "compile.rkt")
  (require "types.rkt")
  (require "build-runtime.rkt")
  (provide exec/io)
   
  ;; Expr String -> (cons Answer String)
  (define (exec/io e in)
    (parameterize ((current-objs (list (path->string runtime-path))))
      (match (asm-interp/io (compile e) in)
        [(cons 'err o) (cons 'err o)]
        [(cons b o) (cons (bits->value b) o)])))
   
   

We can run examples:

Examples

> (exec (parse '(add1 8)))

9

> (exec (parse '(add1 #f)))

'err

11.5 Correctness, revisited🔗

This allows to re-formulate the check for compiler correctness check in its earlier, simpler form that just blindly runs the interpreter and compiler and checks that the results are the same, thanks to the totality of the semantics:

extort/correct.rkt

  #lang racket
  (provide check-compiler)
  (require rackunit)
  (require "interp-io.rkt")
  (require "exec-io.rkt")
  ;; Expr String -> Void
  (define (check-compiler e i)
    (check-equal? (interp/io e i)
                  (exec/io e i)))
   
   

Examples

> (check-compiler (parse '(add1 8)) "")
> (check-compiler (parse '(add1 #f)) "")

And again, we can randomly test the compiler by generating programs and inputs:

Examples

> (require "random.rkt")
> (for ((i 100))
    (check-compiler (random-expr) (random-input)))