7 Dupe: a duplicity of types

8.14

contents ← prev up next →

7 Dupe: a duplicity of types🔗

Source code.

There are 10 types of values...

7.1 Integers and Booleans

7.2 Meaning of Dupe programs

7.3 Ex uno plures: Out of One, Many

7.4 An Example of Dupe compilation

7.5 A Compiler for Dupe

7.6 Correctness and testing

7.1 Integers and Booleans🔗

Up until now we have worked to maintain a single type of values: integers. This led to some awkwardness in the design of conditionals. Let us lift this restriction by considering a multiplicity of types. To start, we will consider two: integers and booleans.

We’ll call it Dupe.

We will use the following syntax, which relaxes the syntax of Con to make '(zero? e) its own expression form and conditionals can now have arbitrary expressions in test position: (if e0 e1 e2) instead of just (if (zero? e0) e1 e2). We also add syntax for boolean literals.

Together this leads to the following grammar for concrete Dupe.

And abstract Dupe:

One thing to take note of is the new nonterminal v which ranges over values, which are integers and booleans.

Abstract syntax is modelled with the following datatype definition:

dupe/ast.rkt
  #lang racket
  (provide Lit Prim1 If)
  ;; type Expr = (Lit Datum)
  ;;           | (Prim1 Op1 Expr)
  ;;           | (If Expr Expr Expr)
  ;; type Datum = Integer
  ;;            | Boolean
  ;; type Op1 = 'add1 | 'sub1
  ;;          | 'zero?
  (struct Lit (d) #:prefab)
  (struct Prim1 (p e) #:prefab)
  (struct If (e1 e2 e3) #:prefab)

The s-expression parser is defined as follows:

dupe/parse.rkt
  #lang racket
  (provide parse)
  (require "ast.rkt")

  ;; S-Expr -> Expr
  (define (parse s)
    (match s
      [(? datum?) (Lit s)]
      [(list-rest (? symbol? k) sr)
       (match k
         [(? op1? o)
          (match sr
            [(list s1)
             (Prim1 o (parse s1))]
            [_ (error "op1: bad syntax" s)])]
         ['if
          (match sr
            [(list s1 s2 s3)
             (If (parse s1) (parse s2) (parse s3))]
            [_ (error "if: bad syntax" s)])]
         [_ (error "parse error" s)])]
      [_ (error "parse error" s)]))

  ;; Any -> Boolean
  (define (datum? x)
    (or (exact-integer? x)
        (boolean? x)))

  (define (op1? x)
    (memq x '(add1 sub1 zero?)))

7.2 Meaning of Dupe programs🔗

To consider the meaning of Dupe programs, we must revisit the meaning of conditions. Previously we branched on whether a subexpression evaluated to 0. We will now consider whether the subexpression evaluates to #f.

Let’s consider some examples:

(if #f 1 2) means 2 since the test expression means #f.
(if #t 1 2) means 1 since the test expression means #t, which is not #f.

Just as in Racket (and Scheme, generally), we will treat any non-#f value as “true.” So:

(if 0 1 2) means 1 since the test expression means 0, which is not #f.
(if 7 1 2) means 1 since the test expression means 7, which is not #f.

The new (zero? e) expression form evaluates to a boolean: #t when e evaluates to 0 and #f to an integer other than 0.

The #t and #f literals mean themselves, just like integer literals.

Once a multiplicity of types are introduced, we are forced to come to grips with type mismatches. For example, what should (add1 #f) mean?

Languages adopt several approaches:

Prohibit such programs statically with a type system (e.g. OCaml, Java)
Coerce values to different types (e.g. JavaScript)
Signal a run-time error (e.g. Racket)
Leave the behavior unspecified (e.g. Scheme, C)

We are going to start by taking the last approach. Later we can reconsider the design, but for now this is a simple approach.

The semantics is still a binary relation between expressions and their meaning, however the type of the relation is changed to reflect the values of Dupe, which may either be integers or booleans:

This relies on two helper judgments:

Notice that here we are following the semantics of Racket, which says that anything that is not #f counts as true in a conditional.

This semantics has also been refactored slightly so that there is a single rule for the Prim1 case. The new rule essentially defers the work to a new metafunction, 𝑫-𝒑𝒓𝒊𝒎:

Returning to the issue of type mismatches, what does the semantics say about (Prim1 'add1 (Lit #f))?

What it says is: nothing. These programs are simply not in the semantic relation for this language. There’s only one rule for giving meaning to an (Prim1 'add1 e0) expression and it’s premise is that e means some integer i0. But ((Lit #f), i) ∉ 𝑫 for any i. So there’s no value v such that ((Prim1 'add1 (Lit #f)), v) ∈ 𝑫. This expression is undefined according to the semantics.

The interpreter follows the rules of the semantics closely and is straightforward:

dupe/interp.rkt
  #lang racket
  (provide interp)
  (require "ast.rkt")
  (require "interp-prim.rkt")

  ;; type Value =
  ;; | Integer
  ;; | Boolean
  ;; Expr -> Value
  (define (interp e)
    (match e
      [(Lit d) d]
      [(Prim1 p e)
       (interp-prim1 p (interp e))]
      [(If e1 e2 e3)
       (if (interp e1)
           (interp e2)
           (interp e3))]))

And the interpretation of primitives closely matches 𝑫-𝒑𝒓𝒊𝒎:

dupe/interp-prim.rkt
  #lang racket
  (provide interp-prim1)

  ;; Op1 Value -> Value
  (define (interp-prim1 op v)
    (match op
      ['add1 (add1 v)]
      ['sub1 (sub1 v)]
      ['zero? (zero? v)]))

We can confirm the interpreter computes the right result for the examples given earlier:

Examples
> (interp (Lit #t))
#t
> (interp (Lit #f))
#f
> (interp (If (Lit #f) (Lit 1) (Lit 2)))
2
> (interp (If (Lit #t) (Lit 1) (Lit 2)))
1
> (interp (If (Lit 0) (Lit 1) (Lit 2)))
1
> (interp (If (Lit 7) (Lit 1) (Lit 2)))
1
> (interp (If (Prim1 'zero? (Lit 7)) (Lit 1) (Lit 2)))
2

Correctness follows the same pattern as before, although it is worth keeping in mind the “hypothetical” form of the statement: if the expression has some meaning, then the interpreter must produce it. In cases where the semantics of the expression is undefined, the interpreter can do whatever it pleases; there is no specification.

Interpreter Correctness: For all Dupe expressions e and values v, if (e,v) in , then (interp e) equals v.

Consider what happens with interp on undefined programs such as (add1 #f): the interpretation of this expression is just the application of the Racket add1 function to #f, which results in the interp program crashing and Racket signalling an error:

Examples
> (interp (Prim1 'add1 (Lit #f)))
add1: contract violation
expected: number?
given: #f

This isn’t a concern for correctness, because the interpreter is free to crash (or do anything else) on undefined programs; it’s not in disagreement with the semantics, because there is no semantics.

From a pragmatic point of view, this is a concern because it complicates testing. If the interpreter is correct and every expression has a meaning (as in all of our previous languages), it follows that the interpreter can’t crash on any Expr input. That makes testing easy: think of an expression, run the interpreter to compute its meaning. But now the interpreter may break if the expression is undefined. We can only safely run the interpreter on expressions that have a meaning.

We’ll return to this point in the design of later langauges.

7.3 Ex uno plures: Out of One, Many🔗

Before getting in to the compiler, let’s first address the issue of representation of values in the setting of x86.

So far we’ve had a single type of value: integers. Luckily, x86 has a convenient datatype for representing integers: it has 64-bit signed integers. (There is of course the problem that this representation is inadequate; there are many (Con) integers we cannot represent as a 64-bit integer. But we put off worrying about this until later.)

The problem now is how to represent integers and booleans, which should be disjoint sets of values. Representing these things in the interpreter as Racket values was easy: we used booleans and integers. Representing this things in x86 will be more complicated. x86 doesn’t have a notion of “boolean” per se and it doesn’t have a notion of “disjoint” datatypes. There is only one data type and that is: bits.

We chose 64-bit integers to represent Con integers, because that’s the kind of value that can be stored in a register or used as an argument to a instruction. It’s the kind of thing that can be returned to the C run-time. It’s (more or less) the only kind of thing we have to work with. So had we started with booleans instead of integers, we still would have represented values as a sequence of bits because that’s all there is. Now that we have booleans and integers, we will have to represent both as bits (64-bit integers). The bits will have to encode both the value and the type of value.

As discussed in the lecture video, there are many possible ways of representing multiple types, this is just how we’ve decided to do it.

Here is the idea of how this could be done: We have two kinds of data: integers and booleans, so we could use one bit to indicate whether a value is a boolean or an integer. The remaining 63 bits can be used to represent the value itself, either true, false, or some integer.

Let’s use the least significant bit to indicate the type and let’s use #b0 for integer and #b1 for boolean. These are arbitrary choices (more or less).

The number 1 would be represented as #b10, which is the bits for the number (i.e. #b1), followed by the integer tag (#b0). Note that the representation of a Dupe number is no longer the number itself: the Dupe value 1 is represented by the number 2 (#b10). The Dupe value #t is represented by the number 1 (#b01); the Dupe value #f is represented by the number 3 (#b11).

One nice thing about our choice of encoding: 0 is represented as 0 (#b00).

If you wanted to determine if a 64-bit integer represented an integer or a boolean, you simply need to inquire about the value of the least significant bit. At a high-level, this just corresponds to asking if the number is even or odd. Odd numbers end in the bit (#b1), so they reprepresent booleans. Even numbers represent integers. Here are some functions to check our understanding of the encoding:

dupe/types.rkt
  #lang racket
  (provide (all-defined-out))
  (define int-shift    1)
  (define mask-int   #b1)
  (define type-int   #b0)

  (define (bits->value b)
    (cond [(= b (value->bits #t))  #t]
          [(= b (value->bits #f)) #f]
          [(int-bits? b)
           (arithmetic-shift b (- int-shift))]
          [else (error "invalid bits")]))

  (define (value->bits v)
    (cond [(eq? v #t) #b01]
          [(eq? v #f) #b11]
          [(integer? v) (arithmetic-shift v int-shift)]))

  (define (int-bits? v)
    (= type-int (bitwise-and v mask-int)))

Examples
> (bits->value 0)
0
> (bits->value 1)
#t
> (bits->value 2)
1
> (bits->value 3)
#f
> (bits->value 4)
2
> (bits->value 5)
invalid bits
> (bits->value 6)
3
> (bits->value 7)
invalid bits

Notice that not all bits represent a value; name any odd number that’s neither 1 (#f) or 3 (#t).

We can also write the inverse:

Examples
> (value->bits #t)
1
> (value->bits #f)
3
> (value->bits 0)
0
> (value->bits 1)
2
> (value->bits 2)
4
> (value->bits 3)
6
> (bits->value (value->bits #t))
#t
> (bits->value (value->bits #f))
#f
> (bits->value (value->bits 0))
0
> (bits->value (value->bits 1))
1
> (bits->value (value->bits 2))
2
> (bits->value (value->bits 3))
3

The interpreter operates at the level of Values. The compiler will have to work at the level of Bits. Of course, we could, as an intermediate step, define an interpreter that works on bits, which may help us think about how to implement the compiler.

We want a function

; interp-bits : Expr -> Bits

such that

; ∀ e : Expr . (interp-bits e) = (value->bits (interp e))

Let’s design interp-bits by derivation. This is a common functional programming technique whereby we start from a specification program and through a series of algebraic transformations, arrive an correct implementation.

First, let’s state the specification of interp-bits as a function that “cheats,” it uses interp to carry out evaluation and then finally converts to bits.

Examples
; Expr -> Bits
> (define (interp-bits e)
(value->bits (interp e)))

It’s clearly correct with respect to the spec for interp-bits that we started with because the code is just the specification itself.

We can even do some property-based random testing (which obviously succeeds):

Examples
(define (interp-bits-correct e)
  (with-handlers ([exn:fail? (λ (x) 'ok)])
    (interp e)
    (check-equal? (interp-bits e)
                  (value->bits (interp e))
                  (format "~a" e))))
(define es
  (for/list ([i 100])
    (random-expr)))
(for-each interp-bits-correct es)

The one wrinkle is we really only need the spec to hold when e is defined according to the semantics. Since we know interp crashes (by raising an exception) whenever e is undefined, we use an exception handler to avoid testing when e is undefined.

Now let us inline the defintion of interp, i.e. let’s replace the use of interp with it’s definition:

Examples
; Expr -> Bits
(define (interp-bits e)
  (value->bits
   (match e
     [(Lit l) l]
     [(Prim1 p e)
      (interp-prim1 p (interp e))]
     [(If e1 e2 e3)
      (if (interp e1)
          (interp e2)
          (interp e3))])))

It’s still correct:

Examples
> (for-each interp-bits-correct es)

Now let’s “push” the value->bits inward by using the following equation:

(f (match m [p e] ...)) = (match m [p (f e)] ...)

So we get:

Examples
; Expr -> Bits
(define (interp-bits e)
  (match e
    [(Lit l) (value->bits l)]
    [(Prim1 p e)
     (value->bits
      (interp-prim1 p (interp e)))]
    [(If e1 e2 e3)
     (value->bits
      (if (interp e1)
          (interp e2)
          (interp e3)))]))

Still correct:

Examples
> (for-each interp-bits-correct es)

Now consider the first case, where we are calling value->bits on i, which we know is an integer whenever this clause of the match is taken. From looking at the definition of value->bits, we know (value->bits i) is (arithmetic-shift i int-shift) when i is an integer. So we can replace the RHS of the first case with (arithmetic-shift i int-shift).

We can do similar reasoning on the second case with (value->bits b) where b is a boolean. From the definition of value->bits, we can replace the RHS with (match b [#t (value->bits #t)] [#f (value->bits #f)]), which can be written more succinctly as (if b (value->bits #t) (value->bits #f)).

In the third case, let’s suppose there is an analog of interp-prim1 called interp-prim1-bits that operates on, and produces, bits. As a start, we can assume a definition that is just the specification of this function (we’ll derive a better version later):

Examples
; Op Bits -> Bits
(define (interp-prim1-bits p b)
(value->bits (interp-prim1 p (bits->value b))))

Now we can replace the RHS of the third case with (interp-prim1-bits p (interp-bits e)).

Finally, in the fourth case, we can use the following equality:

(f (if e0 e1 e2)) = (if e0 (f e1) (f e2))

to arrive at:

(if (interp e0)
(value->bits (interp e1))
(value->bits (interp e2)))

Of course, (value->bits (interp e1)) is just what interp-bits computes, so this is equivalent to:

(if (interp e0)
(interp-bits e1)
(interp-bits e2))

Now observe that (interp e0) produces #f if and only if (interp-bits e0) produces (value->bits #f). We can therefore eliminate the use of interp by replacing this conditional with:

(if (eq? (value->bits #f) (interp-bits e0))
(interp-bits e2)
(interp-bits e1))

(Notice the swapping of the then- and else-branch of the conditional.)

We’ve now arrived at the following interp-free definition of interp-bits:

Examples
; Expr -> Bits
(define (interp-bits e)
  (match e
    [(Lit l) (value->bits l)]
    [(Prim1 p e)
     (interp-prim1-bits p (interp-bits e))]
    [(If e1 e2 e3)
     (if (eq? (value->bits #f) (interp-bits e1))
         (interp-bits e3)
         (interp-bits e2))]))

And it is still correct:

Examples
> (for-each interp-bits-correct es)

We’re almost done. Now let’s derive a version of interp-prim1-bits starting from the specification we gave above. To start, replace the use of interp-prim1 with its definition:

Examples
; Op Bits -> Bits
(define (interp-prim1-bits op b)
  (value->bits
   (match op
     ['add1  (add1 (bits->value b))]
     ['sub1  (sub1 (bits->value b))]
     ['zero? (zero? (bits->value b))])))

Now push value->bits inward:

Examples
; Op Bits -> Bits
(define (interp-prim1-bits op b)
  (match op
    ['add1  (value->bits (add1 (bits->value b)))]
    ['sub1  (value->bits (sub1 (bits->value b)))]
    ['zero? (value->bits (zero? (bits->value b)))]))

Now notice the following:

(value->bits (add1 (bits->value b))) ≡ (+ b (value->bits 1)) ≡ (+ b (arithmetic-shift 1 int-shift))
(value->bits (sub1 (bits->value b))) ≡ (- b (value->bits 1)) ≡ (- b (arithmetic-shift 1 int-shift))
(value->bits (zero? (bits->value b))) ≡ (value->bits (zero? b)) ≡ (if (zero? b) (value->bits #t) (value->bits #f))

So we can define interp-prim1-bits as:

Examples
; Op Bits -> Bits
(define (interp-prim1-bits op b)
  (match op
    ['add1  (+ b (arithmetic-shift 1 int-shift))]
    ['sub1  (- b (arithmetic-shift 1 int-shift))]
    ['zero? (if (zero? b) (value->bits #t) (value->bits #f))]))

Looking back: starting from the spec, we’ve arrived at a definition of interp-bits that is completely self-contained: it doesn’t use interp at all. It also only uses the Bits representation and does no conversions to or from Values.

We can recover the original interpreter by wrapping the bit interpreter in a final conversion:

Examples
; Expr -> Value
> (define (interp.v2 e)
(bits->value (interp-bits e)))
> (interp.v2 (Lit #t))
#t
> (interp.v2 (Lit #f))
#f
> (interp.v2 (parse '(if #f 1 2)))
2
> (interp.v2 (parse '(if #t 1 2)))
1
> (interp.v2 (parse '(if 0 1 2)))
1
> (interp.v2 (parse '(if 7 1 2)))
1
> (interp.v2 (parse '(if (zero? 7) 1 2)))
2
> (interp.v2 (parse '(add1 #f)))
invalid bits
> (interp.v2 (parse '(add1 #t)))
#f

Notice the last two examples. What’s going on?

The interp.v2 function is also a correct interpreter for Dupe, and importantly, it sheds light on how to implement the compiler since it uses the same representation of values.

7.4 An Example of Dupe compilation🔗

The most significant change from Con to Dupe for the compiler is the change in representation, but having sorted those issues out at the level of interp-bits, it should be pretty easy to write the compiler.

Let’s consider some simple examples:

42: this should compile just like integer literals before, but needs to use the new representation, i.e. the compiler should produce (Mov 'rax 84), which is (* 42 2).
#f: this should produce (Mov 'rax 3).
#t: this should produce (Mov 'rax 1).
(add1 e): this should produce the instructions for e followed by an instruction to add 2, which is just how interp-bits interprets an add1.
(sub1 e): should work like (add1 e) but subtracting 2.
(zero? e): this should produce the instructions for e followed by instructions that compare 'rax to 0 and set 'rax to #t (i.e. #b01) if true and #f (i.e. #b11) otherwise.
This is a bit different from what we saw with Con, which combined conditional execution with testing for equality to 0. Here there is no need to jump anywhere based on whether e produces 0 or not. Instead we want to move either the encoding of #t or #f into 'rax depending on what e produces. To accomplish that, we can use a new kind of instruction, the conditional move instruction: Cmov.
(if e0 e1 e2): this should work much like before, compiling each subexpression, generating some labels and the appropriate comparison and conditional jump. The only difference is we now want to compare the result of executing e0 with #f (i.e. #b11) and jumping to the code for e2 when they are equal.

Examples
> (compile-e (Lit 42))
(list (Mov 'rax 84))
> (compile-e (Lit #t))
(list (Mov 'rax 1))
> (compile-e (Lit #f))
(list (Mov 'rax 3))
> (compile-e (parse '(zero? 0)))
(list
(Mov 'rax 0)
(Cmp 'rax 0)
(Mov 'rax 3)
(Mov 'r9 1)
(Cmove 'rax 'r9))
> (compile-e (parse '(if #t 1 2)))
(list
(Mov 'rax 1)
(Cmp 'rax 3)
(Je 'if6687)
(Mov 'rax 2)
(Jmp 'if6688)
(Label 'if6687)
(Mov 'rax 4)
(Label 'if6688))
> (compile-e (parse '(if #f 1 2)))
(list
(Mov 'rax 3)
(Cmp 'rax 3)
(Je 'if6689)
(Mov 'rax 2)
(Jmp 'if6690)
(Label 'if6689)
(Mov 'rax 4)
(Label 'if6690))

7.5 A Compiler for Dupe🔗

Based on the examples, we can write the compiler:

dupe/compile.rkt
  #lang racket
  (provide compile
           compile-e)

  (require "ast.rkt")
  (require "compile-ops.rkt")
  (require "types.rkt")
  (require a86/ast)

  (define rax 'rax)

  ;; Expr -> Asm
  (define (compile e)
    (prog (Global 'entry)
          (Label 'entry)
          (compile-e e)
          (Ret)))

  ;; Expr -> Asm
  (define (compile-e e)
    (match e
      [(Lit d) (compile-value d)]
      [(Prim1 p e) (compile-prim1 p e)]
      [(If e1 e2 e3)
       (compile-if e1 e2 e3)]))

  ;; Value -> Asm
  (define (compile-value v)
    (seq (Mov rax (value->bits v))))

  ;; Op1 Expr -> Asm
  (define (compile-prim1 p e)
    (seq (compile-e e)
         (compile-op1 p)))

  ;; Expr Expr Expr -> Asm
  (define (compile-if e1 e2 e3)
    (let ((l1 (gensym 'if))
          (l2 (gensym 'if)))
      (seq (compile-e e1)
           (Cmp rax (value->bits #f))
           (Je l1)
           (compile-e e2)
           (Jmp l2)
           (Label l1)
           (compile-e e3)
           (Label l2))))

The compilation of primitives, including the new zero? primitive, can be accomplished with:

dupe/compile-ops.rkt
  #lang racket
  (provide compile-op1)
  (require "ast.rkt")
  (require "types.rkt")
  (require a86/ast)

  (define rax 'rax)
  (define r9  'r9)

  ;; Op1 -> Asm
  (define (compile-op1 p)
    (match p
      ['add1 (Add rax (value->bits 1))]
      ['sub1 (Sub rax (value->bits 1))]
      ['zero?
       (seq (Cmp rax 0)
            (Mov rax (value->bits #f))
            (Mov r9  (value->bits #t))
            (Cmove rax r9))]))

We can try out the compiler with the help of asm-interp, but you’ll notice the results are a bit surprising:

Examples
> (asm-interp (compile (Lit #t)))
1
> (asm-interp (compile (Lit #f)))
3
> (asm-interp (compile (parse '(zero? 0))))
1
> (asm-interp (compile (parse '(zero? -7))))
3
> (asm-interp (compile (parse '(if #t 1 2))))
2
> (asm-interp (compile (parse '(if #f 1 2))))
4
> (asm-interp (compile (parse '(if (zero? 0) (if (zero? 0) 8 9) 2))))
16
> (asm-interp (compile (parse '(if (zero? (if (zero? 2) 1 0)) 4 5))))
8

The reason for this is asm-interp doesn’t do any interpretation of the bits it gets back; it is simply producing the integer that lives in 'rax when the assembly code finishes. This suggests adding a call to bits->value can be added to interpret the bits as values:

Examples
> (define (interp-compile e)
(bits->value (asm-interp (compile e))))
> (interp-compile (Lit #t))
#t
> (interp-compile (Lit #f))
#f
> (interp-compile (parse '(zero? 0)))
#t
> (interp-compile (parse '(zero? -7)))
#f
> (interp-compile (parse '(if #t 1 2)))
1
> (interp-compile (parse '(if #f 1 2)))
2
> (interp-compile (parse '(if (zero? 0) (if (zero? 0) 8 9) 2)))
8
> (interp-compile (parse '(if (zero? (if (zero? 2) 1 0)) 4 5)))
4

The one last peice of the puzzle is updating the run-time system to incorporate the new representation. The run-time system is essentially playing the role of bits->value: it determines what is being represented and prints it appropriately.

For the run-time system, we define the bit representations in a header file corresponding to the definitions given in types.rkt:

types.h

#ifndef TYPES_H
#define TYPES_H

/*
  Bit layout of values

  Values are either:
  - Integers:   end in  #b0
  - True:              #b01
  - False:             #b11
*/
#define int_shift        1
#define int_type_mask    ((1 << int_shift) - 1)
#define int_type_tag     (0 << (int_shift - 1))
#define nonint_type_tag  (1 << (int_shift - 1))
#define val_true  ((0 << int_shift) | nonint_type_tag)
#define val_false ((1 << int_shift) | nonint_type_tag)

#endif

It uses an idiom of “masking” in order to examine on particular bits of a value. So for example if we want to know if the returned value is an integer, we do a bitwise-and of the result and 1. This produces a single bit: 0 for integer and 1 for boolean. In the case of an integer, to recover the number being represented, we need to divide by 2, which can be done efficiently with a right-shift of 1 bit. Likewise with a boolean, if we shift right by 1 bit there are two possible results: 3 for false and 1 for true.

We use the following interface for values in the runtime system:

values.h

#ifndef VALUES_H
#define VALUES_H

#include <stdint.h>

/* any abstract value */
typedef int64_t val_t;

typedef enum type_t {
  T_INVALID = -1,
  /* immediates */
  T_INT,
  T_BOOL,
} type_t;

/* return the type of x */
type_t val_typeof(val_t x);

/**
 * Wrap/unwrap values
 *
 * The behavior of unwrap functions are undefined on type mismatch.
 */
int64_t val_unwrap_int(val_t x);
val_t val_wrap_int(int64_t i);

int val_unwrap_bool(val_t x);
val_t val_wrap_bool(int b);

#endif

values.c

#include "types.h"
#include "values.h"

type_t val_typeof(val_t x)
{
  if ((int_type_mask & x) == int_type_tag)
    return T_INT;
  
  switch (x) {
  case val_true:
  case val_false:
    return T_BOOL;
  }
  
  return T_INVALID;
}

int64_t val_unwrap_int(val_t x)
{
  return x >> int_shift;
}
val_t val_wrap_int(int64_t i)
{
  return (i << int_shift) | int_type_tag;
}
int val_unwrap_bool(val_t x)
{
  return x == val_true;
}
val_t val_wrap_bool(int b)
{
  return b ? val_true : val_false;
}

The main function remains largely the same although now we use val_t in place of int64_t:

main.c

#include <stdio.h>
#include "values.h"
#include "print.h"

val_t entry();

int main(int argc, char** argv)
{
  val_t result;

  result = entry();
  print_result(result);
  putchar('\n');
  return 0;
}

And finally, print_result is updated to do a case analysis on the type of the result and print accordingly:

print.c

#include <stdio.h>
#include <inttypes.h>
#include "values.h"

void print_result(val_t x)
{
  switch (val_typeof(x)) {
  case T_INT:
    printf("%" PRId64, val_unwrap_int(x));
    break;
  case T_BOOL:    
    printf(val_unwrap_bool(x) ? "#t" : "#f");
    break;
  case T_INVALID:
    printf("internal error");
  }
}

7.6 Correctness and testing🔗

We can randomly generate Dupe programs. The problem is many randomly generated programs will have type errors in them:

Examples
> (require "random.rkt")
> (random-expr)
'#s(Prim1 zero? #s(Prim1 sub1 #s(Lit #t)))
> (random-expr)
'#s(If
    #s(Prim1
       sub1
       #s(Prim1 zero? #s(Prim1 add1 #s(Lit #t))))
    #s(Prim1
       add1
       #s(If
          #s(Prim1 sub1 #s(Lit -2))
          #s(Lit #t)
          #s(Lit #f)))
    #s(Lit #f))
> (random-expr)
'#s(Lit #t)
> (random-expr)
'#s(Lit -17)
> (random-expr)
'#s(Lit 2)
> (random-expr)
'#s(Lit -10)

When interpreting programs with type errors, we get Racket errors, i.e. the Racket functions used in the implementation of the interpreter will signal an error:

Examples
> (interp (parse '(add1 #f)))
add1: contract violation
  expected: number?
  given: #f
> (interp (parse '(if (zero? #t) 7 8)))
zero?: contract violation
  expected: number?
  given: #t

On the other hand, the compiler may produce bits that are illegal or, even worse, simply do something by misinterpreting the meaning of the bits:

Examples
> (interp-compile (parse '(add1 #f)))
invalid bits
> (interp-compile (parse '(if (zero? #t) 7 8)))
8

This complicates testing the correctness of the compiler. Consider our usual appraoch:

Examples
> (define (check-correctness e)
(check-equal? (interp-compile e)
(interp e)))
> (check-correctness (parse '(add1 7)))

This isn’t a counter-example to correctness because '(add1 #f) is not meaningful according to the semantics. Consequently the interpreter and compiler are free to do anything on this input.

Since we know Racket will signal an error when the interpreter tries to interpret a meaningless expression, we can write an alternate check-correctness function that first runs the interpreter with an exception handler installed. Should an error occur, the test is ignored, otherwise the value produced is compared to that of the compiler:

Examples
> (define (check-correctness e)
    (with-handlers ([exn:fail? void])
      (let ((v (interp e)))
        (check-equal? v (interp-compile e)))))
> (check-correctness (parse '(add1 7)))
> (check-correctness (parse '(add1 #f)))

Using this approach, we check the equivalence of the results only when the interpreter runs without causing an error.

contents ← prev up next →

	Syllabus
	Texts
	Schedule
	Notes
	Assignments
	Midterms
	Project
	Software

1	What is a Compiler?
2	From OCaml to Racket
3	a86: a Little Assembly Language
4	Abscond: a language of numbers
5	Blackmail: incrementing and decrementing
6	Con: branching with conditionals
7	Dupe: a duplicity of types
8	Dodger: addressing a lack of character
9	Evildoer: change the world a couple nibbles at a time
10	Extort: when errors exist
11	Fraud: local binding, variables, and binary operations
12	Hustle: heaps and lists
13	Hoax: vectors and strings
14	Iniquity: function definitions and calls
15	Jig: jumping to tail calls
16	Knock: pattern matching
17	Loot: lambda the ultimate
18	Mug: symbols and interned string literals
19	Mountebank: quote and compound static data
20	Neerdowell: structures
21	Outlaw: self-hosting

7.1	Integers and Booleans
7.2	Meaning of Dupe programs
7.3	Ex uno plures: Out of One, Many
7.4	An Example of Dupe compilation
7.5	A Compiler for Dupe
7.6	Correctness and testing