On this page:
7.1 Conditional execution
7.2 Meaning of Con programs
7.3 An Example of Con compilation
7.4 A Compiler for Con
7.5 Correctness and random testing
8.18

7 Con: branching with conditionals🔗

image Source code.

When you come to a fork in the road, take it.

    7.1 Conditional execution

    7.2 Meaning of Con programs

    7.3 An Example of Con compilation

    7.4 A Compiler for Con

    7.5 Correctness and random testing

7.1 Conditional execution🔗

Let’s now consider adding a notion of conditionals to our target language.

We’ll call it Con.

We will use the following concrete syntax: (if (zero? e0) e1 e2).

This leads to the following grammar for concrete Con:

image

And abstract grammar:

con/syntax/ast.rkt

  #lang racket
  (provide Lit Prim1
           IfZero)
   
  ;; type Expr = (Lit Integer)
  ;;           | (Prim1 Op1 Expr)
  ;;           | (IfZero Expr Expr Expr)
  ;; type Op1 = 'add1 | 'sub1
  (struct Lit (i) #:prefab)
  (struct Prim1 (p e) #:prefab)
  (struct IfZero (e1 e2 e3) #:prefab)
   
   

The parser is similar to what we’ve seen before:

con/syntax/parse.rkt

  #lang racket
  (provide parse)
  (require "ast.rkt")
   
  ;; S-Expr -> Expr
  (define (parse s)
    (match s
      [(? exact-integer?) (Lit s)]
      [(list-rest (? symbol? k) sr)
       (match k
         [(? op1? o)
          (match sr
            [(list s1)
             (Prim1 o (parse s1))]
            [_ (error "op1: bad syntax" s)])]
         ['if
          (match sr
            [(list (list 'zero? s1) s2 s3)
             (IfZero (parse s1) (parse s2) (parse s3))]
            [_ (error "if: bad syntax" s)])]
         [_ (error "parse error" s)])]
      [_ (error "parse error" s)]))
   
  (define (op1? x)
    (memq x '(add1 sub1)))
   
   

Examples

> (parse '(if (zero? 42) 1 2))

'#s(IfZero #s(Lit 42) #s(Lit 1) #s(Lit 2))

> (parse '(if (zero? (sub1 1)) (add1 2) (sub1 7)))

'#s(IfZero

    #s(Prim1 sub1 #s(Lit 1))

    #s(Prim1 add1 #s(Lit 2))

    #s(Prim1 sub1 #s(Lit 7)))

> (parse '(if (zero? 0) (if (zero? 1) 2 3) 4))

'#s(IfZero

    #s(Lit 0)

    #s(IfZero #s(Lit 1) #s(Lit 2) #s(Lit 3))

    #s(Lit 4))

7.2 Meaning of Con programs🔗

The meaning of Con programs depends on the form of the expression and the new form is an if-expression.

  • the meaning of a if expression (IfZero e0 e1 e2) is the meaning of e1 if the meaning of e0 is 0 and is the meaning of e2 otherwise.

Let’s consider some examples (using concrete notation):

The semantics is defined by extending the interpreter to add a case for if-expressions, which recursively evaluates the test expression and branches based on its value.

con/interpreter/interp.rkt

  #lang racket
  (provide interp)
  (require "../syntax/ast.rkt")
  (require "interp-prim.rkt")
   
  ;; Expr -> Integer
  (define (interp e)
    (match e
      [(Lit i) i]
      [(Prim1 p e)
       (interp-prim1 p (interp e))]
      [(IfZero e1 e2 e3)
       (if (zero? (interp e1))
           (interp e2)
           (interp e3))]))
   
   

We can confirm the interpreter computes the right result for the examples given earlier (using parse to state the examples with concrete notation):

Examples

> (interp (parse '(if (zero? 0) (add1 2) 4)))

3

> (interp (parse '(if (zero? 1) (add1 2) 4)))

4

> (interp (parse '(if (zero? (if (zero? (sub1 1)) 1 0)) (add1 2) 4)))

4

> (interp (parse '(if (zero? (add1 0)) (add1 2) (if (zero? (sub1 1)) 1 0))))

1

7.3 An Example of Con compilation🔗

Suppose we want to compile (if (zero? 8) 2 3)...

We already know how to compile the 8, 2, and 3 part.

What needs to happen?

  • Execute the code for 8 leaving the result in rax,

  • check whether rax holds zero,

  • if it does, execute the code for 2,

  • if it doesn’t, execute the code for 3.

We can determine whether 8 evaluates to 0 using a comparison instruction: (Cmp rax 0). To do the conditional execution, we will need to jump to different parts of the code to either execute the code for 2 or 3. There are several ways we could accomplish this, but we take the following approach: immediately after the comparison, do a conditional jump to the code for the else branch when not zero, otherwise fall through to the code for the then branch. At the end of the then branch, unconditionally jump to the end of the code.

To accomplish this, we will need two labels: one for the else branch code and one for the end of the code.

In total, the code for this example would look like:

(list (Mov rax 8)
      (Cmp rax 0)
      (Jne 'else)
      (Mov rax 2)
      (Jmp 'end)
      (Label 'else)
      (Mov rax 3)
      (Label 'end))

Since (if (zero? 8) 2 3) evaluates to 3, we expect that this code will leave 3 in the rax after executing. Step through the code in your mind and try to confirm this. After that, you can run the code using asm-interp to confirm mechanically.

Examples

> (asm-interp
    (prog (Global 'entry)
          (Label 'entry)
          (Mov rax 8)
          (Cmp rax 0)
          (Jne 'else)
          (Mov rax 2)
          (Jmp 'end)
          (Label 'else)
          (Mov rax 3)
          (Label 'end)
          (Ret)))

3

This does indeed seem to do the right thing. If changed the example to (if (zero? 0) 2 3) we would expect to get 2. We can revise the code by changing what we initially put in rax and confirm:

Examples

> (asm-interp
    (prog (Global 'entry)
          (Label 'entry)
          (Mov rax 0) ; changed from 8 to 0
          (Cmp rax 0)
          (Jne 'else)
          (Mov rax 2)
          (Jmp 'end)
          (Label 'else)
          (Mov rax 3)
          (Label 'end)
          (Ret)))

2

From these two concrete examples, it would seem we’re well on our way to having a compiler. The way to compile (if (zero? e0) e1 e2) is:

(seq (compile-e e0)
     (Cmp rax 0)
     (Jne 'else)
     (compile-e e1)
     (Jmp 'end)
     (Label 'else)
     (compile-e e2)
     (Label 'end))

But there is a lurking problem here. See if you can spot it.

Although our sketch works for the simple examples we used, a more compilicated example can demonstrate this compiler is incorrect. In fact, we can write examples that will cause the compiler to error instead of producing code, which definitely violates our specification of correctness.

The issue is that the compiler needs to be correct for all possible subexpressions e0, e1, and e2. While it works for things like 8, 2, and 3, it will not work for certain kinds of subexpressions, namely subexpressions that themselves include conditional expressions. To see write out what you expect (if (zero? 8) (if (zero? 0) 2 3) 3) should compile to. What do you notice?

7.4 A Compiler for Con🔗

The problem with our sketch for compiling conditional expressions is the choice of label names. We used 'else and 'end as labels that get defined in the output of compiling a conditional. This works fine so long as there’s only one conditional expression, but the moment there is more than one, we have label confusion: there will be multiple ambiguous definitions of 'else and 'end. The assembly will reject such programs and prog will error. This is a real bug because clearly nested conditionals are valid Con expressions and our interpreter handles them just fine, so our compiler is obligated to produce correct code. In this case, the compiler doesn’t even produce code, much less correct code.

The fix is to generate new label names ever time we compile a conditional. These fresh names will avoid clashes. In Racket, the gensym function can be used to generate symbols that have not appeared before, which is exactly the property we need here. What symbol will Racket choose? We can try things out in the REPL to see.

Examples

> (gensym)

'g6998

> (gensym)

'g6999

> (gensym)

'g7000

You can see that Racket is just giving us arbitrary symbol names, but they have the important property that are unique.

Although not technically necessary it might be nice to make the symbols more readable by telling gensym a prefix to use:

Examples

> (gensym 'ifz)

'ifz7001

> (gensym 'ifz)

'ifz7002

This way we can leave hints in the assembly code about where the labels came from.

Now for our compiler, we can do the following:

(let ((l1 (gensym 'ifz))
      (l2 (gensym) 'ifz))
  (seq (compile-e e0)
       (Cmp rax 0)
       (Jne l1)
       (compile-e e1)
       (Jmp l2)
       (Label l1)
       (compile-e e2)
       (Label l2)))

This will generate a new symbol and bind it to the variable l1 and similiarly for l2.

We’re now ready to write the complete compiler for Con.

con/compiler/compile.rkt

  #lang racket
  (provide compile
           compile-e)
   
  (require "../syntax/ast.rkt")
  (require "compile-ops.rkt")
  (require a86/ast a86/registers)
   
  ;; Expr -> Asm
  (define (compile e)
    (prog (Global 'entry)
          (Label 'entry)
          (compile-e e)
          (Ret)))
   
  ;; Expr -> Asm
  (define (compile-e e)
    (match e
      [(Lit i) (seq (Mov rax i))]
      [(Prim1 p e) (compile-prim1 p e)]
      [(IfZero e1 e2 e3) (compile-ifzero e1 e2 e3)]))
   
  ;; Op1 Expr -> Asm
  (define (compile-prim1 p e)
    (seq (compile-e e)
         (compile-op1 p)))
   
  ;; Expr Expr Expr -> Asm
  (define (compile-ifzero e1 e2 e3)
    (let ((l1 (gensym 'ifz))
          (l2 (gensym 'ifz)))
      (seq (compile-e e1)
           (Cmp rax 0)
           (Jne l1)
           (compile-e e2)
           (Jmp l2)
           (Label l1)
           (compile-e e3)
           (Label l2))))
   
   

Let’s take a look at a few examples:

Examples

> (define (show s)
    (compile-e (parse s)))
> (show '(if (zero? 8) 2 3))

(list

 (Mov 'rax 8)

 (Cmp 'rax 0)

 (Jne 'ifz7005)

 (Mov 'rax 2)

 (Jmp 'ifz7006)

 (Label 'ifz7005)

 (Mov 'rax 3)

 (Label 'ifz7006))

> (show '(if (zero? 0) 1 2))

(list

 (Mov 'rax 0)

 (Cmp 'rax 0)

 (Jne 'ifz7007)

 (Mov 'rax 1)

 (Jmp 'ifz7008)

 (Label 'ifz7007)

 (Mov 'rax 2)

 (Label 'ifz7008))

> (show '(if (zero? 0) (if (zero? 0) 8 9) 2))

(list

 (Mov 'rax 0)

 (Cmp 'rax 0)

 (Jne 'ifz7009)

 (Mov 'rax 0)

 (Cmp 'rax 0)

 (Jne 'ifz7011)

 (Mov 'rax 8)

 (Jmp 'ifz7012)

 (Label 'ifz7011)

 (Mov 'rax 9)

 (Label 'ifz7012)

 (Jmp 'ifz7010)

 (Label 'ifz7009)

 (Mov 'rax 2)

 (Label 'ifz7010))

> (show '(if (zero? (if (zero? 2) 1 0)) 4 5))

(list

 (Mov 'rax 2)

 (Cmp 'rax 0)

 (Jne 'ifz7015)

 (Mov 'rax 1)

 (Jmp 'ifz7016)

 (Label 'ifz7015)

 (Mov 'rax 0)

 (Label 'ifz7016)

 (Cmp 'rax 0)

 (Jne 'ifz7013)

 (Mov 'rax 4)

 (Jmp 'ifz7014)

 (Label 'ifz7013)

 (Mov 'rax 5)

 (Label 'ifz7014))

And confirm they are running as expected:

Examples

> (asm-interp (compile (parse '(if (zero? 8) 2 3))))

3

> (asm-interp (compile (parse '(if (zero? 0) 1 2))))

1

> (asm-interp (compile (parse '(if (zero? 0) (if (zero? 0) 8 9) 2))))

8

> (asm-interp (compile (parse '(if (zero? (if (zero? 2) 1 0)) 4 5))))

4

7.5 Correctness and random testing🔗

The statement of correctness follows the same outline as before:

Compiler Correctness: For all e Expr, (interp e) equals (asm-interp (compile e)).

Again, we formulate correctness as a property that can be tested:

con/correct.rkt

  #lang racket
  (provide check-compiler)
  (require rackunit)
  (require "interpreter/interp.rkt")
  (require "compiler/compile.rkt")
  (require a86/interp)
   
  ;; Expr -> Void
  (define (check-compiler e)
    (check-equal? (interp e)
                  (asm-interp (compile e))))
   
   

Generating random Con programs is essentially the same as Blackmail programs, and are provided in a random.rkt module.

Examples

> (require "syntax/random.rkt")
> (random-expr)

'#s(Prim1

    sub1

    #s(IfZero

       #s(Lit -9)

       #s(IfZero

          #s(Prim1 add1 #s(Lit 1))

          #s(IfZero

             #s(Lit -2)

             #s(Lit -1)

             #s(Lit -3))

          #s(Lit 2))

       #s(Lit -3)))

> (random-expr)

'#s(IfZero

    #s(Prim1

       sub1

       #s(Prim1 sub1 #s(Prim1 sub1 #s(Lit 2))))

    #s(IfZero

       #s(Prim1 sub1 #s(Lit 0))

       #s(IfZero

          #s(Prim1 add1 #s(Lit 3))

          #s(IfZero

             #s(Lit 2)

             #s(Lit -1)

             #s(Lit 4))

          #s(IfZero

             #s(Lit 2)

             #s(Lit -3)

             #s(Lit -1)))

       #s(Lit 2))

    #s(IfZero

       #s(Prim1 add1 #s(Prim1 add1 #s(Lit 3)))

       #s(Lit -8)

       #s(Prim1 sub1 #s(Lit -2))))

> (random-expr)

'#s(Prim1

    sub1

    #s(IfZero

       #s(Prim1 sub1 #s(Prim1 sub1 #s(Lit 2)))

       #s(Lit 1)

       #s(Prim1 add1 #s(Prim1 sub1 #s(Lit 1)))))

> (random-expr)

'#s(Prim1

    add1

    #s(Prim1

       add1

       #s(Prim1

          add1

          #s(IfZero

             #s(Lit -1)

             #s(Lit -2)

             #s(Lit 1)))))

> (for ([i (in-range 10)])
    (check-compiler (random-expr)))

This compiler has continues to have the issues identified in A Broken Compiler, but appears correct in its implementation of conditional expressions.