week10

Original Calculator
Adding Booleans and Pairs
Adding Function Values
Adding Recursive Functions
Adding Mutable Cells
Adding Mutable Pairs
Adding Implicit Mutation
Adding Aliasing and Passing by Reference

In class on November 22, we just reviewed mutation and aliasing, especially in relation to the Groenendijk, Stokhof and Veltman paper. Below, we bring in more material. This takes the form of making gradual improvements to the calculator we developed in week7. Part of what we do below is a review of the mutation techniques developed in Week9; but we also do other things we haven't discussed in class, like defining new complex functions in our calculator.

Original Calculator

In a real programming application, one would usually start with a string that needs to be parsed and interpreted, such as:

let x = 1 in let y = x + 2 in x * y

The parsing phase converts this to an "abstract syntax tree" (AST), which in this case might be:

Let ('x', Constant 1,
     Let ('y', Addition (Variable 'x', Constant 2),
          Multiplication (Variable 'x', Variable 'y')))

Then the interpreter (or "evaluator") would convert that AST into an "expressed value": in this case, to the integer 3. We're not concerning ourselves with the parsing phase here, so we're just thinking about how to interpret expressions that are already in AST form.

The language we had in week 7 looked like this:

type term =
  Constant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
;;

and the evaluation function looked like this:

let rec eval (t : term) (e: (char * int) list) = match t with
  Constant x -> x
| Multiplication (t1, t2) -> (eval t1 e) * (eval t2 e)
| Addition (t1, t2) -> (eval t1 e) + (eval t2 e)
| Variable c ->
    (* lookup the value of c in the current environment
       This will fail if c isn't assigned anything by e *)
    List.assoc c e
| Let (c, t1, t2) ->
    (* evaluate t2 in a new environment where c has been associated
       with the result of evaluating t1 in the current environment *)
    eval t2 ((c, eval t1 e) :: e)
;;

Adding Booleans and Pairs

Let's tweak this a bit.

First, let's abstract away from the assumption that our terms always evaluate to ints. Let's suppose they evaluate to a more general type, which might have an int payload, or might have, for example, a bool payload.

type expressed_value = Int of int | Bool of bool;;

We'll add one boolean predicate, Iszero, and an If... construction.

Similarly, we might allow for some terms to express pairs of other terms:

type expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value;;

We'd then want to add the ability to construct pairs, and extract their components.

We won't try here to catch any type errors, such as attempts to add a bool to an int, or attempts to check whether a bool iszero. Neither will we try here to monadize anything: these will be implementations of a calculator with all the plumbing exposed. What we will do is add more and more features to the calculator.

We'll switch over to using variable g for assignment functions, which is a convention many of you seem familiar with. As we mentioned a few times in week9, for some purposes it's easier to implement environment or assignment functions as functions from chars to ints (or whatever variables are bound to), rather than as lists of pairs. However, we'll stick with this implementation for now. We will however abstract out the type that the variables are bound to. For now, we'll suppose that they're bound to the same types that terms can express.

type bound_value = expressed_value;;
type assignment = (char * bound_value) list;;

Here's where we should be now. We'll work with the language:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
;;

Here is our evaluation function. We expand some of the clauses and rename a few variables for clarity. Our implementation should make it clear how to add additional constants or native predicates, such as a Second predicate for extracting the second element of a pair.

let rec eval (t : term) (g : assignment) = match t with
  Intconstant x -> Int x
| Multiplication (t1, t2) ->
    (* we don't handle cases where the subterms don't evaluate to Ints *)
    let Int i1 = eval t1 g
    in let Int i2 = eval t2 g
    (* Multiplication (t1, t2) should evaluate to an Int *)
    in Int (i1 * i2)
| Addition (t1, t2) ->
    let Int i1 = eval t1 g
    in let Int i2 = eval t2 g
    in Int (i1 + i2)
| Variable (var) ->
    (* we don't handle cases where g doesn't bind var to any value *)
    List.assoc var g
| Let (var_to_bind, t2, t3) ->
    (* evaluate t3 under a new assignment where var_to_bind has been bound to
       the result of evaluating t2 under the current assignment *)
    let value2 = eval t2 g
    in let g' = (var_to_bind, value2) :: g
    in eval t3 g'
| Iszero (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to an Int *)
    let Int i1 = eval t1 g
    (* Iszero t1 should evaluate to a Bool *)
    in Bool (i1 = 0)
| If (t1, t2, t3) ->
    (* we don't handle cases where t1 doesn't evaluate to a boolean *)
    let Bool b1 = eval t1 g
    in if b1 then eval t2 g
    else eval t3 g
| Makepair (t1, t2) ->
    let value1 = eval t1 g
    in let value2 = eval t2 g
    in Pair (value1, value2)
| First (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to a Pair *)
    let Pair (value1, value2) = eval t1 g
    in value1
;;

The complete code is available here.

Adding Function Values

Now we want to add function values to our language, so that we can interpret (the abstract syntax trees of) expressions like this:

let x = 1 in let f = lambda y -> y + x in apply f 2

What changes do we need to handle this?

We can begin with our language:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
;;

Next, we need to expand our stock of expressed_values to include function values as well. How should we think of these? We've several times mentioned the issue of how to handle free variables in a function's body, like the x in lambda y -> y + x. We'll follow the usual functional programming standard for these (known as "lexical scoping"), which keeps track of what value x has in the function declaration's lexical environment. That shouldn't get shadowed by any different value x may have when the function value is later applied. So:

let x = 1 in let f = lambda y -> y + x in let x = 2 in apply f 2

should evaluate to 3 not to 4. To handle this, the function values we construct need to keep track of the present values of all free variables in the function's body. The combination of the function's body and the values of its free variables is called a "function closure." We'll implement these closures in a straightforward though inefficient way: we'll just stash away a copy of the assignment in effect when the function value is being constructed. Our function values also need to keep track of which of their variables are to be bound to the arguments they get applied to. All together, then, we need three pieces of information: which variables are to be bound to arguments, what the function's body is, and something that keeps track of the right values for the free variables in the function body. We'll pack this all together into an additional variant for our expressed_value type:

type expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value | Closure of char * term * assignment;;

We'd like to define bound_values and assignments just as before:

type bound_value = expressed_value;;
type assignment = (char * bound_value) list;;

However, note that we have a recursive relation between these types: expressed_value is defined partly in terms of assignment, which is defined partly in terms of bound_value, which is equivalent to expressed_value. In OCaml one has to define such types using the following form:

type bound_value = expressed_value
and assignment = (char * bound_value) list
and expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value | Closure of char * term * assignment;;

Now our evaluation function needs two further clauses to interpret the two new expression forms Lambda (...) and Apply (...):

let rec eval (t : term) (g : assignment) = match t with
...
| Lambda (arg_var, t2) -> Closure (arg_var, t2, g)
| Apply (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a function value *)
    let Closure (arg_var, body, savedg) = eval t1 g
    in let value2 = eval t2 g
    (* evaluate body under savedg, except with arg_var bound to value2 *)
    in let savedg' = (arg_var, value2) :: savedg
    in eval body savedg'
;;

The complete code is available here.

Adding Recursive Functions

There are different ways to include recursion in our calculator. First, let's imagine our language expanded like this:

let x = 1 in letrec f = lambda y -> if iszero y then x else y * apply f (y - 1) in apply f 3

where the AST would be:

Let ('x', Intconstant 1,
    Letrec ('f',
        Lambda ('y',
            If (Iszero (Variable 'y'),
                Variable 'x',
                Multiplication (Variable 'y',
                    Apply (Variable 'f',
                        Addition (Variable 'y', Intconstant (-1)))))),
        Apply (Variable 'f', Intconstant 3)))

Here is the expanded definition for our language type:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
| Letrec of (char * term * term)
;;

Now consider what we'll need to do when evaluating a term like Letrec ('f', Lambda (...), t2). The subterm Lambda (...) will evaluate to something of the form Closure ('y', body, savedg), where Variable 'f' may occur free in body. What we'll want to do is to ensure that when body is applied, it's applied using not the assignment savedg but a modified assignment savedg' which binds 'f' to this very function value. That is, we want to bind 'f' not to:

Closure ('y', body, savedg)

but instead to:

let orig_closure = Closure ('y', body, savedg)
in let savedg' = ('f', orig_closure) :: savedg
in let new_closure = Closure ('y', body, savedg')
in new_closure

Except, this isn't quite right. It's almost what we want, but not exactly. Can you see the flaw?

The flaw is this: inside new_closure, what is 'f' bound to? It's bound by savedg' to orig_closure, which in turn leaves 'f' free (or bound to whatever existing value it had according to savedg). This isn't what we want. It'll break if we need to make applications of Variable 'f' which recurse more than once.

What we really want is for 'f' to be bound to new_closure, something like this:

let rec new_closure = Closure ('y', body, ('f', new_closure) :: savedg)
in new_closure

And as a matter of fact, OCaml does permit us to recursively define cyclical lists in this way. So a minimal change to our evaluation function would suffice:

let rec eval (t : term) (g : assignment) = match t with
...
| Letrec (var_to_bind, t2, t3) ->
    (* we don't handle cases where t2 doesn't evaluate to a function value *)
    let Closure (arg_var, body, savedg) = eval t2 g
    in let rec new_closure = Closure (arg_var, body, (var_to_bind, new_closure) :: savedg)
    in let g' = (var_to_bind, new_closure) :: g
    in eval t3 g'
;;

However, this is a somewhat exotic ability in a programming language, so it would be good to work out how to interpret Letrec (...) forms without relying on it.

If we implemented assignments as functions rather than as lists of pairs, the corresponding move would be less exotic. In that case, our Let (...) and Letrec (...) clauses would look something like this:

| Let (var_to_bind, t2, t3) ->
    let value2 = eval t2 g
    in let g' = fun var -> if var = var_to_bind then value2 else g var
    in eval t3 g'
...
| Letrec (var_to_bind, t2, t3) ->
    let Closure (arg_var, body, savedg) = eval t2 g
    in let rec savedg' = fun var -> if var = var_to_bind then Closure (arg_var, body, savedg') else savedg var
    in let g' = fun var -> if var = var_to_bind then Closure (arg_var, body, savedg') else g var
    in eval t3 g'
;;

and this is just a run-of-the-mill use of recursive functions. However, for this exercise we'll continue using lists of pairs, and work out how to interpret Letrec (...) forms using them.

The way we'll do this is that, when we bind a variable to a value, we'll keep track of whether the term was bound via Let or Letrec. We'll rely on that to interpret pairs of terms like these differently:

Let ('f',
    Intconstant 1,
    Let ('f', Lambda ('y', Variable 'f')),
    ...)

Let ('f',
    Intconstant 1,
    Letrec ('f', Lambda ('y', Variable 'f')),
    ...)

In the first case, an application of Variable 'f' to any argument should evaluate to Int 1; in the second case, it should evaluate to the same function closure that Variable 'f' evaluates to. We'll keep track of which way a variable was bound by expanding our bound_value type:

type bound_value = Nonrecursive of expressed_value |
    Recursive_Closure of char * char * term * assignment
and assignment = (char * bound_value) list
and expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value | Closure of char * term * assignment;;

Since we're not permitting ourselves OCaml's ability to recursively define cyclical lists, we're not going to be able to update the saved assignment in a closure when that closure is recursively bound to a variable. Instead, we'll just make a note that variable 'f' is supposed to be the recursively bound one---by binding it not to Nonrecursive (Closure (arg_var, body, savedg)) but rather to Recursive_Closure ('f', arg_var, body, savedg). We'll do the work to make the saved assignment recursive in the right way later, when we evaluate Variable 'f'. The result will look like this:

let rec eval (t : term) (g : assignment) = match t with
...
| Variable (var) -> (
    (* we don't handle cases where g doesn't bind var to any value *)
    match List.assoc var g with
      | Nonrecursive value -> value
      | Recursive_Closure (self_var, arg_var, body, savedg) as rec_closure ->
          (* we update savedg to bind self_var to rec_closure here *)
          let savedg' = (self_var, rec_closure) :: savedg
          in Closure (arg_var, body, savedg')
    )
| Let (var_to_bind, t2, t3) ->
    (* evaluate t3 under a new assignment where var_to_bind has been bound to
       the result of evaluating t2 under the current assignment *)
    let value2 = eval t2 g
    (* we have to wrap value2 in Nonrecursive *)
    in let g' = (var_to_bind, Nonrecursive value2) :: g
    in eval t3 g'
...
| Lambda (arg_var, t2) -> Closure (arg_var, t2, g)
| Apply (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a function value *)
    let Closure (arg_var, body, savedg) = eval t1 g
    in let value2 = eval t2 g
    (* evaluate body under savedg, except with arg_var bound to Nonrecursive value2 *)
    in let savedg' = (arg_var, Nonrecursive value2) :: savedg
    in eval body savedg'
| Letrec (var_to_bind, t2, t3) ->
    (* we don't handle cases where t2 doesn't evaluate to a function value *)
    let Closure (arg_var, body, savedg) = eval t2 g
    (* evaluate t3 under a new assignment where var_to_bind has been recursively bound to that function value *) 
    in let g' = (var_to_bind, Recursive_Closure (var_to_bind, arg_var, body, savedg)) :: g
    in eval t3 g'
;;

The complete code is available here.

Adding Mutable Cells

Next, we'll add mutable cells (explicit-style mutation) to our calculator, as we did in week9.

We'll add a few more syntactic forms to the language:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
| Letrec of (char * term * term)
| Newref of term
| Deref of term
| Setref of (term * term)
;;

And we now have to allow for Mutcells as an additional kind of expressed_value. These are implemented as wrappers around an index into a store:

type index = int;;

type bound_value = Nonrecursive of expressed_value |
    Recursive_Closure of char * char * term * assignment
and assignment = (char * bound_value) list
and expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value | Closure of char * term * assignment | Mutcell of index;;

type store = expressed_value list;;

Our evaluation function will now expect a store argument as well as an assignment, and will return an expressed_value * store pair:

let rec eval (t : term) (g : assignment) (s : store) = match t with
  Intconstant x -> (Int x, s)
  ...
| Variable (var) -> ((
    (* we don't handle cases where g doesn't bind var to any value *)
    match List.assoc var g with
      | Nonrecursive value -> value
      | Recursive_Closure (self_var, arg_var, body, savedg) as rec_closure ->
          (* we update savedg to bind self_var to rec_closure here *)
          let savedg' = (self_var, rec_closure) :: savedg
          in Closure (arg_var, body, savedg')
    ), s)
  ...
| Lambda (arg_var, t2) -> (Closure (arg_var, t2, g), s)
  ...

also, we'll need to be sure to thread the store argument through the evaluation of any subterms, as here:

...
| Multiplication (t1, t2) ->
    (* we don't handle cases where the subterms don't evaluate to Ints *)
    let (Int i1, s') = eval t1 g s
    in let (Int i2, s'') = eval t2 g s'
    (* Multiplication (t1, t2) should evaluate to an Int *)
    in (Int (i1 * i2), s'')
| Addition (t1, t2) ->
    let (Int i1, s') = eval t1 g s
    in let (Int i2, s'') = eval t2 g s'
    in (Int (i1 + i2), s'')
...
| Let (var_to_bind, t2, t3) ->
    (* evaluate t3 under a new assignment where var_to_bind has been bound to
       the result of evaluating t2 under the current assignment *)
    let (value2, s') = eval t2 g s
    (* we have to wrap value2 in Nonrecursive *)
    in let g' = (var_to_bind, Nonrecursive value2) :: g
    in eval t3 g' s'
| Iszero (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to an Int *)
    let (Int i1, s') = eval t1 g s
    (* Iszero t1 should evaluate to a Bool *)
    in (Bool (i1 = 0), s')
...
| Makepair (t1, t2) ->
    let (value1, s') = eval t1 g s
    in let (value2, s'') = eval t2 g s'
    in (Pair (value1, value2), s'')
| First (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to a Pair *)
    let (Pair (value1, value2), s') = eval t1 g s
    in (value1, s')
...
| Apply (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a function value *)
    let (Closure (arg_var, body, savedg), s') = eval t1 g s
    in let (value2, s'') = eval t2 g s'
    (* evaluate body under savedg, except with arg_var bound to Nonrecursive value2 *)
    in let savedg' = (arg_var, Nonrecursive value2) :: savedg
    in eval body savedg' s''
| Letrec (var_to_bind, t2, t3) ->
    (* we don't handle cases where t2 doesn't evaluate to a function value *)
    let (Closure (arg_var, body, savedg), s') = eval t2 g s
    (* evaluate t3 under a new assignment where var_to_bind has been recursively bound to that function value *) 
    in let g' = (var_to_bind, Recursive_Closure (var_to_bind, arg_var, body, savedg)) :: g
    in eval t3 g' s'
...

The clause for If (...) is notable:

...
| If (t1, t2, t3) ->
    (* we don't handle cases where t1 doesn't evaluate to a boolean *)
    let (Bool b1, s') = eval t1 g s
    (* note we thread s' through only one of the then/else clauses *)
    in if b1 then eval t2 g s'
    else eval t3 g s'
...

Now we need to formulate the clauses for evaluating the new forms Newref (...), Deref (...), and Setref (...).

...
| Newref (t1) ->
    let (value1, s') = eval t1 g s
    (* note that s' may be different from s, if t1 itself contained any mutation operations *)
    (* now we want to retrieve the next free index in s' *)
    in let new_index = List.length s'
    (* now we want to insert value1 there; the following is an easy but inefficient way to do it *)
    in let s'' = List.append s' [value1]
    (* now we return a pair of a wrapped new_index, and the new store *)
    in (Mutcell new_index, s'')
| Deref (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to a Mutcell *)
    let (Mutcell index1, s') = eval t1 g s
    (* note that s' may be different from s, if t1 itself contained any mutation operations *)
    in (List.nth s' index1, s')
| Setref (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a Mutcell *)
    let (Mutcell index1, s') = eval t1 g s
    (* note that s' may be different from s, if t1 itself contained any mutation operations *)
    in let (value2, s'') = eval t2 g s'
    (* now we create a list which is just like s'' except it has value2 in index1 *)
    in let rec replace_nth lst m =
        match lst with
        | [] -> failwith "list too short"
        | x::xs when m = 0 -> value2 :: xs
        | x::xs -> x :: replace_nth xs (m - 1)
    in let s''' = replace_nth s'' index1
    (* we'll arbitrarily return Int 42 as the expressed_value of a Setref operation *)
    in (Int 42, s''')
;;

The complete code is available here.

Adding Mutable Pairs

Suppose we wanted to work with pairs where we could mutate either component of the pair. (If you've got a copy of The Seasoned Schemer, which we recommended for the seminar, see the discussion of mutable lists at pp. 143-153.)

Well, we've already given ourselves pairs, and mutable cells, so we could just work here with pairs of mutable cells. But it might sometimes be more wieldy to work with a structure that fused these two structures together, to give us a mutable pair. With the mutable pair, we wouldn't ask for the first element, and then apply Deref to it to get the value it then temporarily contains. Instead, asking for the first element would constitute asking for the value the mutable pair then temporarily contains in its first position.

This means a mutable pair is an interesting hybrid between explicit-style and implicit-style mutation. Looked at one way, it's just a generalization of an explicit mutable cell: it's just that where the mutable cells we implemented before were boxes with only one position, now we have boxes with two positions. Looked at another way, though, mutable pairs are similar to implicit-style mutation: for we don't have separate ways of referring to the first position of the mutable pair, and its dereferenced value. Peeking at the first position just will be peeking at its current dereferenced value.

To keep our codebase smaller, we'll implement mutable pairs instead of, not in addition to, the mutable cells from the previous section. Also, we'll leave out the immutable pairs we've been working with up to this point; in this implementation, all pairs will be mutable.

This implementation will largely parallel the previous one. Here are the differences. First, we remove the Newref, Deref, and Setref forms from the language. Our existing form Makepair will serve to create mutable pairs, and so will take over a role analogous to Newref. Our existing form First will take over a role analogous to Deref. We'll introduce one new form Setfirst that will take over a role analogous to Setref:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
| Letrec of (char * term * term)
| Setfirst of (term * term)
;;

Our expressed_value type changes in two ways: first, we eliminate the Mutcell variant added in the previous implementation. Instead, we now have our Pair variant wrap indexes into the store:

type index = int;;

type bound_value = Nonrecursive of expressed_value |
    Recursive_Closure of char * char * term * assignment
and assignment = (char * bound_value) list
and expressed_value = Int of int | Bool of bool | Pair of index * index | Closure of char * term * assignment;;

type store = expressed_value list;;

Finally, here are the changed or added clauses to the evaluation function:

let rec eval (t : term) (g : assignment) (s : store) = match t with
...
| Makepair (t1, t2) ->
    let (value1, s') = eval t1 g s
    in let (value2, s'') = eval t2 g s'
    (* now we want to retrieve the next free index in s'' *)
    in let new_index = List.length s''
    (* now we want to insert value1 and value2 there; the following is an easy but inefficient way to do it *)
    in let s''' = List.append s'' [value1; value2]
    in (Pair (new_index, new_index + 1), s''')
| First (t1) ->
    (* we don't handle cases where t1 doesn't evaluate to a Pair *)
    let (Pair (index1, index2), s') = eval t1 g s
    (* note that s' may be different from s, if t1 itself contained any mutation operations *)
    in (List.nth s' index1, s')
...
| Setfirst (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a Pair *)
    let (Pair (index1, index2), s') = eval t1 g s
    (* note that s' may be different from s, if t1 itself contained any mutation operations *)
    in let (value2, s'') = eval t2 g s'
    (* now we create a list which is just like s'' except it has value2 in index1 *)
    in let rec replace_nth lst m =
        match lst with
        | [] -> failwith "list too short"
        | x::xs when m = 0 -> value2 :: xs
        | x::xs -> x :: replace_nth xs (m - 1)
    in let s''' = replace_nth s'' index1
    in (Int 42, s''')
;;

Compare these to the clauses for Newref, Deref, and Setref in the previous implementation.

The complete code is available here.

Adding Implicit Mutation

Next we implement implicit-style mutation, as we did in week9. Here we don't have any explicit reference cells or mutable pairs; we'll return pairs back to their original immutable form. Instead, all variables will have mutable bindings. New reference cells will be implicitly introduced by the Let form. They'll also be implicitly introduced by the Apply form---we didn't have function values on the table during the week9 discussion, so this didn't come up then. The reason we introduce new reference cells when Applying a function value to arguments is that we don't want mutation of those arguments inside the body of the function to propagate out and affect the reference cell that may have supplied the argument. When we call functions in this implementation, we just want to supply them with values, not with the reference cells we may be drawing those values from. Below, after we discuss aliases, we'll consider another strategy, where function bodies are given the ability to mutate the reference cells implicitly associated with the arguments they're supplied.

Our language for the present implementation will be the language for the calculator with recursive functions, with one added syntactic form, Change (...):

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
| Letrec of (char * term * term)
| Change of (char * term * term)
;;

In the present implementation, we separate the roles of the bound_value and expressed_value types. As we discussed in week9, our assignment will bind all variables to indexes in the store, and the latter will contain the expressed_values that the variables evaluate to. A consequence of this is that our definitions of the bound_value and expressed_value types no longer need to be mutually recursive:

type index = int;;

type bound_value = index;;
type assignment = (char * bound_value) list;;
type expressed_value = Int of int | Bool of bool | Pair of expressed_value * expressed_value | Closure of char * term * assignment;;

type store = expressed_value list;;

Our evaluation function still interacts with a store argument in much the same way it did with explicit-style mutation. The clause for Variable (...) works differently, because all expressed_values now need to be retrieved from the store:

let rec eval (t : term) (g : assignment) (s : store) = match t with
...
| Variable (var) ->
    (* we don't handle cases where g doesn't bind var to any value *)
    let index = List.assoc var g
    (* get value stored at location index in s *)
    in let value = List.nth s index
    in (value, s)
...

So this clause takes over the roles that were separately played by Variable and Deref in the calculator with mutable cells. The role played by Newref is absorbed into Let, Letrec, and Apply:

...
| Let (var_to_bind, t2, t3) ->
    let (value2, s') = eval t2 g s
    (* note that s' may be different from s, if t2 itself contained any mutation operations *)
    (* get next free index in s' *)
    in let new_index = List.length s'
    (* now we want to insert value2 there; the following is an easy but inefficient way to do it *)
    in let s'' = List.append s' [value2]
    (* bind var_to_bind to location new_index in the store *)
    in let g' = ((var_to_bind, new_index) :: g)
    in eval t3 g' s''
...
| Apply (t1, t2) ->
    (* we don't handle cases where t1 doesn't evaluate to a function value *)
    let (Closure (arg_var, body, savedg), s') = eval t1 g s
    in let (value2, s'') = eval t2 g s'
    (* evaluate body under savedg, except with arg_var bound to a new location containing value2 *)
    in let new_index = List.length s''
    in let s''' = List.append s'' [value2]
    in let savedg' = (arg_var, new_index) :: savedg
    in eval body savedg' s'''
...

Letrec requires some reworking from what we had before. Earlier, we resorted to a Recursive_Closure variant on bound_values because it gave us a non-exotic way to update the savedg component of a Closure to refer to a new_closure that contained that very updated savedg. Now that we we've got a mutation-supporting infrastructure in place, we can do this directly, without needing the unwieldy Recursive_Closure wrapper:

...
| Letrec (var_to_bind, t2, t3) ->
    (* we don't handle cases where t2 doesn't evaluate to a function value *)
    let (Closure (arg_var, body, savedg), s') = eval t2 g s
    in let new_index = List.length s'
    in let savedg' = (var_to_bind, new_index) :: savedg
    in let new_closure = Closure (arg_var, body, savedg')
    in let s'' = List.append s' [new_closure]
    in let g' = (var_to_bind, new_index) :: g
    in eval t3 g' s''
...

Finally, here is the clause for Change (...), which takes over the role earlier played by Setref:

...
| Change (var, t2, t3) ->
    (* we don't handle cases where g doesn't bind var to any value *)
    let index = List.assoc var g
    in let (value2, s') = eval t2 g s
    (* note that s' may be different from s, if t2 itself contained any mutation operations *)
    (* now we create a list which is just like s' except it has value2 at index *)
    in let rec replace_nth lst m =
        match lst with
        | [] -> failwith "list too short"
        | x::xs when m = 0 -> value2 :: xs
        | x::xs -> x :: replace_nth xs (m - 1)
    in let s'' = replace_nth s' index
    (* evaluate t3 using original assignment function and new store *)
    in eval t3 g s''
;;

Note that because the savedg component of a Closure keeps track of which indexes in the store---rather than which values---free variables were bound to, the values at those indexes can later be changed, and later applications of the Closure will use the changed values.

The complete code is available here.

Adding Aliasing and Passing by Reference

Next we'll add aliasing as described at the end of week9. We'll also add the ability to pass (implicit) reference cells as arguments to a function, which lets changes made within the function body be effective in the outside environment. When we discussed this in week9, we proposed a different syntactic form for the function values that get called in this way. Instead of:

let f = lambda (y) -> ...
...
in f x

one would write:

let f = lambda (alias y) -> ...
...
in f x

Real programming languages that have this ability, such as C++, do something analagous. Here the function is declared so that all of its applications are expected to alias the supplied argument. You can always work around that in a particular case, though, like this:

let f = lambda (alias y) -> ...
...
in let y = x ; creates new (implicit) reference cell with x's value
in f y

In our present framework, it will be easier to do things differently. We will introduce a new syntactic form at the location where a function value is applied, rather than in the function's declaration. We say:

Let ('f',
    Lambda ('y', ...),
    ...
    Apply(Variable 'f', Variable 'x')...)

for the familiar, passing-by-value behavior, and will instead say:

Let ('f',
    Lambda ('y', ...),
    ...
    Applyalias(Variable 'f', 'x')...)

for the proposed new, passing-by-reference behavior. (Besides being easier to implement here, this strategy also has the advantage of more closely aligning with the formal system Jim discusses in his "Hyper-evaluativity" paper.) Note that the second parameter to the Applyalias form is just 'x', not Variable 'x'. This is because (1) only variables are acceptable there, not arbitrary expressions, and (2) we don't need at that point to compute the variable's present value.

Here is our expanded language:

type term =
  Intconstant of int
| Multiplication of (term * term)
| Addition of (term * term)
| Variable of char
| Let of (char * term * term)
| Iszero of term
| If of (term * term * term)
| Makepair of (term * term)
| First of term
| Lambda of (char * term)
| Apply of (term * term)
| Letrec of (char * term * term)
| Change of (char * term * term)
| Alias of (char * char * term)
| Applyalias of (term * char)
;;

The definitions of index, bound_value, assignment, expressed_value, and store can remain as they were in the implementation of implicit-style mutation. Here are the changes to our evaluation function:

let rec eval (t : term) (g : assignment) (s : store) = match t with
...
| Alias (var_to_bind, orig_var, t3) ->
    (* we don't handle cases where g doesn't bind orig_var to any value *)
    let index = List.assoc orig_var g
    (* bind var_to_bind to the same index in the store *)
    in let g' = ((var_to_bind, index) :: g)
    in eval t3 g' s
| Applyalias (t1, var) ->
    (* we don't handle cases where t1 doesn't evaluate to a function value *)
    let (Closure (arg_var, body, savedg), s') = eval t1 g s
    (* we don't handle cases where g doesn't bind var to any value *)
    in let index = List.assoc var g
    (* evaluate body under savedg, except with arg_var bound to existing index *)
    in let savedg' = (arg_var, index) :: savedg
    in eval body savedg' s'
;;

The complete code is available here.