# Implementing trees

In Assignment3 we proposed a very ad-hoc-ish implementation of trees. It had the virtue of constructing trees entirely out of lists, which meant that there was no need to define any special tree-construction functions.

Think about how you'd implement them in a more principled way. You could use any of the version 1 -- version 5 implementation of lists as a model.

To keep things simple, we'll stick to binary trees. A node will either be a leaf of the tree, or it will have exactly two children.

There are two kinds of trees to think about. In one sort of tree, it's only the tree's leaves that are labeled:

    .
/ \
.   3
/ \
1   2


The inner, non-leaf nodes of the tree may have associated values. But if so, what values they are will be determinable from the structure of the tree and the values of the node's left and right children. So the inner nodes don't need their own independent labels.

In another sort of tree, the tree's inner nodes are also labeled:

    4
/ \
2   5
/ \
1   3


When you want to efficiently arrange an ordered collection, so that it's easy to do a binary search through it, this is the way you usually structure your data.

These latter sorts of trees can helpfully be thought of as ones where only the inner nodes are labeled. Leaves can be thought of as special, dead-end branches with no label:

       .4.
/   \
2     5
/ \   / \
1   3  x x
/ \ / \
x  x x  x


In our earlier discussion of lists, we said they could be thought of as data structures of the form:

Empty_list | Non_empty_list (its_head, its_tail)


And that could in turn be implemented in v2 form as:

the_list (\head tail. non_empty_handler) empty_handler


Similarly, the leaf-labeled tree:

    .
/ \
.   3
/ \
1   2


can be thought of as a data structure of the form:

Leaf (its_label) | Non_leaf (its_left_subtree, its_right_subtree)


and that could be implemented in v2 form as:

the_tree (\left right. non_leaf_handler) (\label. leaf_handler)


And the node-labeled tree:

       .4.
/   \
2     5
/ \   / \
1   3  x x
/ \ / \
x  x x  x


can be thought of as a data structure of the form:

Leaf | Non_leaf (its_left_subtree, its_label, its_right_subtree)


and that could be implemented in v2 form as:

the_tree (\left label right. non_leaf_handler) leaf_result


What would correspond to "folding" a function f and base value z over a tree? Well, if it's an empty tree:

x


we should presumably get back z. And if it's a simple, non-empty tree:

  1
/ \
x   x


we should expect something like f z 1 z, or f <result of folding f and z over left subtree> label_of_this_node <result of folding f and z over right subtree>. (It's not important what order we say f has to take its arguments in.)

A v3-style implementation of node-labeled trees, then, might be:

let empty_tree = \f z. z  in
let make_tree = \left label right. \f z. f (left f z) label (right f z)  in
...


Think about how you might implement other tree operations, such as getting the label of the root (topmost node) of a tree; extracting the left subtree of a node; and so on.

Think about different ways you might implement leaf-labeled trees.

If you had one tree and wanted to make a larger tree out of it, adding in a new element, how would you do that?

When using trees to represent linguistic structures, one doesn't have latitude about how to build a larger tree. The linguistic structure you're trying to represent will determine where the new element should be placed, and where the previous tree should be placed.

However, when using trees as a computational tool, one usually does have latitude about how to structure a larger tree---in the same way that we had the freedom to implement our sets with lists whose members were just appended in the order we built the set up, or instead with lists whose members were ordered numerically.

When building a new tree, one strategy for where to put the new element and where to put the existing tree would be to always lean towards a certain side. For instance, to add the element 2 to the tree:

  1
/ \
x   x


we might construct the following tree:

  1
/ \
x   2
/ \
x   x


or perhaps we'd do it like this instead:

  2
/ \
x   1
/ \
x   x


However, if we always leaned to the right side in this way, then the tree would get deeper and deeper on that side, but never on the left:

  1
/ \
x   2
/ \
x   3
/ \
x   4
/ \
x   5
/ \
x   x


and that wouldn't be so useful if you were using the tree as an arrangement to enable binary searches over the elements it holds. For that, you'd prefer the tree to be relatively "balanced", like this:

       .4.
/   \
2     5
/ \   / \
1   3  x x
/ \ / \
x  x x  x


Do you have any ideas about how you might efficiently keep the new trees you're building pretty "balanced" in this way?

This is a large topic in computer science. There's no need for you to learn the various strategies that they've developed for doing this. But thinking in broad brush-strokes about what strategies might be promising will help strengthen your understanding of trees, and useful ways to implement them in a purely functional setting like the lambda calculus.