From Coding to Calculating
Like many aspiring software developers and computer scientists, I wanted to improve my art by going through Structure and Interpretation of Computer Programs. I fully expected to emerge from it all with Sherlock Holmes-esque abilities in programming. Yeah, that was about 7 years ago. Still, every so often, I’ll do some of the exercises and search online to see how my answers compared to others’.
Because I didn’t spend a ton of time on these problems, I usually only implemented a naive solution and could only marvel at others’ much cooler, more clever, more efficient solutions. But I could never be sure if they were equivalent. Or could I?
Here’s the example that inspired me to write this post, from SICP:
Exercise 2.28. Write a procedure
fringe
that takes as argument a tree (represented as a list) and returns a list whose elements are all the leaves of the tree arranged in left-to-right order. For example,
(define x (list (list 1 2) (list 3 4)))
(fringe x)
;; equals (1 2 3 4)
(fringe (list x x))
;; equals (1 2 3 4 1 2 3 4)
If you’re not a Scheme user, this function takes something that looks like ((1 2) (3 4))
and gives something that looks like (1 2 3 4)
. It “gets the numbers out of the parentheses”. The nifty thing about this function is that it will do this for arbitrarily-nested trees, not just one level deep.
This function also shows up in Paul Graham’s On Lisp, where it’s called flatten
. In Let Over Lambda, Doug Hoyte points out that since Lisp code is represented by Lisp trees, flatten
performs code-walking.
How do you solve a problem like this? One way is to reason about the kinds of values your tree/list can be, and what your function should do with the tree/list when it assumes those values. In Scheme, a list can represent a binary tree. If tr
is a tree, then (car tr)
gives you the left branch and (cdr tr)
gives you the right branch. So thinking recursively, let’s say someone’s already flatten
ed the left and right branch. What now? Well, just append the flattened-left branch to the flattened-right branch and you’re done.
I regard trees as being empty, being a leaf, or being a branch with two leaves. We covered the latter case just above, so let’s handle the other two:
-
Empty, represented by the empty list
()
. What does a flattened empty list look like? That’s right: just the empty list, so return that. -
A leaf. We want the function to return a list though, so return a list containing this value (i.e., wrap it in parentheses).
That’s all three cases! Here’s a naive solution in Common Lisp:
(defun flatten (tree)
(cond ((null tree) nil)
((atom tree) (cons tree nil))
(t (append (flatten (car tree))
(flatten (cdr tree))))))
Lisp code, so beautiful! Here, null
tests for the empty list and atom
tests to see if the tree is a leaf.
This is all well and good, and it satisfies the test cases from SICP. But if you search for other answers to this question, or even look at the definition in On Lisp, there’s one that looks a bit different:
(defun flatten (tree)
(labels ((rec (x acc)
(cond ((null x) acc)
((atom x) (cons x acc))
(t (rec (car x) (rec (cdr x) acc))))))
(rec tree nil)))
Here, labels
introduces a local helper function in Common Lisp called rec
that takes an accumulator. Except for this, it appears to be of the same “form” as the naive solution. It’s not immediately obvious to me how this second solution is equivalent to the first one. Even though I understood the logic of the second solution, I don’t think I would have come up with it on my own, even though I’m familiar with the accumulator technique.
I wanted to not have to be clever. Even more, I wanted to be able to prove the equivalence of any other version of this function. What I wanted was to derive these equivalences in the same way I proved theorems in geometry or performed manipulations in algebra.
Enter Haskell.
I had been reading about Haskell here and there, and I appreciated that you could reason about programs, even calculate them. This is the method described in the book Algebra of Programming, which uses the Gofer language, an ancestor of Haskell. So I’ll recast this in Haskell because its syntax is ultra-clean. First, let’s roll out our own tree data type:
data Tree a = Empty | Leaf a | Branch (Tree a) (Tree a)
A tree can be either empty, a leaf holding a value of type a
, or a branch of two trees, just like we said above. The flatten
function translates into Haskell easily:
flatten :: Tree a -> [a]
flatten Empty = []
flatten (Leaf a) = [a]
flatten (Branch l r) = flatten l ++ flatten r
Here, [a]
means a list with one value in it, and ++
means “append”. The declaration at the top tells us this function takes a Tree
holding values of type a
and returns a list. Here’s an example of the function in action:
flatten (Branch (Branch (Leaf 2) (Branch (Branch (Leaf 11) (Empty)) (Leaf 7))) (Leaf 4))
[2,11,7,4]
To make this look more like the second solution, let’s use the accumulator technique, the benefits of which I’ll go over in another post. So we define a helper function, flat_helper
, that takes an accumulator as its second value:
flat_helper :: Tree a -> [a] -> [a]
flat_helper tree list = flatten tree ++ list
We flatten tree
, and then push the result onto list
. I chose this form to “accumulate” the result into list
. We can achieve our final result as follows:
flat_helper tree [] = flatten tree ++ [] = flatten tree
What does flat_helper
do to the tree? The great thing about this is that we can just mechanically derive what flat_helper
does from its definition:
flat_helper Empty lst =
{ definition of flat_helper }
flatten Empty ++ lst =
{ definition of flatten }
[] ++ lst =
lst
flat_helper (Leaf a) lst =
{ definition of flat_helper }
flatten (Leaf a) ++ lst =
{ definition of flatten }
[a] ++ lst
flat_helper (Branch l r) lst =
{ definition of flat_helper }
flatten (Branch l r) ++ lst =
{ definition of flatten }
(flatten l ++ flatten r) ++ lst =
{ associativity of append }
flatten l ++ (flatten r ++ lst) =
{ definition of flat_helper }
flatten l ++ flat_helper r lst =
{ definition of flat_helper }
flat_helper l (flat_helper r lst)
Et voila! This is the same code from the second solution written in Haskell. I derived one solution from the other in a way that’s almost mathematical. It turns out that it’s more efficient than the naive solution, and we don’t even need to use ++
. I wish I could learn from programs like this all the time.
I based the above derivation from a chapter in Introduction to Functional Programming in Haskell, which has several more derivations like the above (I used their bracket comment notation). They called it synthesizing the program, which is a good way to think about it.
I started out using computers to do work I would find tedious or unfeasible to do on my own–iterating over millions of variables, or printing a report every hour. Now I can use them to help me think about computers. A lot of fun for one problem in SICP, I’d say!