Catamorphisms in 15 Minutes!

Let’s say you have a list of numbers. What can you do with it? Find its length, map a function over it, add them up, etc. Wouldn’t it be great if there were one function to rule them all, a function that generalizes what you can do with a list? In many languages, there is a higher-order function called fold that encapsulates a pattern of structural recursion over recursive structures: to specify a function on a list, you specify two exhaustive cases: what to do to an empty list, and what to do with any other list. A generalization of this generalization is the notion of a catamorphism, which is what I’m writing about today.

Since catamorphisms are so general, this is really going to be a whirlwind tour through category theory, winding up talking about catas near the end. This is all non-rigorous and hand-wavey, but I hope you can at least hear these concepts later and they make a little bit of sense.

There are many resources about fold (see Graham Hutton’s tutorial) and catamorphisms (see, for example, Edward Kmett’s knol). I wanted to slightly expand on some elementary details from these works. So let’s jump in!

The Shortest Category Theory Tutorial in the World

A category consists of:

a collection of objects, such as $A, B, C$ , and
arrows or morphisms, such as $f: A \rightarrow B$ , between objects. You can read this as “ $f$ is an arrow from $A$ to $B$ ”.

In addition to this structure, objects and arrows must satisfy the following properties:

If $f: A \rightarrow B$ and $g: B \rightarrow C$ , then there must be an arrow from $A$ to $C$ . This arrow is the composition of $f$ and $g$ and is written $g \circ f$ .
For each object $A$ , there is an identity arrow from $A$ to itself: $id: A \rightarrow A$ .

$\begin{xy} \xymatrix { A \ar@(ur,ul)[] \ar[r]^f \ar[dr]_{g \circ f} & B \ar@(ur,ul)[] \ar[d]^g \\ & C \ar@(dr,dl)[] } \end{xy}$

Both of these requirements are shown in the diagram. Each object has an (unlabeled) identity arrow. Notice how I can get from $A$ to $C$ by either going from $A$ to $B$ and then from $B$ to $C$ , or going “directly” there with the composed arrow. This “doesn’t matter which path you take” property of the diagram is called commutation, and so this diagram commutes. Diagrams in category theory are paper tools: they summarize whole paragraphs of proofs in one, maybe complicated figure.

Composition must be associative, which means $(h \circ g) \circ f = h \circ (g \circ f)$ . So we can ignore the parentheses! The identity arrows take an arrow back to itself: $id_A \circ f = f$ and $f = f \circ id_B$ if $f: A \rightarrow B$.

There. Now you know what category theory is. It’s useful because lots of mathematical structures are categories, like sets, groups, algebras, and so on.

Pro tip: whenever I think of a category, I pretend that I’m really talking about sets to get some intuiton. In the category of sets, sets themselves are the objects and regular old functions / mappings are the arrows. Beware though: in general, arrows don’t always behave like functions!

There are a ton of great books, both free and otherwise, and even videos about category theory, so I’ll direct you to them for a deeper look since I’ve only covered about 0% of the subject. As Bird and de Moor say in Algebra of Programming, “One does not so much learn category theory as absorb it over a period of time.” So go easy on yourself. Let’s move on to ever-higher levels of abstraction and talk about…

Functors

The name “functor” has always sounded like a villain from a sci-fi novel, which probably explains why I find them so interesting.

I already mentioned what a category was. What would happen if I could make some kind of mapping between categories? Whoa. I know, amazing, right? This mapping is called a functor and it transforms a diagram into another diagram.

The neat thing is, even if the objects in the mapped category are completely different from the original category, the structure of the functor’d category is the same as the original’s. I’ve just described a homomorphism, which is a mapping that preserves structure (it means “same form” in Greek). Since a functor acts on a whole category, you have to describe its action on both objects and arrows. Let $\textbf{F}: \textbf{A} \rightarrow \textbf{B}$ be a functor from category $\textbf{A}$ to category $\textbf{B}$ . Then:

$g: A \rightarrow B$ gets mapped to $\textbf{F} g: \textbf{F} A \rightarrow \textbf{F} B$
$A,B,...$ get mapped to $\textbf{F} A, \textbf{F} B, ...$

Everyone uses the same symbol to refer to both the arrow mapping and the object mapping. The functor laws analogous to the requirements of a category are:

$\textbf{F}(id_A) = id_{\textbf{F}A}$ for all objects $A$
$\textbf{F}(f \circ g) = \textbf{F}f \circ \textbf{F}g$ for all arrows $f$ and $g$.

Or, in diagrams (adapted from the XyJax page):

$\begin{xy} \xymatrix { A \ar[r]^f \ar[dr]_{g \circ f} & B \ar[d]^g & \ar@{~>}[r]^F & & FA \ar[r]^{Ff} \ar[dr]_*[l]{\scriptstyle F(g\circ f) = Fg\circ Ff} & FB \ar[d]^{Fg} \\ & C && && FC } \end{xy}$

Gorgeous. Let’s say what a functor is one more time: a homomorphism between categories.

If you program in Haskell, you’ve probably already heard of its Functor typeclass, which has the following definition:

class Functor f where
	fmap :: (a -> b) -> f a -> f b

This means that when you create a functor in Haskell, you need to define what the fmap function does, and from its signature, I see that it takes functions of type $A \rightarrow B$ to $\textbf{F}A \rightarrow \textbf{F}B$, similar to what we discussed above. See the discussion in Learn You A Haskell for more functor fun.

From these dizzying heights of thinking, let’s think about special objects or functors that will be useful in the goal of generally processing data types like lists as mentioned earlier.

Initial Objects

In some categories, you can imagine that there are some objects that are in some sense “unspecified” or “empty”. These objects are special, like {}, the empty set (the set with the least amount of useful information), or the empty list. These objects share the trait of being unique: there’s only one empty set, only one number such that when you add it to any other number, you get that number back (zero), and so on. In category theory, we can think of this trait more generally, where it’s called initiality. The term “initial” makes sense to me since I think of the category of sets “starting off” with the empty set, for example.

An object is initial if, for any object in the category, there is only one mapping from it to that object. I’ll call the initial object in a category $\textbf{0}$. So in other words, for an object $A$, there is only one arrow of the form $\textbf{0} \rightarrow A$.

Can there be more than one initial object? Let’s pretend the answer is yes and see what that would imply. Let $\textbf{0}$ and $\textbf{0’}$ be initial objects. Then there’s an arrow $f: \textbf{0} \rightarrow \textbf{0'}$ and $g: \textbf{0'} \rightarrow \textbf{0}$ :

$\begin{xy} \xymatrix{ 0 \ar@/^/[r]^f & 0' \ar@/^/[l]^g \\ } \end{xy}$

From the diagram, it’s clear that starting from $f$ and coming back through $g$ is a function from $\textbf{0}$ to itself: $g \circ f: \textbf{0} \rightarrow \textbf{0}$. But here’s the trick: since $\textbf{0}$ is initial, there’s only one mapping from it to any object–including $\textbf{0}$!

So we have that $g \circ f = id_\textbf{0}$ (this is the identity arrow for $\textbf{0}$). I could have started with $\textbf{0’}$ instead, and we would have gotten $f \circ g = id_\textbf{0'}$ . So $\textbf{0}$ and $\textbf{0’}$ are basically the same object; the technical term is that they are isomorphic, which means “equal form”. Isomorphisms are sort of like the category-theoretic version of equality.

How does all this help us with our goal of understanding structural recursion and recursive structures? Well, the uniqueness of initial objects (up to isomorphism) is similar to a uniqueness proof: if we come up with some function or procedure on a list, we want to know that it’s the solution (or at least how many there are, even if it’s zero). Functors give us the notion taking a data type to another data type. Let’s combine initial objects and functors and get: F-algebras and catamorphisms!

Mighty Catamorphisms

I very much hope the title of this section made you think of a certain 90’s kid’s show. You’re welcome.

Here’s an example of a recursive structure: non-negative integers! Yes, ever since you learned to count, you’ve been thinking recursively. Good for you! So a non-negative integer is one of two things: zero, or 1 plus another non-negative integer. Or in Haskell, you could write:

data NonNeg = Zero | OnePlus NonNeg

So what does “2” look like here? Did you guess OnePlus(OnePlus Zero)? Great! We can reason this through since 2 is either 0 (nope) or 1 plus another number. Well what’s 1? Either 0 (nope again) or 1 plus another number 0. Well what’s 0? Either 0–yep! And now we’re done.

Now, I wouldn’t want to represent the number 839921 like this, but it shows that we can define a useful structure recursively–and sometimes, this may be the best way to do so.

Say, now that I think about it, NonNeg is an example of a category! What we’re interested in is some kind of transformation from this category to itself, and by “transformation” I mean “functor”. A functor that maps a category into itself is called an endofunctor (endo- meaning “inside” in Greek).

But we want a little more than that. In the example involving “2” above, we wanted to transform it into either 0 or 1 plus another number. Or if you’re doing addition, we want a mapping that takes pairs of numbers into another number. So we need arrows with signatures like: $\textbf{F}A \rightarrow A$ , where $A$ here could mean a type, like natural numbers with addition, or NonNeg.

Congrats! You’ve just learned what an F-algebra is: an arrow with type $\textbf{F}A \rightarrow A$ .

By now, you know what I’m going to wonder: do F-algebras themselves form a category, where the objects are F-algebras? Well, we need to determine what the arrows are (and don’t forget that an F-algebra is itself an arrow…).

Let $\textbf{C}$ be a category, and let $\textbf{F}: \textbf{C} \rightarrow \textbf{C}$ be an endofunctor. If $A$ and $B$ are two objects in $\textbf{C}$ , then we have two F-algebras, $f: \textbf{F}A \rightarrow A$ and $g: \textbf{F}B \rightarrow B$ . Let’s define a mapping between $A$ and $B$: $z: A \rightarrow B$. So far, we can summarize all these crazy arrows in a diagram:

$\begin{xy} \xymatrix { \textbf{F}B \ar[r]^g & B \\ \textbf{F}A \ar[u]^{\textbf{F}z} \ar[r]^f & A \ar[u]_z } \end{xy}$

I added the action of $\textbf{F}$ on $z$ already. Now, if this diagram is to commute, both paths from $\textbf{F}A$ to $B$ have to be equal; in other words:

$g \circ \textbf{F}z = z \circ f$

If $z$ satisfies this equation, it’s called an F-homomorphism since it preserves the structure of the F-algebra.

Now, since the identity arrow is a homomorphism, and composing two homomorphisms gives another homomorphism, that means that F-algebras themselves form a category. Here’s where it gets interesting. For a general class of functor, this category of F-algebras has an initial object–an initial algebra. I’ll choose $\alpha$ for the initial algebra: $\alpha: \textbf{F}A \rightarrow A$ .

So now we have the same diagram as above, except I’ll label with the initial algebra:

$\begin{xy} \xymatrix { \textbf{F}B \ar[r]^g & B \\ \textbf{F}A \ar[u]^{\textbf{F}g^\star} \ar[r]^\alpha & A \ar[u]_{g^\star} } \end{xy}$

And there it is! The catamorphism, which I’ve labeled as $g^\star$, is the unique homomorphism from the initial algebra to any other F-algebra. In the bananas, lenses, and barbed wire paper, it’s defined with banana brackets, which I don’t know how to generate. Oh well! Its definition is:

$g^\star \circ \alpha = g \circ \textbf{F}g^\star$

That’s…very abstract. How is this useful? The utility here comes from thinking of the initial algebra as a data type. Let’s pick one: lists in Haskell. A list of type $A$ is either the empty list or a value of type $A$ attached (cons‘ed, if you will) to the back of a list. In other words:

data List a = Nil | Cons a (List a)

This will be our $\alpha$ . More explicitly, we can write the endofunctor as $\alpha: 1 + A \times X \rightarrow X$ . Here, $1$ represents the empty list, and $A \times X$ represents the Cartesian product of a type $A$ connected to a type $X$, or the non-empty list, and $+$ is a disjoint sum, sort of like the | in Haskell. Equations like $\textbf{F}A \rightarrow A$ are awesome because they imply that F-algebras encode a notion of self-similarity, like a fractal. In fact, there is a correspondence between these data types and least fixed points; for more on this, check out the great article at FP Complete.

Fun fact: The term catamorphism was coined by Lambert Meertens. The “cata” in “catamorphism”, according to Grant Malcolm, supposedly means “according to” or “following”, so a catamorphism “follows” the structure of the data type. On the other hand, it could also mean “downward”, like “catabolic” or “catastrophic”, and a catamorphism does “plunge into” a data structure. Since it could mean either, but the former is the primary meaning, let’s go with that!

So now we can define a mapping from the initial algebra $\alpha$ and, say, the natural numbers. We can define the length of a list, a function from lists to integers. Because a catamorphism is a homomorphism, it will preserve the structure of the list type; our function needs to figure out what to map Nil to (in length’s case, 0), and what to map Cons to (in this case, addition). Because a catamorphism is a mapping from an initial algebra, we thus know that it’s the unique solution.

Thus, if we have the following two equations to solve:

z Nil = c
z (Cons a as) = f a (z as)

where we transform Nil to c and Cons to f (both binary functions), then we immediately know that z is a catamophism, and the unique solution to those equations is given by the catamorphism diagram above. In fact, functional programmers are already familiar with this function: it’s just good ol’ foldr in Haskell:

z = foldr f c

Fusion Techniques!

So we’ve discussed catamorphisms on lists, and the definition generalizes to any category where we can define an initial algebra: the natural numbers, binary trees, multiway trees; the sky is the limit. The catamorphism will look different with other data types, but its definition is the same. This is why category theory is useful–do a lot of abstract work once and watch it pay off again and again.

Probably the most useful thing about catamorphisms, though, is fusion, when you can combine a mapping with a catamorphism and get another catamorphism. This could, for example, make a computation more efficient, or make intermediate transformations in a compiler more efficient.

You know what? By now, you’re a pro at reading diagrams, so let’s look at the one that represents fusion:

$\begin{xy} \xymatrix { \textbf{F}C \ar[r]^f & C \\ \textbf{F}B \ar[u]^{\textbf{F}h} \ar[r]^g & B \ar[u]_{h} \\ \textbf{F}A \ar[u]^{\textbf{F}g^\star} \ar[r]^\alpha & A \ar[u]_{g^\star} \ar@/_2pc/[uu]_{f^\star} } \end{xy}$

The bottom square commutes and is the same one as before. The upper square defines another F-homomorphism. If both squares commute, the whole rectangle commutes. The fusion law is immediately read off from the diagram:

$f^\star = h \circ g^\star$

This fuses two separate operations into one–it folds two folds into one!

Stack Unwinding

From physics to Big Data, functional programming, and more.