Algebraic Data Types: For the math(s) inclined

Date: August 11, 2020 Last modified: August 15, 2020

This was initially published as a section of one of my substack posts.

You might be wondering why the term algebraic data types has the word algebra in it. Don’t sweat it if you hated algebra in high school, I will not call on you to solve simultaneous equations or prove by induction on the board in front of the class.

The basic idea is that through a little algebra we can determine the cardinality of an algebraic data type which represents how many possible values that could inhabit it.

Consider the following sum type:

data Color
  -- | RGB uses 8 bits in three color channels with values of 0–255
  = RGB { red   :: Word8, green :: Word8, blue  :: Word8 }

Here we have created a type Color and a data constructor RBG which represents a color as 8 bits in three color channels with values of 0–255 (red, green, blue). This is basically a record with three fields and each field can hold 256 different values which gives us a cardinality of 256^3 = 256*256*256 = 16777216. This means we can represent 16,777,216 (>16 million) different colors with this representation. Record types are sometimes called “product types” for this reason.

Datatypes defined in terms of mutually exclusive data constructors are called “sum types” for a similar reason because you can “sum up” the cardinalities of all constructors to find the cardinality of the datatype.

Consider our Color datatype from above where now our application needs to denote colors using the subtractive CMY representation because sometimes we want to be able to use our library or application for dye colors (rather than for light-based usages like photography, television, digital screens, etc.).

data Color
  -- | RGB uses 8 bits in three color channels with values of 0–255
  = RGB { red :: Word8, green :: Word8, blue :: Word8 }
  -- | CMY uses 8 bits in three other color channels w/ subtractive model
  | CMY { cyan :: Word8, magenta :: Word8, yellow :: Word8 }

Now for every value of the type Color, we can choose between RGB and CMY constructions. Since RGB denotes an additive model which is fundamentally different from the CMY subtractive model the data constructor acts as a way to differentiate between the representations, therefore the number of colors with different representations we can construct with this new datatype definition is the sum of the cardinality of both constructions (RBG, CMY) which happens to be 16777216 + 16777216 = 33,554,432.

This post was just a teaser to show how the algebra of algebraic data types start off, but it continues even further than what we have covered with recursive algebraic data types. More details can be found in these blog posts or talks:

That zippers are derivations of the original type is wild and fun for the math inclined (see the links above)! :)

I hope you enjoy learning more about the algebra of ADTs as much as I did when I first learned about it.

Useful links for the domain we explore with datatypes include: