The Lambda Cube in Haskell: what construct do I need for this polymorphism?
Note: I just rewrote the syntax highlighter for my site's generator, Zola, to use tree-sitter, and there are probably some deficiencies. Let me know if there's anything going terribly wrong.
I was reading a beta copy of Production Haskell by Matt Parsons again, and a section stood out to me in the type level programming chapter (something I am trying to get better at): "Value Associations", discussing the varieties of functions in the language. It brought up the idea of the Lambda Cube. What the heck is a Lambda Cube and why haven't I heard of one before?
Let's look it up on Wikipe—uhh maybe not
Wikipedia tells me to pursue other fields of study by its use of absolutely baffling jargon and a bunch of judgements. Apparently this article was so unusually baffling that someone put a note on it:
This article may need to be rewritten to comply with Wikipedia's quality standards, as article uses pervasively inconsistent, confusing and misleading terminology for basic concepts fundamental to the understanding of the article's subject. You can help. The talk page may contain suggestions.
Incidentally, I was recommended Practical Foundations for Programming Languages as a textbook covering the use of judgements/sequents in type theory. It's pretty dense and the authors have a bad habit of defining symbols in the middle of paragraphs, but it is doable and useful for figuring out the notation in papers.
Here's my understanding of the Lambda Cube (my email is on About if you have any corrections to submit):
The Lambda Cube is an idea from type theory about the possible combinations
of ways a language can extend the simply typed lambda calculus which has value -> value
functions.
Beginning at the simply typed lambda calculus on one corner of the cube, there are three directions it can be extended, leading to eight combinations of extensions, one for each point of the cube:
value -> type
(values determining types), representing dependent types.type -> value
(types determining values), representing type classes and generic functions.type -> type
(types determining types), representing type level functions.
Looking at language features through the lens of the cube has made it easier to
figure out which of the dizzying array of Haskell language features I might
want to achieve a specific goal in writing incomprehensible code that my
coworkers will have to suffer through.
In this post I'll go through somewhat-practical uses of features in the vicinity of each of the three axes.
value -> type
Value to type functions represent dependent types.
Haskell does not have dependent types, although Haskell programmers seem strangely willing to engage in so-called "hasochism" to pretend it does. Dependent types would allow the type of the result to depend on a value input to the function, constraining the caller to exactly the set of relevant output types. Generally, dependent types are the realm of languages that lean more toward theorem prover on the spectrum between general purpose languages and theorem provers, such as Agda, Idris, Coq, and so on, although the configuration language Dhall also has dependent types.
That said, there are several features that together can be combined to produce
things in the general vicinity of value -> type
to allow constraining the
result type of functions depending on input values, falling short of dependent
types.
GADTs
Generalised algebraic data types (GADTs) are among the features that are useful for problems where you have to make types depend on values. I think of GADTs as a bridge allowing proofs made at compile time to be attached to values and recalled later.
The following is a (very uninspired) example of the use of GADTs on
Peano numbers, a representation of the natural numbers as zero or the
successor of a number. It demonstrates rejecting programs that try to call
predecessor
to get the number prior to zero, which does not exist:
{-# LANGUAGE GADTs #-}
module Peano where
-- uninhabited types
data Zero
data Succ n
data Peano a where
S :: Peano a -> Peano (Succ a)
Z :: Peano Zero
-- This will not type check with a zero; also, the compiler is clever enough to
-- know that you don't need to match the Z case here!
predecessor :: Peano (Succ a) -> Peano a
predecessor (S n) = n
{-
Peano.hs:16:23: error:
• Couldn't match type ‘Zero’ with ‘Succ a’
Expected: Peano (Succ a)
Actual: Peano Zero
|
16 | kablammo = predecessor Z
| ^
-}
-- kablammo = predecessor Z
There is a package on hackage for Peano numbers in a GADT, but a Peano number GADT is not necessarily as useful as one might initially think. One annoying downside of using a GADT is that it makes it harder to write functions potentially returning values using any of the constructors, since different constructors have different types.
Consider the following function:
fromInt :: Int -> Peano a
fromInt 0 = Z
fromInt n = S $ fromInt (n - 1)
The function does not type check, since it has different types in the two equations:
• Couldn't match type ‘a’ with ‘Zero’
Expected: Peano a
Actual: Peano Zero
‘a’ is a rigid type variable bound by
the type signature for:
fromInt :: forall a. Int -> Peano a
|
17 | fromInt 0 = Z
| ^
If we wanted this to compile, we would have to hide the type variable somehow. Hmmm.
The tool for "I want to hide this type variable" is an existential type. Existential types are so called because they mean "there exists this type. See? Here's a value of that type".
Such a type for this would be data SomePeano = forall a. SomePeano (Peano a)
which will hide the a
, giving both arms the same type. That existential
can equivalently be written in GADT notation, which may be somewhat easier to
understand since its constructor's signature looks like a normal function
signature: data SomePeano where SomePeano :: forall a. Peano a -> SomePeano
DataKinds
Another useful feature in making values kind of into types is DataKinds, although DataKinds does not allow the functionality you might expect on the surface.
A "kind" can be seen as the "type" of a type, in the same relationship as a
value has to its type. Most data types you probably write are of kind Type
(equivalently spelled *
), representing a fully applied type such
as Maybe Bool
, ()
, Int
, and so on. Partially applied types have arrows in
their kind, just like partially applied functions have arrows in their type.
For instance, Maybe
has the kind Type -> Type
.
DataKinds turns types into kinds, and data constructors into types. If you
have a type Nat
, which has two constructors Zero
and Succ Nat
, then the
kind of Nat
is Type
, the kind of Zero
is Nat
, and the kind of Succ
is
Nat -> Nat
.
One odd new piece of syntax here is the use of prefix apostrophe symbols. This
is to denote that you're naming the constructor as a type, rather than the
datatype. For instance, if you have data Foo = Foo Int
, 'Foo
would refer to
the constructor, having a kind Int -> Foo
.
Another neat thing that DataKinds do is turning natural numbers and strings
into types which can be manipulated with the GHC.TypeLits
and GHC.TypeNats
libraries.
Let's see an example that someone might use in industry: decoding untyped data into the correct variant when the tag is known.
I've written approximately this code in order to implement a settings system
with data stored as Map Tag Value
, where Value
is some untyped value that
can be parsed in different ways depending on the tag. I know what the tag is
supposed to be at compile time, so I can use that information with a GADT to
prove that only one variant is possible to be returned from the get-setting
function. A typeclass is used to write a function turning the type-level tag
into its respective value-level representation.
{-# LANGUAGE GADTs, DataKinds, KindSignatures #-}
-- explicitly enable the warning that would say the pattern match below is
-- nonexhaustive, to show it does not appear
{-# OPTIONS_GHC -Wincomplete-uni-patterns #-}
module DataKinds where
-- Perhaps this could be a database enum, with the tag of the object, along
-- with an untyped value field
data SettingTag = TOne | TTwo | TThree
-- GADT to staple a type to the return value of the 'decode' function below
data Setting (tag :: SettingTag) where
SettingOne :: Int -> Setting 'TOne
SettingTwo :: Bool -> Setting 'TTwo
SettingThree :: String -> Setting 'TThree
-- Typeclasses are functions from type to value (in this case, the value is a
-- value-level function)
class Decode (tag :: SettingTag) where
decode :: String -> Setting tag
instance Decode 'TOne where
decode = SettingOne . read
instance Decode 'TTwo where
decode = SettingTwo . read
instance Decode 'TThree where
decode = SettingThree
test :: Int
test = let
SettingOne a = decode @'TOne "blah"
in a
I had a misconception while learning Haskell and baffled by all these fancy underdocumented features, each having « un soupçon de Type »:
DataKinds do not let you pass around sum type variants' bodies. The following program does not compile.
{-# LANGUAGE DataKinds #-}
module WhatIThoughtDataKindsDid where
data Ty
= A {field1 :: Int}
| B {field2 :: Int}
someFunctionTakingB :: 'B -> Int
someFunctionTakingB = field2
{-
WhatIThoughtDataKindsDid.hs:8:24: error:
• Expecting one more argument to ‘'B’
Expected a type, but ‘'B’ has kind ‘Int -> Ty’
• In the type signature: someFunctionTakingB :: 'B -> Int
|
8 | someFunctionTakingB :: 'B -> Int
| ^^
-}
You can sort of do this! It's what the first DataKinds example above is doing, but it looks a little different than I initially imagined it would work. The DataKinds are used as a witness that a GADT is a particular variant, rather than being directly returned.
Hasochism
Is it really worth it? Just because you can does not mean you should. The singletons library allows for simulating dependent types with very bad ergonomics and many crimes. Thus, the question arises: should you?
type -> value
Turning types into values is something that we do every day. It represents the usual mechanisms for ad-hoc and parametric polymorphism: type classes and generic functions. If there is any conclusion to draw from this post, it's that a typeclass is a type to value function.
After starting to use this mental model, I know to reach for them immediately when looking to dispatch something based on type.
Functions take two kinds of arguments: type and value arguments. In the case of functions on type classes, the type argument selects which value it is by selecting the instance.
This is most clearly illustrated with the use of TypeApplications, where you
can explicitly apply a type to a function (which, it stands to reason, is thus
a type to value function). Let's use Bounded
as an example, with
the minBound
function:
-- minBound is a function from a type that's an instance of Bounded to a value
-- of that type
λ> :t minBound
minBound :: Bounded a => a
λ> :t minBound @Bool
minBound @Bool :: Bool
λ> minBound @Bool
False
Realizing this made me write more type classes deliberately as functions from type to value.
In Haskell and many languages, types are erased at runtime. All the fancy things you do with type families and so on go away. This is important to remember as it means that Haskell programs typically need programmer specified structures to bridge between types and values. What stays around is dictionaries, which are similar to vtables in Rust: tables of functions for some class. Values are accompanied by references to the relevant dictionaries.
type -> type
Functions from type to type are the bread and butter of type level programming. They allow you to transform types in nearly arbitrary ways, allowing for better DSLs and creating very generic functions.
The main facilities Haskell has for type to type functions are:
MultiParamTypeClasses plus FunctionalDependencies
This method of writing type level functions is kind of Odd, since one of the "parameters" is actually the result type, and you have to put the middle parts of the function on the left hand side as constraints.
It goes like so:
class TypeOr (a :: Bool) (b :: Bool) (result :: Bool) | a b -> result
instance TypeOr 'False 'False 'False
instance TypeOr 'True a 'True
instance TypeOr a 'True 'True
-- >>> x = Proxy :: TypeOr 'True 'False r => Proxy r
-- >>> :t x
-- x :: Proxy 'True
Clearly, I did not suffer enough, so I wrote a simple program to concatenate two type level lists, breaking my brain in the process:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# LANGUAGE MonoLocalBinds #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE UndecidableInstances #-}
module MPTCFundeps where
import Data.Kind (Type)
import Data.Proxy (Proxy (..))
-- * Concatenating a type-level list
data TCell (t :: Type) = TNil | TCons t (TCell t)
-- | meow~
class Cat (a :: TCell Type) (b :: TCell Type) (result :: TCell Type) | a b -> result
instance Cat 'TNil b b
instance Cat a 'TNil a
-- This is kind of Weird. You have to write the recursive case by
-- destructuring the arguments on the right-hand side, then provide evidence on
-- the left hand side, then construct the result on the right. Needless to say,
-- it's mind-bending in a bad way.
--
-- Fundeps-based type level programming is unnecessarily hard!
instance (Cat a b r) => Cat ('TCons v a) b ('TCons v r)
data T1
data T2
data T3
data T4
type OneTwo = TCons T1 (TCons T2 TNil)
type ThreeFour = TCons T3 (TCons T4 TNil)
{-
>>> v = Proxy :: Cat OneTwo ThreeFour r => Proxy r
>>> :t v
v :: Proxy ('TCons T1 ('TCons T2 ('TCons T3 ('TCons T4 'TNil))))
-}
There is one benefit to this, which is that the compiler can infer types backwards through these. That is not really a good reason to do this to yourself though: you can just use TypeFamilyDependencies with a closed type family instead.
OK but that's awful: using associated TypeFamilies instead
That's not a question, but nevertheless you can use associated type families to achieve the same objective and the code makes way more sense:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
module AssociatedTypes where
import Data.Kind (Type)
import Data.Proxy (Proxy (..))
-- * Concatenating a type-level list
data TCell (t :: Type) = TNil | TCons t (TCell t)
-- | meow~
class Cat (a :: TCell Type) (b :: TCell Type) where
-- Result type. Since cats result in kittens, of course.
type Kitten a b :: TCell Type
instance Cat 'TNil b where
type Kitten 'TNil b = b
instance Cat ('TCons v a) b where
type Kitten ('TCons v a) b = 'TCons v (Kitten a b)
data T1
data T2
data T3
data T4
type OneTwo = TCons T1 (TCons T2 TNil)
type ThreeFour = TCons T3 (TCons T4 TNil)
{-
-- It's also way easier to test:
>>> :kind! Kitten OneTwo ThreeFour
Kitten OneTwo ThreeFour :: TCell (*)
= 'TCons T1 ('TCons T2 ('TCons T3 ('TCons T4 'TNil)))
-}
Getting rid of the class: closed TypeFamilies
What if you could not write the class at all? Turns out, the associated type was unnecessary (although it is useful to use an associated type if you also need type class features).
Using a closed type family is definitely the nicest option ergonomically: it just looks like pattern matches:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
module TypeFamilies where
import Data.Kind (Type)
import Data.Proxy (Proxy (..))
-- * Concatenating a type-level list
data TCell (t :: Type) = TNil | TCons t (TCell t)
-- | meow~
--
-- Wow, it's two lines and they look vaguely like value level code
type family Cat (a :: TCell Type) (b :: TCell Type) :: TCell Type where
Cat 'TNil b = b
Cat ('TCons v a) b = 'TCons v (Cat a b)
data T1
data T2
data T3
data T4
type OneTwo = TCons T1 (TCons T2 TNil)
type ThreeFour = TCons T3 (TCons T4 TNil)
{-
>>> :kind! Cat OneTwo ThreeFour
Cat OneTwo ThreeFour :: TCell (*)
= 'TCons T1 ('TCons T2 ('TCons T3 ('TCons T4 'TNil)))
-}
There's also the ugly duckling, the open type family, but it has the same limitations as an associated type family.
Conclusion
I hope that this ten thousand foot overview is useful in giving a better mental model of which structure to reach for and when.
Thanks to Hazel Weakly for reviewing the draft of this article.