(Non-)Functional Ramblings - What’s in the Name?

Posted on September 20, 2024

As when doing anything with a dependent language, the biggest part of my Master’s thesis was about finding the correct way to phrase things

As a quick summary, my thesis was to give a automated proof of correctness of forward-mode AD in Coq.

. I learned that there are quite a few ways to phrase a lambda calculus within a functional language. I’ll go over a few of them, each time writing a catamorphic interpreter

The recursion-scheme interpreter approach will be almost identical to Michael Peyton Jones’ in this blog post.

And the obligatory list of language extensions I’ll be using.

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}

In On the Ground Floor

The first one I’ll go over, are first-order representations, i.e. if you simply deeply embed everything. These representations require you to build much of the semantics of the lambda calculus yourself. Even within this, you can already vary quite a bit by using for example de bruijn indices instead of string variables, to avoid the problem of shadowing.

data Expr
  = Var Text
  | Lambda Text Expr
  | App Expr Expr
  | Constant Double
  | Add Expr Expr
  | Mul Expr Expr

I can show some example expressions

I used some shortcuts to make it a bit easier to read.

. I’ll also be making use of the base functor version of this datatype, which replaces all recursive usages of Expr with a type variable, essentially making it possible to disconnect a node from the expression abstract syntax tree.

increment :: Expr
increment = fun "x" (var "x" |+| con 1)

example :: Expr
example = increment `app` (con 3 |*| con 3)

To continue with the interpreter, I need to define the result of our lambda expressions. Many will terminate into a singular Double, but many more will halt due to incorrect information or incorrect typed expressions.

data Value m where
  Value :: Double -> Value m
  Suspend :: Text -> m (Value m) -> Value m
  Halt :: Env m -> m Expr -> Value m

type Env m = Map.Map Text (Value m)

This defines all terminal states our interpreter can land in.

Value: If it finish with a singular Double, then we’ve have evaluated the entire expression.
Suspend: This encodes function values. The first parameter consists of the binder we’ll use in the abstraction. The second parameter encodes the body of the abstraction, but requires a bit more explanation. As we’re building the lambda interpreter completely ourselves, the body of the abstraction will result in a value, but only within this interpreter context m. This context will hold all binders and their values and ensure they’re propogates around.
Halt: Lastly we have a constructor we can use to communicate incorrect expressions, such as attempting to reference a free variable or applying a function to a number.

With that in place, we can tontinue to concretely define the evaluation context. We know that we need to keep track of the currently bound variables and which values they are bound to. So the evaluation context consists of an environment where we can access and keep track of these variables.

-- Note that this is definition is recursive, as the variables themselves are each also bound within a certain evaluation environment.
newtype EvalM a = EvalM {unEvalM :: Reader (Env EvalM) a}
  deriving newtype
    ( Functor
    , Applicative
    , Monad
    , MonadReader (Env EvalM)
    )

Defining the interpreter for this lambda calculus is pretty straightforward from here on out. Let’s continue on section by section.

interpret :: Expr -> Value EvalM
interpret expr = runReader (unEvalM (cata go expr)) mempty
 where
  ...

This fun bit is the recursion-schemes part of the interpreter. We fold into a monadic action containing the evaluation context, which we subsequently run with an empty environment.

  go :: MonadReader (Env m) m => ExprF (m (Value m)) -> m (Value m)
  go e = case e of
    ...

Next is the helper function signature which takes care of each layer of our recursive data type. Each layer, denoted by the ExprF recursive datatype we generated, represeents a “node” in our original AST in isolation. Projecting and embedding this ExprF within any large recursive Expr is what makes the recursion-schemes package so flexible.

    VarF v -> do
      env <- ask
      case Map.lookup v env of
        Nothing -> pure (Halt env (pure (Var v)))
        Just a -> pure a
    LambdaF v b -> do
      pure $ Suspend v b
    AppF e1 e2 -> do
      e2' <- e2
      e1' <- e1
      case e1' of
        Suspend v c -> local (Map.insert v e2') c
        _ -> pure e1'

Variables, abstractions and function application all follow from how we set up our environment. I used a Map to keep track of which variable names correspond to which values, so variable references reduce to a simple lookup. Function abstractions suspend the current computation, and monadic action containing the environment representing the body. And for function application, we have to ensure we insert the applied value to the bound name of the suspension, into the environment.

The last few constructors are ones I add for this specific lambda calculus, filling it in can be left as an exercise to the reader. The only thing is that it uses a strict semantics, but it is otherwise non-noteworthy.

From the Ground Up

By keeping a flat first order representation of our language, I had to build much of the operational machinery alongside it in the interpreter. But we can avoid that if we “loan” some functionality from the underlying programming language we’re writing it in. This is where it becomes quite difficult to keep the recursion scheme approach, so let’s ditch that for a second. Instead, we’ll make it so our language is typed, i.e. it will be impossible to represent incorrectly typed terms.

We first define our type system, which consists of real values, and function types. Using GADT syntax we can then indicate what the type is of each (sub-)expression while we define it.

data Ty where
  R :: Ty
  Arrow :: Ty -> Ty -> Ty

data Expr :: Ty -> Type where
  Lambda :: (Expr s -> Expr t) -> Expr (Arrow s t)
  App :: Expr (Arrow s t) -> Expr s -> Expr t
  Constant :: Double -> Expr R
  Add :: Expr R -> Expr R -> Expr R
  Mul :: Expr R -> Expr R -> Expr R

Notice that I omitted variable bindings and instead use Haskell’s function type to embed our lambda expressions. This ensures that the binder in the lambda expression will always be of the correct type. This trick of embedding variable binders in the host language is generally called HOAS (higher order abstract syntax)

Handling variable binders can get quite convoluted, not because it is difficult, but because it can be quite tedious. These alternative representations, because much of it relies on the host language, lighten implementation load. This is especially true when proofing. Here’s a nice resource.

One downside of this specific version of HOAS, is that it becomes impossible to introspect the lambda abstraction’s structure, becaues it now exists as a function. This unfortunately means that we can’t use recursion schemes. As a benefit, however, the interpreter is trivial.

data Value t where
  Value :: Double -> Value R
  Suspend :: (Expr s -> Expr t) -> Value (Arrow s t)

interpret :: Expr t -> Value t
interpret e = case e of
  Lambda e1 -> Suspend e1
  App e1 e2 -> do
    case interpret e1 of
      Suspend b -> interpret (b e2)
  ...

Note that in the interpret call in the case of function application, I don’t even have to handle the Value case! GHC is able to infer that it is impossible (if this expression correctly went through type checking, no unsafe shenanigans) for the result of that interpret call to result in anything other than Suspend.

I can probably write another blogpost about why I think that the future can only really point to dependent types. Being able to utilize static value-level information to optimize and eliminate entire branches of logic, is just too attractive. Programs that are not only correct, but also optimally represented.

I’ll leave it here for now. Turning data into functions or functions into data can have interesting consequences depending on the usecase.