Harder-Coded: Simple Newtypes are for Scrubs

Posted on October 20, 2024

There’s quite a few fun and nifty things you can do in Haskell to avoid developer error. One notable one is the usage of newtype, which some other languages have adopted. It avoids parameter blindness and ensures values are only used in specific locations.

The trick I’d like to show right now, is to use DataKinds (or TypeData) to achieve something similar using type-level tag values. A simple application of a technique fleshed out quite well in this paper, which has a sick name (Ghosts of Departed Proofs), by Matt Noonan.

The Example

The easiest example I can think of, is none other than this blog! In the Hakyll setup of this blog, I add metadata to some posts. Two markers I use for example, are published, to indicate whether the post should be visible, and includetoc, to include a generated table of contents, partly using the general flow described in this blogpost by Riccardo Odone.
---
# Example of what this metadata looks like at the top of a
# markdown post. Hakyll parses this section as yaml.
title: Harder-Coded
ispublished: true
includetoc: true
---

Both of the metadata fields are booleans
When passing booleans, however, it becomes quite ambiguous what the user has to pass (e.g. Python and Haskell both capitalize the first letter, while go and rust don’t, yaml even accepts yes and no).

Getting from strings to booleans, is even worse if the language decides to do automatic conversions instead of forcing the user to properly parse. Looking at you javascript. 👀

But even if you did parse appropriately, parse what exactly? Do you accept all of 1, true, True, TRUE, yes? How about sure, ok, affirmative or accept? Language specifications are important. 😌
. In my case, I want to arbitrarily accept true-ish values, case-insensitively matching yes or true. But simply returning booleans ‘as is’, loses the metadata key associated with the value.
-- The `Bool` has no connection to the fact that it was pulled from the
-- "published" key.
checkPublished' :: String -> Metadata -> Bool
checkPublished' key meta =
  case lookupString "published" meta of
    Nothing -> False
    Just t -> map toLower t `elem` ["yes", "true"]

So… Newtype?

The simplest alternative is to create a newtype for specifically the publish boolean and to explicitly keep that separate.
newtype IsPublished = IsPublished {isPublished :: Bool}

checkPublished'' :: String -> Metadata -> IsPublished
checkPublished'' key meta =
  case fmap Text.pack (lookupString "published" meta) of
    Nothing -> IsPublished False
    Just t -> IsPublished $ Text.toLower t `elem` ["yes", "true"]

filterPublished' :: [(IsPublished, Post)] -> [Post]
filterPublished' = mapMaybe $
  \(flag, post) ->
    if isPublished flag
      then Just post
      else Nothing

But this now becomes problematic if I now need similar logic for my includetoc metadata flag, do I copypaste everything? Do I parameterize everything with a (Bool -> a) parameter that encompases the newtype I’ll add for each flag? That sounds esoteric and horrible!

No, Phantoms!

A solution I like here, is akin to using TypeData with phantom tags that serve as proof witnesses of the checking function. That’s a lot of abstract type jargon, that serves no one, so let’s break it down.

TypeData is the extension that allows you to define type level values. The textbook case is type level naturals via type data Nat = Zero | Succ Nat. For our purposes, we’ll use it to create small tags we can pass around.
type data Flags
  = IsPublished
  | IncludeTableOfContents

Phantom types are the fun idea of only carrying around knowledge via variables on the type level, with no equavalent on the value level. It’s a technique that’s used quite commonly in Haskell. In my case, I want to attach my tags to booleans, but only on the type level.
newtype Flag a = Flag {getFlag :: Bool}

I didn’t change the underlying type, it’s still a Bool. I only added a type variable. One nice observation is that my Flag IsPublished is isomorphic to to the IsPublished newtype I defined a bit ago.

Now I can use this flag to define generic functions over my flag types, and define “smart” constructors that introduce these type level constructors as witnesses of the fact that the flag was created correctly
Without avoiding exporting the Flag constructor, this unfortunately doesn’t preclude someone manually creating a proof witness incorrectly. The Flag constructor essentially plays the role of QED in our functions, so incorrect usage means you introduce incorrect values, subverting why we’re doing this in the first place.
illegal :: Flag Published
illegal = Flag True
.
checkFlag :: String -> Metadata -> Flag a
checkFlag key meta =
  case lookupString key meta of
    Nothing -> Flag False
    Just t -> Flag $ map toLower t `elem` ["yes", "true"]

filterFlag :: [(Flag a, Post)] -> [Post]
filterFlag = mapMaybe $
  \(flag, post) ->
    if getFlag flag
      then Just post
      else Nothing

A Bit Contrived

So I’ve discussed how to easily handle boolean flags in post metadata, but the same approach can be extended to other types of data as well. Say I wanted to support guest authors and put their names under titles in posts, but only when a guest author is actually relevant (otherwise it’s me!).
---
ispublished: false
includetoc: true
author: Billy Bob Barker (fictional)
---

If I follow similar line of thinking as the previous section. I’d build a wrapper akin to newtype Attribute a = Attribute (Maybe String), but then have almost, but not quite, the same logic as I do for Flag. This code duplication is again, kind of an eyesore. So let’s create something a bit more generic.

In identifiable pattern here, is that the datatype encodes a value whose construction is checked against something (that something could also be a mental hueristic, when hardcoding a value written by hand). I can write that as a datatype that can be specialized as both the Attribute and Flag types.
-- Generalizes both `Flag` and `Attribute`, and any other value type with an
-- additional phantom type variable.
newtype MetaValue a b = MetaValue {getMetaValue :: a}

-- These are isomorphic to my original phantom newtype wrappers.
type Flag a = MetaValue Bool a
type Attribute a = MetaValue (Maybe String) a
type data Flags
  = IsPublished
  | IncludeTableOfContents
  -- A tag I'll use for the guest-authorship.
  | GuestAuthor

-- I can write smart helper that specialize the type variables in MetaValue
flag :: Bool -> Flag a
flag = MetaValue
getFlag :: Flag a -> Bool
getFlag = getMetaValue

attribute :: String -> Attribute a
attribute = MetaValue . Just
getAttribute :: Attribute a -> Maybe String
getAttribute = getMetaValue

Constructing this MetaValue datatype can be created generically, with the idea that its usages implement the assertion of whatever the specialized phantom will be filled with. I’ll show what this means by reimplementing the truthy-parsing for flags, and the optional non-me authorship.
checkMetadata
  :: a
  -> (String -> a)
  -> String
  -> Metadata
  -> MetaValue a b
checkMetadata defaultValue f key meta =
  case lookupString key meta of
    Nothing -> MetaValue defaultValue
    Just t -> MetaValue (f t)

checkFlag :: String -> Metadata -> Flag a
checkFlag = checkMetadata False parsesTrue
 where
  parsesTrue t = map Char.toLower t `elem` ["yes", "true"]

checkAttribute :: (String -> Maybe String) -> String -> Metadata -> Attribute a
checkAttribute = checkMetadata Nothing

The specialized helpers ensure that usage sites can still limit what kind of scoped values are passed through, but even if it didn’t, the phantom type variables imply another barrier of entry before a function parameter is accepted.

One observation is that each type variable can encode one degree of functionality. I use the non-phantom variable, which I use to specialize to Flag and Attribute, to encode what kind of value is contained. While the phantom type variable encodes the key in the blogpost’s yaml that’s used to pull in the metadata value.

I’ll finish this off with a showcase of value constructor functions and how easily they pull string data from the Metadata of a post into the typed world, applying any necessary conditions. All in all, a bit of a contrived exercise just to be able to use the same functions across multiple newtypes, but fun nonetheless.
isPublished :: Metadata -> Flag IsPublished
isPublished = checkFlag "published"

hasToC :: Metadata -> Flag IncludeTableOfContents
hasToC = checkFlag "includetoc"

getAuthor :: Metadata -> Attribute GuestAuthor
getAuthor = checkAttribute isNotMe "author"
 where
  isNotMe t = if t == "Curtis Chin Jen Sem" then Nothing else Just t