I could not remember the Markdown syntax for links, so I made a pretty bad Lisp.

When I decided to build a new website and publish some articles about what I am currently working on, I knew I wanted it to be as DIY as possible; but I felt that writing the articles themselves directly in HTML would be too much of a burden, and although I have written my share of Markdown, I still need to look up how to do simple things all the time, and I still need a way to render the articles anyway. So I just sat down and started jamming on what I would like my authoring format to be like; I (very quickly, as will perhaps become apparent soon) came up with something that seemed like it would be fun to use and hack on, picked up a name from my list of project codenames, and here we go. And while this project was started on a whim, it is turning out to be a great source of self-inspiration, motivation to try out new things, and more importantly, fun.

Dodo 🦤 is a lightweight markup language similar to XML. It has elements, which are named, and have attributes and children, which can be elements or text. A Dodo document is a tree of elements, with a root element and some metadata. Dodo has comments. Dodo has no meaning; not in the sense that it has no reason to exist, but in that elements and attributes mean whatever we want them to mean. If Dodo has a philosophy, it could be write first and ask questions later. And one of this questions is: what should we do with a Dodo document once it is written? Before answering, let us have a look at the source for this article:

{ article: "I could not remember the Markdown syntax for links, so I
    made a pretty bad Lisp." lang: en date: 2024-02-16
    logo: ../images/logo-recursive.svg

    { abstract Dodo 🦤 is a lightweight markup language that is used
    to author this website. It has no fixed semantics, except for a
    { em transformation language } that can produce other output
    formats. It is actually based on a simple Lisp evaluator that
    uses the same syntax as the markup language itself. }

    # Introduction

    { p When I decided to build { link: /index.html a new website }
    and publish some articles about what I am currently working on
    (...) }

}

How this very document begins

Elements are enclosed in a pair of braces { ... }. The element name comes first and is followed by zero or more attributes (which are name: value pairs), then text or other elements that make up the content of the parent element. There are no tags, no double quotes when not necessary (i.e., when the attribute value has no whitespace), very few special characters, and those can be escaped with a \; hence the “lightweight” adjective. Element names can even double as attribute names, as in the article and link elements; if all articles need a title and all links a URL, it makes sense for those to be the default attributes for their respective elements. The goal of Dodo is to be simple to write, and to remain somewhat legible as is.

This article is written in Dodo of course, using a set of elements made up as they become needed: article is the root of the article, link is used for hyperlinks, p for paragraphs, eg is a shorthand for e.g., &c. But in order to reach readers, this Dodo document needs to be transformed into some more traditional output like good old HTML. This idea also comes straight from the world of XML, where authors can create their custom XML language and use XSL Transformations (XSLT) to output content in other formats. And just like XSLT itself is XML, transforming Dodo documents is also done with Dodo documents.

{ transform

    # Do not show abstract

    { match { element abstract } }

    # Generic rules, use the same element and attributes as HTML.

    { match { element }
        <{ name-of }{ apply { attributes-of } }>
            { apply { content-of } }
        </{ name-of }>
    }
    { match { attribute } { space }{ name-of }="{ value-of }" }
    { match { text } { escape-html { value-of } } }

}

Transforming Dodo to HTML (excerpt)

The figure above shows a part of the transform from article to HTML. A transform consists of rules that attempt to match parts of the document, and produce some output when they do. The match element has two parts: its first child describes what it is trying to match (the abstract element in the first rule, or any element, attribute or text in the following rules), then the rest of the content is the output, which is generally a mixture of text and some other elements to fetch and match more content from the source document. Or it can output nothing, like in the first rule for abstract (the abstract is shown in the table of contents, but not in the article itself).

In the case of the generic element rule, which matches any element (unless it was matched by a preceding rule), the output is a HTML tag, starting with <, followed by the name of the matched element; then by whatever is produced by applying the transforms to the attributes of the matched element, then a closing >. The rules are then applied recursively to the content of the matched element, and a closing tag is added, just like the opening tag was created. Attributes are also simply copied to HTML, and text is escaped to produce proper HTML output. As an example, { p Hello, family & friends } gets transformed into <p>Hello, family & friends</p> by this rule.

Compared to XSLT, the transform language is very simple and has few features; it also lacks a more compact syntax such as the XML Path Language (XPath) for matching content. But it is currently sufficient to generate the not completely trivial output seen here.

The reader may get the impression that expressions such as { escape-html { value-of } } look suspiciously like S-expressions, the main difference being that Dodo uses braces instead of parens (braces being much less common in common prose). So transforms can be implemented easily with a SICP-style LISP evaluator using the input of the Dodo parser, and we can transform link elements to HTML anchor tags, using the link attribute as href and the content of the element as the content of the anchor, defaulting to the link URL itself if there is no content. In order to do that, two possible outputs are combined with a regular boolean or:

{ match { element link }
    <a href="{ attribute link }">{
        or { apply { content-of } } { attribute link }
    }</a>
}

But it turns out that Dodo markup and S-expressions are actually not the same thing at all. To Dodo, everything that is not an element, an attribute or an attribute value is just a continuous text string, with some funky whitespace handling rules (whitespace is trimmed at the beginning and the end of an element, but not inside it). So it is not quite possible to just turn a Lisp function to compute Fibonacci numbers like the one below into a form suitable for evaluation only by replacing parens with braces.

(define (fib x)
    (if (< x 2)
        x (+ (fib (- x 1) (fib (- x 2)))
    )
)

Fibonacci in Lisp

For example, converting (< x 2) to { < x 2 } would give us an element with name < (that’s fine, it is suitable for looking up a function or a special form to be evaluated) and content x 2, which is just a string of three characters, and not an indentifier x and a literal number 2. Even if the string was tokenized properly to produce two arguments for <, these would still be string values; so we would need something more verbose like { < { get x } { number 2 } } to get the value of x from the environment, and the number value 2. This is starting more and more like trying to fit a square peg in a round hole. But... what if the section of the peg was more like a squircle?

Seven people on a stroll — Seven strollers

Lisp has a special form, quote, and syntactic sugar for it, ', to prevent content from being evaluated. Well, Dodo has one extra special character, `, which does the opposite, and unquotes the following string, turning `x into { get x } and `2 into { number 2 }; like its Lisp counterpart, a whole element can be unquoted to turn it into a list. So now the squircle peg can be jammed into the round hole like so:

{ define `{ fib x }
    { if { < `x `2 }
        `x { + { fib { - `x `1 } } { fib { - `x `2 } } }
    }
}

Fibonacci in Dodo Lisp

One last tweak is to allow elements to be anonymous, i.e., to not have a name or attributes, but only child elements. This permits the evaluation of the first term of a function application (note also how attributes can be abused in the lambda definition to avoid unquoting):

{ { λ: x { + `x `1 } } `2 }

((lambda (x) (+ x 1)) 2) in Dodo Lisp

So this can work, at the cost of a few splinters. One thing this has going for it though is that it is starting to look a little bit less like Lisp and a little bit more like Logo (in particular, UCB Logo, with its : notation). This opens up a lot of doors for future work on Dodo as a programming language, but in the meantime, document transforms can rely on a proper, if wonky, programming language to support them; this is used to format dates for example. ⚁⚂

The core of this work, namely turning the markup language into a Lisp, updating the parser and writing the evaluator, was done during the Winter 2 Impossible Stuff Day at the Recurse Center.