(Translations: English • 日本語)
In this series of posts, we will define a small programming language using formal methods. We will also produce proofs about our definitions using the Lean theorem prover.
There are many programming languages, but most do not have a formal definition. The majority of programming languages are defined by their implementation.
Given some programming language
We could then say
However, a programming language may have multiple implementations. And though these implementations may strive for compatibility between one another, it sometimes happens that different implementations behave differently given the same program as input.
In such a case, which one of these implementations is the one that defines the programming language?
A fix for this problem is to write a specification for the programming language. Developers then use the specification as a reference when writing implementations of the language.
Formally specifying a language in this way is not a trivial task, but the benefits are real.
For one, a mathematical specification is completely unambiguous. This contrasts with specifications written in human languages, which can be ambiguous. For instance: in that last sentence, what is the thing that I am saying can be ambiguous? Is it specifications or human languages?
Furthermore, using formal methods allows us to state and prove theorems about the specification. This gives us a high degree of confidence that our specification is free of bugs and mistakes.
Let us explore this method of specification by defining our own programming language.
First, we will give a name to our language.
There’s an excellent series of blog posts about implementing a programming language. As in that series, we’ll name our language by translating the English word “utterance” into some other human language. I’ll choose the target language to be Japanese, given my interests. Thus, we will call the language “Hatsugen”.
For now, Hatsugen will just have integers and booleans.
We’ll represent integers with integer literal expressions like
We’ll represent booleans with the literal expressions
As one extra bit, we’ll also support a conditional expression
which will evaluate the condition
It’s intentional that Hatsugen is a very small language, at least for now. This will allow us to get comfortable with all the formal stuff without getting bogged down with the complexity of Hatsugen itself.
We’ll use a BNF-style grammar to describe the syntax of expressions in Hatsugen.
Since there are infinitely many integers, and writing them all out would take quite a while, we’ll represent an arbitrary integer literal with
This says that an expression
Note that the conditional expression contains other expressions:
We now define how to evaluate an expression. This is called the dynamic semantics. There are various ways to do this, but the one we’ll use for Hatsugen is often called structural operational semantics.
To do this, we will define two judgments. To define these judgments, we will write inference rules.
The first judgement is written
We first define that any integer literal is a value. We can express this definition with an inference rule:
An inference rule looks like a big fraction, with the premises on top and the single conclusion on the bottom. There are no premises for this rule, so the top is empty.
We can also define that the boolean literals
In Hatsugen, these are the only expressions that are values. But what about conditional expressions?
For expressions which are not values, we must now define how to evaluate them. The second dynamics judgement,
“Evaluating an expression” is thus defined as repeatedly stepping an expression until it is a value.
In Hatsugen, the only expressions that can take a step are conditional expressions. First, we define that a conditional expression can take a step if its condition
Now, we define what happens when
And if it was
With that, we’ve defined the dynamic semantics.
But we immediately run into a problem.
We earlier informally defined what it means to evaluate an expression:
“Evaluating an expression” is… defined as repeatedly stepping an expression until it is a value.
However, under this definition, there exist expressions that cannot be evaluated. For instance:
This expression cannot take a step, because none of the rules for
Yet, this expression is also not a value, because none of the rules for
There are generally two things we can do here. One is to go back and add more rules to the dynamics to define how to evaluate these kinds of expressions. For instance, we could emulate C-like languages with the following rules:
The other thing we could do define a notion of a “valid” expression, and only permit evaluation of these valid expressions. This is the approach we will use.
To define which expressions are valid, we will introduce a static semantics.
The static semantics tell us which expressions are valid, and thus permitted to evaluate.
First, we introduce the notion of a type. For now, we have only two: integer and boolean.
Next, we introduce another judgement,
Now, we define
For conditional expressions, we require that
Note that because
are now disallowed by the static semantics.
This completes the definition of the static semantics.
We can now state and prove theorems about Hatsugen.
The first crucial theorem is that of progress. Progress says that well-typed expressions are either done evaluating or can keep evaluating.
More formally, progress states:
and , if , then or there exists such that .
Note that, before we introduced the static semantics, we had the problem of certain expressions neither being values nor being able to step. Now that we have the statics at our disposal, we can use them to strengthen the precondition of the theorem so that it is provable.
Next, we have preservation. Preservation states that well-typed expressions that can keep evaluating preserve their types when evaluating.
Again, more formally:
and and , if and , then .
Note that taken together, we get the following safety theorem:
and , if , then or there exists such that and .
Crucially, note that in the case where
The proofs are available on GitHub.
In the next post, we’ll add functions to Hatsugen.