In the previous post, we introduced Hatsugen, a small programming language with integer and boolean types.
In this post, we'll add functions to Hatsugen. With this addition, Hatsugen becomes approximately as powerful as the simply-typed lambda calculus.
We update the expression syntax to add variables, function literals, and function application. We also update the type syntax to add function types.
For variables, we just write things like or or or . There are infinitely many variable names to choose from.
is a function literal, taking one argument of type and evaluating when applied to an argument. may appear in the expression .
is an application expression, representing the application of the function to the argument .
is the type of functions taking as input and returning as output.
Consider the expression . It should have type .
The very similar expression should have type .
Notice that both of these example expressions contain the sub-expression , as the body of the function. But in each, the type of is different: in the first and in the second.
As this example illustrates, the type of variables is determined by how the variable is declared. This is the first time that the same expression (in this case ) can have a different type depending on the context.
We'll need to fundamentally change the structure of the static semantics to account for variables.
First, we'll need to formalize the notion of a context. We'll use to represent a context, which will just be a list of variable-type pairs.
A context can either be empty, or it can be an existing context augmented with a new variable-type pair.
The old typing judgement was , read " has type ". The new judgement is written , read " entails has type ".
We'll need to update all the statics rules from the first post. All of these rules just use the context without changing it.
We can now add the typing rules for the new constructs.
For variables, we need to be able to look up the variable in question in the context to get its type. For that, we'll define a small helper judgement for context lookup, written .
Note that these rules engender shadowing, which is where multiple variable-type pairs with the same variable exist in the context at once, but the one furthest to the right in the context determines the type of the variable under the context. For instance, using the rules we can derive
We can now use this helper lookup judgement in the typing rule for variables.
For function literals, the type of the parameter is the input type, and the type of the function body is the output type.
However, the bound variable may appear in the function body. So we add the variable and its type to the context when determining the function body's type.
For application, the parameter and argument types must match.
Functions are values, regardless of whether the function body is a value:
For application expressions, we first step the function expression. Then, once it's a value, we step the argument to a value as well:
Once both expressions are values, we can prove the first will be a function literal, since it had function type.
We will step into the body of the function, substituting all free occurrences of the variable bound by the function with the value of the argument. We write to mean "substituting all free occurrences of with in yields ".
To define the dynamics for function application, we must now define substitution for expressions.
Substitution does nothing to integer and boolean literals:
For conditional and application expressions, we simply recurse on the sub-expressions:
For variables, we case on whether the variable in the expression is the variable being substituted. If it is, we replace the variable with . If not, we leave it alone.
For function literals, we again case on whether the variables are the same. If they are, we leave the function literal untouched. This is consistent with how we treat variables with the same name in the context .
For the case when the variables are different, we must take care to avoid variable capture. For example, if we define
we can use the rules to prove that
In this case, the variable has been captured by the binding for in the function literal.
To avoid this, we require that the variable bound by the function literal not appear free in . We thus revise the rule, writing to denote the free variables in .
We must now define the free variables of an expression.
Integer and boolean literals have no free variables:
For conditional and application expressions, we simply recurse and union:
Variables alone are free:
Function literals bind a single variable:
Since the judgments have changed a bit, we'll need to restate the theorems slightly.
For all and and , if and , then or there exists such that .
We require that have no free variables at all. This will allow us to perform substitution as we evaluate.
To see why this is necessary, consider choosing to be
This expression has one free variable, . According to the rules, this expression is neither a value nor can it step. This is because appears free in the argument but is bound by the function being applied.
The expression is an even smaller counterexample, since bare variables are neither values nor can they step. Variables are given meaning by substitution.
For all and and and , if and and , then and .
Preservation not only preserves typing, but also the presence of free variables. This is important, since we need there to be no free variables in in order to be able to feed back into the progress theorem.
Once again, taken together we have the following safety theorem:
For all and and , if and , then or there exists such that and and .
The proofs are once again on GitHub.
In the next post, we'll add product types, also known as structs or tuples.