Proofs about programs
An interactive tutorial
The halting problem be damned — we can prove all kinds of things about programs, and we can even check those proofs with computers! In this chapter, we’ll use a language called Lean to prove whether the famous Ackermann function always halts. Along the way, we’ll introduce Noetherian induction and formal systems.
To warm up, here’s a JavaScript function called fac
:
function fac(n) {
if (n <= 0) return 1;
else return n * fac(n-1);
}
If we run fac(0)
, will it halt?
Right, it will immediately return 1
.
Actually, it will halt: the test 0 <= 0
passes, so it immediately returns 1
.
But what about fac(1)
, will that halt?
Right, it will return 1 * fac(0)
, which is 1 * 1
, which is 1
.
No, this too will halt. The test 1 <= 0
fails, so it calls itself, returning 1 * fac(0)
. We just saw that fac(0)
returns 1
, so this reduces to 1 * 1
, so the whole thing returns 1
.
But what about fac(42)
?
Fine, fine! But this illustrates something important. You didn’t just run fac(42)
in your head — instead you just figured, “I can see that fac(n)
halts, whatever n
is.” But how did you prove that to yourself?
Mathematical induction 🍳
Let’s play the game once more, but this time for a function bar
. This time, I won’t show you the source code for bar
; I’ll just tell you two things about it:
Fact 1: bar(0)
halts.
Fact 2: If bar(n)
halts, then bar(n+1)
halts.
Does bar(55378008)
halt?
Indeed, we can use fact #2 to conclude that bar(1)
halts, then use it again to conclude that bar(2)
halts, et cetera, all the way up to bar(55378008)
.
I would argue it does halt. We’re told that bar(0)
halts. Then, using fact #2, we can conclude that bar(0+1)
halts, i.e. bar(1)
halts. Using fact #2 again, we can conclude that bar(1+1)
halts. Using fact #2 another 55,378,006 times, we can conclude that bar(55378008)
halts.
Let’s generalize this. We’ll say H(n) means “the function bar
halts on input n
”. Then the first fact says H(0), and we call this the “base case”. And the second fact says H(n) implies H(n+1), and we call this the “induction step”.
Then let’s say we want to prove H(4). We can do so by starting with the base case, and applying the induction step four times:
Is H(n) true for every natural number n?
The same works for any other meaning of H, and is a common way to prove a “for all” statement:
🔐 Unlocked: Proof by Induction
If P(0), and if P(n) implies P(n+1), then P holds for every natural number n.
Once more, in Lean
So, induction is one way to prove to ourselves that a program halts. But what about proving it to the computer? Enter Lean 4, a language that lets us prove things about our programs.
Here’s fac
again, but rewritten in Lean:
def fac: Nat -> Nat
| 0 => 1
| n+1 => (n+1) * fac n
You don’t need to understand it all, but you can read it if you squint. Nat
means “natural number”, i.e. the numbers 0
, 1
, 2
and so on. And the | n+1 =>
says, “if the input number looks like n+1
, return the following”.
If you copy-paste this into a file and run lean
on it, congratulations! You’ve written your first computer-verified proof that a program halts!
That’s because Lean only lets you write functions that halt. And Lean accepts the above program without trouble, by saying: “I can see that fac 0
halts. And I can see that fac (n+1)
halts if fac n
halts. So by induction, fac n
must halt for any n
. I therefore accept this function definition.”
The Ackermann function ♾️
Let’s try a famous tricky function called the Ackermann function:
function ack(m, n) {
if (m === 0) return n+1;
else if (n === 0) return ack(m-1, 1);
else return ack(m-1, ack(m, n-1));
}
Does this function always halt? Try out a small example: what is ack(1,1)
?
That was a bit of work! But not too hard.
No, it’s 3
, and here’s how we could compute it:
ack(1, 1) =
ack(0, ack(1, 0)) =
ack(0, ack(0, 1)) =
ack(0, 2) =
3
Next, let’s look at ack(4,2)
.
No! Wait! Come back! The last student that tried it was found dead, crushed under piles of notepaper and Red Bull!
We should convince ourselves that our programs will halt before we go off calculating them. Luckily, we have a way to test this! We can just ask Lean:
def ack : Nat -> Nat -> Nat
| 0 , n => n+1
| m+1 , 0 => ack m 1
| m+1 , n+1 => ack m (ack (m+1) n)
Unfortunately, running Lean on this file, it complains:
failed to prove termination
Darn! Does that mean ack
doesn’t halt?
No — Lean says it failed to prove that it does halt, but that’s different from proving that it doesn’t halt. Lean is just saying: “try harder; I’m not yet convinced.”
Indeed, Lean didn’t prove that it doesn’t halt. We’re just going to have to think harder.
Here’s a way to visualize what ack
is doing. We can write out its values in a spreadsheet:
Above, we’ve got the first argument m
down the side, and the second argument n
along the top. We’re filling in the cells in normal reading order: row by row, and each row left to right. Now here’s the ack
algorithm in English: “To fill in a cell, take the value of the cell to the left, and use it as an index into the row above.”
For example, we just filled in ack(1,2)
. We looked at the cell to the left, which has value 3
. Then we looked at the row above, and took the cell at index 3
, i.e. ack(0,3)
. That cell has value 4
, so that’s our answer.
Our next cell to fill in is ack(1, 3)
. What’s its value?
No, it’s 5
. We first look at the value of the previous cell, which is 4
. Then we look up index 4
from the row above, i.e. row 0
. That cell has value 5
, so that’s the answer.
Right, we take index 4
from row 0
.
Perhaps now you’re seeing an argument that ack
always halts? Completing the table in normal reading/writing order, each cell only uses previous cells: the cell to the left, and a cell in the row above.
So, perhaps we can apply induction, just like we did with fac
: “ack
halts for cell 0. And if it halts for cells 0…n, it will halt for cell n+1. So it will halt for all cells.”
This argument is actually using a variant of induction called complete induction:
🔐 Unlocked: Complete Induction
If we can show P(n) from the assumption that P(m) for every m<n,
Then we can conclude that P(n) for every n.
To apply this argument, we need to number the cells in our spreadsheet. The first cell ack(0,0)
has cell number 0. The cell ack(0,1)
has cell number 1.
What is the cell number of ack(1,0)
?
Yeah ... the problem is that our cells are in an infinite grid, not a line. So assigning every cell a number doesn’t work (at least, not in reading order). Also, we conveniently imagined that we’d completed the first row, even though that would take forever!
This is why Lean didn’t accept our definition of ack
: ordinary induction doesn’t work.
Noetherian induction
Enter Emmy Noether. She improved the induction method so that we don’t have to number everything.
We can draw a graph of the data flow in our spreadsheet. Here’s the graph for ack(2,1)
:
We can explore ack(2,1)
’s call stack by starting at the red box (function call), and following arrows backwards. The green boxes are ‘base cases’, which have no recursive function calls, and therefore no arrows into them.
Is it possible to follow arrows backwards from ack(2,1)
, but never reach a green box?
I think you always reach a green box. For example, going from (2,1)
to (1,3)
to (1,2)
to (0,3)
reaches a green box. If you can find a counter example in that diagram, email me!
That should make it clear that ack(2,1)
halts. Now, here are the data flow graphs for four unknown function calls. Which one halts?
Right! All the others have loops, or infinite recursion.
No, C doesn’t halt. It never “loops,” but it recurses forever, never reaching a base case. Only D has the correct structure.
No, B doesn’t halt: after four levels of recursion, it calls itself again, in an infinite loop. Only D has the correct structure.
No, A doesn’t halt, because it makes a function call that then calls itself in a loop. Only D has the correct structure.
We want to convince Lean that, for any m
and n
, ack(m,n)
always has a graph with that property: that you’ll always reach a base case within a finite number of steps.
Well-ordering
Let’s return to our earlier argument: we argued that ack
always halts because “each cell only uses previous cells”. This doesn’t require numbering all the cells; it just requires some definition of “previous”. The definition we’re using is: “previous by normal reading order”: check the row number first, then tie-break using the column number.
We write this relation with <
, just like with ordinary numbers. For example, which of these is true?
No, (3,3)
is not less than (2,9)
. To compare them, we first compare the first number. We find that 2 < 3
, so (2,9) < (3,3)
.
Emmy Noether said: you can consider this relation, (a,b) < (c,d)
to have a graph, too! Here, an arrow from box a to box b means that a<b:
Imagine starting at a random box in this infinite graph — say, (5,42)
. Then repeatedly follow arrows backwards from it to a “previous” box. Will you always eventually reach the box (0,0)
?
I think you will. Imagine, starting at, say, (m=4,n=23)
, and trying to avoid hitting (0,0)
for as long as possible. We can decrease n
until we get to (m=4,n=0)
. But then we have to go to m=3
. We could pick, say, n=99999
or some other large number — but we’ll always go back to some finite number. Then after 99,999 steps, we’ll have to go back to m=2
. Et cetera.
A mathematician might say that “<
is a well-founded relation on pairs of natural numbers.” This is just a fancy way to say that you’ll always reach (0,0)
.
We can now make another attempt at a formal argument that ack
always halts: “Pick any ack(m,n)
. By following its function calls, we’re really just following paths through the <
graph on pairs of numbers. And because that always halts, ack(m,n)
must halt, too.”
Lean understands this kind of argument! We can prove to Lean that our function halts, by appending:
termination_by ack m n => (m, n)
Lean reads this and thinks: “Okay, you’re telling me to associate each function call ack m n
with the pair of numbers (m,n)
. I know about <
on pairs of numbers. And I know that this relation is ‘well-founded.’ And I can see that each call to ack
follows this relation backwards to a smaller pair of numbers. I therefore accept that this function always halts.”
🔐 Unlocked: Noetherian induction
Given some well-founded relation a<b,
if we can show P(x) from the assumption that P(y) for every y<x,
then we can conclude that P(x) for every x.
(You might notice that the above is the same as complete induction, just generalizing from natural numbers to anything where we can define “less than”.)
Proving machines halt 🤚
So this is how we can prove that Lean functions halt. But what if we want to talk about other models of computing — say, Turing machines, or JavaScript? Can we talk about those in Lean?
Yes! We can write an interpreter for the language we care about, then prove things about the interpreter. Here’s an extremely restricted form of computer in JavaScript:
function step(state) {
switch (state) {
case "A": return "B";
case "B": return "Halt";
case "C": return "D";
case "D": return "C";
case "Halt": return "Halt";
}
}
To run this useless computer, you repeatedly apply the step
function until reaching the Halt
state.
Consider starting with state A
, and repeatedly applying the step
function. Does it eventually reach the Halt
state?
No, it will halt, after two steps: after A
, it steps to B
, and then from B
, it steps to Halt
.
What about starting with state C
? Will that reach the Halt
state?
I think it won’t halt. After C
, it steps to D
. But then from D
, it steps back to C
again. Then this cycle repeats forever, never reaching the Halt
state.
So, if you need to heat your house, start with C
or D
! But can we prove this to Lean? First, we have to translate the program into Lean:
inductive State where
| A | B | C | D | Halt
def step : State -> State
| A => B
| B => Halt
| C => D
| D => C
| Halt => Halt
Recall that Lean only accepts a function definition if it can see that it always halts. Will Lean accept our definition of step
?
Right, step
has no recursive calls — or indeed, any function calls at all — so it’s easy to see that it halts.
This is a bit tricky. Lean will accept step
, because it has no recursive calls — or indeed, any function calls at all. For example, step C
immediately halts, returning D
. Note that Lean has no clue here that we’re intending to iterate the step
function, and has nothing to say on whether that will halt.
Next, we have say what it means to run this machine. This function runs it for n
steps:
def run : Nat -> State -> State
| 0, s => s
| n+1, s => step (run n s)
What is run 42 A
?
No, it’s Halt
. Notice that the Halt
state just loops back to Halt
. For run 42 A
, it follows the series A
→ B
→ Halt
→ Halt
→ ..., so for any n
greater than 1, the state will be Halt
.
(Note in passing that Lean accepts the definition of run
. Why? Because it can see that n
always decreases.)
So far, we’ve used Lean as a programming language — but Lean is also a formal system that lets us makes logical propositions and prove them. Here’s our first proposition:
def A_halts_prop: Prop :=
exists n: Nat, run n A = Halt
This says, “A_halts_prop
is the proposition that there exists some number n
such that run n A
equals the Halt
state.”
Note this doesn't prove it! It just makes the claim, and we can claim all kinds of false things! However, is this proposition true?
Indeed, it’s true. This proposition is just saying, “run n A
halts,” and it does so after just two steps.
No, I would say it’s true: there does exist such an n
, for example, n = 2
. That is, after 2 steps, it will reach the Halt
state.
To prove the proposition that an n
exists, one way is just to provide such an n
, like n = 2
. And we can write this proof in Lean:
theorem A_halts : A_halts_prop := by
exists 2
To check this proof, Lean evaluates run 2 A
, then checks that it equals Halt
. It does, so it accepts our proof.
What would have happened if we wrote exists 999999999
instead?
Right — it would just take a bit longer to evaluate! 🙂
No, Lean would still accept the proof, but it would take longer. It would evaluate run 999999999 A
, which eventually yields the value Halt
.
Proving programs don’t halt ➿
We’ve spent all this time proving that programs do halt, but what if we want to prove that a program does not halt? This ability will be important in our next chapter on Gödel’s incompleteness theorem.
Here’s the step
function again, this time as a diagram:
Just like before, we can propose that the machine halts when running from C
:
def C_halts_prop: Prop :=
exists n: Nat, run n C = Halt
Will Lean accept this definition?
Right, we’re not yet saying anything about whether the proposition is true, we’re just giving it a name.
This is tricky — Lean actually will accept this definition! The thing is, we’re not yet saying anything about whether the proposition is true, we’re just giving it a name.
But yes, the proposition is false, so we actually want to propose that there does not exist any n
such that run n C
is the Halt
state:
def C_hangs_prop :=
¬ exists n: Nat, run n C = Halt
To prove that a program does halt, we basically told Lean: “Just run it! You’ll see!” Can we use the same approach to prove that a program doesn’t halt?
Right, it’s a bit trickier.
No, we can’t — if the program doesn’t halt, then running it will never halt. Remember also that Lean will only run something if it knows beforehand that it will halt.
To prove this one, we’ll have to go back to our friend, induction. Remember that induction lets us prove that something is true for all values. Therefore, we need to change our proposition a bit, so that instead of saying ¬ exists n
, we instead say forall n
:
def C_hangs_prop :=
forall n: Nat, run n C ≠ Halt
This says, “for every natural number n
, run n C
does not equal the Halt
state.” Is this equivalent to our previous proposition?
I would say it’s equivalent. Generally, “for all x, it’s not the case that P(x)” is equivalent to saying “there does not exist an x such that P(x)”.
Now here’s an attempt at proving this by induction:
Base case: If we started at C
, the first state was not Halt
.
Induction step: If we started at C
, and the current state is not Halt
, the next state will not be Halt
either.
Read the induction step carefully! If you have a state that is not Halt
, does it follow that the next state is not Halt
?
The problem here is that our induction hypothesis, “the current state is not Halt
”, is not enough to conclude that the next state is not Halt
. For example, maybe the current state is B
— then the next state would be Halt
.
We need more information in the induction step. So here’s a second attempt at a proof by induction:
Base case: If we started at C
, the first state was either C
or D
.
Induction step: If we started at C
, and the current state is C
or D
, then the next state will either be C
or D
.
Again, read the induction step carefully! Can we prove it?
This time, the induction hypothesis is strong enough to let us prove the induction step. We can say, “well, if it’s C
, the next state will be D
. And otherwise if it’s D
, then the next state will be C
.”
🔐 Unlocked: Strengthening the induction hypothesis
If your proof is failing at the induction step, it might be easier to prove something stronger!
This principle can be counter-intuitive! It’s usually harder to prove something stronger, but when using induction, it can actually be easier.
Here’s the stronger theorem that we informally proved: “starting from C
, every future state will be either C
or D
.” And here’s that proof written in Lean — but you don’t need to understand it all:
theorem C_D_loop:
forall n,
run n C = C ∨ run n C = D
:= by
intro n
induction n with
| zero => apply Or.inl; rw [run]
| succ n ih =>
cases ih with
| inl hc => apply Or.inr; rw [run, step, hc]
| inr hd => apply Or.inl; rw [run, step, hd]
Using this theorem, we can weaken it to prove that the machine does not halt from C
. We just observe that neither of C
or D
are the Halt
state:
theorem C_hangs: C_hangs_prop := by
intro n
have is_c_or_d := C_D_loop n
cases is_c_or_d with
| inl is_c => rw [is_c]; simp
| inr is_d => rw [is_d]; simp
Conclusion
From here, we could add features like “input”, “output”, and “memory”. We’d quickly get to Turing Machines. We could go further, modelling JavaScript, or even Lean itself. To reason about them, the same principles will apply: somehow prove whether or not there exists some n:Nat
such that the program will halt after n
steps.
In this chapter, you’ve learned how to prove things using induction, and you’ve learned how to use this technique in Lean, to prove whether programs halt or not. Perhaps you’re wondering — just how much can I prove in Lean? Are there things it can’t prove? In the next chapter, you’ll learn Gödel’s first incompleteness theorem: a famous result that says, yes, for any formal system like Lean, there are true things that it cannot prove.
Endnotes
This chapter is free this week! If you like it and want to read the other chapters, please consider supporting me by buying the course!
Other reading material:
Here’s a nice intro to Noetherian induction by Joseph Sullivan.
Read more about theorem proving in Lean 4.
The Natural Number Game is super fun, although it’s for Lean 3.