Gödel’s first incompleteness theorem
An interactive tutorial
Back in 1931, Kurt Gödel published his first mathematical mic-drop: “Our formal systems of logic can make statements that they can neither prove nor disprove.” In this chapter, you’ll learn what this famous theorem means, and you’ll learn a proof of it that builds upon Turing’s solution to the Halting Problem.
What is a formal system? 🤵
Gödel’s theorem is about “formal systems”, but what are they? The standard examples on Wikipedia have weird names like “Zermelo–Fraenkel set theory with the axiom of choice”, and they use fancy math symbols like “∃x,y∈N”. You’d be forgiven for thinking ‘formality’ means black tie.
Yes, because we’ll instead illustrate the theorem with a dirty, concrete formal system: the Lean proof assistant, a program made by Microsoft Research.
Suppose we wanted to say that “for all natural numbers n, 0+n=n”. In Lean, we could write this as
(n : Nat) : 0 + n = n
You don’t need to understand the syntax precisely. Just scan over it and notice the 0 + n = n
. Now, we could also use Lean to state that “there exists a natural number n, such that n=n+1”. What do you think of this?
Mm ... I don’t think so. Take an example, say n=7. Here, n+1=8, and 7=8. This reasoning works for any n. If you can find an n where n=n+1, please email me!
I don’t think so either.
So, we need some way to distinguish true statements from false statements! Mathematicians do this using proofs. In Lean, we can prove our first statement with these magic words:
theorem zero_add (n : Nat) : 0 + n = n
:= by induction n with
| zero => rfl
| succ n ih => rw [Nat.add_succ, ih]
Don’t sweat the details here! You just need to know that if we run the lean
program on this file, it derives the statement that (n : Nat) : 0 + n = n
. We can imagine lean
as a function that takes a proof string, and either returns null
(meaning the proof string is nonsense), or returns the statement string that it proves. Examples:
// An accepted proof yields a theorem
lean("theorem zero_add ...")
=== "(n : Nat) : 0 + n = n";
// A nonsense proof proves nothing
lean("nonsense") === null;
(Note that lean
will always halt with either null
or a theorem. It just runs the finite list of commands in the proof; it cannot loop forever.)
Now let’s see what makes Lean a ‘formal system.’ A formal system has three things:
A language to make statements. In Lean, those are things like (n : Nat) : 0 + n = n
.
A language to make proofs. In Lean, those are things like theorem zero_add ...
, which we saw earlier.
A way to interpret a proof string as a way to generate a proved statement, which we call a theorem. In the Lean formal system, that’s the program lean
.
Enumerating theorems ... 🔢
So now the computer can help check our proofs. But finding theorems is still hard work! Can’t we get the computer to prove everything for us? How might we do that?
Here’s an idea ... just list out all the possible strings! Every Lean theorem is just a string, so it must be somewhere in the list of all possible strings! Here’s how we can list all strings, in order of increasing length, using a JavaScript generator:
// Long list of all Unicode chars!
const alphabet = "abc...012...αβγ...";
function* strings() {
yield "";
for (const s of strings())
for (const c of alphabet)
yield (s + c);
}
This generates strings in length order, shortest first: all 0-char strings, then all 1-char strings, then all 2-char strings, and so on. The list looks like this:
"", "a", "b", "c", ..., "aa", "ab", ...
Will the string "(n : Nat) : 0 + n = n"
appear in this list?
Right! This specific string will appear somewhere in the block of 21-char strings.
Actually, it will appear, but it will take a while. Since the string "(n : Nat) : 0 + n = n"
has 21 characters, it will turn up somewhere in the block of 21-char strings.
But now we have a problem. For each string in the list, we want to ask whether it’s a theorem. Can we do that by passing it to the lean
program?
The lean
program cannot tell us whether a statement is a theorem, because it does not accept statements as input. It accepts proofs as inputs! So instead, we will iterate through all strings, and try to interpret each string as a proof:
function* theorems() {
for (const s of strings()) {
const t = lean(s);
if (t !== null) yield t;
}
}
This procedure theorems()
goes through every possible string, and for each, calls lean
on it. If lean
says the string is a valid proof, it outputs the theorem that it proves.
Does theorems()
do what it says on the tin? 🥫
Before we get too excited, let’s distinguish two things:
The set of strings outputted by theorems()
The set of strings that are Lean theorems
Are these the same sets? Let’s check. First, consider the earlier Lean statement, (n : Nat) : 0 + n = n
. Will this statement eventually be found by theorems()
?
Right: we saw earlier that this statement has a proof that is accepted by Lean. theorems()
will eventually find that proof string, then output it.
Actually, it will be found; it will just take a while. We know that this statement has a proof that Lean accepts: theorem zero_add ...
. That string is 115 characters long. That string will turn up in strings()
, in the block of all 115-char strings.
Next, consider some string x outputted by theorems()
. Is it possible that x is not a Lean theorem? Read the source of theorems()
to check.
Remember the definition of ‘theorem’: any string that’s returned by lean
. Check the source: each value outputted by theorems()
came from lean
, so it must be a theorem.
Right: theorems()
only outputs a string if it has a proof that passes lean
— that is to say, it’s a theorem.
So, we’ve found that theorems()
is indeed one and the same as the set of all theorems! Although, here’s one wrinkle ... let’s say we wait around until the theorem (n : Nat) : 0 + n = n
is outputted by theorems()
. After that, is it possible that that same theorem could be outputted again?
Right.
Actually, in general, it is possible!
A theorem will be outputted once for every possible proof of that theorem. But this turns out to be unimportant; we can just de-duplicate these outputs.
Well ... the computer can generate all the Lean theorems. And that’s really cool! In fact, let’s take a moment to see just how cool it is ...
This was a preview of chapter 5 of Busy Beavers!. To read all chapters, buy the course for $19. Yours forever.
Statements about programs! 👩💻
Statements like “x+0=x” seem a bit dull. Is Lean limited to arithmetic, or can it tackle more interesting math? In the previous chapter, we learned about the halting problem. That topic is surely mathematics! So can we ‘formalize’ it in Lean?
It turns out, yes! We can teach Lean a simple language called the Turing Machine, and teach it how to run one. Then we can talk about whether a Turing Machine machine eventually halts. And here it is: a Lean (v3) proof of Turing’s theorem!:
theorem halting_problem (n) :
¬ computable_pred (λ c, (eval c n).dom)
| h :=
rice {f | (f n).dom} h nat.partrec.zero
nat.partrec.none trivial
So Lean can prove that the halting problem is undecidable. But if you run theorems()
, will it discover this theorem?
Right: theorems()
really does find every theorem!
Remember: theorems()
really does work! Since there is a Lean proof of Turing’s theorem, theorems()
will eventually find it, then output the theorem.
And not only that: theorems()
will find the shortest proof. Perhaps it’s much more elegant than the one we humans found. So, theorems()
really is amazing! Let’s try it out, and see if we can win ourselves a mathematical prize!
Theorem mining — better than crypto?
Instead of just listing out all theorems, let’s say we’re just interested in one particular mathematical claim, and we want to know whether it’s true. Consider the following famous function:
function collatz(n: number) {
while (n != 1) {
if (n % 2 == 0) n = n / 2;
else n = 3*n + 1;
}
}
Try it out! Does collatz(3)
halt?
Actually, it halts after the following steps:
n = 3. Odd, so triple it and add 1.
n = 10. Even, so halve it.
n = 5. Odd, so triple it and add 1.
n = 16. Even, so halve it.
n = 8. Even, so halve it.
n = 4. Even, so halve it.
n = 2. Even, so halve it.
n = 1. Halt!
Okay, now what about for any starting number? In 1937, Herr Collatz himself conjectured: “collatz(n)
always halts, for every natural number n
.” What do you think of this claim?
Well, Wikipedia says that nobody knows! If you genuinely have a proof, then glory and power awaits you! But my point is, the claim is either true or false.
That’s an interesting philosophical position! I disagree, though. Collatz’s claim seems well-defined, a bit like “all natural numbers n are either even or odd.” Imagine someone tomorrow revealed a genuine proof that the claim is true. Surely then it’s also true right now, even though we don’t know it yet?
Don’t worry, nobody else knows either! The point is, the claim is either true or false.
There’s a $1M prize for proving the truth or non-truth of this famous conjecture. So let’s deploy our amazing function theorems()
to earn some easy money! First, we write the conjecture in Lean:
import Init.Data.Nat.Basic
def step (n: Nat): Nat :=
if n%2 == 0 then n/2 else 3*n + 1
def coll_conj :=
∀n, n>0 -> ∃s,
Nat.repeat step s n = 1
Then we search through theorems()
to see whether there’s a proof:
const coll_conj = "...";
function is_collatz_true() {
for (const t of theorems()) {
if (t === coll_conj) {
// Proof found! It must be true!
return true;
}
}
// No proof found! It must be false!
return false;
}
Here’s how it should work: if the conjecture is true, is_collatz_true
should return true
, and if the conjecture is false, is_collatz_true
should return false
.
Now, imagine that the Collatz conjecture is false. Read the source: what does is_collatz_true
do in this case?
Interesting. You seem to be saying that Lean has a proof that the conjecture is true, even though it’s false! And actually, that’s possible, and we’ll talk about that in a few minutes! But for now, let’s assume Lean is not that broken.
Remember that theorems()
searches the infinite list of all strings for proofs. Assuming the Collatz conjecture is false, and Lean is not broken, it should never find a valid proof of the conjecture. It will keep searching forever, so the loop will never finish, and is_collatz_true
will never return.
Remember that theorems()
searches the infinite list of all strings. It will keep searching forever, so the loop will never finish, so is_collatz_true
will never reach the return false
line.
The problem is that we never actually search for a proof that the conjecture is false; we only search for a proof that it’s true!
In Lean, the word ‘not’ is written as ¬
. So the inverse of the conjecture is ¬coll_conj
. So let’s just add that to our search!
function is_stmt_true(s: string) {
for (const t of theorems()) {
if (t == s) return true;
if (t == "¬"+s) return false;
}
}
function is_collatz_true() {
return is_stmt_true(coll_conj);
}
This is looking better! We have an amazing function, is_stmt_true
, that can tell us whether anything is true or false, given enough time!
Read it through, and try to prove to yourself that is_collatz_true
really does eventually return the truth of the Collatz conjecture.
Eventually, I think you’ll realize the algorithm relies on this assumption: “theorems()
contains either coll_conj
or ¬coll_conj
. It does not contain both, or neither.”
This assumption is quite natural. We’ve already accepted that the Collatz conjecture is either true or false. So, similarly, theorems()
should contain either coll_conj
or ¬coll_conj
. This gets to the heart of what Godel wanted to prove.
Consistency 👎👍
But what if theorems()
contains both coll_conj
and ¬coll_conj
? Is that possible?
That’s a reasonable answer: maybe Lean’s rules allow for this! Although if you know of an example, you should file a bug report with the Lean developers. 😉
You must really trust Lean! And you’re not alone: people working with formal systems generally assume that this can’t happen. But in general, a formal system can have both S
and ¬S
as theorems. It really just depends on the proof rules that they allow.
Me neither! Really, it depends on the rules of Lean. If we trust Lean, it shouldn’t be possible. But in general, formal systems can have both S
and ¬S
as theorems. It really just depends on the proof rules that they allow.
This property is known as consistency. Lean is ‘consistent’ if there are no statements S
where both S
and ¬S
are theorems.
Recall the ‘proof by contradiction’ method that we used in Chapter 1 to prove Turing’s theorem. The idea is that, if an assumption leads to a contradiction, the assumption must be false. The Lean system lets you use this proof method to generate theorems.
Now, imagine that both coll_conj
and ¬coll_conj
are both theorems. Could we prove that 1+1=3
in Lean?
Right! Even though 1+1=3 is surely false, we could prove it in Lean using a proof by contradiction: “Let’s assume that ¬(1+1=3)
. But this leads to a contradiction: coll_conj
and ¬coll_conj
. We can therefore deduce that our assumption must be false, so 1+1=3
.”
So, even though 1+1=3 is surely false, we could prove it in Lean like so: “Let’s assume that ¬(1+1=3)
. But this leads to a contradiction: coll_conj
and ¬coll_conj
. We can therefore deduce that our assumption must be false, so 1+1=3
.”
Worse, we could use this method to prove anything in Lean! So, inconsistency would be disastrous. Mathematicians generally assume that their systems are consistent. (But it turns out to be rather hard to show it! We’ll see why in a future chapter!)
Completeness
This leaves one final possibility. What if theorems()
contains neither coll_conj
nor ¬coll_conj
? How disastrous would that be? Could you use that to prove that 1+1=3
in Lean?
If you know how, let me know!
I don’t think so either. At least, the proof by contradiction method can’t apply, since we don’t have a contradiction to work with.
That’s fine. I don’t quite know either. the proof by contradiction method can’t apply, since we don’t have a contradiction to work with.
In general, the absence of a proof doesn’t lead to a complete collapse. It would just mean that Lean is not powerful enough to comment on the truth of the statement.
However, it would still be extremely nice if Lean were able to say, one way or another, whether the Collatz conjecture is true. And it would be even nicer if Lean were able to say, one way or the other, whether any statement is true.
If this were the case, we would say that Lean is complete. Precisely: if Lean is complete, then we can make any statement S
, and either S
or ¬S
is in theorems()
.
So, earlier when we tried to prove that is_stmt_true
is a correct algorithm, we relied on the assumption that Lean is both consistent and complete.
A proof using Turing’s theorem
So, is Lean consistent and complete? Here’s Gödel’s famous answer: No, Lean is either inconsistent or incomplete!
Our proof works by way of contradiction. The assumption: Lean is consistent and complete. We decided earlier that, if this is true, then our is_stmt_true
function works perfectly.
Now, recall the Halting Problem from the previous chapter. This problem asks for a function halts(tm)
that returns true
or false
telling us whether the Turing Machine tm
halts. With our is_stmt_true
function, we can easily solve the Halting Problem!:
function halts(tm: string) {
const lean_stmt = `halts ${tm}`;
return is_stmt_true(lean_stmt);
}
We saw earlier that Lean can talk about whether Turing Machines halt. So we can construct a Lean statement claiming that the Turing Machine tm
halts. Then we just ... ask whether that statement is true!
Yes! We also learned in Chapter 1 that the Halting Problem is impossible to solve! So we got a contradiction!
The problem must be our initial assumption, that “Lean is consistent and complete.” We can therefore conclude that Lean is either inconsistent or incomplete!
What about other formal systems?
In this chapter, we applied this reasoning to Lean, one specific formal system. But the same reasoning can be applied to any formal system F, with fairly minimal assumptions:
1. F is enumerable. (This comes for free via the theorems
trick.)
2. We can reason about programs and halting in F.
Gödel’s original proof was slightly stronger: it only required that F can reason about arithmetic, rather than programs and halting. But it turns out that pretty much any formal system can reason about programs — we’ll learn how easily in a future chapter!
Conclusion
You might be wondering ... Okay, perhaps we can’t show that our systems of logic are complete, but perhaps we can at least show that they are consistent. But Gödel has some more bad news for you, coming up in the next chapter ...
If you want to know when the next chapter comes out, sign up for updates here! You might also enjoy our course Everyday Data Science, which teaches fundamental data science using the same interactive format.
Next in Busy Beavers!:
6. Godel’s second incompleteness theorem
Not satisfied with one bombshell, Kurt quickly arrived at a second. So what if there are things our systems can’t prove? Perhaps those things are unimportant anyway. Well, Kurt’s second theorem finds a very important thing that these systems can’t prove: their own consistency!
In this chapter, we will learn why our systems, if they are consistent, cannot prove their own consistency. This builds neatly on top of the first theorem.