Formalizing Rubin's theorem

2024-02-10 Back to blog list

During my semester abroad at the University of Saarland, I followed Pr. Laurent Bartholdi’s course on computer-assisted proofs in Lean, both because I wanted to get back into the field of computer proofs, and because I was excited at the idea of contributing to Mathlib, which is an impressively large collection of theorems and mathematical tools, all proven in Lean.

About a month into the lecture, Laurent handed me his work-in-progress formalization of Rubin’s theorem, which was still written for Lean 3, with the task to port it to Lean 4 and to make some progress with this formalization. Since then, I have successfully finished this formalization, and I am currently in the process of cleaning it up and slowly getting it merged into Mathlib!

In this blog entry, I will try my best to introduce the necessary knowledge required to understand what Rubin’s theorem is about, and then go into the interesting moments of the process of formalizing this theorem in Lean.

Groups, topologies and group actions

This is a crash course on the necessary concepts of groups, topologies and group actions to be able to formulate Rubin’s theorem, and hopefully develop an intuition for what it means.

These are structures that you most likely have already encountered somewhere, and they have a lot of interesting and useful properties besides the elementary ones listed here.

If you are already familiar with these concepts, then you can safely jump to the last section of this chapter.

Groups

Groups are a common structure that arises when studying objects like addition, multiplication, matrices or sequences of moves on a Rubik’s cubes. All four of these examples have a binary operation that can be cancelled out or “undone”, yielding a “do nothing” element (respectively 00, 11, the identity matrix and the empty sequence).

A group mirrors these properties in a more abstract and generic way, allowing us to forget about the specifics of how each of the examples above work. As such, every group is made up of four components, each mirroring one of the interesting aspects of these structures:

  • The carrier set,

    which we will refer to as G\mathbf{G}. It contains all of the elements of the structure we wish to study.

  • An identity element,

    here written 11 (or 1G1_{G}). It represents the “do nothing” element.

  • A binary operation,

    here represented using the multiplication symbol *. It represents the act of combining or chaining together two elements gg and hh to get a compound element ghg * h, that does what hh does followed by what gg does.

    Note: the left-to-right interpretation is also possible, but I will use the right-to-left interpretation, since this is how matrices naturally multiply together.

  • The inverse operation,

    which is often written g1g^{-1}. It represents the act of “undoing” an element. For instance, the Rubik’s cube move R1R^{-1} (also written RR') undoes the move RR.

Then, these four components follow a couple of rules:

  • Identity:

    Multiplying by 1G1_{G} must do nothing, so for any gGg \in \mathbf{G}, we have 1g=g1 * g = g and g1=gg * 1 = g.

  • Associativity:

    For any g,h,iGg, h, i \in \mathbf{G}, we have g(hi)=(gh)ig * (h * i) = (g * h) * i. This essentially lets us forget about parentheses.

  • Inverse:

    For each element gGg \in \mathbf{G}, multiplying it by g1g^{-1} cancels it out: gg1=1g * g^{-1} = 1 and g1g=1g^{-1} * g = 1.

  • Closure:

    If gg and hh are in G\mathbf{G}, then ghg * h and g1g^{-1} are also in G\mathbf{G}

There are many examples of groups in the wild, some of which you might already be familiar with. For instance:

  • Integers, armed with addition: G=Z\mathbf{G} = \mathbb{Z}, 1G=01_{G} = 0, gGh=g+hg *_{G} h = g + h and g1=gg^{-1} = -g
  • Non-zero rational numbers, armed with multiplication: G=Q0\mathbf{G} = \mathbb{Q} \setminus {0}, 1G=11_{G} = 1, gGh=ghg *_{G} h = g * h and g1=1/gg^{-1} = 1/g
  • Invertible, square matrices
  • Sequences of moves on a Rubik’s cube, with * representing the chaining of two sequences of moves, and M1M^{-1} undoing all of the moves in MM

I highly recommend watching 3Blue1Brown’s video on group theory for a more visual introduction to group theory, and Lingua Mathematica’s video on the associativity rule for a more intuitive interpretation of the group axioms.

Note that multiplication doesn’t need to commute: for any two elements gg and hh, ghhgg * h \ne h * g is allowed.
This is notably the case for matrix multiplication, or on a Rubik’s cube (see example below). In fact, we will see later that groups in which multiplication always commutes (also known as abelian groups) cannot be used in Rubin’s theorem.

Drawing of how to get to U*R' and R'*U On a Rubik’s cube, when doing U after R' (ie. U*R'), we get a different position than when doing R' after U (ie. R'*U).

Lastly a subgroup of G\mathbf{G} is a subset H\mathbf{H} of G\mathbf{G} containing 1G1_{G}, and in which the multiplication and inverse operations are closed: gH,g1H\forall g \in \mathbf{H}, g^{-1} \in \mathbf{H} and g,hH,ghH\forall g, h \in \mathbf{H}, g * h \in \mathbf{H}.

The subgroups of G\mathbf{G} can be ordered with the relationship, and there are two subgroups found in every group:

  • G\mathbf{G} itself, which is the biggest subgroup of G\mathbf{G}, and is therefore usually referred to as the top subgroup, or \top
  • {1G}\{1_{G}\}, which only contains G\mathbf{G}‘s identity element 1G1_{G}. This is the smallest possible subgroup of G\mathbf{G}, so it is also referred to as the bottom subgroup, or \bot

Topological spaces

You might remember from calculus the concept of a continuous function: it is a function which takes as input a real number, and outputs another real number, such that you never have to “lift your pen” to draw the graph of this function.

However, this “pen”-based definition quickly falls short when we consider continuity for other kinds of functions:

  • What does it mean for a function to be continuous on the rational numbers Q\mathbb{Q}? The “pen” always has to “jump” from one rational to the other, but that shouldn’t be a reason to discard the function f(x)=x2f(x) = x^2 as discontinuous.
  • For functions that take as input two numbers instead of one, the “pen” would need to draw surfaces instead of lines, which is something that pens are usually incapable of doing.
  • What about functions that operate on other functions? Translating a function (for instance, ϕ(f)=(xf(x)+1)\phi(f) = (x \mapsto f(x) + 1)) sounds like a continuous operation, but I doubt I would convince anyone if I said that I never lifted my “function pen” while doing so.

A discontinuous transformation of functions “Can you tell where I lifted my function pen?”

To remedy this, mathematicians came up with the concept of a topological space. Like groups, a topological space is made up of a few basic building blocks, which follow a set of rules:

  • A set X\mathbf{X}, called the universe.
  • A collection of subsets of X\mathbf{X}, referred to as τ\tau or the open sets of X\mathbf{X}.
  • The empty set must be in τ\tau: τ\empty \in \tau.
  • X\mathbf{X} itself must be in τ\tau: Xτ\mathbf{X} \in \tau.
  • If SS and TT are both open, then their intersection must also be open: SτTτSTτS \in \tau \Rightarrow T \in \tau \Rightarrow S \cap T \in \tau.
  • If we have a collection of open sets C\mathbf{C} (such that SC,Sτ\forall S \in \mathbf{C}, S \in \tau), then the union of all of these sets must be open.
  • A set SS is closed if XS\mathbf{X} \setminus S is open.

In the case of the real numbers, τ(R)\tau(\mathbb{R}) contains all of the open intervals, and unions of open intervals.
(0,1)(0, 1) and (,0)(-\infty, 0) are both open sets, but (0,1](0, 1] and [2,2]={2}[2, 2] = \{2\} aren’t.
In fact, {2}\{2\} is closed, since R{2}=(,2)(2,+)\mathbb{R} \setminus \{2\} = (-\infty, 2) ∪ (2, +\infty) is open.

Using this definition, a function ff is then said to be continuous if for every open set SτS \in \tau, the set of points that ff maps inside of SS must also be open: f1(S)τf^{-1}(S) \in \tau. For functions from R\mathbb{R} to R\mathbb{R}, this condition is equivalent to the pen-based definition.

An example of a discontinuous function, which doesn't respect the topological property of continuity The function ff is discontinuous, because for S=(2,3)τS = (2, 3) \in \tau, f1(S)=[2,3)f^{-1}(S) = [2, 3) isn’t open.

Conditions on topological spaces

You might have noticed that the above definition for a topological space leaves a lot of freedom on the choice of τ\tau, perhaps even too much freedom.
On the one hand, if τ={,X}\tau = \{\empty, \mathbf{X}\} (the trivial topology of X\mathbf{X}), then the only open sets are the universe and the empty set, which is not very useful.
On the other hand, if τ={S    SX}\tau = \{S \;|\; S \subseteq \mathbf{X}\} (the discrete topology of X\mathbf{X}), then all the sets are open (and coincidentally closed), so every function from X\mathbf{X} is continuous.

For this reason, we often impose some common, additional conditions that τ\tau and X\mathbf{X} must respect. Here are the ones that Rubin’s theorem requires:

X\mathbf{X} must be Hausdorff: for any two xyx \ne y, you can find an open set SS around xx and an open set TT around yy, that do not intersect (ST=S \cap T = \empty).

For the reals, assuming that x<yx < y, the sets S=(,x+y2)S = (-\infty, \frac{x + y}{2}) and T=(x+y2,+)T = (\frac{x + y}{2}, +\infty) satisfy this condition. A similar construction can be done for x>yx > y.

A depiction of the Hausdorff property on the real plane If X\mathbf{X} is Hausdorff, then no matter how close xx and yy are, we can draw disjoint open sets around both of them.

X\mathbf{X} must not have any isolated points: for every xXx \in X, the set that only contains it, {x}\{x\}, must not be open. This condition essentially forbids us from choosing the discrete topology.

We saw before that for the reals, {x}=[x,x]\{x\} = [x, x] is never an open set (it is instead only a closed set), so R\mathbb{R} has no isolated points.

An example of an isolated point In the topological space R({0}[1,2])\mathbb{R} \cap (\{0\} \cup [1, 2]), the point 00 is isolated.

X\mathbf{X} must be locally compact: for every point xXx \in X, it should be possible to obtain a compact set in the neighborhood of xx; that is, there is a set TT such that every sequence of points in TT must converge to a point in TT, and there exists a set SτS \in \tau such that xSx \in S and STS ⊆ T.

The reals are an example of a locally compact space, since every closed set is compact (for instance, any sequence in [0,1][0, 1] cannot converge to any number outside of that range). We thus simply have to take the interval T=[x1,x+1]T = [x - 1, x + 1] to obtain a compact set in the neighborhood of some xx, and S=(x1/2,x+1/2)S = (x - 1/2, x + 1/2).

Interior and closure

There are two common operations in topology that allows one to obtain an open or a closed set from any arbitrary set SS:

  • The interior of SS is the largest open subset of SS: int(S)Sint(S) \subseteq S and Tτ,TSTint(S)\forall T \in \tau, T \subseteq S \Rightarrow T \subseteq int(S). It can be obtained by taking the union of all open subsets of SS.
  • The closure of SS is the smallest closed superset of SS, it is the dual of the interior operation: cl(S)=Xint(XS)cl(S) = X \setminus int(X \setminus S)

Note that the interior of SS may be empty if SS is too small, and the closure of SS can be X\mathbf{X} itself if SS is too big.

Group actions

A group G\mathbf{G} is said to acts on some set X\mathbf{X} if, for every xXx \in \mathbf{X}:

  • For every gGg \in \mathbf{G}, the result of the action of gg on xx (which we write gxg \bullet x) is in X\mathbf{X}: gxXg \bullet x \in \mathbf{X}
  • For every gGg \in \mathbf{G} and hGh \in \mathbf{G}, the action of ghg * h corresponds to the compound action of hh and gg: (gh)x=g(hx)(g * h) \bullet x = g \bullet (h \bullet x)
  • The action of 11 does not move xx: 1x=x1 \bullet x = x

An example of such a group action is the matrix-vector multiplication: when evaluating (M1M2)v(M_1 * M_2) \bullet v, we can first transform vv by M2M_2 and then transform it by M1M_1, or we can compute M=M1M2M = M_1 * M_2 once and then transform vv by MM.

Another famous example of a group action is the action of the cyclic group by rotating points on a 2D plane:

The cyclic group C₃, acting by rotation on ℝ² The group action of (r3)ix(r_3)^i \bullet x corresponds to rotating xx by 120i120 * i degrees.

Conditions on group actions

Just like with topological spaces, Rubin’s theorem requires that the group action respects a few additional rules, namely:

The action must be faithful: for any g,hGg, h \in \mathbf{G}, if gg and hh move points in the same fashion, then they are equal: (x,gx=hx)g=h(\forall x, g \bullet x = h \bullet x) \Rightarrow g = h.
In other words, we can tell group elements apart simply by looking at how they move points in space.

The action must be continuous: for every gGg \in \mathbf{G}, the function f(x)=gxf(x) = g \bullet x must be continuous.

The action must be locally dense: this is a somewhat complex condition that is specific to Rubin’s theorem, but it can be summarized as follows:

  • Pick an open set SτS \in \tau and some xSx \in S.
  • Find all the elements gGg \in \mathbf{G} that leave points outside of SS unmoved (y∉S,gy=y\forall y \not\in S, g \bullet y = y); the set of all of these elements forms a subgroup of G\mathbf{G}, which we will call HH.
  • Construct Hx={hx    hH}H \bullet x = \{ h \bullet x \;|\; h \in H \}, the set of all the points towards which xx gets moved by the elements of HH.
  • Compute the closure of that set: cl(Hx)cl(H \bullet x).
  • Then, there must exist an open set TT containing xx that is a subset of the previously-constructed set: xTτx \in T \in \tau and Tcl(Hx)T \subseteq cl(H \bullet x).

Essentially, this last condition requires that for arbitrarily small open sets SS, we must always find enough elements in G\mathbf{G} to “fill in” enough of the space in SS around xx.

Visual representation of local denseness A visual representation of the condition that an action is locally dense.

Homeomorphism

A homeomorphism is a function f:XYf : \mathbf{X} \rightarrow \mathbf{Y} such that:

  • ff is bijective: it maps one-to-one values in X\mathbf{X} to values in Y\mathbf{Y}
  • ff is continuous
  • f1f^{-1} is also continuous

Homeomorphisms are useful tools, since the open sets in YY can be obtained from the open sets in XX by mapping them through ff, and vice-versa (τ(Y)=f(τ(X))\tau(\mathbf{Y}) = f(\tau(\mathbf{X})) and τ(X)=f1(τ(Y))\tau(\mathbf{X}) = f^{-1}(\tau(\mathbf{Y}))).
As such, if one can construct a homeomorphism between two topological spaces, then these two spaces will share a lot properties and be essentially topologically equivalent.

Rubin’s theorem

We can now finally state Rubin’s theorem!

An action of a group G\mathbf{G} on a topological space X\mathbf{X} is called a Rubin action if:

Then, if we have two Rubin actions of G\mathbf{G} on X\mathbf{X} and of G\mathbf{G} on Y\mathbf{Y}, then there exists a homeomorphism ϕ:XY\phi : \mathbf{X} \rightarrow \mathbf{Y} that preserves the action of G\mathbf{G}: gϕ(x)=ϕ(gx)g \bullet \phi(x) = \phi(g \bullet x).

The existence of this homeomorphism means that the two spaces and their respective group action by G\mathbf{G} are essentially unique: we can losslessly go from one to the other.

Computer-assisted theorem proving

Computer theorem provers are programs that provide ways to express and verify mathematical theorems. The advantage of using such a theorem prover is that you reduce human error in the proof verification process to almost zero: as long as the theorem prover functions well, and your definitions are accurate, you can be guaranteed that your proof is 100% correct if the theorem prover accepts it.

The one downside when one wants to prove things using such a theorem prover, though, is that one has to prove these mathematical statements to a computer. No taking shortcuts, hand-waving details away or leaving lemmas as exercise to the reader; everything needs to be proven thoroughly, from properties as simple as 1 + 1 = 2 to the Fundamental Theorem of Algebra.

The computer rarely accepts proofs by triviality. Lean is equally unphased by my attempts to convince it with snacks.

Theorem provers have been around for a few decades now, notably with Coq and Isabelle.

Lean is, in comparison, a relatively recent addition to the theorem prover landscape, as it was first released in 2013. Its main selling point is that it is easier to use (it provides a VSCode extension whose quality is on-par with rust-analyzer’s), and it powers the community-developped library, Mathlib, which aims to formalize as much of modern mathematics as possible.

While not being a central theorem, Laurent and I believe that Rubin’s theorem would be a neat addition to Mathlib, as it can be used as a stepping stone in a few already-existing theorems (J. Belk et al. have compiled a nice list of its applications in their article).

Our proof closely follows the proof by J. Belk et al, as it is the most concise proof of this theorem to date. However, as I started working on this project, all I had was a link to their paper and a 1000-lines long, unfinished proof, written for Lean 3, covering the first few lemmas of the paper. So, there was and still is a lot left to do to get to this project’s final goal.

Porting to Lean 4

The first official release of version 4 of Lean was only made a couple of months ago, but projects like Mathlib had been preparing themselves for the release and had already ported themselves over to the newer version. This meant that my first order of business was to port the existing proof over to Lean 4.

Thankfully, to aid with porting Mathlib from version 3 to version 4, the Lean community built a tool specifically for this task, called Mathport, which is essentially a transpiler:

  • First, you prepare the port by specifying which files, functions and theorems to port, and how to rename them.
  • Mathport first compiles the old code (which takes a little while).
  • Mathport then compiles the new dependencies and your annotations.
  • It then takes the data it gathered from your functions and theorems written for Lean 3, and it replaces all function/theorem names according to the annotations you just wrote and those maintained within Mathlib, before generating new Lean 4 code based on this.
  • Finally, Mathport does a final compilation step to verify that the new code works.

I only really needed to do one round of Mathport, but one definition in the middle of the file failed to port correctly.
What I should have done at that point would be to split the file in two, port the first half, then remove its corresponding annotations, manually port the problematic definition and automatically port the rest.

… But instead I just ported the second half by hand.

I didn’t really lose much time doing that, though, because it let me familiarize myself with the new semantics of Lean 4, and because I had to clean up some of the ported code anyway, especially code that had been golfed in Lean 3:

An example of code that Mathport munched up into spaghetti. It turns out that golfing code doesn’t play well with automatic code translation tools.

TODO ALERT: These next sections are still unfinished!
I’m glad you could follow so far, but I recommend waiting until I come around to properly writing the rest of this article.

Topological basis

The first step in the proof is to construct a topological basis (a subset of τ(X)\tau(\mathbf{X}) that, among other conditions, can be used to re-construct τ(X)\tau(\mathbf{X})), and a one-to-one mapping from it to a purely group-theoretic construction.

Filters

Filters are a generalization of limits in calculus. They are especially useful in expressing theorems about limits to a computer theorem prover, since you often only need to state the theorem once, and it will hold true for every “flavor” of limits (from above or below, at a value or towards infinities, diverging or converging).

At its core, a filter FF is a subset of a partially ordered set (S,)(\mathbf{S}, \le), such that:

  • FF is non-empty: there is some xFx \in F.
  • FF is upwards-closed: if xFx \in F, then for all yxy \ge x, yFy \in F.
  • FF is downward-directed: if x,yFx, y \in F, then there exists some zFz \in F such that zxz \le x and zyz \le y.

In topology, we usually pick as S\mathbf{S} the powerset of X\mathbf{X} (P(X)\mathcal{P}(\mathbf{X}), the set of all subsets of X\mathbf{X}), and as \le the “is a subset of” relationship \subseteq. With this choice of S\mathbf{S} and \le, the rules are often stated in a different, but equivalent manner (proving that these two are equivalent is a fun little exercise):

  • XF\mathbf{X} \in F: the universe must be in FF.
  • FF is upwards-closed: if a set tFt \in F, then any superset utu \supseteq t will be in FF.
  • FF is closed under intersection: if t,uFt, u \in F, then tuFt \cap u \in F.

The most common filter in topology is the set of all open sets containing some point xXx \in \mathbf{X}, which we call the neighborhood of xx: N(x)={s      tτ(X),  tsxt}\mathcal{N}(x) = \{ s \;|\; \exists\; t \in \tau(\mathbf{X}), \; t \subseteq s \land x \in t \}.

For instance, the Epsilon-Delta definition of a limit has an equivalent definition using the neighborhood filters only:

limcxf(c)=yf(N(x))=N(y)\lim_{c \rightarrow x} f(c) = y \enspace\Leftrightarrow\enspace f(\mathcal{N}(x)) = \mathcal{N}(y)

Ultrafilters

We say that a filter F1F_1 is finer than a filter F2F_2 (F1F2F_1 \preceq F_2) if every element of F2F_2 is in F1F_1 (in other words, and maybe confusingly, F1F2F2F1F_1 \preceq F_2 \Leftrightarrow F_2 \subseteq F_1). The filter that contains every element, S\mathbf{S} itself, will thus be the finest filter: F,SF\forall F, S \preceq F.

Ultrafilters are filters with one added condition: for a filter UU to be an ultrafilter, there must be no filter finer than it, other than UU and S\mathbf{S}. You can think of ultrafilters as the cousins of prime numbers.

Ultrafilters play a major role in the latter parts of the proof of Rubin’s theorem. Pr. M. Rubin uses the order-preserving equivalence seen in the previous section to construct an equivalence from the ultrafilters in the regular support basis to ultrafilters in the algebraic support basis, that preserves the convergence of an ultrafilter.

Thankfully, filters and ultrafilter are really well developped in Mathlib, and are the backbone of its topology section. Unfortunately, at the time of writing, those are defined using the alternative conditions stated earlier, which means they can only be used with (P(X),)(\mathcal{P}(\mathbf{X}), \subseteq): no ultrafilter in the regular support basis, no filters of subgroups.

There is an alternative structure defined, called Order.PFilter, which uses the main conditions, but it lacks a lot of theorems and tools (notably, it doesn’t yet have a way to obtain ultrafilters). Speaking with other members of the Lean community, it turns out that Order.PFilter is planned to be replaced with a new, cleaner structure. This however would take several months of work to happen, which neither I nor those who wanted to do this change currently have.