MCU Bootloaders: Boxing up

2024-11-29 Back to blog list

This article is the third one in my series about bootloaders for microcontrollers, where I’m exploring the clever algorithms that make those pieces of software possible, as well as the ones I developped for Polyboot (a bootloader geared towards updating devices that contain multiple microcontrollers).
If this is your first time here, then I strongly recommend you to read the opening article of this series beforehand, where I explain the basics of what a bootloader for microcontroller is and does.

In the last article, I showed how one can carefully design an algorithm to be resistant to sudden shutdowns — a situation that our little bootloader can unfortunately find itself in. With this algorithm, we now know how a bootloader can update the application from one version to another.

Don’t worry if you didn’t catch everything, I prepared a recap :)

There’s still a bit of work left, though, before we can say that our bootloader can reliably update a microcontroller. It still doesn’t know when to update, or where the application is. Today, we’re going to teach it exactly that!

There isn’t going to be any algorithm in this article, so you should be good if you made it so far.

Partitioning the flash

Just like on regular computers, we need to impose some structure on how we are using the internal storage. On regular computers, the internal storage is split into multiple partitions. For instance, you might have a partition dedicated to the read-only system files, one for the bootloader and one for your files. This is commonly done using MBR or GPT.

On a microcontroller, it would be very wasteful to use these partition schemes: we don’t need all of their features and storage space is a premium. Instead, we are going to let the developer choose the partitions ahead of time and bake it directly into the bootloader’s source code.

Which partitions do they have to choose?

We are going to need 4 different kinds of partition, each serving a different purpose:

The bootloader partition will contain the code for the bootloader.
The primary partition will contain the current version of the application. This is where the application will be run from.
Secondary partitions will contain the next or the previous versions of the application.
The scratch partition is used by our bootloader to always keep a copy around of the data it is processing, to ensure that it is resistant to sudden shutdowns.

The partitions must, naturally, not overlap and be big enough for what we want to store in them.

They also need to be aligned to a page of the flash: it must be made up of a whole number of pages, otherwise we risk loosing information from another partition if we try to erase the memory allocated to a partition.

Oh, wait! My recap!

Moxie's flash memory cheat sheet

Welcome to my small recap about how flash memory works!

Flash memory is very cheap but a bit difficult to write to. In microcontrollers, we are using a variant called “NOR” flash memory, which is fast enough to read that we can directly execute the code from it.

We can read from flash memory with little to no restriction, but for economical and physical reasons, writing to it is trickier:

By default, all bits are set to 1.
We can write to a word, setting the bits we want to 0 and leaving the others untouched.
A word is usually 2 to 8 bytes long.
We can erase an entire page, setting all bits back to 1.
A page is quite big, usually something like 1024 to 4096 bytes!
We can only write each word a set number of times.
On some devices, we can only write each word once! Gasp!

A visual explanation of how flash memory works, as described above.

A few examples

For our simple, example bootloader, we will require the flash memory to be split into four different partitions: one for the bootloader, a primary and secondary partitions (of similar sizes) and a small scratch partition. This is the most basic layout required for the scratch-swap algorithm to work, but it can get the job done:

The most basic partition layout: bootloader, primary, secondary and scratch

MCUBoot gives you some more freedom: it lets you have multiple primary and secondary partitions, but each secondary partition must be paired with a primary partition. The only way to update the code within a primary partition is to put its new version within its associated secondary partition:

A more complex partition layout under MCUBoot: two primary and two secondaries, with each primary being paired up with a secondary.

Polyboot, on the other hand, lets you have fewer secondary partitions than primary partitions. As long as there is one secondary partitions, new versions of any of the applications running across the device can be placed in any secondary partition, and the bootloader will take care of the rest:

A partition layout made possible with Polyboot: only one secondary partition serves two primary partitions, freeing up space on the auxiliary MCU.

This means that you can dedicate almost all of a microcontroller’s internal storage to the application it needs to run, allowing board designers to use a cheaper microcontroller!

Nifty!

Images

Okay, now we know where the application and its new version are, but how do we know when to update?

The bootloader indeed cannot just guess when there is an update available. It needs to be able to read what’s inside the primary and secondary partitions to gather several pieces of information:

Whether or not there is an application within that partition.
What version of the software it is.
How many bytes that version of the software uses.
Some kind of checksum, to ensure that nothing got corrupted.
Other metadata, like cryptographic keys or, in the case of Polyboot, to which microcontroller the application belongs.

There is also some information that the bootloader needs to modify at runtime:

Whether or not an update is under way.
The journal for the scratch-swap algorithm.

That’s a lot to manage!

Clearly, we need a way to neatly organize all of this information into an image: a “box” in which we can place the application’s code and the metadata our bootloader needs, that can easily be handled by a machine (which here is our bootloader).
This is not to be confused with the images that a camera takes!

Drawing of a chip character opening one of several boxes; the contents of the opened box are glowing.

There are many, many ways to design an image format, but I chose for Polyboot to use an extension of MCUBoot’s image format. This lets developers using my bootloader reuse the tools they already built around MCUBoot for creating, flashing or analyzing images.

MCUBoot’s image format looks something like this:

A visual breakdown of MCUBoot's image format: the application is preceded by a header with magic bytes, the version and the length of the application. After the application is a TLV area, containing notably the checksum. At the very end is the trailer, with another set of magic bytes, the "image_good" flag and the "is_updating" flag.

The exact details of this format are not important for us today, but we can look at the more interesting aspects:

The first few bytes of the image are called “magic bytes”, which were chosen randomly. Their presence indicates that we are, in fact, looking at an MCUBoot image.
The application’s code and metadata is protected by a checksum, which is stored near the end of the image.
TLV stands for Type-Length-Value: it’s a way to efficiently encode data of various types and sizes.

The trailer

What’s that thing at the very end of your image?

This is called the trailer, it also contains a few magic bytes, but it most importantly contains the data that the bootloader will modify.

The data stored in this trailer is different between MCUBoot and Polyboot, but both variants contain the journal and a flag (is_updating), indicating whether the scratch-swap algorithm is being ran.

Unlike the rest of the image, there is no checksum mechanism in place to ensure that the data remains sound. We also still need to abide by the flash storage medium’s rules, so that means pulling off all the tricks we can!

What kind of tricks?

You might have already noticed an issue with the is_updating flag. During an update, the bootloader would have to set that flag twice: once to turn it on, right before the scratch-swap algorithm begins, and once to turn it back off.

If you remembered well, some microcontrollers make it impossible to set such a flag twice within a single word!

The trick is to store this flag into two words: one indicating that the update of that image started, and one that it finished.

A table of the different states of the two words used to encode a flag: when both are equal, then the flag is interpreted as "off", but when the first one has been programmed and the second one hasn't, the flag is interpreted as "on". Encoding of the is_updating flag using two words.

When to update

With all of this in place, we can finally teach our bootloader when to update!

Obviously, we want it to update if the version read in the secondary partition is newer than the version read in the primary partition. But, if you recall from the last article, we also would like for it to revert a faulty update.

How is it going to know when an update is faulty?

This is a surprisingly tricky problem. The solution that MCUBoot uses and that I have decided to keep is to ask the application to tell the bootloader when it thinks it is “good”. If any new version is faulty, then it will crash before telling the bootloader that it is good. The next time the device reboots, the bootloader can detect that it happened, and revert the faulty update!

We store the fact that an image is good with a flag called image_good inside of the trailer, as it will need to be modified. The reset value of image_good is interpreted as false, and once the application notifies us that it is good, we program it to a value interpreted as true.

Note that this isn’t a perfect solution: a critical bug could very well appear after the application notifies the bootloader. There is sadly not much else that one can do to prevent this kind of scenario, besides notifying the bootloader as late as possible.

Here are the different rules that we can teach to our bootloader so it finally knows when to update:

If the secondary partition contains a newer version, then update the application.
If the primary partition was not flagged as good when the device booted up, then revert the update.

Illustration of the update flow: first, swap V1 and V2, then boot into V2. If V2 is good, keep it, otherwise revert the update. A typical update sequence using the rules above.

Letting developers skip bumping the version

We can also choose to add the rule that if the secondary partition contains the same version as the primary partition, but isn’t flagged as good, then we update the application.

This scenario doesn’t happen normally (see the graph above), but it makes it so that developers don’t need to bump the version number whenever they want to try out a new version locally.

I implemented it by comparing the tuple (version, !good) of each partition, instead of just comparing their version.

A working bootloader

One last step is to give our bootloader the ability to, well, boot the application. Thankfully, on Cortex-M microcontrollers, this is pretty easy :)

In C, you will have to write some assembly to read and set the starting address and the stack address, while in Rust the authors of cortex_m lovingly put together cortex_m::asm::bootload, which does this exact thing for you.

With this out the way, we have a working bootloader that is almost resilient to sudden shutdowns!

“almost”?! I thought we were fully resilient!

Well, the scratch-swap algorithm from the last article is resilient, but it can only be used to update the body of the image, not the trailer.

The next step in our journey will be to see how MCUBoot handles this problem, as well as my different attempts at solving this issue in Polyboot, where there isn’t just one primary and one secondary partition. You will also finally get to see a new algorithm :D

Let’s meet again in the next article for… a “detour”?

I’ll explain when we get there :)