Toybox Linux in QEMU

Let’s try to spin up a minimal Linux system. Specifically, we’re going to run a self-compiled Linux kernel in QEMU with a Toybox-based userland. Why are we doing this? I think a minimal system like this is great starting point to poke around and explore without getting lost. It’s also a good base to do start some kernel hacking. At least conceptually it is also pretty much how a lot of Linux firmware for embedded devices is built, if this is the route you want to go. Setting up such a basic system is also surprisingly simple with modern tools. So let’s get started.

Step 1: Building the Kernel

A simple build of the Linux Kernel is actually surprisingly straight forward. The Kernel is probably one of the most self-sufficient pieces of software. And it has to be. The Kernel being the Kernel does not have the luxury of a libc or a hodgepodge of shared libraries available. It’s just raw CPU instructions that set up the base for everything else.

A good place the get the Kernel sources is https://www.kernel.org/. You could also fetch them via git of course, but for our purposes a release tarball is still quite a convenient way to get the source. Choose one of the latest stable releases and you should be fine. At time of writing this is 6.14.6.

1wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.14.6.tar.xz
2tar xvf linux-6.14.6.tar.xz

Once you have the sources its time to compile. There are a lot of options for a kernel build. We will be running in a virtual QEMU machine later. There’s already a set of prepared config options for such a scenario, the kvm_guest.config.

1cd linux-6.14.6.tar.xz
2make defconfig
3make kvm_guest.config

Now that the configuration is done, we can start the build. Unless you have a beast of a machine it’s probably a good time to fetch a coffee or tea. Then enjoy watching page after page of compilation output fly by.

1make -j$(nproc)

At the end of this process we end up with a fresh, self-compiled Linux Kernel image, the bzImage. The compile output will tell you where this image is located. For me it is in arch/x86/boot/bzImage.

A bzImage is not just any regular executable. It is a wild hodgepodge of raw machine code pulling itself up by its own hair.

The Linux Kernel booting up
Rare snapshot of a Linux Kernel doing its thing in the wild™

The Kernel sets up the environment in which all other executables run. The hoops it has to jump through just to get itself up are wild. Good resources as to what the bzImage exactly contains and why are not easy to find. Nevertheless here are some starters if you’re interested:

As intriguing as it is we must now leave the bzImage behind and continue crafting our little virtual machine. Don’t worry we’ll come back to our Kernel image at the end.

Step 2: Building the user space

A Kernel’s job is to abstract the hardware and manage the environment in which everything else can run. Right now we have a Kernel which we can boot into, with some drivers for our (virtual) hardware.

In fact you could already try booting that Kernel in QEMU

1qemu-system-x86_64 \
2    -m 2G \
3    -smp 2 \
4    -kernel ./linux-6.14.6/arch/x86_64/boot/bzImage \
5    -enable-kvm \
6    -nographic \
7    -pidfile vm.pid \
8    -append "console=ttyS0 earlyprintk=serial net.ifnames=0 nokaslr" \
9    2>&1 | tee ./vm.log

It will probably print a bunch of stuff but ultimately tell you that it is missing a crucial piece: a root filesystem. Without a root filesystem a Kernel is pretty useless. You need some place to put all the other stuff. The filesystem is where the rest of the Operating System and also your user data lives. A Kernel alone is just one link in the chain, albeit an important one.

The structure of a root filesystem is pretty well defined. Most recognizable is the well known folder structure. You’ll likely be familiar with /bin, /boot, /dev, /proc, etc. These are not just random folders. In fact the Kernel expects some of them to be there. You might have heard the phrase in Linux “everything is a file”. If you poke into some of these folders like /dev or /proc you’ll find that the Kernel exposes certain internal functionality and information as easily consumable files there.

Besides the filesystem and folder structure, we will need the mother of all processes, an init program. The init program is pretty much the next link in the chain after the Kernel. The Kernel start and hands over control to the init which in turn initializes the rest of the OS. In modern distributions the init process is often systemd, but there are many others. On a quick side note, many Docker containers do not spawn into a proper init process, which is actually an issue. As the first process in the process tree it has kind of a special place, and should implement a couple of functionality such as installing signal handlers and reaping Zombie process. When a Docker container starts a user program as PID 1 all these things are usually not done properly.

Anyways, to do anything useful we will also want a couple basic of programs. If you have ever used the Unix command line you have probably heard of and used something like the venerable GNU Coreutils. A set of small but powerful utilities to manage the system, read, write and find files, search, replace, spawn processes, execute small scripts, etc. etc.

Instead of the full suite of GNU Coreutils we will be using something specifically created for minimal systems like ours: Toybox. Toybox is basically a BusyBox replacement with a more permissive license. It provides re-implementations of many GNU Coreutils in a single binary and has a convenient Make target to set up a complete root filesystem for us. It is pretty much an All-In-One solution for a minimal system like ours.

To make our life simpler we want a statically linked build of Toybox. This way the binary can run on its own without the need for shared libraries and the complexity that comes with them. Unfortunately, creating a fully static build is not straight forward either, especially if your host system is glibc based. Glibc is well known for making it hard to impossible to link statically. Instead, we aim for a musl based build. The downside of this is that your host system is likely not musl based and so building this on your machine means jumping through some hoops.

Things must have been much harder in the old days. Nowadays, however we have awesome container technology which makes this a breeze. A well known musl-based distribution is Alpine Linux. So let’s create a simple container with a toolchain for our Toybox build.

First, the Dockerfile:

1FROM alpine:latest
2
3RUN apk add gcc musl-dev linux-headers bash \
4            bc bison file findutils make \
5            cpio flex wget elfutils-dev \
6            ncurses-dev libressl-dev

Then build the toybox-builder image:

1docker build --no-cache -t toybox-builder .

And finally start a container with a shared folder from our host mounted inside:

1docker run -it -v ./shared:/build toybox-builder

This should drop you into a shell inside the Alpine container. Now we can start a fully statically linked Toybox build. Head over to http://landley.net/toybox/downloads/ and find the latest release tarball. Then download and extract it in the Alpine build container.

1cd /tmp
2wget http://landley.net/toybox/downloads/toybox-0.8.12.tar.gz
3tar xzf toybox-0.8.12.tar.gz

As mentioned before Toybox makes it incredibly easy to create a minimal root filesystem and compile everything you need in one go. All you have to do is

1cd toybox-0.8.12
2make root

At the end it will tell you where it created the new filesystem, in my case in /tmp/toybox-0.8.12/root/host. There you should find a compressed initramfs.cpio.gz file. This is very similar to a gzip’ed tarball and what we will use for QEMU. Next to there’s also a folder named fs with the uncompressed filesystem in case you want to poke around a bit.

Since we will only need the initramfs.cpio.gz, we copy this file back to the host (remember we mounted a host folder to /build in the container):

1cp /tmp/toybox-0.8.12/root/host/initramfs.cpio.gz /build
2exit

Step 3: Running our TYBX/Linux OS

Now you probably also understand a little better why the GNU project insists on calling GNU-based Linux systems GNU/Linux. The Kernel is an extremely important part, but the Kernel alone is just one piece of the puzzle. The Kernel alone is really not that useful.

We now have a Kernel and a filesystem filled by Toybox with all the essentials. Back on the host you should find the initramfs.cpio.gz in the shared folder. Now it’s finally time to go back to our lovely bzImage and run our minimal TYBX/Linux OS in its full glory on a virtual QEMU machine.

 1qemu-system-x86_64 \
 2    -m 2G \
 3    -smp 2 \
 4    -kernel ./linux-6.14.6/arch/x86_64/boot/bzImage \
 5    -initrd ./shared/initramfs.cpio.gz \
 6    -enable-kvm \
 7    -nographic \
 8    -pidfile vm.pid \
 9    -append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0 nokaslr" \
10    -no-reboot
11    2>&1 | tee ./vm.log

Step 4: Have Fun

Yeah I know, this was not on the menu. Consider it a petit four.

You may be sitting there wondering what next to do with your new toy. Here are some ideas:

Finally, a huge shoutout to https://blog.leonardotamiano.xyz/tech/linux-kernel-qemu-setup/. It’s a slightly different setup but a significant part of this blog is based on it.