Containers Made Easy part 1

When you type containers or docker into any search engine you will get a lot of pages about using docker and comparing it with virtual machines. Writing another post just about it would be waste of mine and more important your time. So I would like to do it in a bit different (and probably a lot longer) way. General idea is to write series of posts where I’ll show you how to write your own container engine that is capable of running docker images. Of course it won’t be as feature rich and bullet proof as original, but will work and will be great for learning both containers and Golang. Before we jump to coding part lets refresh some theory.

Docker, what it is?

Docker, we hear that term all over the internet. Everybody seems to use it or starting to use it but what it really is? In fact it’s a tool that vastly simplify usage of two Linux kernel features: namespaces and cgroups. I’ll later describe them in more detail. In short it allows you to run processes (programs) in isolation from other processes. At beginning as it rely on Linux kernel features it was running only on Linux. If you had it on Windows or OSX then in fact you have had a small Linux VM running in background. Now there is initiative from Microsoft to have native support for containers in Windows so it’s currently possible to have native non virtual containers on Windows. So if it’s using only Linux kernel features then are there any alternatives? Yes for example rkt from CoreOS. Whole concept is also nothing new. Similar solution was available as Solaris Zones, BSD Jails or LXC containers many years before Docker (at beginning Docker was even using LXC to start it’s containers). This is completely different than traditional virtual machine as there isolation is on whole OS level and not on single process level. But to the point: Docker is a tool that allows you to run programs in isolation with ease. It also simplify distribution of those programs by allowing to pack them into immutable images.

Image vs Container

Before Docker can start your application it needs to be packed into something called an image. This image contains your application and all of its dependency. This allows easy migration from host to host as image contains all things needed to run application. Even more, all libraries are in exact needed version so no more lib differences among servers. This approach allows building app on dev machine or dedicated build server, then pack it into image and distribute to servers. We will be sure that app behave same on developer machine, test and production as it contains all its dependency in the image. Images are also immutable. This means that if you want to change it you need to build new version of the image. Docker images can be also tagged in a way similar to git tag. This allows to keep track of image history. Containers are just running instances of images. You can run multiple containers from same image simultaneously. Every of those running containers will be independent and by default unaware of other running instances. As images are immutable every new container starts in exact same state. Containers are also ephemeral, this means that every change done during run time will be gone after container is destroyed. It should be considered as a clear separation of application from data that it is processing. All application data should be stored outside of the container. At some point all major stakeholders in container world came together and formed Open Container Initiative. After some time they produced official specification of image format, you can read about it here. As this is quite new not all container runtimes implements it. In this series I’ll focus on Docker way of handling images, it’s so similar that once you get the idea it will all be super clear.

How it works internally?

Namespaces are as the name imply virtual kernel name spaces. There are few namespaces available like NET, PID or mount. When for example new PID namespace is created every new process started in that namespace will “see” only processes also started in same namespace. If we will create new PID namespace for every started container then every container will see only processes started by itself. So there is no virtualization as every container still see same OS kernel and hardware. There is only isolation from other processes running on same system. Other namespaces work exact same way just isolating different areas of operating system: network, file system mounts, inter process communication, etc. Cgroups are responsible for resource usage control and limitation. By default every container share same cgroup as host OS. This means that there is no resource limitation applied. But if we want to limit for example CPU usage then we can instruct docker to create new control group for that when starting container. There are multiple cgroups available to control different types of resources like CPU, memory or IO.

What we will actually create?

First of all we will create Golang package that is capable of handling Docker image format. This part will cover downloading, unpacking and mounting image using overlay file system. We will do it from scratch to better understand how it really works. After that we will create another Golang package capable of executing processes from mounted image within Linux namespace. In next step we will wrap this into simple cli app capable of running single container. Once we will have this simple app running we will extend it so it can run containers in background, collect output from them and add network namespace support. At the end I’ll sum up whole process and write how each part relates to docker itself. What you will need to prepare:

working go 1.8+ build environment and knowledge how to build go app (reading https://golang.org/doc/install and https://tour.golang.org will be enough)
Linux machine with not too old kernel. 4.8+ preferred, older kernels might have problems with overlay file system mounts. Can be some VM, I write and test this on Arch.
willingness to learn a lot of new stuff