Computers and the Internet are not magical… and it’s really not that hard to explain

I’ve been working in the tech industry, close to software development, for about 20 years; more recently at the infrastructure and operations level. And throughout time, like many of my fellow colleagues, I’ve met a number of people who looked at me as “the computer guy” and decided to ask: How does it actually work? When we say “downloading a video“, “re-tweeting a joke“, or “sharing a picture on Instagram“, what is going on in there? What exactly is coming in and out of those “Internet pipes“? This curiosity usually sparks after they feel shamed by one of Dara O’Brian’s comedy stand ups.

To the common user, computers are literally magic boxes with buttons and a color display. It’s a wonderful machine you can use to play games, share moments, buy stuff, talk to friends, meet new people… But when a tech savvy person tries to explain how anything works under the hood, things go south really fast. Attempts to explain vary from how binary numbers work to all sorts of analogies, down to trucks and a series of tubes. Most of the time, an in-depth explanation quickly devolves into a Rockwell Retro Encabulator presentation.

No wonder so many movies depict people destroying computers by trashing the monitor. People just give up, and social life keeps going.

The analogy I tend to use is somewhat simpler. It tries to explain how computers and the Internet work in a way that’s simple enough, but still captures the fundamental concept of what is going on under the hood. This analogy makes no attempt to dive into complex operations performed inside the CPU, or how bits of raw data move between cache registers, main memory, permanent storage or network interfaces. Forget the Retro Encabulator for a moment and focus on the actual purpose. Remember what it all means. What it’s actually used for.

Buying a computer is like buying a deck of cards

When you go to your favorite store and order a brand new device — a smartphone or a laptop you can work on — no doubt, you are actually buying what can be considered the apex of human technological evolution. A delicate, complex machine that uses electricity to pump billions of operations in one second.

But, behind that shining exterior and the fancy tech jargons you see on the box, those tiny electronic components inside hide a much simpler concept; one that most people take for granted.

Jeu de cartes sur ardoise... bannière

So… now at home, you have your brand new deck of cards. What can you do with it, aside from the obvious choice of playing games? You are free to do whatever you want. The deck is a physical object. It is truly yours. Paint over the cards, build a castle, count the cards, arrange them on the table to form an beautiful pattern. The choice is yours.

Imagine you managed to arrange all of the cards on the table to create some pattern that you like. It’s a work of art!

Deck of playing cards on blue background. Gambling concept

Let’s say that you really, really like what you see and immediately want to share it with your best friend, except he lives hundreds of miles away. The fastest move would be to pick up the phone, call him and pass on instructions so he can recreate the same pattern using his own deck of cards. When you are done passing instructions, your friend will have a replica.

Now… Here are some important things you need to take notice right now. Your friend does not have the original work you created. Not a single card on his table is yours. He does not have a picture of your work, not a fake mockup, not a carbon copy, not even a description. It truly is an exact replica. Most importantly, you should realize that your own creation never left your house. You still have all of your cards with you, lined up the same way as before. No image or any physiscal object was literally “transmitted over the wire” or some “magical tube“. The only thing that went over phone was your voice; instructions on how to recreate your work. Since you and your friend speak the same language, he can understand the instructions and create a perfect replica from miles away — no talent required!

Wait, that’s it?

Yes. It really is. Trust me! My Computer Science degree tells me that. Think of the “phone conversation” you had with your friend as your “Internet connection” and “decks of cards” as “computers”. Each person has its own machine, and is free to rearrange their own contents anyway they want.

If you ever wondered what digital information actually means, that concept is perfectly layed out in your deck of cards. Information we care about is represented using smaller pieces that can be managed individually. Fundamentally, every piece is identical in nature, and the only thing that matters to make sense of them is the order and the purpose for which they are presented.

Digital information never arrives anywhere because it never leaves. The Internet really is a web of electronic signals, wired or wireless. Those signals are instructions to recreate exact replicas of the information from one computer inside another.

If you ever want to have a sensible discussion about anything in the digital era we live in, that basic understanding needs to be clear. It becomes relevant when you consider that every computer on Earth is built the same way, and works the same way. For example, if we want to write better laws for copyright infringement, patents, net neutrality, social privacy, cyber crimes, crypto currency… Remember the deck of cards, because that’s the underlying scenario on top of which all of them actually happen.

Everything else you hear from experts are just fancy words for complex ways they found to manufacture machines that operate on that same simple idea.

There is a way to create valid Let’s Encrypt certificates for local development… sometimes

Caveats and a quick Docker example included 👍

A common developer trick that’s become popular is to work with valid domain names handled by magic DNS resolvers, like nip.io, xip.io, or sslip.io. If you include any IP address as part of their subdomain, their DNS servers will resolve to that IP authoritatively.

For example, you can use the domain 127.0.0.1.sslip.io anywhere and it will resolve globally to 127.0.0.1. The IP address itself is not available anywhere except your local machine, but the domain is considered a valid, resolvable name worldwide. This gives developers a chance to work locally without resorting to localhost, 127.0.0.1, or tricky hacks, like editing the /etc/hosts file.

These magic DNS servers might be able to help because they can also be used with public IP addresses. That means, they can be used to assign a valid domain to a public IP in situations where those addresses would not normally have one.

Now, let’s go off topic for a moment. Take notice on your Internet connection. As you may know, most broadband providers are rolling out IPv6 to their customers. This is becoming a common practice as part of the IPv6 worldwide adoption. If that’s the case for you, it’s very likely that all of the most modern devices connected to your home Wi-Fi already have a public IPv6 address. Smartphones, laptops, tablets, desktop computers… Yes! In some creepy way, they are ALL available online, reachable from anywhere in the world where IPv6 is routable.

So, go ahead and check out the IP addresses you have in your network interface.

If you see some inet6 address with global scope, that means you are hot, online, right now!

If you’ve already put 2 and 2 together, you’ve realized you have the pieces you need to create a Let’s Encrypt certificate:

Public IP address + Valid domain name that resolves to that address

Here’s an example:

$ nslookup 2800-3f0-4001-810-0-0-0-200e.sslip.io
Server:         208.67.220.220
Address:        208.67.220.220#53

Non-authoritative answer:
Name:   2800-3f0-4001-810-0-0-0-200e.sslip.io
Address: 2800:3f0:4001:810::200e

Note that the IPv6 address needs to be cleaned up in order to be used with sslip.io. Following IPv6 rules, expand :: into its respective zeroes. Then, replace every colon (:) with a dash (-). Repeated zeroes can still be compacted into a single digit.

Now you can use certbot to emit a valid certificate. If you don’t have certbot installed, you can easily run it using the official Docker image.

docker run --rm -it --name certbot --network host \
  -v $(pwd)/data:/etc/letsencrypt \
  certbot/certbot \
  --test-cert --dry-run \
  --standalone \
  certonly --agree-tos \
  -m "myemail@my.domain.com" \
  -d <FORMATTED-IPv6>.sslip.io

The above will run certbot in dry-run mode using Let’s Encrypt staging endpoint. It won’t create a certificate, but it will test the waters and tell you if it’s possible.

To go ahead and create a real certificate, remove the --test-cert and --dry-run parameters.

The local volume directory ./data will contain everything generated inside /etc/letsencrypt, including certbot configuration files and the certificate. To renew it, change certonly to renew, keeping the same ./data directory.

This is pretty much the same task that would be automated somewhere in a real server with public IP addresses, except you can do this manually using a valid domain that points to your local machine.

Installing the certificate is still up to you. That depends on the software, platform and frameworks you are using.

But now, you can test even more scenarios with a valid SSL certificate 😉

Some caveats to remember

As usual, for Let’s Encrypt to work, you still need a public IP address, either v4 or v6. This hint about using IPv6 comes from the fact that its adoption has been increasing worldwide. So, chances are that you already have one by now.

This same trick could also come in handy for your home brew projects, like Raspberry Pi boxes. But in any case, you might be out of luck if your broadband provider hasn’t rolled out IPv6 in your area yet.

By using your IPv6 address, you must also make sure that the software you are running — or developing on — supports IPv6. Aside from its clients resolving names on any public DNS server, the server software itself must also be able to bind to an IPv6 address in order for you to use the <IPv6>.sslip.io domain. Otherwise, the certificate will be valid, but useless.

You might also be out of luck if your local machine is behind a corporate network that does not support or provide public IPv6 addresses. Corporate policies will likely not allow you to have one to begin with.

Either way, if you happen to be on the IPv6 network, you might also want to check your local firewall rules to make sure they allow incoming connections to port 80, at least temporarily.

Let’s get something straight for beginners: A Container is NOT a Virtual Machine

UPDATE 2020/Jan/29: As pointed out by some of the feedback, the term Virtual Machine in this article refers specifically to full x86/x64 virtualization, as described in the current Wikipedia article. It relates to the use of hypervisor and similar technologies to emulate an entire physical machine in software. Please, be aware of this reference while reading, so it is not confused with other types of virtual machines, such as JVM, .NET or interpreted programming language environments.

I’ve been working with Docker, containers, and Kubernetes for over three years now. And, from my perspective, I managed to catch up with this new trend just before it picked up full steam in most developer forums. I’ll admit, it took me more than a few months to understand what a container actually is and how it works.

If you’ve been working at software operations and infrastructure for quite sometime, and by any chance you are just now beginning to catch up, do not be scared. You have A LOT to take in. It does take effort and getting used to it. I remember the feeling, the confusion, the will to give up and go back to provisioning stuff the old fashioned way. I distinctly remember the will to find a nice blog post describing things in a simple way, without making so many assumptions. By now, I’m pretty sure some folks at /r/docker are getting used to watching thread, after thread, after thread of people rambling about their frustration. They need to migrate a full stack to containers and nothing seems to make sense.

So, I decided to write up this simple, quick introduction to welcome beginners into a new era. I will try to uncover some of the magic behind containers, so you don’t feel so lost in the dark. It serves as an introduction to containers before you are introduced to Docker. Something that I feel is missing from most tutorials and guides.

Hopefully, it’ll help you deal with the frustration, clear some of the most basic concepts and pave the way for a better experience with Docker, Kubernetes and everything else.

First things first

If you work with Linux, the basic idea is not really hard to grasp. I wish I had someone tell me this from the beginning:

To understand Linux Containers, you should first understand what makes a Linux Distribution.

Said me to myself when I finally got it

Ubuntu, CentOS, Arch, Alpine, FreeBSD, OpenBSD… We each have our favorite. But, whatever flavor you love, they all have one important thing in common: a Linux Kernel. Making a new Linux Distribution almost never means writing your own kernel from scratch. There already exists a very good one driven by a strong community. For the most part, you just take it, compile it, and bundle it with other stuff to create your distribution.

Inside every common Linux Distro, you will find basically the same types of components grouped into directories in the filesystem:

  • /boot – The kernel, along with whatever it needs to be bootstrapped.
  • /bin – Basic program binaries like cp, ls, cat, grep, echo…
  • /sbin – Program binaries reserved for the root.
  • /etc – System wide configuration files.
  • /lib – System wide libraries.
  • /usr – User installed software, their binaries and libraries.
  • /opt – Proprietary software that won’t follow the above directory structure.
  • /home – User files

Of course, there’s more to that structure, variations and more directories. But that is the basic overview. The cherry on top is a Package Manager so that users can install and manage additional software: dpkg, apt, yum, synaptic, pacman, zypper, rpm… One is enough, so take your pick.

Bundle all that into an ISO image that boots as an installer program, and voilà! You’ll have yourself a working Linux Distribution.

Remember the basics, how programs work

When you run a program, a copy of it goes to RAM and becomes a process managed by the kernel. From there, it expects all of its dependencies to be in place and readily accessible. Among various things, it will usually:

  • Load configuration files from /etc
  • Load libraries from directories like /lib or /usr/lib
  • Write data to /var/some/directory

As long as everything is in place, exactly as expected, a process will run happily ever after.

So, what’s the problem?

Dedicated servers typically run a small number of dedicated processes. For example, in order to host a WordPress blog, a single Linux host can be easily configured with a LAMP stack: MySQL, Apache, and PHP packages installed.

But… what if you need to host more than one WordPress installation? What if each one is required to have their own MySQL instance? Let’s keep going… What if you need to deploy older stacks that require different PHP versions? Different modules? Conflicting libraries? Binaries compiled with different flags and modules?

We are used to solving this problem very bluntly: increase the cost and pay for more resources. The standard response to complex requirements has been the same for so many decades:

We just can’t run everything in one host. Either give me more hosts or create more Virtual Machines. We need to keep things ISOLATED!

Isolation! That’s the key word here

From very early on, the Linux community had been trying to find ways to isolate running processes to avoid dependency conflicts and improve security. Solutions like chroot and jails were notable foreshadows of what came to be known as Linux Containers (LXC). Those were early attempts at process isolation. While chroot and jails were popular and relatively easy, they lacked advanced features. The complexity of LXC, on the other hand, made it difficult for wide adoption.

Up until now, the traditional way of isolating services with security and quality guarantees has meant mostly only one thing: running different services in different hosts. Each with its own Linux installation and dedicated kernel.

The kernel has evolved, and most people never even noticed

For quite some time now, the Linux Kernel has been growing with new exciting features. Today, several different ways to isolate processes have been baked into the kernel itself and are quite ready for production — control groups, namespaces, virtual network interfaces… all kinds of interesting features are there. LXC was a first real attempt to harness those features, but it failed to keep things simple.

Putting it the simplest way possible:

Creating a container means running a Linux process, much like any other, except with very strong isolation, the likes of which no one had ever seen before.

In practice, it means:

  • Create a directory dedicated to your application.
  • Place the application binary, along with everything it needs, inside that directory: dependent libraries, configuration files, data directories…
  • Spawn the application process asking the kernel to isolate EVERYTHING, giving it restrictions like:
    • Its own user space, including access to another less privileged root user, and no visibility or UID/GID conflicts with other users already created outside the container.
    • Its own filesystem structure, with the most important parts (like /etc/hosts) as read-only, even for the container’s root user.
    • Its own process space, with no visibility to any other process and PIDs running in the same kernel.
    • Its own network interface where it can have its own IP and not worry about conflicting ports.
    • Limits to how much time it can spend consuming CPU cycles.
    • Limits to how much memory it can use.

Think of it as chroot or jails on steroids.

You can literally cram an entire different Linux Distribution inside the container directory. A process running inside the container isolation shares the same kernel as other processes, but it can easily think that it’s running completely alone and that it’s part of an entirely different operating system.

If it walks like Alpine, and it quacks like Alpine… Well, I guess I REALLY AM running in the Alpine OS!

Says the process running in a container sharing the kernel bootstrapped by an Ubuntu host.

Containers start much faster than Virtual Machines because they are not bootstrapping a different kernel into a new memory space, along with every other process a full operating system needs. They are simply spawning a new process in the same kernel. The isolation is what makes that process special.

And now, you are ready for Docker

Docker came along as a bold rewrite and recycling of LXC. It completely re-imagined how to create, manage and distribute containers. It made things much, much, much easier, especially at large scale.

Instead of manually creating all of that structure, you simply need dockerd installed and running as a normal system daemon. Container can be created by writing intuitive Dockerfiles, compressed into tarballs and easily ported to other hosts. Under the hood, Docker makes use of OverlayFS to share and merge multiple directory layers into a single view inside the container. It’s a powerful tool that makes Docker containers so versatile.

I’m sure you will find many people listing all the advantages (and disadvantages) of running containerized applications. But the most important, IMHO, is automation. Infrastructure provisioning related to each application becomes code. Code that you write, that can be committed to VCS, traced, shared and integrated with other tools into a pipeline. It’s a bold new way of thinking about infrastructure. It changes a lot. But at large scale, full automation from development, to testing, to production becomes a concrete reality.

Don’t worry! There’s a lot more to learn. But, hopefully, this introduction has given you a foothold so you can dive into the official docs with more confidence: https://docs.docker.com

God speed, and best of luck!