naftuli.wtf An urban mystic, pining for conifers in a jungle of concrete and steel.

systemd Sucks, Long Live systemd

systemd seems to be a dividing force in the Linux community. There doesn’t seem to be a middle ground to systemd, polarizing opinions suggest that you must either love it or want to kill it with fire. I aim to provide a middle ground. First, let’s discuss the awful things about systemd.

The Bad and the Ugly

systemd-escape

The fact that systemd-escape exists screams that there’s something horrifyingly wrong. If you haven’t seen or used these commands in the wild, consider yourself blessed.

The use case is running a command like this:

/bin/bash -c 'while true; do \
    /usr/bin/etcdctl set my-container \
        "{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" \
    --ttl 60; \
    sleep 45; \
done'

Now, to be fair, this seems like a bad idea in general, but sometimes you’re writing cloud-init for CoreOS and this is your best option. The newline escapes are mine to make the command more intelligible.

If we were to create an ExecStart command with this as the contents, systemd fails to understand quotation marks, as it’s not running a shell, and the command which works in your shell won’t work in a systemd unit. The straightforward solution would be for systemd to implement something like Python’s shlex or Ruby’s Shellwords, but instead, a bandaid was forged in the bowels of the underworld, systemd-escape:

$ man systemd-escape | head
SYSTEMD-ESCAPE(1)                                            systemd-escape                                            SYSTEMD-ESCAPE(1)

NAME
       systemd-escape - Escape strings for usage in system unit names

SYNOPSIS
       systemd-escape [OPTIONS...] [STRING...]

DESCRIPTION
       systemd-escape may be used to escape strings for inclusion in systemd unit names.

Let’s convert the script above to be acceptable to SystemD:

$ systemd-escape 'while true;do /usr/bin/etcdctl set my-container "{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" --ttl 60;sleep 45;done'
while\x20true\x3bdo\x20-usr-bin-etcdctl\x20set\x20my\x2dcontainer\x20\x22\x7b\x5c\x22host\x5c\x22:\x20\x5c\x221\x5c\x22\x2c\x20\x5c\x22port\x5c\x22:\x20\x24\x28-usr-bin-docker\x20port\x20my\x2dcontainer\x205000\x20\x7c\x20cut\x20\x2dd\x22:\x22\x20\x2df2\x29\x7d\x22\x20\x2d\x2dttl\x2060\x3bsleep\x2045\x3bdone

OH GOD WHY

Now agreed, if your workflow demands that you embed a Bash while loop in a unit, you’re already in a bad place, but there are times where this is required for templating purposes.

Binary Logs

If you weren’t aware, journald stores its logs in binary format. This breaks the typical tools we are accustomed to using for monitoring a system. tail, cat, less, and grep aren’t useful any more. With binary logging formats, the capability for log corruption also becomes real. If a plaintext log accidentally gets binary content in it, most editors like vim and less will handle it gracefully. If a binary log gets binary data in the wrong place, your logs are toast.

The justification for storing logs in a binary format was speed and performance, they are more easily indexed and faster to search. However, it was definitely a difficult choice to make with obvious consequences to end users on either side of the debate. If fast logs/logging are desired, that can be accomplished, but users need to learn the new journalctl command and can’t use the tools they’re familiar with.

I don’t see binary logs as a bad thing, but it was yet another hurdle to systemd adoption. I’ll review logging later on in the post and defend my position on why I think that journald was a good idea.

The Good

Now, let us turn our attention to the benefits that systemd brings us. I believe that these are the reasons that all Linux distributions have adopted systemd.

Sanity

Let’s just start by comparing a SysV init script for ZooKeeper, which is 169 lines of fragile shell script, as indicated by comments throughout their source code:

# for some reason these two options are necessary on jdk6 on Ubuntu
#   accord to the docs they are not necessary, but otw jconsole cannot
#   do a local attach
...

Let’s realize the above as a systemd unit:

[Unit]
Description=ZooKeeper

[Service]
Type=simple
Restart=always
RestartSec=5
EnvironmentFile=/etc/sysconfig/zookeeper
ExecStart=/usr/bin/java -cp ${ZK_CLASSPATH} ${JVM_FLAGS} org.apache.zookeeper.server.quorum.QuorumPeerMain ${ZOO_CFG_FILE}

[Install]
WantedBy=multi-user.target

I wrote that in less than ten minutes. Admittedly, it requires an environment file which defines the following variables:

ZK_CLASSPATH=/opt/zookeeper:/opt/zookeeper/lib:/etc/zookeeper
JVM_FLAGS=-Xmx=2g
ZOO_CFG_FILE=/etc/zookeeper/zoo.cfg

But… that’s it. It’s done.

If this process just logs to standard output and standard error, its logs will be recorded by the journal, and can be followed, indexed, searched, and exported using syslog-ng or rsyslog. I’ll review logging below.

Process Supervision

Back in the day, we used something like supervisord to make sure our processes stayed running. This was because before systemd, if you didn’t write it, it didn’t happen. Don’t think that the init scripts running your system services would actually monitor the processes that they started, because that didn’t happen. Services could segfault and stay stopped until manual user intervention was made.

Enter systemd:

[Service]
...
Restart=always
RestartSec=1

This tells systemd that if this process crashes, wait one second and always restart it. If you stop the service, it will stay off until you have started again, just as you’d expect. Additionally, systemd will log when and why the process crashed, so finding issues later on is straightforward and trivial.

Process Scheduling

Back in the dark days of Sys V init scripts, what were our options for starting a service after another service? Further, what were our options for starting service A after service B but before service C? The best option was this:

while true ; do
  if pgrep serviceB ; then
    start_service
    break
  else
    sleep 1
  fi
done

For starting service A before service C, we’d need to amend service C’s init script and add a similar while loop to detect and wait for service A. Needless to say, this is a disaster.

Enter systemd:

[Unit]
Description=Service A
After=serviceB.service
Before=serviceC.service

And that’s all. There is nothing left to do. systemd will create a service dependency graph and will start the services in the correct order, and you’ll have a guarantee that serviceA will start after serviceB but before serviceC.

What’s even better is unit drop-ins, which I’ll cover shortly. In a nutshell, it means that it’s easy to drop in additional unit files to a unit without rewriting the source unit file.

Bonus Points: Conditional Units

systemd also makes it easy to conditionally start units:

[Unit]
Description=Service A
ConditionPathExists=/etc/sysconfig/serviceA

This will make Service A only start if the /etc/sysconfig/serviceA file is present. There are many different conditionals available, and all of them can be inverted.

Bonus Points: Parallelism

Since systemd knows the dependency ordering of all of its units, starting up a Linux machine using systemd is much faster than on older init systems. This is because systemd is parallel and will start non-dependent services in parallel.

Unit Overloading

As discussed above, systemd makes it trivial to drop in additional configuration for a given unit to extend it. Let’s say that we need to only start rsyslog after cloud-final has run. cloud-final is the final stage of cloud-init running.

The source file for the rsyslog.service unit lives at /usr/lib/systemd/system/rsyslog.service, but we won’t be editing that file. We will create a systemd drop-in unit at /etc/systemd/system/rsyslog.service.d/after-cloudinit.conf:

[Unit]
After=cloud-final.service

The final name of the file isn’t entirely relevant, so long as it ends in .conf. Whatever is defined in this file will be appended into the default unit file. This small drop-in will make sure that rsyslog does not start until cloud-final.service has started/finished.

EDIT: It was pointed out to me on Twitter that systemd loads these files in alphabetical order. In order to maintain sanity amid chaos, it would probably be a good idea to name these with numerical prefixes so that load order is intelligible, ie %02d-%s.conf.

Overwriting Units

What if the underlying unit needs to have certain bits entirely removed from the unit? Removing them is simple:

[Service]
ExecStartPre=

What we have done here in our overloading unit is to remove all ExecStartPre blocks from the upstream unit. If we add another ExecStartPre line underneath the empty one, we can provide our own pre-start scripts completely different than those provided upstream.

Logging

Logging with systemd is incredibly straightforward and sports all the bells and whistles one would want. If a process simply logs to standard output or standard error, by default its logs will go into the journal. Looking up those logs is then trivial:

$ sudo journalctl -u rsyslog.service

This will launch a less-like browser to scan through the log history of rsyslog.service. Following a unit is also easy:

$ sudo journalctl -u rsyslog.service -f

This is basically the equivalent of tailing a log file.

Logs are rotated automatically by the journal and this can be configured, so no more logrotate nonsense, the journal just handles it.

Plugging in rsyslog or syslog-ng into the journal is simple, and this means that none of your applications need to speak syslog, their standard output will go into the journal and will be imported and sent according to your syslog configuration.

Go Forth and Learn

We’ve covered a lot of ground here. I have personally bookmarked the following pieces of documentation for systemd to help me write units:

I haven’t even covered glorious systemd mount points, timers, or many of the security related options that systemd affords. I have also not covered the userspace tools systemctl and journalctl, which are documented here:

I was definitely in the “systemd sucks” camp for a long time, until I started investigating what systemd actually made possible. I now see systemd as a necessary part of my system-level infrastructure and it has become increasingly difficult to do without it on older distributions.