systemd Sucks, Long Live systemd
January 12, 2017systemd seems to be a dividing force in the Linux community. There doesn’t seem to be a middle ground to systemd, polarizing opinions suggest that you must either love it or want to kill it with fire. I aim to provide a middle ground. First, let’s discuss the awful things about systemd.
The Bad and the Ugly
systemd-escape
The fact that systemd-escape
exists screams that there’s something horrifyingly wrong. If you
haven’t seen or used these commands in the wild, consider yourself blessed.
The use case is running a command like this:
/bin/bash -c 'while true; do \
/usr/bin/etcdctl set my-container \
"{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" \
--ttl 60; \
sleep 45; \
done'
Now, to be fair, this seems like a bad idea in general, but sometimes you’re writing cloud-init for CoreOS and this is your best option. The newline escapes are mine to make the command more intelligible.
If we were to create an ExecStart
command with this as the contents, systemd fails to understand quotation marks, as
it’s not running a shell, and the command which works in your shell won’t work in a systemd unit. The straightforward
solution would be for systemd to implement something like Python’s shlex
or Ruby’s Shellwords
, but instead, a
bandaid was forged in the bowels of the underworld, systemd-escape
:
$ man systemd-escape | head
SYSTEMD-ESCAPE(1) systemd-escape SYSTEMD-ESCAPE(1)
NAME
systemd-escape - Escape strings for usage in system unit names
SYNOPSIS
systemd-escape [OPTIONS...] [STRING...]
DESCRIPTION
systemd-escape may be used to escape strings for inclusion in systemd unit names.
Let’s convert the script above to be acceptable to SystemD:
$ systemd-escape 'while true;do /usr/bin/etcdctl set my-container "{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" --ttl 60;sleep 45;done'
while\x20true\x3bdo\x20-usr-bin-etcdctl\x20set\x20my\x2dcontainer\x20\x22\x7b\x5c\x22host\x5c\x22:\x20\x5c\x221\x5c\x22\x2c\x20\x5c\x22port\x5c\x22:\x20\x24\x28-usr-bin-docker\x20port\x20my\x2dcontainer\x205000\x20\x7c\x20cut\x20\x2dd\x22:\x22\x20\x2df2\x29\x7d\x22\x20\x2d\x2dttl\x2060\x3bsleep\x2045\x3bdone
Now agreed, if your workflow demands that you embed a Bash while loop in a unit, you’re already in a bad place, but there are times where this is required for templating purposes.
Binary Logs
If you weren’t aware, journald stores its logs in binary format. This breaks the typical tools we are
accustomed to using for monitoring a system. tail
, cat
, less
, and grep
aren’t useful any more. With binary
logging formats, the capability for log corruption also becomes real. If a plaintext log accidentally gets binary
content in it, most editors like vim
and less
will handle it gracefully. If a binary log gets binary data in the
wrong place, your logs are toast.
The justification for storing logs in a binary format was speed and performance, they are more easily indexed and
faster to search. However, it was definitely a difficult choice to make with obvious consequences to end users on
either side of the debate. If fast logs/logging are desired, that can be accomplished, but users need to learn the new
journalctl
command and can’t use the tools they’re familiar with.
I don’t see binary logs as a bad thing, but it was yet another hurdle to systemd adoption. I’ll review logging later on
in the post and defend my position on why I think that journald
was a good idea.
The Good
Now, let us turn our attention to the benefits that systemd brings us. I believe that these are the reasons that all Linux distributions have adopted systemd.
Sanity
Let’s just start by comparing a SysV init script for ZooKeeper, which is 169 lines of fragile shell script, as indicated by comments throughout their source code:
# for some reason these two options are necessary on jdk6 on Ubuntu
# accord to the docs they are not necessary, but otw jconsole cannot
# do a local attach
...
Let’s realize the above as a systemd unit:
[Unit]
Description=ZooKeeper
[Service]
Type=simple
Restart=always
RestartSec=5
EnvironmentFile=/etc/sysconfig/zookeeper
ExecStart=/usr/bin/java -cp ${ZK_CLASSPATH} ${JVM_FLAGS} org.apache.zookeeper.server.quorum.QuorumPeerMain ${ZOO_CFG_FILE}
[Install]
WantedBy=multi-user.target
I wrote that in less than ten minutes. Admittedly, it requires an environment file which defines the following variables:
ZK_CLASSPATH=/opt/zookeeper:/opt/zookeeper/lib:/etc/zookeeper
JVM_FLAGS=-Xmx=2g
ZOO_CFG_FILE=/etc/zookeeper/zoo.cfg
But… that’s it. It’s done.
If this process just logs to standard output and standard error, its logs will be recorded by the journal, and can be
followed, indexed, searched, and exported using syslog-ng
or rsyslog
. I’ll review logging below.
Process Supervision
Back in the day, we used something like supervisord
to make sure our processes stayed running. This
was because before systemd, if you didn’t write it, it didn’t happen. Don’t think that the init scripts running your
system services would actually monitor the processes that they started, because that didn’t happen. Services could
segfault and stay stopped until manual user intervention was made.
Enter systemd:
[Service]
...
Restart=always
RestartSec=1
This tells systemd that if this process crashes, wait one second and always restart it. If you stop
the service, it
will stay off until you have started again, just as you’d expect. Additionally, systemd will log when and why the
process crashed, so finding issues later on is straightforward and trivial.
Process Scheduling
Back in the dark days of Sys V init scripts, what were our options for starting a service after another service? Further, what were our options for starting service A after service B but before service C? The best option was this:
while true ; do
if pgrep serviceB ; then
start_service
break
else
sleep 1
fi
done
For starting service A before service C, we’d need to amend service C’s init script and add a similar while loop to detect and wait for service A. Needless to say, this is a disaster.
Enter systemd:
[Unit]
Description=Service A
After=serviceB.service
Before=serviceC.service
And that’s all. There is nothing left to do. systemd will create a service dependency graph and will start the
services in the correct order, and you’ll have a guarantee that serviceA
will start after serviceB
but before
serviceC
.
What’s even better is unit drop-ins, which I’ll cover shortly. In a nutshell, it means that it’s easy to drop in additional unit files to a unit without rewriting the source unit file.
Bonus Points: Conditional Units
systemd also makes it easy to conditionally start units:
[Unit]
Description=Service A
ConditionPathExists=/etc/sysconfig/serviceA
This will make Service A only start if the /etc/sysconfig/serviceA
file is present. There are many different
conditionals available, and all of them can be inverted.
Bonus Points: Parallelism
Since systemd knows the dependency ordering of all of its units, starting up a Linux machine using systemd is much faster than on older init systems. This is because systemd is parallel and will start non-dependent services in parallel.
Unit Overloading
As discussed above, systemd makes it trivial to drop in additional configuration for a given unit to extend it. Let’s
say that we need to only start rsyslog
after cloud-final
has run. cloud-final
is the final stage of cloud-init
running.
The source file for the rsyslog.service
unit lives at /usr/lib/systemd/system/rsyslog.service
, but we won’t be
editing that file. We will create a systemd drop-in unit at
/etc/systemd/system/rsyslog.service.d/after-cloudinit.conf
:
[Unit]
After=cloud-final.service
The final name of the file isn’t entirely relevant, so long as it ends in .conf
. Whatever is defined in this file will
be appended into the default unit file. This small drop-in will make sure that rsyslog
does not start until
cloud-final.service
has started/finished.
EDIT: It was pointed out to me on Twitter that systemd loads these files in alphabetical
order. In order to maintain sanity amid chaos, it would probably be a good idea to name these with numerical prefixes
so that load order is intelligible, ie %02d-%s.conf
.
Overwriting Units
What if the underlying unit needs to have certain bits entirely removed from the unit? Removing them is simple:
[Service]
ExecStartPre=
What we have done here in our overloading unit is to remove all ExecStartPre
blocks from the upstream unit. If we add
another ExecStartPre
line underneath the empty one, we can provide our own pre-start scripts completely different than
those provided upstream.
Logging
Logging with systemd is incredibly straightforward and sports all the bells and whistles one would want. If a process simply logs to standard output or standard error, by default its logs will go into the journal. Looking up those logs is then trivial:
$ sudo journalctl -u rsyslog.service
This will launch a less
-like browser to scan through the log history of rsyslog.service
. Following a unit is also
easy:
$ sudo journalctl -u rsyslog.service -f
This is basically the equivalent of tailing a log file.
Logs are rotated automatically by the journal and this can be configured, so no more logrotate
nonsense, the journal
just handles it.
Plugging in rsyslog
or syslog-ng
into the journal is simple, and this means that none of your applications need to
speak syslog, their standard output will go into the journal and will be imported and sent according to your syslog
configuration.
Go Forth and Learn
We’ve covered a lot of ground here. I have personally bookmarked the following pieces of documentation for systemd to help me write units:
I haven’t even covered glorious systemd mount points, timers, or many of the
security related options that systemd affords. I have also not covered the userspace tools systemctl
and journalctl
,
which are documented here:
I was definitely in the “systemd sucks” camp for a long time, until I started investigating what systemd actually made possible. I now see systemd as a necessary part of my system-level infrastructure and it has become increasingly difficult to do without it on older distributions.