An urban mystic, pining for conifers in a jungle of concrete and steel.

systemd Sucks, Long Live systemd

systemd seems to be a dividing force in the Linux community. There doesn’t seem to be a middle ground to systemd, polarizing opinions suggest that you must either love it or want to kill it with fire. I aim to provide a middle ground. First, let’s discuss the awful things about systemd.

The Bad and the Ugly


The fact that systemd-escape exists screams that there’s something horrifyingly wrong. If you haven’t seen or used these commands in the wild, consider yourself blessed.

The use case is running a command like this:

/bin/bash -c 'while true; do \
    /usr/bin/etcdctl set my-container \
        "{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" \
    --ttl 60; \
    sleep 45; \

Now, to be fair, this seems like a bad idea in general, but sometimes you’re writing cloud-init for CoreOS and this is your best option. The newline escapes are mine to make the command more intelligible.

If we were to create an ExecStart command with this as the contents, systemd fails to understand quotation marks, as it’s not running a shell, and the command which works in your shell won’t work in a systemd unit. The straightforward solution would be for systemd to implement something like Python’s shlex or Ruby’s Shellwords, but instead, a bandaid was forged in the bowels of the underworld, systemd-escape:

$ man systemd-escape | head
SYSTEMD-ESCAPE(1)                                            systemd-escape                                            SYSTEMD-ESCAPE(1)

       systemd-escape - Escape strings for usage in system unit names

       systemd-escape [OPTIONS...] [STRING...]

       systemd-escape may be used to escape strings for inclusion in systemd unit names.

Let’s convert the script above to be acceptable to SystemD:

$ systemd-escape 'while true;do /usr/bin/etcdctl set my-container "{\"host\": \"1\", \"port\": $(/usr/bin/docker port my-container 5000 | cut -d":" -f2)}" --ttl 60;sleep 45;done'


Now agreed, if your workflow demands that you embed a Bash while loop in a unit, you’re already in a bad place, but there are times where this is required for templating purposes.

Binary Logs

If you weren’t aware, journald stores its logs in binary format. This breaks the typical tools we are accustomed to using for monitoring a system. tail, cat, less, and grep aren’t useful any more. With binary logging formats, the capability for log corruption also becomes real. If a plaintext log accidentally gets binary content in it, most editors like vim and less will handle it gracefully. If a binary log gets binary data in the wrong place, your logs are toast.

The justification for storing logs in a binary format was speed and performance, they are more easily indexed and faster to search. However, it was definitely a difficult choice to make with obvious consequences to end users on either side of the debate. If fast logs/logging are desired, that can be accomplished, but users need to learn the new journalctl command and can’t use the tools they’re familiar with.

I don’t see binary logs as a bad thing, but it was yet another hurdle to systemd adoption. I’ll review logging later on in the post and defend my position on why I think that journald was a good idea.

The Good

Now, let us turn our attention to the benefits that systemd brings us. I believe that these are the reasons that all Linux distributions have adopted systemd.


Let’s just start by comparing a SysV init script for ZooKeeper, which is 169 lines of fragile shell script, as indicated by comments throughout their source code:

# for some reason these two options are necessary on jdk6 on Ubuntu
#   accord to the docs they are not necessary, but otw jconsole cannot
#   do a local attach

Let’s realize the above as a systemd unit:


ExecStart=/usr/bin/java -cp ${ZK_CLASSPATH} ${JVM_FLAGS} org.apache.zookeeper.server.quorum.QuorumPeerMain ${ZOO_CFG_FILE}


I wrote that in less than ten minutes. Admittedly, it requires an environment file which defines the following variables:


But… that’s it. It’s done.

If this process just logs to standard output and standard error, its logs will be recorded by the journal, and can be followed, indexed, searched, and exported using syslog-ng or rsyslog. I’ll review logging below.

Process Supervision

Back in the day, we used something like supervisord to make sure our processes stayed running. This was because before systemd, if you didn’t write it, it didn’t happen. Don’t think that the init scripts running your system services would actually monitor the processes that they started, because that didn’t happen. Services could segfault and stay stopped until manual user intervention was made.

Enter systemd:


This tells systemd that if this process crashes, wait one second and always restart it. If you stop the service, it will stay off until you have started again, just as you’d expect. Additionally, systemd will log when and why the process crashed, so finding issues later on is straightforward and trivial.

Process Scheduling

Back in the dark days of Sys V init scripts, what were our options for starting a service after another service? Further, what were our options for starting service A after service B but before service C? The best option was this:

while true ; do
  if pgrep serviceB ; then
    sleep 1

For starting service A before service C, we’d need to amend service C’s init script and add a similar while loop to detect and wait for service A. Needless to say, this is a disaster.

Enter systemd:

Description=Service A

And that’s all. There is nothing left to do. systemd will create a service dependency graph and will start the services in the correct order, and you’ll have a guarantee that serviceA will start after serviceB but before serviceC.

What’s even better is unit drop-ins, which I’ll cover shortly. In a nutshell, it means that it’s easy to drop in additional unit files to a unit without rewriting the source unit file.

Bonus Points: Conditional Units

systemd also makes it easy to conditionally start units:

Description=Service A

This will make Service A only start if the /etc/sysconfig/serviceA file is present. There are many different conditionals available, and all of them can be inverted.

Bonus Points: Parallelism

Since systemd knows the dependency ordering of all of its units, starting up a Linux machine using systemd is much faster than on older init systems. This is because systemd is parallel and will start non-dependent services in parallel.

Unit Overloading

As discussed above, systemd makes it trivial to drop in additional configuration for a given unit to extend it. Let’s say that we need to only start rsyslog after cloud-final has run. cloud-final is the final stage of cloud-init running.

The source file for the rsyslog.service unit lives at /usr/lib/systemd/system/rsyslog.service, but we won’t be editing that file. We will create a systemd drop-in unit at /etc/systemd/system/rsyslog.service.d/after-cloudinit.conf:


The final name of the file isn’t entirely relevant, so long as it ends in .conf. Whatever is defined in this file will be appended into the default unit file. This small drop-in will make sure that rsyslog does not start until cloud-final.service has started/finished.

EDIT: It was pointed out to me on Twitter that systemd loads these files in alphabetical order. In order to maintain sanity amid chaos, it would probably be a good idea to name these with numerical prefixes so that load order is intelligible, ie %02d-%s.conf.

Overwriting Units

What if the underlying unit needs to have certain bits entirely removed from the unit? Removing them is simple:


What we have done here in our overloading unit is to remove all ExecStartPre blocks from the upstream unit. If we add another ExecStartPre line underneath the empty one, we can provide our own pre-start scripts completely different than those provided upstream.


Logging with systemd is incredibly straightforward and sports all the bells and whistles one would want. If a process simply logs to standard output or standard error, by default its logs will go into the journal. Looking up those logs is then trivial:

$ sudo journalctl -u rsyslog.service

This will launch a less-like browser to scan through the log history of rsyslog.service. Following a unit is also easy:

$ sudo journalctl -u rsyslog.service -f

This is basically the equivalent of tailing a log file.

Logs are rotated automatically by the journal and this can be configured, so no more logrotate nonsense, the journal just handles it.

Plugging in rsyslog or syslog-ng into the journal is simple, and this means that none of your applications need to speak syslog, their standard output will go into the journal and will be imported and sent according to your syslog configuration.

Go Forth and Learn

We’ve covered a lot of ground here. I have personally bookmarked the following pieces of documentation for systemd to help me write units:

I haven’t even covered glorious systemd mount points, timers, or many of the security related options that systemd affords. I have also not covered the userspace tools systemctl and journalctl, which are documented here:

I was definitely in the “systemd sucks” camp for a long time, until I started investigating what systemd actually made possible. I now see systemd as a necessary part of my system-level infrastructure and it has become increasingly difficult to do without it on older distributions.

PSA: Don't Break Public APIs

The date is December 1st, 2016. Amazon announces an interesting new feature for CloudFront allowing running Lambda functions at CloudFront edge locations. This is a powerful addition to the AWS arsenal for running code in locations geographically closest to users. This feature is a “preview” feature, and it’s opt-in only.

What was not mentioned, however is this change to the CloudFront API. Namely, Amazon added a field to DefaultCacheBehavior objects in CloudFront Distributions which is documented as being not required, but is nevertheless required, resulting in the following error message if UpdateDistribution is called:

InvalidArgument: The parameter Lambda function associations is required.

Their documentation states:


A complex type that contains zero or more Lambda function associations for a cache behavior.

Type: LambdaFunctionAssociations

Required: No

Emphasis on “no” is mine.

Of course, the reality is that this parameter is required, and not passing that XML element breaks all API calls, as seen and documented in this Terraform bug report. A simple hack works around the issue by always creating an empty <LambdaFunctionAssociations> block for every request:

diff --git a/builtin/providers/aws/cloudfront_distribution_configuration_structure.go b/builtin/providers/aws/cloudfront_distribution_configuration_structure.go
index b891bd26b..1eff7689f 100644
--- a/builtin/providers/aws/cloudfront_distribution_configuration_structure.go
+++ b/builtin/providers/aws/cloudfront_distribution_configuration_structure.go
@@ -261,6 +261,9 @@ func expandCacheBehavior(m map[string]interface{}) *cloudfront.CacheBehavior {
                MinTTL:               aws.Int64(int64(m["min_ttl"].(int))),
                MaxTTL:               aws.Int64(int64(m["max_ttl"].(int))),
                DefaultTTL:           aws.Int64(int64(m["default_ttl"].(int))),
+               LambdaFunctionAssociations: &cloudfront.LambdaFunctionAssociations{
+                       Quantity: aws.Int64(0),
+               },
        if v, ok := m["trusted_signers"]; ok {
                cb.TrustedSigners = expandTrustedSigners(v.([]interface{}))

A full fix is forthcoming from the Terraform community thankfully, but this isn’t Terraform’s problem. Amazon broke the interface to this API without warning and in contrast to their documentation which says that the field isn’t required. This change would have broken their CLI if not for a fix in botocore, and even appears to have broken some of their web interface for origins for a distribution; configuring an S3 origin is broken and doesn’t appear to work for adding origin access identities for S3 origins.

All of this added up to finding myself in a predicament: I had tampered with my origin configuration on CloudFront and I had no way of returning to a sane state. I couldn’t use Terraform to revert, the CLI was very hard to work with, and I couldn’t use the web interface to revert.


I was able to ultimately get around the issue by manually compiling Terraform myself after patching the source code. After recompiling, I was able to apply changes again and get my distribution working. Again, Terraform is not at fault here, it’s entirely Amazon’s fault for breaking a public API.

Lessons Learned

API breakage, whether we like it or not, happens.

However, the fact that Amazon could release software that would break things like this and release documentation contrary to the actual functionality, all without some testing alarms going off, engenders a serious violation of trust for me as a user of Amazon Web Services.

It should go without saying to developers of REST APIs that if you introduce backwards incompatible changes, you :clap: must :clap: bump :clap: the API version in the URL.

Amazon, please update your documentation or please make LambdaFunctionAssociations a truly optional field in DefaultCacheBehavior. In the meantime, everyone should scramble and try to work around this API breakage.

A New Era

After many years of internet content management dysfunction, I have finally begun to consolidate everything and solidify my approach to writing articles and publishing content. This site uses Jekyll as a content management system for hosting a static site, a modified version of Lanyon as a theme, and a fleet of other technologies to create a pretty comprehensive system. Since it’s all the rage to talk about how we each choose to do things, I’ll spend a moment to describe how all of this is setup.

Development and Writing

All content for the site is managed in version control in Git and hosted privately on GitHub. I primarily use Atom as my editor, editing Markdown files by hand and previewing them locally.

While I greatly prefer Less for stylesheet management, Lanyon uses Sass, so I just try to pretend I’m writing Less when working in style-land.

My typical local development workflow uses Vagrant to create a VM for each software project I work on or maintain so as to have a reproducible environment in which work happens, and this project is no exception. Ansible is used to provision the CentOS 7.2 VM to install Ruby and do other needful things. While I personally use elementary OS which is based on Ubuntu, I would never run Ubuntu in production, it’s setenforce 1 or GTFO, and AppArmor is a terribly ineffective mandatory access control system… oh right, we were talking about my blog :blush:

I write my posts and do my theming from my local machine, the files are shared between my host and the Vagrant VM, and I use this little SystemD unit to automatically regenerate my site as I change files:

Description=Jekyll Static Site Serving

ExecStartPre=/home/vagrant/bin/jekyll clean
ExecStart=/home/vagrant/bin/jekyll serve --force_polling --host --config _config.yml,_config_dev.yml


Using a port-forward to my local machine, I can browse my site as I work on it at localhost:8080, making things absolutely fabulous :ok_hand:

Typically, posts are written in a feature branch, submitted as a pull request, some baseline minimal tests are run, and output can be previewed. When things are ready, I git merge --no-ff -S by hand and push to master. When this happens, deployment starts.


Content for my site is stored in Amazon S3, cached and fronted by Amazon CloudFront as a CDN. Additionally, in order to use serve content at the apex of the domain name(s), I use Amazon’s Route 53 for DNS. I’m not doing anything super fancy for geolocated superfast DNS, as I’m preferring reduced cost over ultimate performance victory™, at least for now :wink:

I’m also not making a big deal about getting DNSSEC set up for my domains, though I may in the future. I’m not convinced that it solves the problem it aims to solve, and I’m not entirely even clear about which problems it does solve well or at all. If you are so enlightened, please drop me a line.

TLS certificates are provided by Amazon’s Certificate Manager and are cheap as free™ for CloudFront and for a few other Amazon resources. I get an A on Qualys’ SSL Labs and I have no management/maintenance overhead.

The minimum hosting cost of the aforementioned setup is $1 USD per month per hosted zone that is hosted with Route 53. Yes, you heard that right: one US dollar per month. I am hosting two domains ( and, so my minimum is $2. Everything else is variable but very cheap, it’s pennies for S3 storage, CloudFront charges by transfer, and Route 53 charges by bulk counts of requests, and I don’t estimate hitting anywhere near where I’d have to worry about these costs being significant, so :muscle:

If you want a private GitHub repository, that’s another $7 USD per month, for a total of $8 USD per month for unlimited private GitHub repositories and one hosted zone serving content out in the described fashion.

Presently, my infrastructure is all automated in Amazon’s CloudFormation, which I have lost a lot of blood, sweat, and tears to over the years. No less evil is Terraform, which is probably what I’ll migrate my resources to in time.


Part of what was alluded to previously is continuous integration and continuous delivery/deployment. While most organizations I’ve worked for use Travis CI, I find it prohibitively expensive for individual plans, which at the time of writing is $70 USD per month for personal private repositories. Since the actual usage for me consists of less than 30 minutes of build/deployment time per month, I found this kind of unacceptable.

I shopped around and found CircleCI which actually is free for my purposes, allowing one concurrent build across all repositories, 1,500 build minutes per month. For me this was perfect, as my private repositories are few and far between, and I can use Travis for any public repositories.

EDIT: Whereas before I had some Bash monstrosity, I have migrated to something a little bit better.

After trying to work around an unpredictable Bash deployment script that worked… uh, sometimes ¯\_(ツ)_/¯, I have created a Python script which does the same thing and is far more reusable: s3cf-deploy. It essentially uses the AWS CLI to sync assets to S3, and then interprets output in order to generate a CloudFront invalidation for only those assets which have changed, which is pretty cool.

I can now write and deploy things without thinking too hard about it.


In any case, it’s nice to finally have a consolidated place on the internet to host and write things, and I anticipate that I’ll be migrating many of my old posts from previous blogs. It’s also nice that this entire setup costs a fraction of what I’ve been paying for years to $TRADITIONAL_HOSTING_PROVIDER for something very similar with many more limitations.

I don’t work for Amazon, so there’s no reason specifically that I have chosen them, other than the selling points of it being cheap, working, being relatively fast, and not requiring too much maintenance at all. The last three places I have worked, I have been involved in infrastructure automation in Amazon Web Services, so needless to say I have a bit more experience in it as opposed to other services. If someone finds a cheaper way to do this on Azure or GCE, :clap: that’s awesome and I’d love to hear about it.

For now, the only limitation is that everything here by definition is static, there is no server executing code to render content or pages here. I do have a plan to experiment with Amazon API Gateway and Amazon Lambda, encrypted similarly with Amazon’s Certificate Manager to have on-demand compute resources for arbitrary things I’d like to trigger, but that remains for another post :raised_hands: :sun_with_face: