Disconsolate Puppeteering


Basically a giant rant about Puppet adventures circa 2012-2017.

“The three types of terror: The Gross-out: the sight of a severed head tumbling down a flight of stairs, it’s when the lights go out and something green and slimy splatters against your arm. The Horror: the unnatural, spiders the size of bears, the dead waking up and walking around, it’s when the lights go out and something with claws grabs you by the arm. And the last and worse one: Terror, when you come home and notice everything you own had been taken away and replaced by an exact substitute. It’s when the lights go out and you feel something behind you, you hear it, you feel its breath against your ear, but when you turn around, there’s nothing there …” — Stephen King

Intro

There are numerous people out there who are better and more competent than me. I’m happy to have my mistakes corrected (maybe not ridiculed though, but hey — it’s the Internet) and hope to learn from them.

I’ve started working with Puppet around 5 years ago when I volunteered to fill the vacant position of “operations guy” at a small/mid sized company. Having some years of experience with Ruby, this opportunity was exciting and Puppet didn’t scare me much since I knew if worst comes to worst I can always throw in some binding.pry statements and feel very much at home.

Configuration management was a new field to me and major competitors at the time were (and still are) Puppet and Chef. After spending some time learning differences between the two, and even though Chef was pure Ruby inside-out, Puppet had won me over by virtue of using a declarative/ functional approach to describe infrastructure details and dependencies, and leaving the nitty-gritty details of OS-specific file/package/service management to the tool. Chef just felt much more mutable and procedural, maybe because Ruby inherently is (yes, yes, i know you can do functional Ruby) and people inescapably mess things up over time.

I firmly believe functional/declarative is the right approach to manage infrastructure, but after all these years I’ve witnessed how even this amazing idea can be horribly undermined by poor execution. This post has been brewing for the last ~4 years and spans multiple releases (0.28–4.5) and multiple minor and major upgrade events.

The Gross

This section is mostly about syntax warts and syntax/semantic choices of the language. As was mentioned, the original promise of functional/declarative language was what got me sold initially, but the implementation and developments have been a continuous source of disappointment.

Ruby is a beautiful language. I don’t know why Puppet developers went with Ruby, but it happened and I consider it generally a good thing.

The Puppet language on the other hand is gross. It almost seems as if it was deliberately designed to be removed from its Ruby roots. The feature-set and syntax/semantic choices made by Puppet went all in a direction that is the opposite of good. I didn’t find a comprehensive history of how Puppet’s syntax or type system evolved over time, but suffice to say that it is quite contrived, often very inconvenient and calling Puppet functional is a major insult to most other functional languages out there.

Here are just some examples of what frustrates me on language level.

Undef

Nil value has trivial meaning and mechanic in Ruby — it represents nothing, it is falsey when used in conditionals and coerces into an empty string, that’s about it. For some mysterious reason, rather than simply adopting a nil value/type directly, Puppet decided to introduce it’s own implementation in form of undef/undefined with a few minor subtleties.

First, syntactical representation of undefined value is/was the token undef and considering that in Puppet having quotes around most strings isn’t mandatory — things were often getting out of hand with people abusing and misusing string values to represent undef values. This was further amplified by the fact that internally, in Ruby land, undefs were represented by :undef and/or :undefined symbols — perfectly valid, truthy values, and Ruby’s nil.

Second, conversion of undef into empty string has undergone a major change, when empty strings stopped being falsey somewhere around Puppet versions 3.7–3.8 with “future” parser enabled.

So in the end, this is how your code begins to look like: https://gist.github.com/frimik/4482463.

At first these sound like minor issues to whine about, but take into account that all of this is happening in the DevOps world, where testing, DRY and other good practices of software development were on a rather slow adoption journey, and the Puppet module ecosystem was largely driven by cargo-culting and hacking on other people’s chunks of code until it “sort of did what you needed”. All of this means extremely extremely slow upgrade/maintenance schedule due to the amount of testing, forking and fixing 3rd party code. Not many things are more frustrating than nil-hunting, and hunting for Puppet’s nils was the worst.

No functional building blocks

Whining about syntax and semantics is often controversial, but I believe objectively bad choices can be determined, especially considering the domain and original promise of functional/declarative. Building a functional language without builtin support for closures and adequate syntactic- or library-provided functional primitives like map, reduce, filter, etc is just plain bad. Ruby provides all of that for free in a very straightforward way — why not just grab it?

Same goes for non-value statements, like if, unless and case. There is literally no reason for those not to return a value, even if inner branches are emitting side-effects.

Lastly, being based around manipulating large nested data-structures, not having a convenient set of functions to manipulate said data-structures is robbing every developer of productivity, code-quality and happiness.

Inconsistencies

case has a twin brother — the selector syntax, which does return a value, except they are different in more than one way:

case $some_var {
  /regexp/: { $a_value = "hello" }
  default:  { $a_value = "default" }
}

$a_value = $some_var ? {
  /regexp/ => 'hello',
  default  => 'default'
}

Case statements are made to “execute code”, selector syntax is made to return value, matchers are separated from respective branches by colon in case and by fat arrow in selectors, branches have to be wrapped in curly braces in case but not in selectors. case is the epitome of inconsistency. There are plenty more sprinkled around the language, but case just trumps everything.

The basic idea of Puppet is to declare resources using a name, identifier and some parameters and let the Puppet internals figure out which changes need to be applied to the system for it to match your declarations. Here’s a contrived example:

$user = $::fqdn {
  /dev/  => nobody,
  /prod/ => web
}

file { "/some/path":
  content => foo,
  owner   => $user
}

This code will create a file with a certain ownership and content attributes. If the file already exists, Puppet will make sure that owner and content match the declaration. Notice how “foo” and user name values aren’t quoted , because quotes are optional in Puppet (a syntax choice that was probably meant to improve readability?) — after many years working with various Puppet versions (seriously this might be the only technology where extensive experience works against you) I am not sure what happens if you replace some of these values with the undef literal. For sure it won’t be interpreted as the “undef” string, but beyond that — I’d have to actually run Puppet to check. And it wouldn’t be consistent between Puppet versions. The effect can be anywhere from argument defaulting to some value specified in class declaration, ending up being an empty string, being passed as a literal undef value, preventing defaults to take effect, or properly treated as nil values.

What good is there in “clean” syntax when it impairs readability and maintenance of your code? Easy solution is to simply always enquote your strings, and it seems Puppet and community around it has been moving that way for a while, but how many man-hours could have been saved if this “feature” was simply deprecated many years ago?

There is no native syntax to declare a resource by passing it a hash-map with arguments. There is a create_resources function, but the absence of syntactic sugar for it made for some obnoxious patterns and workarounds.

Strings are lacking simple concatenation operation. “Just use string interpolation!”, I hear you say, but then how to avoid gigantic lines of code and adhere to 80-rule (yeah, I’m that guy), since there’s no way to break up string because that requires concatenation operation? OK, I’ll use the join function, that is almost acceptable.

Every syntax complaint I have about Puppet is fairly minor, but working through codebases where these syntactic warts are layered atop each other, where dozens of developers before you have made all the different compromises on much effort they want to put into making their code readable, maintainable and correct, is very frustrating.

Migration story

It’s not hard to conclude, judging from sections above, that a major contributor to man-hours spent dealing with Puppet codebase is upgrading versions. Infrastructure automation code is usually a slow moving beast with parts that get phased out and/or neglected over time but are still well in use and just chug along wherever they are relevant. A major version change is problematic across the entire codebase due to changes in syntax and semantics. Even minor changes are shipped with extreme caution and often reveal surprising code-paths that, due to a combination of factors, either stopped doing what they were intended to, or worse — are now doing something that’s wrong.

In general I very much like going with the Kent Beck’s make the change easy, then make the easy change. When preparing for largest codebase migration from Puppet 3 to Puppet 4, it didn’t help making the change easy, to find that a decision was made to change default locations of config files. Not only the top-level /etc/puppet had to become /etc/puppetlabs/puppet, some contents of /etc/puppet had to be shuffled around. Because, you know, there’s nothing better than frustrating the hell out of your users.

All of the above is just the tip of the iceberg. Down below the waterline we encountered, in various degrees of impact all of the following, and some more:

  • major performance regressions
  • new function API radically changing what Puppet function is internally, which broke a ton of tests
  • stdlib functions moved to new function API changing subtly in inputs and outputs
  • previously working modules would stop working
  • previously working modules would start working in a different way
  • inconsistencies with variable interpolation in various contexts
  • etc

Imagine all that sprinkled evenly across your infrastructure, no machine spared. It was a long and strenuous migration.

Internals

Oh God, the internals! The code is as far removed from the implementation via as many indirection techniques as it is humanly possible. Navigating stack traces is pure nightmare. And it’s not like there are some simple ways to reduce the complexity without major refactoring of entire subsystems, but there are few suggestions that I’ll make in the Conclusions section.

The Horror

In this section I’m mostly ranting about things that accompany Puppet, that i get exposed to on a daily basis and spend most time dealing with and lamenting about on IRC channels.

Sources of truth

Hiera, ENC, argument defaults, facts. These are four sources of truth that Puppet takes as inputs for it’s codebase to deliver you the (often gigantic) blob of a nested data-structure.

Only facts are relatively hassle-free and I don’t have any major complaints about.

Argument defaults have more than one way of defaulting to some value — the one you define in class declaration and the one you might have in per-module or global Hiera data. Have fun tracking all that down.

ENC — external node classifier is a neat way for Puppet to ask for supplementary inputs based on the fully-qualified domain name of a host. You specify the ENC command, and Puppet will invoke it, passing the fqdn as an argument. The resulting JSON will be parsed and merged into root inputs for catalog compilation. Question is — why only fqdn? Puppet invokes the command on master, having access to full set of facts about the host. Why not allow trivial interpolation, like it is allowed for Hiera keys and values? Seems like an arbitrary restriction that doesn’t have to be there.

Problem with Hiera is that it’s a naive (initially) attempt at encoding a data transformation and inheritance rules into YAML. Subsequently it has become a grossly-complex naive attempt at the same thing. I’ve spent hundreds of hours stepping through stack-traces, jumping wildly between different Hiera contexts, hierarchies and locations, I still don’t have a clear picture of what’s going on underneath.

In Puppet code there are couple ways to use Hiera:

  1. Let class arguments default to values behind corresponding hierarchy/key. You risk losing sanity when you notice that after an eternity of using :: as key separator, Puppet 4 had suddenly introduced . separators, which breaks all your keys that have a dot in them. Like, i don’t know, versions of things or some alternative topology you’re constructing in your module to work around Hiera’s shortcomings.
  2. Use one of stdlib’s hiera, hiera_hash, hiera_array. These exist for when you need to lookup values outside of your context. The difference between them is in the manner in which values are collected across hierarchies. It is a clear hint that trying to encode data manipulation using the same data without putting some thought into it leads to very ugly solutions.

During migrations I had to go through many debugging sessions to track down inconsistencies between Puppet 3 and Puppet 4 Hiera lookup warts. Previously accessible keys would end up outside your context / ability to interpolate. Previously erring undefined variable interpolations would yield an empty string. Previously working nested interpolations would stop working, forcing upgrade to even more recent Puppet version in the middle of upgrading the codebase, forcing re-checking and re-fixing everything you’d done until this point. I had to implement some ugly (and admittedly still buggy) hacks just to keep the thing from falling apart.

Composability

Puppet is not the UNIX way.

Let me expand by roughly describing how Puppet works (fairly late into this post, now that I think about it).

  1. When a node wants to run Puppet, it establishes an HTTPS connection to a Puppet-master, sends a set of facts about itself and waits for a catalog.
  2. Puppet master retrieves that set of facts, optionally runs the ENC classifier to merge in some additional data, parses your puppet codebase into an AST and walks that AST, executing instructions, that potentially have side-effects — they can yield resources or resource edges.
  3. Those resources and edges are then collected into a giant data-structure and sent back to the client, which walks over them, compares resource declaration to resource state on local machine and applies the differences if any.

Notice how this splits neatly into separate isolated steps? It is only logical, that for convenience, testability, reproducibility, cache-ability, introspect-ability and all the other good made-up -bilities — those steps should all be invokable in isolation!

I should be able to write something like

facter --for-catalog | \
  puppet agent --facts — --get-catalog-request | \
  curl --something-something - | \
  puppet agent apply

on a client, and something like

(cat facts.yaml; cat enc.yaml) | \
  puppet master --facts - --compile

and it should be easy to find out how to do it! It is not. It took me quite some poking to find the right (non-composable) command sequence to run on client and MUCH more poking to find the same for master. And of course both turned out subtly different between major Puppet versions.

Problem on client was that getting Puppet to log what requests it was sending out was not trivial (due to aforementioned amounts of indirection). Another problem on client was that after some poking I just couldn’t figure out how to make puppet find work (it is/was supposed to fetch a catalog from the master?).

Problems on master were numerous:

  • can’t tell master “use this file for facts”, it only knows to look in a folder for specific fqdn
  • can’t tell master “use this data as ENC”, it only knows to execute a command (a trivial #!/bin/cat sufficed)
  • for some inexplicable reason puppet master mixes in log notice into catalog JSON output so you have to sed/head it out
  • least trivial of them all — catalog that master produces by default in that mode is completely useless because of how it references static resources, so I had to figure out the right flag values to make it work

None of this should be so complex for all sorts of reasons, most prominent of which — it does not make sense.

The Terror

Where today’s design choices are making me question if I ever want to touch it again.

Strange Typing

A type system has been introduced in Puppet 4 with seemingly sole reason to replace validations, but it isn’t flexible enough to express arbitrary validators, such as regular expression for instance and it isn’t used in any capacity to help performance, which leaves me wondering — why burden the codebase and the users with this novelty, why add this gigantic amount of complexity, as if you don’t have enough of it, and more importantly why not spend the hundreds of man-hours that went into developing and will go into maintenance and bug-fixing, doing something useful, like making Puppet faster, leaner and simpler?

All aboard the Template

For generating file contents out of some free-form data, Puppet has been and still is using ERB templates. ERB would parse template content looking for special tokens <% %> and execute the Ruby code in between according to some simple rules. Point is — there was a refuge from Puppet language, a place where one could find a little peace of mind. That is probably coming to an end with announcement of EPP template language in Puppet 4, which looks just like ERB, except the code between those tokens is not Ruby. Consistency in developer struggle is at least some form of consistency, I guess? Of course ERB is not gone yet and I haven’t seen anybody using the EPP for anything in the past couple years — so there is some hope after all.

Java to the rescue

Puppet Server is a Clojure app that proxies requests to Puppet master instances in jRuby threads. I’m not sure exactly how these decisions are made, but between “let’s wrap our Ruby codebase in a thread and run a proxy in front of it” and “let’s simplify and reduce our Ruby codebase” I would’ve certainly gone for the latter. I mean supposedly you have a whole bunch of Ruby developers already?

Clojure is amazing and I wish the Puppet language was emulating it, but layering complexity atop complexity is not helping the core problem — the amount of complexity.

Conclusion

My main beef with Puppet has evolved over time and undertook a significant shift since i learned LISP (in form of Clojure).

A functional approach, if implemented rigorously, would allow valuable additions to the language. In the end Puppet is primarily a big nested-data-structure-munging machine. You get a big data-structure as input and you spit out another one as output. Some graph shenanigans in the middle. Have first-class support for manipulating those data-structures and you’re halfway there. Add LISPy homoiconicity to enable representing code using your data-structures and you’re 80% there. And the rest 20% could have been all that hard work that went into implementing types in a language that doesn’t need types, implementing a proxy layer for a server that doesn’t need a proxy layer and a whole bunch of useful and scalable features following naturally from functional-ness and homoiconicity: partially evaluated catalogs, completely transparent and introspect-able processes, a way to properly encode data manipulation and inheritance rules in Hiera and most importantly — coherent and consistent language at the bottom. Open source

“But it’s open source — don’t whine, go fix it!”. That is true and I definitely lack the tenacity to do it. It’s hard to stay motivated after you’ve put so much effort and fought so many battles against the codebase. Still, I have tried sending pull requests to fix some of the issues I’ve encountered, but even though I can’t claim that I was persistent or collaborative enough to get them merged, I didn’t get the feeling that it would have been easy. The chance I will ever go back to poking around Puppet internals has diminished to near zero.

puppet  rant