Beyond configuration management

Configuration management is the problem, not the solution. Start managing applications instead.

For decades, operations engineers have focused on configuration files and how to manage them. And that’s the problem.

All configuration management tools trap you in the idea that you should be writing configuration files at all. It doesn’t matter if you prefer Ansible or Salt, or Puppet or Chef. These are just different styles of trap.

With ever more pieces of software to integrate, from an ever more diffuse range of open source publishers, and ever faster releases, there are too many configuration files with too many permutations and combinations to keep track of. It is becoming impossible to find the combination of low-level skills needed to integrate the stack.

Kubernetes is not a solution either, because declarative infrastructure requires you to declare every detail that you want, in YAML, which is just another configuration file. It’s FAST, but to run a thousand things in K8s still requires dedicated in-house expertise in that thousand things so that you can write the YAML.

That’s the real problem. Running an application should not require detailed knowledge of the internals of that application.

To reach the next level of productivity we need to stop focusing on configuration files altogether.

Integration makes the problem harder

If each application ran completely independently, then we could share configuration management code as we know it today. The problem is that applications in the enterprise always need to be integrated with other applications. A particular business makes unique choices of combinations of applications to integrate for particular situations.

Configuration management tools force us to blend all of those choices into a single tower of babel that reflects the unique landscape inside that enterprise. Whether it is Chef, Puppet, Salt or Ansible, that enterprise configuration management code cannot simply be taken to another enterprise, because it mixes up knowledge of the applications with knowledge of the particular business and situations those applications are being used.

Application management is concrete

We can easily describe what we want from our applications. We want Spark on those three machines, and HDFS on these 20 machines. We want Kubeflow integrated with our Active Directory. We want this Postgres to be highly available and we want that Postgres to be fast for test runs even if it’s not persisting its data to disk at all.

We draw pictures of this on whiteboards all over the planet. You could walk into a conference room on the other side of the world and recognise what the engineers who just left were cobbling together from the block diagram on the whiteboard. Of course, some pieces would be unique to them, but a lot would be… Kafka, Postgres, Kubeflow, Spark.

Our goal in the Juju operator lifecycle manager is to make that high- level application management language, the actual language you use for your operations.

Model-driven operations capture intentions clearly

Instead of expressing what we want from integration as the resulting configuration file soup, we express our goals in a model. The model says which applications we want, on which machines or Kubernetes clusters, integrated in which way, with some high -level policy guidance such as whether to optimise for throughput, reliability, low-latency or persistence in a database.

The model is just like the whiteboard — you can glance at any model and see immediately what the intention is, what standard components are being integrated, where this is all supposed to run, and at what scale.

Operators translate model to config

We then use shared code called an ‘operator’ to turn that high- level model into configuration files. The idea of an operator is just that it is software (in a container or on the machine filesystem) which has the responsibility to drive an application lifecycle and integration.

Most importantly, the users of the system never need to actually write the configuration files, and for standard workloads that everybody uses, they don’t even need to know where those configuration files live and how they work.

The operator is packaged as a charm. The operator lifecycle manager, Juju, fetches and installs charms and then tells them about the model they are living in so that they can do the application management correctly.

Operators are reusable code

Since each operator is focused only on a single application, and since the operator can only respond to a standardised model, that code can be reused in any organisation that uses the same lifecycle manager.

Reuse of code is the key ingredient to improve quality, performance, security and features. Traditional configuration management code is a huge pile of almost-done software, but operators can be developed iteratively, improved continuously and receive scrutiny and contributions widely because they are valuable code for many different people.

Knowledge distilled into code

Most organisations don’t write their own kernels and databases. It’s too hard and too expensive to reinvent something that does not differentiate your business. They reuse commercial products or open source projects that are standard in their industry, and through that reuse, they get access to code which they could not write themselves.

Very high -quality automation of an application requires substantial experience and sophisticated skills which are rare and expensive.

It takes a long time for a very smart person to automate a database and handle all of the different situations an organisation might encounter. That database will be used in many different places — on clouds, on Kubernetes, in testing, in staging, in production. It will be used on tiny machines and on huge machines. It will be used with fast disks and slow disks. In each case, a specialist would be able to optimise the configuration for that situation, and a very good software developer would be able to turn that knowledge into code which delivers that configuration perfectly every time.

Creating that code is too expensive for a small organisation. Specialists and very good developers are rare. If you have them it is better to have them focused on things which differentiate your business.

Shared ops code packages

The goal of the Juju operator lifecycle manager is to make it possible for organisations to share operations code for that database, perfectly, just like they share the code for the kernel and the code for the database itself.

Open source principles and practices

We treat operations code as open source packages, just like the standard commodity applications themselves. And we get all the same benefits of quality and cost that open source packages give you for your applications.

An open source operator will not be perfect initially. It will likely handle just the situations that its original developer needed. But it serves to attract a community and contributors, each of whom bring new insights and perspectives and experience. Just as open source gains momentum and depth, an open source operator delivers better and better operations, until it is the world’s best expert in a package.

Consistent experience

In order for operations to become simple despite the richness and diversity of software, we are required to integrate and run, the administration experience of diverse applications must become consistent.

Today, administering a particular application requires detailed knowledge of much more than configuration files. The admin needs to know how to restart the application, where it sends its log files, how to backup and restore it. Those details vary from application to application, and it takes time to learn them which slows down the adoption of new software by the organisation.

Unifying the experience of application management means creating a standard set of mechanisms that do not involve low-level details. Expressing every application in this standard way means that a new application can be embraced and integrated much more quickly because there is less unique knowledge to be acquired before an admin can be proficient.

With operators and the Juju operator lifecycle manager, every new application presents exactly the same way to administrators. A charm provides consistent metadata about versions and substrates (kKubernetes, machines, operating systems). The charm also declares the set of daily operations actions that can be performed — backup, restore, reset and so on — all of which can be triggered in exactly the same way and monitored in the same way. Integration possibilities are declared explicitly. Scale, storage and networking are all standardised.

In order to give administrators a clean and immediately understandable way of operating an application, we must use a consistent language to describe all applications.

The Juju operator lifecycle manager provides a model-driven language that unifies applications on bare metal, virtualisation, cloud instances, and kKubernetes. It allows the administrator to be effective with a new application even if they have no underlying knowledge of the details of that application, and no access to the actual machines or containers that are hosting the application.