Multi-cloud operations

Companies want to operate across multiple different public and private clouds. It should be possible to deploy and operate any standard application or component on any cloud in exactly the same way, with exactly the same declarative descriptions. It should also be straightforward to integrate software deployed across diverse clouds, and integrate cloud SAAS with local applications.

What are multi-cloud operations?

Many organisations have invested in relationships with multiple separate public clouds. Since all of the major cloud providers offer the raw materials of compute, network, and storage, it makes sense to be able to deploy onto those clouds in a standardized, repeatable and portable way.

Case Study: Volkswagen Group multi-cloud operations

Watch Scania present at Ubuntu Masters in 2020.

From bare metal high performance computing to on-premise VMware clusters and multiple diverse public cloud regions, Scania have created a seamless operations regime which puts the right workload in the right place and preserves the ability to move overnight.

Charms are portable operations code

With the Juju operator lifecycle manager, we ensure that it is easy to:

  • Deploy legacy apps on any public cloud
  • Take advantage of unique capabilities or services of particular clouds
  • Deploy on-premise, on bare metal, VMware, or OpenStack
  • Deploy Kubernetes workloads on any cluster
  • Integrate legacy and Kubernetes workloads
  • Integrate cross-cloud applications

The Charmhub hosts the Open Operator Collection, a portfolio of operators for many applications that work across all public and private clouds as well as bare metal.

The business case for multi-cloud operations

Each of the large clouds differentiates with specialist services. While the clouds are superficially the same, offering commodity resources, in practice there are amazing capabilities that are unique to each provider. Engaging with multiple public clouds ensures that a business has access to those unique capabilities.

It is relatively expensive for a large organisation to engage deeply with a big public cloud. Security policies and practices need to be defined, identity management needs to be arranged and maintained, monitoring and management needs to be put in place for resources on that cloud. While a startup can hop on a cloud in a very agile way, large organizations typically run a complex process to choose their cloud providers, and then spend substantial amounts of time and money investing in the relationship.

So fragmentation of an estate comes at a price, which is why Juju is so useful. Enabling repeatable and portable operations across clouds lowers the friction and cost of those relationships.

There are other benefits to having multiple cloud providers. Competition between providers is fierce and keeps costs lower. In a regulatory or physical disaster scenario, redundancy of relationships may be very valuable to the buyer, enabling rapid movement from one platform to another, as long as those workloads are encapsulated and portable as Juju operators.

Technical methodology and advantages

The Kubernetes operator pattern encapsulates container operations in an ‘operator container’.

Juju generalises the operator pattern, taking it beyond containers. With the Juju operator lifecycle manager, operators can drive software outside of Kubernetes too. Juju provides a universal operator framework for multi-cloud operations that transcends Kubernetes and addresses legacy workloads.

Much of the software that companies wish to operate in a multi-cloud way is not yet containerized, and it may never be containerized, so a universal operator mechanism is important to the multi-cloud goal.

Provisioning is a lifecycle manager function

Operators should not provision themselves, nor should the admin directly provision them. This is why the Juju operator lifecycle manager (OLM) handles provisioning on behalf of operators.

The admin requests the OLM to install an application, and the lifecycle manager finds the right operator for that application on that substrate, provisions any needed containers or machines with the correct networking and storage, and installs the operator. This approach means the operator is able to drive application installation without having to worry about any of the compute, networking and storage setup that was handled by the lifecycle manager during provisioning.

The OLM design decision ensures that operators do not need any code that is specific to cloud machine or container provisioning, network or storage setup, which would tie them to particular substrates that they know how to provision. Any Kubernetes where Juju can stand up containers, or any cloud where Juju can stand up machines, will work with any operator.

Handling provisioning in the lifecycle manager is particularly important for associated resources like networking and storage, which can vary widely between clouds or even Kubernetes clusters. Specifying network and storage at a conceptual level again simplifies the task of the human operator and ensures greater portability, reuse and value in any given operator.

The Juju approach greatly simplifies the human element of software operations. Yes, trailblazer tech pioneers love to work at a low-level, specifying every detail. However, to bring the benefits of operators to the widest possible enterprise market, Juju decouples the idea of the application from the specific operator version operator that drives it, so that the human administrator can focus on the key thing they want to achieve, and not have to interact with low-level system details on Kubernetes or the cloud. This dramatically simplifies the administration task - if you know what application you want, ask for it and the lifecycle manager figures out which operator will work best. Asking for “MySQL” is exactly the same on Kubernetes or on bare metal or on VMware, Juju handles the details.

Unlike other operator lifecycle managers, the Juju OLM can manage operators on multiple different substrates at the same time. You can have a Juju controller (operator lifecycle manager) running on Kubernetes that manages operators for a model on bare metal, or the other way around.

Substrate-agnostic operators

An operator is expected to work on every cloud, or every Kubernetes, and not be limited to a single environment unless it is offering an exclusive service from that environment, such as Amazon RDS or Azure HDInsights or Google Cloud Spanner.

Each operator is code that executes in a machine or container. It is typically a Python application which handles events delivered by the Python Operator Framework. The operator in turn drives the application.

Exactly the same operator code can handle both ‘machine’ substrates and ‘container’ substrates, but in practice it is more common for these to be related codebases rather than exactly the same binary code. Typically, if you care to have an application on both machine and container substrates you will have a lot of common code that handles the logic that is specific to the app, and then some substrate-specific code for each substrate, and publish separate binaries for the different substrates.

To restate, you can have a single codebase for both container and machine flavours of your operator, but generally people don’t, they have common code for two flavours and they build, test and release the different flavours independently.

Cross-cloud integration

The Juju OLM natively supports integrating applications deployed on two different clouds.

Operators are installed into a ‘model’ which represents a coherent set of apps that run in the same place. Integration between applications is declarative, since the operators have typed integration points defined in their metadata which specify the sorts of operators that they can integrate with. The graph of integrations is called the application graph.

So, all of the operators in a single model can be integrated directly with any other operators in the same model which declare that they have matching integration points. When two operators in the same model are integrated, they exchange information in order to configure their apps to talk to one another.

Juju supports ‘cross-model integration’, whereby a specific integration point on an operator in one model is ‘offered’ from that model, and another operator in a different model ‘consumes’ that offer. Both offer and consume require local admin permission in each model. In effect, to the operators, the exchange of integration information is exactly the same as if the two operators were in the same model.

The extra step of ‘offer’ and ‘consume’ reflects the fact that such an integration line is crossing an administrative boundary. All the people who have permission to manipulate operators in one model may not have permission to do the same in the other model. By offering their endpoint to the other model, they allow the people who do have permission in the other model to complete the integration by consuming their offer.

A given model has to be on a single substrate. But cross-model integration can connect two models in two completely different clouds. So, for example, a model on VMware could integrate with the S3 service on AWS for backups. Or a set of applications on Azure Kubernetes could integrate with a Google SAAS for real time search trend data.

Cloud-specific application features

It is important not to be stuck with a lowest-common-denominator approach to multi-cloud operations.

Many clouds have unique features that are valuable and beneficial for a particular application. Operators can take advantage of these features if they happen to be running in a particular environment, but it is bad practice for them to depend on those features unless they are clearly cloud-specific operators. For example, a Kubeflow operator could be extended to take advantage of a particular GPU on AWS, but it shouldn’t fail to work on VMware if that GPU is not present.

How those features are enabled or disabled is configuration for the operator publisher, but it is good practice to adopt ‘sensible defaults’ or discover through some appropriate mechanism the default preference of that particular user. For example, it is reasonable to assume that a workload would not have been provisioned onto a VM with a GPU if that GPU were not intended to be used, so use it by default. After all the user has chosen that cloud, and that VM type, for a reason.

In the Open Operator Collection, we celebrate operators that take advantage of substrate-specific capabilities while working consistently across the widest range of substrates.

Cloud-specific SAAS

Many clouds offer SAAS, for example Amazon’s ‘Relational Data Service’, RDS. Such MySQL-as-a-service offerings are common, and there are hundreds of such API-driven capabilities which can be turned on or off and integrated into applications. They are a valuable part of the cloud ecosystem. It is important to integrate these services into an operator-driven environment.

In the Juju OLM it is easy to integrate SAAS offerings with components that are being driven locally by operators. In fact, operators of SAAS are presented in the Juju model as just another application, with just a charm, and all the same integration mechanisms as a normal operator in the collection.

We talk about operators as ‘driving software’, managing the lifecycle of application installation and upgrade and configuration. But there is no reason for that software to be a local application. SAAS offerings have a lifecycle too. ‘Install the application’ maps to ‘call the API to instantiate the service’. ‘Remove the application’ maps to ‘call the API to terminate the service’. Each service has a set of things which need to be negotiated in order to integrate with it, just like an application does.

So, in the open operator collection, you will find operators for common SAAS solutions. You can install these operators into your model, you can configure them, you can integrate with them, and they behave in the model exactly as any other operator. The only difference is that they drive the APIs of the SAAS provider, rather than driving local software installation.

A key idea in the Juju OLM is typed integration points on operators. These allow declarative integration between operators in a model. Where a SAAS offers a standard service, like MYSQL, it will offer up integration points with exactly the same type as the equivalent local MySQL. That way, apps can integrate with SAAS MYSQL as if it were the local MySQL. In fact, they may not even know it is not the local MySQL.

The Juju OLM has a focus on usability, and this transparent integration of SAAS is a great example of the benefit of model-driven integration for long term usability.