My lessons learned from using GitOps with FluxCD
I've been using GitOps-based approaches to Platform management for years. For my Kubernetes-based Smart Factory projects at MaibornWolff I use a stack based on FluxCD. In this post I want to share some lessons learned from multiple years and projects building and running Kubernetes platforms with FluxCD.
I have already written about Why I use GitOps and how I build multi-cluster GitOps setups with FluxCD. You should read those first to understand my approach. Otherwise, some lessons might make less sense.
In the following sections I want to touch on a few things I've learned working with FluxCD and GitOps in Kubernetes.
Helm is a two-edged sword
Helm is the de-facto standard package manager for Kubernetes. Packages are called Charts and basically consist of template files for manifests (using Go templating) and a values.yaml with default values for the templates. This can make Helm Charts very powerful, as a lot of logic can be packed into the templates. They can also expose lots of configuration parameters, allowing admins to adapt a package to their specific setup.
Although Helm charts are a great way to distribute and install software for Kubernetes, they carry some disadvantages.
Charts can grow very big. As an example, at the time of writing, the values.yaml file for the kube-prometheus-stack has 5417 lines. Granted, it is a complex chart for a complex and powerful piece of software, but it is easy getting lost in this massive values.yaml file and overlook options. It makes it harder to have a complete overview of a chart and increases the risk of something going wrong.
CRD handling is outright bad. Custom Resource Definitions (CRD) are a way to extend the Kubernetes API with custom types and actions (usually with Kubernetes Operators). If I install an operator that has CRDs as a Helm Chart, the official way to do this is to place them in a special crds folder in the chart. Helm will then install them before the other manifests. But Helm will never update them if they change in newer Chart versions. And it will also not delete them on uninstall (which could be considered a safety feature so an unintended uninstall doesn't have a cascading effect).
The alternatives are to place the CRDs with the other manifests, which can lead to problems if the operator starts before the CRD is installed. Or to publish a second Chart with only the CRDs and place the burden on the admin to install the CRDs first.
Helm in itself is not good at detecting changes. It normally only compares its internal state with what a chart defines, not with the actual state of the cluster. This means if I manually edit a resource, Helm will not detect it. FluxCD has a way around that with drift detection, which you should definitely enable for all your HelmReleases to make sure your cluster stays consistent with your repository and no one tries to manually edit things. Which would just lead to problems later down the road.
Another problem with Helm and FluxCD is its rollback handling. You can run this in two modes: If you leave the defaults and a HelmRelease fails to install or upgrade, FluxCD will not try it again and leave it in a half-arsed state. No continuous reconcile, no retries. It will only retry if changes are made to the chart or a manual reconcile is triggered.
You can change this in FluxCD and configure remediation with retries. In this mode FluxCD will either uninstall the chart (if the error happens during installation) or roll back to the previous state and then try again, in theory indefinitely. Not only can this lead to a lot of flip-flopping, it can also cause problems if CRDs are in the mix. More rigorous testing and finely tuned timeouts are a must with complex charts.
Even though the previous paragraphs all paint a rather negative picture of Helm, I still find it to be a powerful and useful tool. If used correctly. As a package manager for installing software it works well, if the charts are kept simple and not overburdened with logic and options.
Explicit templating is better than overlay patches
The alternative to Helm charts is using Kubernetes Kustomize via FluxCD Kustomizations. In contrast to Helm, Kustomize does not use templating. Instead, the approach favored by the FluxCD makers (e.g. in the flux2-multi-tenancy example repo) is the base and overlay approach. It means you write base manifests (e.g. Deployment and Service) and then use overlays to specialize (or customize) the manifests for a cluster. These overlays either add completely new manifests (e.g. a ConfigMap with cluster-specific configuration) or use patches to modify existing ones, e.g. change the resources or adding a securityContext.
A slightly edited example from the FluxCD Kustomization Patches docs that adds a securityContext to a Deployment:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: podinfo
namespace: default
spec:
# ...
patches:
- patch: |
- op: add
path: /spec/template/spec/securityContext
value:
runAsUser: 10000
fsGroup: 1337
- op: add
path: /spec/template/spec/containers/0/securityContext
value:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop:
- ALL
target:
kind: Deployment
name: podinfo
namespace: default
I have found the overlay approach rather hard to read and understand. For one, just by reading the base manifests I have no idea what the final result for a cluster will be. I always have to read the patches as well (and they can change any part of the resource, even delete existing fields) and mentally apply them and calculate the overlay. It also makes it more risky modifying the base, I'm never sure if an existing patch might interfere with my change. And last, I find most patches are not very concise, with a lot of overhead for small changes.
I prefer the variable substitution mechanism that FluxCD provides for Kustomizations. With it, I can see the variables in the base, so directly know this part is specific per cluster. It is also efficient, instead of at least 6 lines for a patch I only need one per variable.
Variable substitution has its limits, it can only handle single values. Entire blocks are not possible. The example above would not be possible that way. So it is less powerful than a templating engine. In these cases when using Kustomizations, patches are sadly the only real way. Unless I want to copy entire manifests. Or use a Helm chart, which carries with it its own overhead and negatives, as I have described in the previous section.
Don't DRY yourself out
In software engineering DRY (Don't repeat yourself) is an important principle to reduce repetition and improve abstraction. In a very simplified manner it means: If you write the same or very similar code multiple times, abstract it away so it exists only once and is used multiple times. It reduces the amount of code and potential bugs among other things.
For GitOps and platform engineering, DRY also holds true. But I have found that there is a middle ground between abstraction and repetition. With GitOps, too much abstraction only increases complexity instead of reducing it. This is mainly due to both Helm and Kustomize being less powerful than full programming languages, but also due to the fact that Kubernetes manifests naturally have a lot of repeating and boilerplate structures.
To reduce complexity and improve readability and therefore understandability, I have found that it is better to accept a certain level of repetition. Better copy a single manifest than to build another entire module just to avoid that.
In one recent project we were deploying a number of PostgreSQL databases using the Zalando Postgres Operator. In that project we distinguish between cloud and on-premise clusters. Both run an overlapping subset of databases and use slightly different configurations. Instead of having just one Kustomization I could have separated out the databases into three (only cloud, only on-premise, both). The configuration differences unfortunately were more complex than could have been handled via simple variable substitutions, so I would have had to use overlay patches. Instead, I opted to have two Kustomizations, one for cloud, one for on-premise, with the manifests for the databases running in both sets of clusters being duplicated between the modules. This leads to repetition, but it makes the setup easier to read and understand in my opinion.
Prefer clear, repeating structures
Even if we try to aim for a platform in which every cluster is (or behaves) the same, there are always small differences. Be it things like cluster-specific names, or differing software versions as updates are slowly rolled out through environments and clusters. In the past I have tried to minimize repetition, building complex abstractions (often using Helm Charts) to have as little as possible to define per cluster.
But I have found that this only creates problems. For one, the abstractions introduce complexity and indirection, making them harder to understand. And if one cluster needs to be different (be it due to a different environment or use case, or temporarily for testing or rolling out a change), the existing abstractions will most likely not support these differences. So we end up extending them, adding more complexity and instability.
The better approach in my experience is to accept some repetition per cluster. It makes it easier to introduce differences per cluster. Making a change for all clusters is more effort this way, but if the structures are the same per cluster, the change is always the same so does not require mental effort (or could even be scripted easily).
This similarity in structures should not just exist between clusters, it should also exist between different parts of the GitOps platform. Your entire platform could consist of multiple layers or multiple teams with their own GitOps repositories, all deploying onto the platform. Having the same structures throughout all repos makes it so much easier to jump between repos and understand the code. At first glance it might make the structure a bit more bloated and complex if we have the same repeating patterns everywhere, but in the end it reduces complexity and mental load because everything looks and behaves the same way.
Or to say it in the words of George Lucas, creator of Star Wars, "It's like Poetry, They Rhyme".
Use small independent units
When using FluxCD with Kubernetes Kustomize there is the big temptation of just pulling all manifests for a cluster together into one big Kustomization. Kustomize makes that easy, we can just list the folders and files we want. But I have found this a bad practice for multiple reasons:
- You always have dependencies between parts of your platform. For example, you will want cert-manager to wait for the kube-prometheus-stack so it can deploy a
ServiceMonitorso metrics are collected. If they are all thrown together, that is impossible to express and will lead to instability if not outright failure. - If there is a problem with one manifest in your
Kustomization, it blocks the entire reconcile loop until that is resolved. - It makes it harder to get an overview and sense of your cluster from a FluxCD perspective, you can't get a good progress indication, because it is all just one big ball.
Therefore, the better way in my experience is to split the platform into small independent modules or units, all with their own FluxCD Kustomization. Following the old Latin saying "divide et impera" lets us conquer GitOps.
This has several advantages:
- FluxCD allows us to express dependencies between
Kustomizations, meaning aKustomizationwill not run until all of its dependencies have successfully reconciled. This also avoids running into walls of errors just because one unit/manifest deep down had a problem. With thedependsOnrelationship, the others will patiently wait. - Every
Kustomizationis independent (sans dependencies) and can run on its own, meaning less blockage if something goes wrong. - If every component is its own
Kustomization, you can easily get an overview over your deployment and cluster just by looking at the Kustomizations.
Unfortunately FluxCD does not allow to express dependsOn relationships between Kustomizations and HelmReleases. To work around this I always "package" a HelmRelease into a Kustomization, so that modules are always Kustomizations. Not only does this unify the structure, it also allows me to always express dependencies between units.
If a Kustomization is configured with wait: true, FluxCD will only declare it successful, if all child resources are ready. This means it will wait for Deployments becoming ready, but also HelmReleases being installed.
Comments save time
Comments are an important aspect of writing code. The prevailing opinion is that comments should be used sparingly, to explain why something is done, and that code should be if possible self-explanatory so that comments are not needed.
For Kubernetes manifests with GitOps I have found that this only partially holds true. After years of trying to understand manifests my colleagues wrote and ones I wrote months ago, the following rules should be applied in regard to comments in GitOps repositories:
- Explain why something was set/configured the way it is, especially if it deviates from the defaults for a package/chart.
- Link to documentation sections, GitHub issues and forum posts as explanation if you add/change options for specific reasons, but also summarize it in one sentence in the comment.
- If parts in different files are related, link them together by having the path to the other file in a comment, with an explanation about the relationship.
- Better explain too much in comments. There is a chance you will have to make changes while half-asleep on a Sunday at 3am because of a production outage. And you will want to hug your previous self for any explaining comments that reduce your cognitive load.
- If you have changes where you are not sure if they are the correct way to go or are experimenting, also document the alternatives. Again, your future self will be grateful during an outage situation.
Don't fear eventual consistency
In distributed computing and especially databases there is a concept called eventual consistency. In simplified terms it means that if you make changes to a distributed system like a database, this change will not necessarily immediately be visible on all parts of the system, but at some point in the future it will. For someone running a system where consistency is key, like the central database of your bank, the "eventual" is deadly. If you wire money, it must be immediately visible and consistent in your account, the balance must be correct, otherwise bad things can happen. But in other scenarios, like if Amazon records what products you searched for, it doesn't matter if one entry is not immediately visible.
Kubernetes with FluxCD works in a way that I like to equate to eventual consistency. The real state in Kubernetes (e.g. number of Pods and their configuration) will not immediately reflect the declared state you have provided, but Kubernetes works to converge the real to the declared state. It might run into temporary problems along the way, maybe because some other resource is not yet available, but eventually it will achieve the desired state.
Many people believe that a GitOps setup with FluxCD needs to be sort of strongly consistent at first try. They will define endless dependencies and fret over how to write the manifests so that everything deploys correctly on the first try.
But in my experience, this is neither necessary, nor desirable. Kubernetes already works in this eventual consistency manner. If a secret a pod wants is not yet there, it will wait and try again. If an image can't be pulled, it will try again. That is completely normal in a complex, dynamic distributed system. So there is no use putting extra effort into getting a FluxCD setup into a "one-shot" deployment. Accept that a complex deployment might take a few tries to get into gear, there is nothing bad about it. Although as stated earlier, some dependsOn definitions can make life easier, but don't overdo it.
Conclusion
All of my above lessons learned can be boiled down to two lessons:
Even though GitOps is slotted into the configuration-as-code category, it does not always pay off to think too much like a software engineer writing code. Software engineering principles and practices can help, but they can also make things worse. GitOps is different than writing software, don't force patterns into it, regardless of if they match or not.
There are a lot of tools out there for GitOps. None of them are perfect, none of them can do everything. Be mindful of their limitations and either work with or around them, or use a different tool for a different use case. Here I like to go with the legendary Montgomery Scott from Star Trek: "The right tool for the right job".