CVE in the heart of Kubernetes apiserver
What’s the big fuss over the latest Kubernetes apiserver vulnerability?
Early on Monday December 3rd, a boulder splashed into the placidly silent Kubernetes security channels. A potentially high severity authentication bypass was disclosed with scant explanation the same day that K8s version 1.13 went golden master. For Kubernetes administrators with PTSD from 2014’s HeartBleed, the CVE blast and its 37-line fix triggered palpitations in anticipation of sleepless patchfests to come.
In this post, we’ll explain the “verify backend upgrade connection” commit and the bug’s actual impact. We have also whipped up a proof of concept of the vulnerability, which we could not find elsewhere, in case you want to see if your clusters are affected.
Explanation of CVE-2018-1002105 root cause
Kubernetes apiserver has the ability to proxy http requests to other kubernetes services, allowing for the K8s API itself to become extensible through its Aggregation Layer. This is the same facility that allows RBAC or namespace-constrained users to magically
kubectl attach, and
kubectl port-forward directly from their laptops to pods running within live clusters.
Kubernetes cluster-local authentication model is largely based around full mutual TLS authentication (mTLS), and the various “microservice” components that make up a live K8s cluster use signed certificates to trust each other. When a “master” apiserver process establishes a connection to an “aggregate” layer, the master uses its certificate to authenticate the connection with this peer. The lower-level peer, aka “aggregate layer”, verifies the certificate and trusts that the apiserver has validated required credentials on the other side of the proxied connection.
Where the palpitations start is that the kubernetes API isn’t just basic HTTPS. To support remote administrative tasks, K8s also allows upgrading apiserver connections to full, live, end-to-end HTTP/1.1 websockets.
The CVE-2018-1002105 vulnerability comes from the way this websocket upgrade was handled: if the request contained the
Connection: Upgrade http header, the master apiserver would forward the request and bridge the live socket to the aggregate. The problem occurs in the event that the websocket connection fails to complete. Prior to the fix, the apiserver could be tricked into assuming the pass-thru connection successfully landed even when it had triggered an error code. From that “half-open” and authenticated websocket state, the connected client could send follow-up HTTP requests to the aggregated endpoint, essentially masquerading itself as the master apiserver.
Because the apiserver has bridged the connections between client and server, this allowed the client to continue to use the connection to make requests, bypassing all security and audit controls on the master. Finding the exploit in logs would be extremely difficult.
How was CVE-2018-1002105 fixed?
The patch causes Kubernetes apiserver to check the result of the proxied connection attempt. If the “aggregated” server responds to the “upgrade” request by successfully switching protocols to the websocket connection, apiserver sends the response and bridges the connection. If there is any other result, the apiserver sends the response from the aggregated server without bridging the connection, preventing the authentication bypass.
What is the potential impact?
There’s a good reason this only misses a perfect ten on the Common Vulnerability Scoring System (CVSS) by two tenths of a point – the impact is quite severe given not-uncommon runtime conditions and deployment choices on many production clusters.
Anonymous access to the apiserver is enabled by default in upstream so that basic access to the cluster, such as a load checking health and discovering API endpoints can be completed without requiring authentication. And while the maintainer of your certified Kubernetes service or distribution may have disabled anonymous access or limited anonymous access using RBAC, out-of-the-box upstream components that add K8s API functionality via “aggregates” (eg the scenario of anonymous HTTP requests forwarded amongst distinct components) are extremely well used.
In short, while certain hardened deployments require luck to fully exploit, CVE-2018-1002105 could only be worse if it gave root to all of your machines.
I’ve disabled anonymous Kubernetes API access, am I still affected?
Disabling anonymous access only restricts the bug to be exploitable by authenticated users. If any K8s API user (eg, any kubectl user) is allowed to exec into a pod – even restricted to pods within their “namespace” – they could exploit CVE-2018-1002105 to exec into any pod handled by the same kubelet (server).
Once they’re within a pod that they don’t normally have access to, they can potentially pivot, using that pod’s access to do other things within the cluster, such as pivot into your cluster control plane, and then use those controller’s credentials to deploy additional privileged pods, change configuration, etc.
In simplest terms, CVE-2018-1002105 allows some level of privilege escalation within the cluster, which depending on somewhat random runtime factors could lead to total control of the cluster.
How do you tell if you are affected?
If you have live kubeconfig (kubectl) credentials, you can simply download and run a CVE-2018-1002105 vulnerability checker created by one of our Kubernetes engineers.
All current mainline Kubernetes releases (latest v1.10.x, 1.11.x, 1.12.x & 1.13.0) now include the fix, and most vendors with long-term support (LTS) policies have cherry-picked the fix into older K8s branches no longer supported by the community. For Gravity users, we published updates to our downloads page the day the exploit was released. You can download the latest version here.
See our Kubernetes Release Cycles post if you’re curious why a K8s release from less than a year ago is unsupported by the primary Kubernetes maintainers.
Teams that leverage managed Kubernetes offerings such as GCP GKE, AWS AKS or Azure EKS would likely have magically had their apiservers patched for them, albeit with a slight delay or subtly different remediation method depending on the service provider. (Note that the linked vulnerability checker may throw false-positives if it encounters a configuration we haven’t tested – pull requests are welcome!)
What can be done if you’re vulnerable and can’t upgrade your cluster?
Since the initial drop, @liggitt and other leaders in the Kubernetes community have added lots of nitty detail to the issue at https://github.com/kubernetes/kubernetes/issues/71411. The potential mitigations revolve around disabling anonymous API access, then either removing features or downgrading access permissions for authenticated users. None of the work-arounds look particularly appetizing if you have a diverse set of end-users / workloads: patching and a rolling restart of your apiservers is pretty much required.
Kubernetes is generally a tribute to well designed open-source community driven software, using current security best practices and modern choices around security. Kubernetes is also a large project, which leaves plenty of surface area for exploits to be lingering under the surface. For a project that has grown and expanded so rapidly over the past 4 years, we find it impressive more severe vulnerabilities haven’t been found.
- The Horrors of Upgrading Etcd Beneath Kubernetes
- How to run PostgreSQL on Kubernetes
- Migrating To Kubernetes