Untaint controller

Using the untaint controller

If Kubernetes can start pods on a node before the istio-cni agent has configured node networking, then those pods will not correctly have traffic redirection configured. This can lead to a short period where traffic is not controlled by Istio and can bypass any configured policy.

In order to avoid this race condition, you can take advantage of node taints. New pods will not be scheduled until the taint is removed by Istio’s untaint controller.

Configure Istio

Install Istio with the following values to enable the untaint controller:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    pilot:
      taint:
        enabled: true
      env:
        PILOT_ENABLE_NODE_UNTAINT_CONTROLLERS: "true"

Certain environments may require istio-cni to be installed in a different namespace to istiod. You can specify the namespace to watch by setting the pilot.taint.namespace value:

spec:
  values:
    pilot:
      taint:
        enabled: true
        namespace: kube-system

Creating your nodes

Configure your node deployment (node pool, auto-scaling group, CI template etc) to add the cni.istio.io/not-ready taint to nodes when they are created. This is sometimes called a startup taint. For example, when using Karpenter:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        billing-team: my-team
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      startupTaints:
        - key: cni.istio.io/not-ready
          effect: NoSchedule

In Google Kubernetes Engine, you can specify node taints with the --node-taints flag on cluster or node pool creation.

This taint will mean that no pods can be scheduled onto the node unless they tolerate the cni.istio.io/not-ready taint. (System add-ons, such as the istio-cni agent itself, are usually configured to tolerate all taints.) When the agent starts, the untaint controller will remove this taint from the nodes, and then pods can be scheduled,

Debugging the untaint controller

The untaint controller runs as part of istiod. You can see the status of the controller by connecting to a debug page on the istiod instance:

$ kubectl port-forward deployment/istiod -n istio-system 8080:8080

Navigate to http://localhost:8080/debug/krtz?pretty.

You can see that the untaint controller is running, and look at its state; specifically, node-untaint/nodes shows the status of nodes, and node-untaint/ready-cni-nodes shows the CNI agents which are ready.

At the default info log level, you can see the untaint controller start:

$ kubectl logs -f deployment/istiod -n istio-system
2025-07-18T01:23:00.952072Z	info	krt	node-untaint/nodes synced	owner=node-untaint/nodes
2025-07-18T01:23:00.952085Z	info	krt	node-untaint/pods synced	owner=node-untaint/pods
2025-07-18T01:23:00.954215Z	info	krt	node-untaint/cni-pods synced	owner=node-untaint/cni-pods
2025-07-18T01:23:00.956745Z	info	controllers	starting	controller=untaint nodes
2025-07-18T01:23:00.956785Z	info	krt	node-untaint/ready-cni-nodes synced	owner=node-untaint/ready-cni-nodes

You can set the untaint controller to debug to see events as nodes are created:

$ istioctl admin log --level untaint:debug

If you add the taint to a node, the untaint controller will notice and remove it:

$ kubectl taint nodes ambient-worker2 cni.istio.io/not-ready:NoSchedule
2025-07-18T01:35:11.698525Z	debug	untaint	adding node to queue event: ambient-worker2
2025-07-18T01:35:11.698838Z	debug	untaint	reconciling node ambient-worker2
2025-07-18T01:35:11.698855Z	debug	untaint	removing readiness taint from node ambient-worker2
2025-07-18T01:35:11.705994Z	debug	untaint	removed readiness taint from node ambient-worker2

Restore the log level to default:

$ istioctl admin log --level untaint:info