Production readiness

Prepare for production usage

When you’re ready to take ambient mesh to production, there are a few things we recommend you should monitor.

Configuration changes for production

Resource allocation & scaling

The ztunnel is a DaemonSet, which runs one per node. While ztunnel scales with the size of the cluster (in terms of pods and services) as well as traffic rates (connections, requests, and throughput), it is designed to have a small footprint and to handle large scale clusters out of the box. Typically, you should not expect to need any configuration changes unless:

  • The cluster has over 100,000 pods or 20,000 services
  • An individual node is serving over 20,000 connections, 100,000 requests per second, or 5Gb/s of traffic

Note: these are not limits of ztunnel — which can scale beyond these — and are merely the thresholds at which point we recommend analyzing usage and vertically scaling the CPU/memory reservations to match observed usage.

Ztunnel by default runs with a small CPU and memory reservation, and no limits. However, Ztunnel will run with a limited number of threads, implicitly limiting its maximum CPU usage. You can inspect the current setting:

$ istioctl ztunnel-config all -ojson | jq .config.numWorkerThreads
2

We recommend the following:

  • We do not recommend setting CPU limits on the Pod itself, and instead tuning the number of threads. This ensures the system will not excessively throttle Ztunnel. If you have a requirement to set a limit, set it to the same number of threads. Do not set the CPU limit to less than the number of threads, which will result in substantial performance degradation.
    • To tune the thread allocation, set the environment variable ZTUNNEL_WORKER_THREADS. This allows a fixed number like 8 (for 8 threads) or a percent like 25% (for 1 thread for every 4 cores on the node, useful in environments with mixed nodes).
  • We do not recommend setting a memory limit. If you have a requirement to set a limit, set it as high as you possibly can. While Ztunnel is highly optimized for a low memory footprint, a memory limit has the risk that if this threshold is exceeded (possibly, at the same time across the cluster), all Ztunnel’s could be killed causing a cluster wide outage.
In general, resource limits are designed to protect co-located applications from starving each other of resources. However, since Ztunnel is a shared resource for all applications on the node, throttling (from CPU limits) or killing (from memory limits) will have the opposite effect and hurt the networking of each application on the node.

Upgrading

See the upgrading page for how to safely upgrade Istio when a new version is released.

Considerations for users already familiar with Istio

For people familiar with Istio in sidecar mode, there are some considerations when using it in ambient mode.

Multiple clusters

As of v1.26.2, Istio in ambient mode only supports running in a single Kubernetes cluster.

Multi-cluster support is available as a feature of Gloo Mesh, an enterprise distribution of ambient mesh.

VMs

Istio does not yet support adding external (VM) workloads to an ambient mesh.

VM workloads are available as a feature of Gloo Mesh, an enterprise distribution of ambient mesh.

Envoy extensibility

Waypoints can be extended with WebAssembly, but the EnvoyFilter extension point is not supported.

Extension via EnvoyFilter is available as a feature of Gloo Mesh, an enterprise distribution of ambient mesh.