Prepare for production usage
When you’re ready to take ambient mesh to production, there are a few things we recommend you should monitor.
Configuration changes for production
Resource allocation & scaling
The ztunnel is a DaemonSet, which runs one per node. While ztunnel scales with the size of the cluster (in terms of pods and services) as well as traffic rates (connections, requests, and throughput), it is designed to have a small footprint and to handle large scale clusters out of the box. Typically, you should not expect to need any configuration changes unless:
- The cluster has over 100,000 pods or 20,000 services
- An individual node is serving over 20,000 connections, 100,000 requests per second, or 5Gb/s of traffic
Note: these are not limits of ztunnel — which can scale beyond these — and are merely the thresholds at which point we recommend analyzing usage and vertically scaling the CPU/memory reservations to match observed usage.
Ztunnel by default runs with a small CPU and memory reservation, and no limits. However, Ztunnel will run with a limited number of threads, implicitly limiting its maximum CPU usage. You can inspect the current setting:
$ istioctl ztunnel-config all -ojson | jq .config.numWorkerThreads
2We recommend the following:
- We do not recommend setting CPU limits on the Pod itself, and instead tuning the number of threads. This ensures the system will not excessively throttle Ztunnel. If you have a requirement to set a limit, set it to the same number of threads. Do not set the CPU limit to less than the number of threads, which will result in substantial performance degradation.
- To tune the thread allocation, set the environment variable
ZTUNNEL_WORKER_THREADS. This allows a fixed number like8(for 8 threads) or a percent like25%(for 1 thread for every 4 cores on the node, useful in environments with mixed nodes).
- To tune the thread allocation, set the environment variable
- We do not recommend setting a memory limit. If you have a requirement to set a limit, set it as high as you possibly can. While Ztunnel is highly optimized for a low memory footprint, a memory limit has the risk that if this threshold is exceeded (possibly, at the same time across the cluster), all Ztunnel’s could be killed causing a cluster wide outage.
Upgrading
See the upgrading page for how to safely upgrade Istio when a new version is released.
Considerations for users already familiar with Istio
For people familiar with Istio in sidecar mode, there are some considerations when using it in ambient mode.
- Layer 7 metrics are only collected for workloads with a waypoint deployed, or when using HTTP observability in Gloo Mesh.
- Authorization policies with Layer 7 conditions will only work when bound to a waypoint.
Multiple clusters
As of v1.26.2, Istio in ambient mode only supports running in a single Kubernetes cluster.
VMs
Istio does not yet support adding external (VM) workloads to an ambient mesh.
Envoy extensibility
Waypoints can be extended with WebAssembly, but the EnvoyFilter extension point is not supported.