Observing the ambient mesh

Observing the ambient mesh

July 8, 2025·Craig Box
Craig Box

Every component in a service mesh generates telemetry data that can be used for the observability of your environment. In sidecar mode, you had full Layer 7 processing at every step, which gave you the opportunity for full observability at Layer 7 (by way of configurable generation of metrics, traces and logs).

Ambient mode, by virtue of having Layer 7 processing be opt-in through waypoints, changes the observability picture a little. You should use a waypoint for any workloads where HTTP observability is required, but be aware that you only get telemetry from the server side of the connection.

In brief:

  • Most traffic into a mesh goes through an ingress gateway, which does Layer 7 processing, and thus generates metrics, logs and can create trace headers
  • ztunnels provide Layer 4 metrics and logging
  • Waypoints can provide Layer 7 metrics, but are only on the receiving side of east-west traffic
  • Telemetry generation is still configured using the Telemetry API, though the attachment method changes
  • Solo offers an enhanced ztunnel which includes HTTP observability features as part of Gloo Mesh

Let’s jump in.

Differences in the data

Metrics

The Prometheus-format metrics vary between sidecar mode and ambient mode, due to the different proxies processing traffic at different layers.

In sidecar mode

A HTTP request from pod curl to pod httpbin would result in the following (abbreviated) metrics from the source and destination sidecars:

istio_requests_total { 
  reporter="source" 
  source_workload="curl"
  destination_workload="httpbin"
  response_code="200"
} 1
istio_requests_total { 
  reporter="destination"
  source_workload="curl"
  destination_workload="httpbin"
  response_code="200"
} 1

The reporter label tells you which sidecar generated the metric.

In ambient mode, the generated metrics differ depending on whether or not a waypoint is used.

Ambient mode, no waypoint

Both ztunnels generate Istio standard TCP metrics, because Layer 7 processing is not performed and so request-level data is not available.

istio_tcp_connections_opened_total { 
  reporter="source" 
  source_workload="curl"
  destination_workload="httpbin"
} 1
istio_tcp_connections_opened_total { 
  reporter="destination"
  source_workload="curl"
  destination_workload="httpbin"
} 1

See below to learn more about the enhanced Gloo Mesh ztunnel, which includes Layer 7 telemetry.

Ambient mode, waypoint used

When a workload is configured to use a waypoint, the source and destination labels of the TCP metrics change to reflect this. Istio’s HTTP metrics are emitted by the waypoint, with a reporter label of waypoint.

istio_tcp_connections_opened_total { 
  reporter="source" 
  source_workload="curl"
  destination_workload="waypoint"
} 1
istio_requests_total { 
  reporter="waypoint"
  source_workload="curl"
  destination_workload="httpbin"
  response_code="200"
} 1
istio_tcp_connections_opened_total { 
  reporter="destination"
  source_workload="waypoint"
  destination_workload="httpbin"
} 1

Current versions of Istio-aware telemetry tools, such as Kiali and the Istio Grafana dashboards, have been updated to be able to correctly parse these metrics. If you use your own dashboards or PromQL queries, you will have to update them yourself.

Traces

As in sidecar mode, using distributed tracing requires mesh-wide definition of a tracing provider, which will then be referenced by a Telemetry API object.

Trace spans are reported by gateways and waypoints. The first span relating to a trace is generally created by a gateway when a request enters the mesh, with spans created for a percentage of requests based on the configured sampling rate.

For requests that are generated inside the mesh, a single span will be reported by a configured waypoint.

This is one of the trade-offs of the ambient model. With a sidecar as both a client and a server, you end up with two spans in a trace. This allows you to measure the time taken for a given request. When a client calls a server in ambient mode, the ztunnels, which only operate at Layer 4, are unable to generate a span. You only get tracing if you’re using a waypoint (or if you’re using Gloo Mesh) and it’s not as in-depth as in sidecar mode.

Initiating a trace at a gateway works well for traffic generated from outside the mesh, which is a large part of what people want to profile. For requests generated by your own services — entirely inside the mesh — you could choose to use an OpenTelemetry library to augment what Istio can report by generating your own spans in your services.

As of Istio 1.26, the node_id tag of a span in ambient mode is the waypoint that reports the span, not the destination pod. This means you cannot easily correlate a trace with a call graph, such as in Kiali. It will change in future releases.

Logs

ztunnel generates TCP access logs in a key-value format, logging source and destination attributes as well as bytes sent, bytes received and destination:

src.addr=10.244.1.35:52196
src.workload="bookinfo-gateway-istio-574fdf9755-59kqb"
src.namespace="default"
src.identity="spiffe://cluster.local/ns/default/sa/bookinfo-gateway-istio"
dst.addr=10.244.1.44:15008
dst.hbone_addr=10.244.1.44:9080
dst.service="productpage-v1.default.svc.cluster.local"
dst.workload="productpage-v1-c5b7f7dbc-bd876"
dst.namespace="default"
dst.identity="spiffe://cluster.local/ns/default/sa/bookinfo-productpage"
direction="inbound"
bytes_sent=9618
bytes_recv=959
duration="14ms"

Logs are enabled for connection completion by default, and you can control log output with environment variables.

Gateway and waypoint access logs are off by default, and can be enabled using Istio’s Telemetry API as you would for sidecars. The Telemetry API allows targeting by entire mesh, by namespace, or by workload selector/targetRefs. The former two policies will work in ambient mode the same as in sidecar mode. If you have a configuration at the most granular level be sure to update the policy to use targetRefs instead of a workload selector.

Configuration changes required

As with all our previous posts, the attachment model for the Telemetry API changes when using Gateway API objects such as waypoints. Instead of targeting sidecars with a workload selector, your Telemetry policies must use the targetRefs field to attach to one or more services, and will be bound to the corresponding waypoint.

ztunnel telemetry cannot currently be customized with the Telemetry API.

Adding HTTP observability to ztunnel

People who don’t require any Layer 7 traffic management functionality are able to benefit from the performance gains of running only in the secure overlay layer provided by ztunnel.

As an alternative to having to use waypoints solely to provide HTTP observability, Solo has built a read-only HTTP telemetry system into the Gloo Mesh version of ztunnel. It is able to generate metrics, logs, and trace spans at Layer 7, outside the critical path of requests. HTTP telemetry in ztunnel has been observed to have a less than 1% overhead in request latency and throughput, even under heavy load — substantially less overhead than a hop through a waypoint.

Learn more about the features, or contact Solo.io to get started.

Changing your dashboards

The upstream Istio dashboards have been updated for ambient mode. If you have built your own dashboards, you should check if any changes are required. The most obvious thing to look for is anywhere you hard-code the values for the reporter labels, because as mentioned above they change when using waypoints.

The dashboard collection also includes a ztunnel dashboard, which you should add when you move to ambient mesh.

Last updated on