HTTP observability in ztunnel
Gloo Mesh includes an enhanced version of ztunnel which is able to provide HTTP observability directly. This ensures telemetry is available even when workloads are not using waypoint proxies. If you are deploying a waypoint exclusively to get HTTP observability, it is recommended to instead used ztunnel’s metrics, which come with a substantially reduced overhead.
Configuration
Depending on how you installed Gloo Mesh, you may already have HTTP observability enabled. To check the current configuration:
$ istioctl ztunnel-config all -ojson | jq .config.l7Config
{
"access_log": {
"enabled": true,
"skip_connection_log": false
},
"enabled": true,
"metrics": {
"enabled": true
},
"tracing": {
"enabled": false,
"otlp_endpoint": "http://gloo-telemetry-collector.gloo-mesh:4317"
}
}
This is also logged during ztunnel startup.
If the output from the above is null
(i.e. you don’t see any l7Config
entries) ensure you are using Gloo Mesh images.
To enable access logs and metrics with default settings:
$ kubectl set env ds/ztunnel -n istio-system L7_ENABLED=true
To customize installation entirely, you can use the following values during installation:
Value | Description |
---|---|
l7Telemetry.enabled |
Globally enable or disable HTTP telemetry. Both this option and the individual telemetry types must be enabled to be enabled |
l7Telemetry.metrics.enabled |
Enables or disables HTTP metrics |
l7Telemetry.accessLog.enabled |
Enables or disables HTTP access logs |
l7Telemetry.accessLog.skipConnectionLog |
If enabled, connections that are found to only have HTTP requests will log the TCP connection log at the ‘debug’ level (which is typically disabled, so effectively it does not log it). This is particularly useful when dealing with short-lived connections, where logging both TCP connections and HTTP requests causes excessive noise. Note: if the connection does not carry HTTP, the TCP connection event will always be logged. If disabled (default), both HTTP requests and TCP connections will be logged. |
l7Telemetry.distributedTracing.enabled |
Enables or disables HTTP tracing |
l7Telemetry.distributedTracing.otlpEndpoint |
OTLP endpoint to send traces to. For example http://opentelemetry-collector:4317 |
Logs
HTTP access logs have mostly same format as TCP access logs, with a few variations.
-
HTTP logs are logged per HTTP request, while TCP logs are per connection.
-
While TCP logs have the
bytes_sent
andbytes_recv
attributes, HTTP logs havemethod
,path
,protocol
,response_code
,host
, anduser_agent
. For example:method=GET path="/productpage" protocol=HTTP1 response_code=200 host="productpage:9080" user_agent="curl/8.10.1"
-
In addition to logging HTTP requests, TLS information will also be added for TLS requests. This adds the
tls.sni
andtls.alpn
attributes. For example:tls.sni=example.com tls.alpn=h2
.
Metrics
In addition to TCP level metrics, ztunnel will also provide the following HTTP metrics:
istio_requests_total
: Indicates the total count of HTTP requests. Theresponse_code
label distinguish the result.istio_request_duration_milliseconds
: Indicates the distribution of the duration of each HTTP request.istio_request_bytes
: Indicates the distribution of HTTP request sizes.istio_response_bytes
: Indicates the distribution of HTTP response sizes.
Performance and safety
ztunnel telemetry is specifically designed to be highly performant and safe.
Capturing telemetry does not modify requests at all. If a request cannot be parsed (whether due to being invalid HTTP, bugs, etc), the connection is not impacted (telemetry collection is, however, disabled for the remainder of the connection).
Requests processing occurs after requests are forwarded (ensuring this processing time is outside the critical path) and typically takes ~100ns per request. Even under heavy load, this has been observed to have a less than 1% overhead in request latency and throughput.