Request timeouts
When a client encounters latency in an upstream microservice, it can wait indefinitely, causing itself to become unavailable, and thus propagating failure throughout the network. This problem can be mitigated with request timeouts, where the client severs the connection after a set period.
In this guide you will learn how to set up request timeouts in your ambient mesh.
Prerequisites
Set up a cluster
You should have a running Kubernetes cluster with Istio installed in ambient mode. Ensure your default namespace is added to the ambient mesh:
$ kubectl label ns default istio.io/dataplane-mode=ambient
Deploy a waypoint
Request timeouts are a Layer 7 feature, applied to HTTP requests, and therefore require the use of waypoints.
If you don’t already have a waypoint installed for the default
namespace, install one:
$ istioctl waypoint apply -n default --enroll-namespace --wait
For more information on using waypoints, see Configuring waypoint proxies.
Configure request timeouts in ambient mesh
Deploy sample services
To test request timeouts, you will deploy a service, httpbin
, and a client, curl
.
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/httpbin/httpbin.yaml
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/curl/curl.yaml
Test latency
The httpbin
application has an /delay/{delay} endpoint
, which simulates a response delay of the length requested.
To simulate an endpoint with a 2-second delay:
$ kubectl exec deploy/curl -- curl httpbin:8000/delay/2
In the output, confirm that “Time Total” and “Time Spent” show a value of 2 seconds (0:00:02):
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 430 100 430 0 0 213 0 0:00:02 0:00:02 --:--:-- 213
Configure a request timeout
Configure calls to httpbin
with a 500ms timeout:
$ kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: httpbin
spec:
parentRefs:
- group: ""
kind: Service
name: httpbin
port: 8000
rules:
- backendRefs:
- name: httpbin
port: 8000
timeouts:
request: 500ms
EOF
backendRequest
timeout duration. Consult the documentation for the specific meaning of each timeout field.
Verify the timeout
Make another call to httpbin
:
$ kubectl exec deploy/curl -- curl -v httpbin:8000/delay/2
Observe the 504 “Gateway Timeout” response in the output:
* IPv6: (none)
* IPv4: 10.43.131.250
* Trying 10.43.131.250:8000...
* Connected to httpbin (10.43.131.250) port 8000
* using HTTP/1.x
> GET /delay/2 HTTP/1.1
> Host: httpbin:8000
> User-Agent: curl/8.11.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 504 Gateway Timeout
< content-length: 24
< content-type: text/plain
< date: Wed, 04 Dec 2024 17:47:01 GMT
< server: istio-envoy
< x-envoy-decorator-operation: httpbin.default.svc.cluster.local:8000/*
<
* Connection #0 to host httpbin left intact
upstream request timeout
You can confirm the timeout using the time command:
$ time kubectl exec deploy/curl -- curl -v httpbin:8000/delay/2
The executed time will be slightly above the 500ms timeout value set.
Executed in 618.05 millis...
Clean up
Delete the HTTPRoute:
$ kubectl delete httproute httpbin
Deprovision the sample applications:
$ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/httpbin/httpbin.yaml
$ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.24/samples/curl/curl.yaml
Tips on configuring timeouts
- Is it important to first remove existing timeouts from my applications, or can I just overlay mesh timeouts on top of them as they are?
- It is simpler to remove the logic from your applications, and consistently maintain all your resilience configuration in the mesh layer. That said, leaving them there doesn't necesarily do harm. It's important to reason about which timeout "wins." For example, an application-level timeout of 100ms will preempt a 200ms timeout configured through Istio.
- What should I set my timeout to?
-
In Understanding Distributed Systems, author Robert Vitillo suggests setting timeouts to the 99.9% percentile latency of a service. That is, if the client is waiting longer than the time it normally takes for 99.9% of requests to respond, then sever the connection.
This is based on an acceptable false timeout rate of 0.1% of requests, where you are willing to accept 1 in every 1000 requests will timeout erroneously.
You can use the observability features of ambient mesh to determine this latency number.