Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
11 min read
Share
One of the Istio service mesh's most popular and robust features is its advanced observability. Because all service-to-service communication is routed through Envoy proxies, and Istio's control plane is able to gather logs and metrics from these proxies, the service mesh can provide us with deep insights about the state of the network and the behavior of services. This provides operators with unique ways of troubleshooting, managing, and optimizing their services, without imposing any additional burdens on application developers.
Operators thus gain a deep understanding of how monitored services interact in both inbound and outbound directions. These metrics provide a diverse array of information, including overall volume of traffic, error rates and the response times for requests.We see the service mesh as a key component of every modern Cloud Native stack. To make this a reality, we are on a mission to make Istio simple to use and manage for everyone. We have built a product called Backyards (now Cisco Service Mesh Manager), the Banzai Cloud operationalized and automated service mesh, which makes setting up and operating an Istio-based mesh a cinch. Backyards provides unmatched out of the box observability and an extensive set of tooling such as:
# TYPE envoy_cluster_internal_upstream_rq_200 counter
envoy_cluster_internal_upstream_rq_200{cluster_name="xds-grpc"} 2
# TYPE envoy_cluster_upstream_rq_200 counter
envoy_cluster_upstream_rq_200{cluster_name="xds-grpc"} 2
# TYPE envoy_cluster_upstream_rq_completed counter
envoy_cluster_upstream_rq_completed{cluster_name="xds-grpc"} 3
# TYPE envoy_cluster_internal_upstream_rq_503 counter
envoy_cluster_internal_upstream_rq_503{cluster_name="xds-grpc"} 1
# TYPE envoy_cluster_upstream_cx_rx_bytes_total counter
envoy_cluster_upstream_cx_rx_bytes_total{cluster_name="xds-grpc"} 2056154
# TYPE envoy_server_memory_allocated gauge
envoy_server_memory_allocated{} 15853480
# TYPE istio_requests_total counter
istio_requests_total{
connection_security_policy="mutual_tls",
destination_app="analytics",
destination_principal="cluster.local/ns/backyards-demo/sa/default",
destination_service="analytics.backyards-demo.svc.cluster.local",
destination_service_name="analytics",
destination_service_namespace="backyards-demo",
destination_version="v1",
destination_workload="analytics-v1",
destination_workload_namespace="backyards-demo",
permissive_response_code="none",
permissive_response_policyid="none",
reporter="destination",
request_protocol="http",
response_code="200",
response_flags="-",
source_app="bookings",
source_principal="cluster.local/ns/backyards-demo/sa/default",
source_version="v1",
source_workload="bookings-v1",
source_workload_namespace="backyards-demo"
} 1855
The Envoy sidecars call Mixer after each request to report telemetry, and Mixer provides a Prometheus metrics endpoint to expose collected metrics, thus making them available for scraping. The proxies send data about the source and destination side of the request, most importantly the unique ID of the source and destination workloads (essentially a unique Pod ID in a K8s environment) to Mixer in each report, and it is the responsibility of Mixer to get additional metadata from K8s and expose the metrics on a specific endpoint for Prometheus to scrape. Although the Envoy sidecars buffer the outgoing telemetry requests, that architecture generated significant resource consumptions in larger environments. An active connection was necessary between every proxy and Mixer. That obviously caused higher CPU and memory consumption in the proxies, and subsequently caused higher latencies as well.If you're a history buff, you might enjoy taking a look at our detailed blog post, [Istio telemetry with > > Mixer]({{< relref "/blog/istio-telemetry.md" >}}).
According to the Istio documentation, the new telemetry system cuts latency in half - 90th percentile latency has been reduced from 7ms to 3.3 ms. Not only that, but the elimination of Mixer has reduced total CPU consumption by 50% to 0.55 vCPUs per 1,000 requests per second.
In-proxy service-level metrics in Telemetry V2 are provided by two custom plugins,In another post we'll write more about Envoy WASM plugins in general, and how we use this new extensibility option in Supertubes to provide mTLS based RBAC for Kafka with Istio.
metadata-exchange
and stats
.
By default, in Istio 1.5, Telemetry V2 is enabled as compiled in Istio proxy filters, mainly for performance reasons. The same filters are also compiled to WebAssembly (WASM) modules and shipped with Istio proxy. Performance will be continuously improved in forthcoming releases.
envoy.wasm.metadata_exchange.upstream
, envoy.wasm.metadata_exchange.downstream
) in the request/response that contains the metadata attributes of the other side.
For generic TCP traffic the metadata exchange uses ALPN-based tunneling and a prefix based protocol. A new protocol istio-peer-exchange
is defined, which is advertised and prioritized by the client and the server sidecars in the mesh. ALPN negotiation resolves the protocol to istio-peer-exchange for connections between Istio enabled proxies, but not between an Istio enabled proxy and any client.
Name | Description |
---|---|
istio_requests_total | This is a COUNTER incremented for every request handled by an Istio proxy. |
istio_request_duration_milliseconds | This is a DISTRIBUTION which measures the duration of requests. |
istio_request_bytes | This is a DISTRIBUTION which measures HTTP request body sizes. |
istio_response_bytes | This is a DISTRIBUTION which measures HTTP response body sizes. |
Name | Description |
---|---|
istio_tcp_sent_bytes_total | This is a COUNTER which measures the size of total bytes sent during response in case of a TCP connection. |
istio_tcp_received_bytes_total | This is a COUNTER which measures the size of total bytes received during request in case of a TCP connection. |
istio_tcp_connections_opened_total | This is a COUNTER incremented for every opened connection. |
istio_tcp_connections_closed_total | This is a COUNTER incremented for every closed connection. |
reporter: conditional((context.reporter.kind | "inbound") == "outbound", "source", "destination")
source_workload: source.workload.name | "unknown"
source_workload_namespace: source.workload.namespace | "unknown"
source_principal: source.principal | "unknown"
source_app: source.labels["app"] | "unknown"
source_version: source.labels["version"] | "unknown"
destination_workload: destination.workload.name | "unknown"
destination_workload_namespace: destination.workload.namespace | "unknown"
destination_principal: destination.principal | "unknown"
destination_app: destination.labels["app"] | "unknown"
destination_version: destination.labels["version"] | "unknown"
destination_service: destination.service.host | "unknown"
destination_service_name: destination.service.name | "unknown"
destination_service_namespace: destination.service.namespace | "unknown"
request_protocol: api.protocol | context.protocol | "unknown"
response_code: response.code | 200
connection_security_policy: conditional((context.reporter.kind | "inbound") == "outbound", "unknown", conditional(connection.mtls | false, "mutual_tls", "none"))
response_flags: context.proxy_error_code | "-"
source_canonical_service
source_canonical_revision
destination_canonical_service
destination_canonical_revision
You can find more info about the labels in Istio docsThe stats plugin in Istio 1.5 not only includes standard metrics, but experimental support for modifying them. Be aware that the API to configure the metrics will be changed in Istio 1.6, due to the new extensions API design.
Mixer-based telemetry also lacks cluster information, which is why Backyards (now Cisco Service Mesh Manager) always had its own Istio distribution. As mentioned earlier, there aren't really any ways of extending the metrics Telemetry V2 provides, but with Backyards (now Cisco Service Mesh Manager) we can pre-configure proxies to hold cluster information in their node metadata, which is then propagated to metrics.The following metrics are generated for traffic between the
catalog
service on cluster host
and the movies
service on cluster peer
. Notice the source_cluster_id
and destination_cluster_id
labels.
istio_requests_total{
connection_security_policy="mutual_tls",
destination_app="movies",
destination_canonical_revision="v2",
destination_canonical_service="movies",
destination_cluster_id="peer",
destination_principal="spiffe://cluster.local/ns/backyards-demo/sa/default",
destination_service="movies.backyards-demo.svc.cluster.local",
destination_service_name="movies",
destination_service_namespace="backyards-demo",
destination_version="v2",
destination_workload="movies-v2",
destination_workload_namespace="backyards-demo",
grpc_response_status="0",
instance="10.20.1.222:15090",
job="envoy-stats",
namespace="backyards-demo",
pod_name="movies-v2-85bdf95c7d-89klz",
pod_template_hash="85bdf95c7d",
reporter="destination",
request_protocol="grpc",
response_code="200",
response_flags="-",
security_istio_io_tlsMode="istio",
service_istio_io_canonical_name="movies",
service_istio_io_canonical_revision="v2",
source_app="catalog",
source_canonical_revision="v1",
source_canonical_service="catalog",
source_cluster_id="host",
source_principal="spiffe://cluster.local/ns/backyards-demo/sa/default",
source_version="v1",
source_workload="catalog-v1",
source_workload_namespace="backyards-demo",
version="v2"
} 279
Those additions are essential for Backyards to be able to provide multi-cluster service graphs like this:
Want to know more? Get in touch with us, or delve into the details of the latest release. Or just take a look at some of the Istio features that Backyards automates and simplifies for you, and which we've already blogged about.Check out [Backyards'] observability in action:
Get emerging insights on emerging technology straight to your inbox.
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.
* No email required
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.