8 min read
Published on 12/21/2021
Last updated on 02/05/2024
Troubleshooting Using Prometheus Metrics in Epsagon
Share
Observability is crucial for modern application development, as it enables organizations to achieve tighter control over dynamic systems. In addition to the inherent complexities of an application workflow, various cloud-native ecosystems, such as Kubernetes and Prometheus, introduce a number of components within an already distributed framework. The resultant system involves numerous interconnected services that increase failure points as well as the complexity of debugging and general maintenance.
Prometheus is an open-source observability platform that supports the discovery and monitoring of services scheduled in Kubernetes clusters. The platform typically relies on the Kubernetes API to discover targets in order to observe the state and change of cluster components.
Extending the features of Prometheus, Epsagon provides an end-to-end observability solution of containerized workloads in Kubernetes, making it easier to trace bugs and troubleshoot efficiently.
This article delves into various Prometheus metrics, the benefits of using the platform, and the steps required for you to integrate Epsagon with Prometheus to properly observe your Kubernetes clusters.
You can then install the Epsagon agent to send cluster resource data to the Epsagon’s Kubernetes Explorer:
Following this, configure the remote write feature for sending Prometheus metrics to Epsagon by adding the following lines to the Prometheus operator’s configuration:
Quick Note: The EPSAGON_TOKEN and CLUSTER_NAME values should match those specified in the previous step.
For full utilization of Epsagon’s Kubernetes dashboards, you can install kube-state-metrics via the following commands:
With this, the setup is now complete and the Epsagon explorer can be used to access Kubernetes metrics.
Figure 1: A typical view of the Epsagon Kubernetes Node Explorer
Figure 2: The Epsagon Application Metrics Dashboard
Figure 3: The Epsagon Infrastructure Metrics Dashboard
Prometheus Metrics for Kubernetes
As one of CNCF’s managed projects, Prometheus has been widely adopted for monitoring Kubernetes applications due to its efficiency in collecting metrics for varied services. The platform leverages an instrumentation framework capable of churning large amounts of data, making it ideal for complex, distributed workloads. Prometheus collects performance data using a pull-based system, where it sends an HTTP request based on a component’s configuration. The platform then scrapes metric data from the response to this request while using exporters to make sure that the scraped data is correctly exposed and formatted. The scope of Prometheus’ service discoveries ranges across multiple components of a Kubernetes cluster, including:- Nodes
- Endpoints
- Services
- Pods
- Ingress
Benefits of Prometheus
Though use cases may vary for different organizations, the Prometheus platform offers a range of observability benefits, including: A multidimensional data model: Prometheus collects data in key-value pairs, similar to how Kubernetes component metadata is configured in YAML files. The platform relies on the Prometheus Query Language (PromQL) to enable the collection of flexible and accurate time-series data. Simple data formats and protocols: The platform collects data in self-explanatory, human-readable formats that can be published in standard HTTP. This makes exposing and checking metrics a pretty straightforward task. A built-in alert manager: Developers can specify rules for Prometheus notifications and alerts. This reduces disruptions and your developers’ workload since there is no need to source an external system or API for notification. Whitebox and blackbox monitoring: The platform includes exporters and client libraries to enable the monitoring of both performance and user experience. Prometheus can consume metrics from labels and annotations in configuration files for efficient monitoring and tracking of component status. Additionally, the platform includes metrics exposed by each component’s internals, such as logs, interfaces, and internal HTTP handlers. Pull-based metrics: With the pull-based metrics collection system, teams can simply expose metrics as HTTP endpoints and use Prometheus without exposing the monitor’s location to the services.Metric Types
Prometheus’s out-of-the-box client libraries primarily support the collection of four different types of metrics: Count: A cumulative metric arising from a single counter that either rises monotonically or is reset to zero when metric collection restarts, this is used to represent indicators such as tasks completed, errors, or number of requests received. Gauge: A single, numeric value that can rise or fall arbitrarily, this is used to expose metrics for measured values such as memory usage, temperature, or count metrics that can go up and down. Histogram: Using buckets to represent the frequency distribution of sample metrics, this measure is cumulative and can be used to observe trends in Summary and Count metrics. Summary: Similar to Histogram, this samples metrics and provides the total count of observations. However, in contrast to Histogram, the Summary metric type uses a sliding time window to calculate configurable quantiles.Prometheus Metrics & KPIs for Kubernetes
Prometheus exposes metrics that help you observe various components of a Kubernetes ecosystem. These include the four groups of metrics reviewed below.Cluster & Node Metrics
These indicators focus on an entire cluster or a specific node’s health status. They include:- Node resource metrics: disk & memory utilization, network bandwidth, and CPU usage.
- Number of nodes
- Number of running pods per node
- Memory/CPU requests and limits
Deployment and Pod Metrics
These include:- Current deployment and daemonset
- Missing and failed pods
- Pod restarts
- Pods in CrashLoopBackOff
- Running vs. desired pods
- Pod resource usage vs. requests and limits
- Available and unavailable pods
Container Metrics
These help teams establish how close container resource consumption is to the configured limits. Such metrics include:- Container CPU usage
- Container memory utilization
- Network usage
Application Metrics
These measure whether the applications running in pods are healthy and available. They include:- Application availability
- Application health and performance
Troubleshooting Using Prometheus Metrics in Epsagon
Epsagon offers observability support for clusters running on all open-source Kubernetes distributions. The platform allows a simple and seamless integration with Prometheus to automatically discover and generate metrics for an entire application workload. Epsagon also provides access to cluster logs and traces for monitoring dynamic, containerized environments. With complete visibility into cluster health and application performance issues, organizations can detect bottlenecks, troubleshoot issues, and optimize resource configuration to help developers enhance productivity. Epsagon’s integration with Prometheus lets you collect and analyze various metrics for actionable intelligence and troubleshooting. Such metrics include:- Container logs
- Cluster performance metrics, insights, and alerts
- Detailed mappings of cluster components for health verification
Installing the Epsagon Agent
You can install the Epsagon agent in your Kubernetes cluster using the Helm package manager. If it doesn’t exist already, install Helm in your cluster using the following command:$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
$ chmod 700 get_helm.sh
$ ./get_helm.sh<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
- Generate an Epsagon token to connect your application with an associated account.
- Create a simple cluster name that will be shown on the Epsagon dashboard.
- Complete the installation using the command:
$ helm repo add epsagon https://helm.epsagon.com
$ helm install <RELEASE_NAME>
--set epsagonToken=<EPSAGON_TOKEN> --set clusterName=<CLUSTER_NAME> epsagon/cluster-agent<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
Setting up Prometheus to Send Metrics to Epsagon
Before collecting Prometheus metrics, it’s important to have the Prometheus operator installed in the cluster using the command:$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm install [RELEASE_NAME] prometheus-community/prometheus --set serviceAccounts.alertmanager.create=false --set serviceAccounts.nodeExporter.create=false --set serviceAccounts.pushgateway.create=false --set alertmanager.enabled=false --set nodeExporter.enabled=false --set pushgateway.enabled=false --set server.persistentVolume.size=10Gi<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
server:
remoteWrite:
- url: https://collector.epsagon.com/ingestion?<EPSAGON_TOKEN>
basic_auth:
username: <EPSAGON_TOKEN>
write_relabel_configs:
- target_label: cluster_name
replacement: <CLUSTER_NAME><div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install [RELEASE_NAME] bitnami/kube-state-metrics<div class="open_grepper_editor" title="Edit & Save To Grepper"></div>
Enabling Trace-to-Log Correlation
The Epsagon platform autonomously correlates logs and traces, allowing developers to view logs for a specific time span. This eliminates the need for a manual log search or injecting logs with Span IDs. To enable trace-to-log correlation, developers have to:- Trace their containers. Note: Only applications in Java, Python, and Node.js support log correlation.
- Set up FluentD as a DaemonSet to send logs to AWS CloudWatch.
- View a trace’s logs by opening the trace, selecting a node, and accessing the logs in one click.
How to Send Metrics
Teams can use the Prometheus StatsD exporter to translate StatsD metrics into Prometheus metrics using pre-configured mapping rules. This is achieved by downloading and installing the exporter in the cluster. You can implement the native Prometheus instrumentation client for sending custom metrics into Prometheus. To achieve this, use the Prometheus Pushgateway or scrape the metrics directly from the client.Summary
As a workload’s ecosystem grows, a single Prometheus instance is often not enough to account for the increasing number of time series data. While deploying multiple instances of Prometheus is always one option, federating data of those instances through a common, centralized channel such as Espagon is considered the most optimal solution. Having successfully deployed the Prometheus operator with the Epsagon agent, organizations can track the overall health, performance, and behavior of their Kubernetes clusters efficiently. Prometheus is a metrics-based application monitoring system that enables DevOps teams to observe, repair, and maintain distributed, microservices-based Kubernetes workloads. And with Epsagon, teams can access a comprehensive dashboard of logs, traces, and metrics that enhances observability and simplifies troubleshooting. It also allows you to easily integrate with a wide range of data sources and create custom dashboards. Check out our demo environment or try Epsagon for FREE for up to 10 Million traces per month!Subscribe to
the Shift!
Get emerging insights on emerging technology straight to your inbox.
Unlocking Multi-Cloud Security: Panoptica's Graph-Based Approach
Discover why security teams rely on Panoptica's graph-based technology to navigate and prioritize risks across multi-cloud landscapes, enhancing accuracy and resilience in safeguarding diverse ecosystems.
Related articles
Subscribe
to
The Shift
!Get on innovative technology straight to your inbox.
emerging insights
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.