Key Service Metrics

Service monitoring, part art, part science. What I attempt to do in this series of posts on metrics is to lay down some basic principles that I have found to be helpful without being too prescriptive.

Jan 04, 2025

I’d like to round out this series for now by summarizing some metrics I have found helpful for monitoring any service.

Availability

We spent a fair amount of time in the past three posts on availability. The purpose of computing, displaying and monitoring availability is that it most easily defines what “good” is across any service.

(SUM of successful responses) Availability = ——————————————————————————— (SUM of valid requests)

Drops in availability have measurable business value that can easily be computed to $ and ¢.

Total Revenue ————————-————— - Total Revenue ~= REVENUE IMPACT Availability

Now let’s take a look at additional metrics that are important to monitor for any service.

Latency

Caveat: For the purposes of this article, I am not going to cover end user experience latency, which is a complex and highly specialized topic. For this article, assume latency of a micro service…

Latency is the total amount of time it takes to complete a single transaction, and is typically measured in milliseconds for consistency across services. While availability is more of a high level gauge, Latency is a window into the application that can say and reveal a lot about what’s happening under the hood. It’s like taking an XRay. Take for example, the following graph. The first question I would ask is what is causing the spikes at p90?

If I were to wager a guess, more than likely it’s contention for shared resources, such as a database running into I/O contention. Is this a big deal? Well, it depends. How much traffic? How much headroom is available in the fleet? How often does it happen?

Latency allows us to generally determine a number of different things about a service. Sudden drops in latency can often indicate an availability drop as errors sometimes cause abnormal termination of the request. Sudden spikes in latency are indications of exhaustion of a shared resource, whether it’s a consistency lock on a database table, availability of a new connection from a connection pool, or lack of available cpu.

Latency can also be used to compute concurrency using Little’s law:

Concurrency = Arrival rate * Wait time

Concurrency can be used as a computational aid to understand concurrent resources needed to satisfy traffic at a specific level. An example would be connection limits on a load balancer as a simplistic load shedding device.

One of the strongest signals latency can give is during stress testing it often can be used to signal the breaking point of an application on a specific resource configuration. The breaking point, as described by Amdahl’s law, is the point at which a specific application on a specific hardware configuration becomes overwhelmed. It is at this point point that latency spikes dramatically. Some sophisticated load shedding techniques use latency rather than concurrency to predict when load shedding should occur to enable fleet protection in traffic overload scenarios.

Errors

Using metrics to monitor errors can be extremely helpful. However there is one caveat, which is that high cardinality metrics be both expensive and noisy. It can be really helpful to summarize errors by type into a lower cardinality list. Timeouts are a really good example error metric. Any timeout of a service endpoint is an availability drop. Any other availability causing errors (or server faults) should have some way to be summarized on a dashboard at a low cardinality for monitoring purposes.

Physical Resources

All service are dependent on physical resources in order to run and respond to traffic. It is important to monitor critical resources that may reach physical limits due to increased traffic or other application issues. CPU and Memory are good examples, but it can sometimes be important to monitor other resources that are either heavily optimized (e.g. limited) or known to cause contention issues, such as thread pools, connection pools, and dependency rate limits. If caching is an important part of the service design, a good graph on cache-hit ratio is a must. Generally latency can in many cases act as a proxy to resource contention signals, but it is valuable to have important resource metrics on a dashboard as well to enable rapid diagnosis of where the resource contention is once it arises.

Dependencies

For each critical dependency, whether it’s a storage resource such as a database or search index or a 3rd party or internal service, a couple of quick graphs per dependency should be on the primary service dashboard. It makes sense to monitor Availability, Latency, and Timeouts per dependency. If the number of dependencies is large, it can sometimes make sense to put these on a secondary dashboard that can be used for deep dives for operational issues.

Monitoring Each Endpoint

When monitoring a service, doing an overall monitor on all endpoints is really only useful if all endpoints are doing similar work (such as multiple web pages on a web server). Generally services with multiple endpoints require individual latency, availability, throughput, and error metrics per endpoint.

In summary, understanding availability, latency, throughput, and errors should be a key design goal of a service dashboard. Monitoring by endpoint is important. Understanding utilization and issues with physical resources and dependencies can help to quickly diagnose and respond to issues.

Still Compiling...

Discussion about this post