Improve observability in rust repositories
Published on Jun 07, 2024 by math-almeida.
Overview
In complex systems, such as distributed software and cloud infrastructures, understanding how the system is behaving can be a challenge. Observability allows us to understand a system from the outside, allowing us to ask questions about that system without knowing its inner workings. Furthermore, it allows us to easily solve and deal with new problems (ie. “unknown unknowns”) and helps us answer the question: “Why is this happening?” A lack of observability means there are certain states or behaviors that cannot be discerned or predicted just by looking at its outputs.
Why it’s important?
Even though the benefits of having Observability are apparent, many organizations do not prioritize its implementation. The issue lies in outdated assumptions about how systems function. Your application is no longer a single entity residing within your control but a vast, ever-changing ecosystem of services, often dependent on elements you don’t own. Failures are always present at all stages of the project and it is impossible to predict all possible failures that may occur.
The pillars
Observability is based on three fundamental elements: logs, metrics and tracing, known as the three pillars of Observability.
Logs
Logs are records of events that occur within a system. They provide details about activities, errors, exceptions, and other information relevant to the functioning of the system.
Tracing
Tracing involves tracking individual requests as they move through the entire system. This allows you to understand the path a request follows, including all interactions between the different components of the system.
Metrics
Metrics are quantitative measurements that provide information about the performance and behavior of a system over a given period of time. They can include information such as CPU usage, API response time, number of requests per second, among others.
Core concepts
Structured Log
It consists of recording important events in systems in an organized and consistent way. These records follow a standardized format, facilitating analysis and extracting insights into system behavior.
Log centralization
It involves sending logs from all system components to a centralized location where they can be efficiently stored, searched, and analyzed.
Distributed Tracing
It involves instrumenting system-wide requests to track their path as they traverse various components. This is typically done by generating unique tracking IDs that are propagated throughout the request flow.
Metrics Collection
It consists of regularly collecting performance metrics, such as CPU usage, memory usage, request response time, among others. These metrics are generally collected at regular intervals and stored for later analysis.
Alerts and Corrective Actions
It consists of configuring alerts based on specific metrics or events that indicate problems or anomalies in the system. When an alert is triggered, corrective actions can be automatically taken to mitigate the issue.
Observability in Rust
One amazing crate who provides APIs necessary for instrumenting libraries and applications to emit trace data is tracing. Developed by the Tokio team, it’s fully built up from the ground for async which is perfect for web applications with Rust logs. It uses the concept of “spans” which are used to record the flow of execution through a program.
You can use tracing
to 1:
- Emit distributed traces to an Open Telemetry collector.
- Debug your application with Tokio Console.
- Log to stdout, log file or journalId.
- profile where your application is spending time.
Getting Started with Tracing
You can get started installing the crate into your project.
cargo add tracing
If your program copiles to a binary (not a library), you need to install a logging subscriber.
cargo add tracing-subscriber
And now we need to implement it in our project
use tracing; use tracing_subscriber; fn main() { tracing::subscriber::set_global_default( tracing_subscriber::FmtSubscriber::new() ).expect("setting default subscriber failed"); let result = compute(5); tracing::info!("The result is {}", result); } #[tracing::instrument(ret)] fn compute(n: i32) -> i32 { if n > 10 { tracing::warn!("The number is greater than 10"); } else if n < 1 { tracing::error!("The number is less than 1"); } n * 2 }
In the above code, we first set a default subscriber for the tracing events.
Then, we use the info!
macro to record an event at the info level. In the compute function, we use the trace!, warn!
, and error!
macros to record events at different levels based on the value of n.
We also use instrument
macro to record the return and the params of the compute function, it can be used to record errors or other fields in Span increasing even more observability.
This simple example show the power of tracing crate and how we can use it to improve observability in Rust repositories, in more complex systems we can add tracing-futures
and tracing-serde
to improve serializing and provides instrumenting for Futures, but I will cover this topics in another post.
Thanks for reading!