Metrics
When operating a distributed system like Camunda 8, it is important to put proper monitoring in place. To facilitate this, Camunda leverages Micrometer, a library which provides a convenient facade that allows exporting metrics to one or more supported implementations (e.g. Prometheus, OpenTelemetry, Datadog, Dynatrace, etc.).
Configuration
Configuration for metrics is done via the built-in Spring Boot Micrometer configuration, as documented here.
Defaults
Camunda comes built-in with support for Prometheus and OpenTelemetry. By default, it is configured to export only Prometheus metrics via a scraping endpoint, and OpenTelemetry is disabled.
Prometheus
The scraping endpoint for Prometheus is located under the management context (by default :9600/actuator/prometheus
). This is configured via the following
properties:
management:
endpoint.prometheus.access: unrestricted
prometheus.metrics.export.enabled: true
In order to collect the metrics, Prometheus needs to be made aware of the new scraping endpoint. To do so, add the following scraping job:
- job_name: camunda
scrape_interval: 30s
metrics_path: /actuator/prometheus
scheme: http
static_configs:
- targets:
- localhost: 9600
If you've configured your management context to be over HTTPS, you will need to also update the scheme above. Same thing if you changed the management port.
The scraping interval is 30s by default; this means, you will get new data points in Prometheus every 30s. This is a good default to minimize the storage requirements for Prometheus. If you want to run alerts or auto-scaling based on the provided metrics, then you may wish to configure a shorter interval. This will result in more data ingested however, so use at your own risk.
OpenTelemetry Protocol
Zeebe also comes built-in with support to export metrics via OpenTelemetry (done through the micrometer-registry-otlp
). To configure it, you would set the following properties:
management:
# Disable Prometheus
promethus.metrics.export.enabled: false
# Configure OpenTelemetry Metrics
otlp:
metrics:
export:
# Enable OTLP
enabled: true
# Since metrics are pushed, you will need to configure at least one endpoint
url: "https://otlp.example.com:4318/v1/metrics"
You can find a more extensive list of configuration options for OTLP on the Micrometer website.
When using the OTLP exporter, be sure to check the requirements of your target endpoint, as it may require additional configuration.
For example, you may need to pass a client secret and ID for authentication via the otlp.metrics.export.headers
options. Or your system
may not support cumulative
aggregation temporality, but instead require delta
(e.g. Dynatrace).
Note that a wide variety of existing monitoring systems also support ingesting OpenTelemetry data (e.g. Dynatrace, Datadog, etc.). We recommend using it instead of the specific Micrometer implementations.
Using a different monitoring system
To use a different monitoring system, refer to the Spring Boot documentation. Note that Zeebe only ships with built-in support for the Prometheus and OTLP systems.
If you wish to use a different system, you would need to add the required dependencies to your Zeebe installation, specifically to the distribution's lib/
folder.
When using the container image, you will need to add it to the following path based on your image:
camunda/zeebe
:/usr/local/zeebe/lib
camunda/camunda
:/usr/local/camunda/lib
For example, if you want to export to Datadog, you would download the io.micrometer:micrometer-registry-datadog
JAR and place it in the ./lib
folder of the distribution.
Running from the root of the distribution, you can leverage Maven to do this for you:
mvn dependency:copy -Dartifact=io.micrometer:micrometer-registry-datadog:1.14.4 -Dtransitive=false -DoutputDirectory=./lib
Make sure the version is the same as the Micrometer version used by Camunda. You can find this out by checking the distribution artifact on Maven Central.
Make sure to select the distribution version you're using, then filter for micrometer
to get the expected Micrometer version.
Customizing metrics
You can modify and filter the metrics exposed in Camunda via configuration.
Common tags
Tags provide a convenient way of aggregating metrics over common attributes. Via configuration, you can ensure that all metrics for a specific instance of Camunda share common tags. For example, let's say you deploy two different clusters, and want to differentiate them.
The first one could be configured as:
management:
metrics:
tags:
cluster: "foo"
And the second one as:
management:
metrics:
tags:
cluster: "bar"
Filtering
You can additionally disable certain metrics. This can be useful for high cardinality metrics which you do not care for, but which may end up being expensive to store in your target system.
To filter a metric called zeebe.foo
, you would configure the following property:
management:
metrics:
enable:
zeebe:
foo: false
Filtering applies not only to direct name matches (e.g. zeebe.foo
), but as a prefix. Meaning any metric starting with the prefix zeebe.foo
in the example
above would also be filtered out, and wouldn't be exported.
Available metrics
Spring already exposes various metrics, some of which will be made available through Camunda:
Additionally, Camunda will expose several custom metrics, most of them under the zeebe
, atomix
, operate
, tasklist
, or optimize
prefixes.
While all nodes in a Camunda cluster expose metrics, they will expose relevant metrics based on their role. For example, brokers will expose processing related metrics, while gateways will expose REST API relevant metrics.
Metrics related to process processing:
zeebe_stream_processor_records_total
: The number of events processed by the stream processor. Theaction
label separates processed, skipped, and written events.zeebe_exporter_events_total
: The number of events processed by the exporter processor. Theaction
label separates exported and skipped events.zeebe_element_instance_events_total
: The number of occurred process element instance events. Theaction
label separates the number of activated, completed, and terminated elements. Thetype
label separates different BPMN element types.zeebe_job_events_total
: The number of job events. Theaction
label separates the number of created, activated, timed out, completed, failed, and canceled jobs.zeebe_incident_events_total
: The number of incident events. Theaction
label separates the number of created and resolved incident events.zeebe_pending_incidents_total
: The number of currently pending incident, i.e. not resolved.
Metrics related to performance:
Zeebe has a backpressure mechanism by which it rejects requests when it receives more requests than it can handle without incurring high processing latency.
Monitor backpressure and processing latency of the commands using the following metrics:
zeebe_dropped_request_count_total
: The number of user requests rejected by the broker due to backpressure.zeebe_backpressure_requests_limit
: The limit for the number of inflight requests used for backpressure.zeebe_stream_processor_latency_bucket
: The processing latency for commands and event.
Metrics related to health:
The health of partitions in a broker can be monitored by the metric zeebe_health
.
Execution latency metrics
The brokers can export optional execution latency metrics. To enable export of execution metrics, configure set the ZEEBE_BROKER_EXECUTION_METRICS_EXPORTER_ENABLED
environment variable to true
in
your Zeebe configuration file.
Grafana
Zeebe
Zeebe comes with a pre-built dashboard, available in the repository: monitor/grafana/zeebe.json.
Import it into your Grafana instance and select the correct Prometheus data source (important if you have more than one). You will then be greeted with the following dashboard, which displays a healthy cluster topology, general throughput metrics, handled requests, exported events per second, disk and memory usage, and more.
You can also try out an interactive version, where you can explore help messages for every panel and get a feel for what data is available.