Backup and restore Zeebe data
A backup of a Zeebe cluster is comprised of a consistent snapshot of all partitions. The backup is taken asynchronously in the background while Zeebe is processing. Thus, the backups can be taken with minimal impact on normal processing. The backups can be used to restore a cluster in case of failures that lead to full data loss or data corruption.
Zeebe provides a REST API to create backups, query, and manage existing backups.
The backup management API is a custom endpoint backups
, available via Spring Boot Actuator. This is accessible via the management port of the gateway. The API documentation is also available as OpenApi specification.
Configuration
To use the backup feature in Zeebe, you must choose which external storage system you will use. Make sure to set the same configuration on all brokers in your cluster.
Zeebe supports S3 and Google Cloud Storage (GCS) for external storage.
Backups created with one store are not available or restorable from another store.
This is especially relevant if you were using GCS through the S3 compatibility mode and want to switch to the new built-in support for GCS now. Even when the underlying storage bucket is the same, backups from one are not compatible with the other.
S3 backup store
To store your backups in any S3 compatible storage system such as AWS S3 or MinIO, set the backup store to S3
and tell Zeebe how to connect to your bucket:
zeebe:
broker:
data:
backup:
store: S3
s3:
bucketName:
basePath:
region:
endpoint:
accessKey:
secretKey:
Alternatively, you can configure backup store using environment variables:
ZEEBE_BROKER_DATA_BACKUP_STORE
- Set this toS3
to store backups in S3 buckets.ZEEBE_BROKER_DATA_BACKUP_S3_BUCKETNAME
- The backup is stored in this bucket. The bucket must already exist.ZEEBE_BROKER_DATA_BACKUP_S3_BASEPATH
- If the bucket is shared with other Zeebe clusters, a unique basePath must be configured.ZEEBE_BROKER_DATA_BACKUP_S3_ENDPOINT
- If no endpoint is provided, it is determined based on the configured region.ZEEBE_BROKER_DATA_BACKUP_S3_REGION
- If no region is provided, it is determined from the environment.ZEEBE_BROKER_DATA_BACKUP_S3_ACCESSKEY
- If eitheraccessKey
orsecretKey
is not provided, the credentials are determined from the environment.ZEEBE_BROKER_DATA_BACKUP_S3_SECRETKEY
- Specify the secret key.
Backup Encryption
Zeebe does not support backup encryption natively, but it can use encrypted S3 buckets. For AWS S3, this means enabling default bucket encryption.
Using default bucket encryption gives you control over the encryption keys and algorithms while being completely transparent with Zeebe.
Combined with TLS between Zeebe and the S3 API, backups are fully encrypted in transit and at rest. Other S3 compatible services might have similar features that should work as well.
Backup compression
Backups can be large depending on your usage of Zeebe. To reduce S3 storage costs and upload times, you can enable backup compression.
Zeebe compresses backup data immediately before uploading to S3 and buffers the compressed files in a temporary directory. Compression and buffering of compressed files can have a negative effect if Zeebe is heavily resource constrained.
You can enable compression by specifying a compression algorithm to use. We recommend using zstd as it provides a good trade off between compression ratio and resource usage.
More compression algorithms are available; check commons-compress for a full list.
zeebe.broker.data.backup.s3.compression: zstd # or use environment variable ZEEBE_BROKER_DATA_BACKUP_S3_COMPRESSION
GCS backup store
To store your backups in Google Cloud Storage (GCS), choose the GCS
backup store and tell Zeebe which bucket to use:
zeebe:
broker:
data:
backup:
store: GCS
gcs:
bucketName: # or use environment variable ZEEBE_BROKER_DATA_BACKUP_GCS_BUCKETNAME
basePath: # or use environment variable ZEEBE_BROKER_DATA_BACKUP_GCS_BASEPATH
The bucket specified with bucketName
must already exist, Zeebe will not try to create one for you.
To prevent misconfiguration, Zeebe will check at startup that the specified bucket exists and can be accessed.
Setting a basePath
is not required but useful if you want to use the same bucket for multiple Zeebe clusters.
When basePath
is set, Zeebe will only create and access objects under this path.
This can be any string that is a valid object name, for example the name of your cluster.
Authentication is handled by Application Default Credentials.
In many cases, these credentials are automatically provided by the runtime environment.
If you need more control, you can customize authentication by setting environment variable GOOGLE_APPLICATION_CREDENTIALS
.
Backup encryption
There are multiple data encryption options, some of which are supported by Zeebe:
- Default server-side encryption is fully supported. This is enabled by default for all GCS buckets.
- Customer-managed encryption keys are supported if they are set as the default key for your bucket.
- Customer-supplied encryption keys are not supported.
- Client-side encryption keys are not supported.
Create backup API
The following request can be used to start a backup.
Request
POST actuator/backups
{
"backupId": <backupId>
}
A backupId
is an integer and must be greater than the id of previous backups that are completed, failed, or deleted.
Zeebe does not take two backups with the same ids. If a backup fails, a new backupId
must be provided to trigger a new backup.
The backupId
cannot be reused, even if the backup corresponding to the backup id is deleted.
Example request
curl --request POST 'http://localhost:9600/actuator/backups' \
-H 'Content-Type: application/json' \
-d '{ "backupId": "100" }'
Response
Code | Description |
---|---|
202 Accepted | A Backup has been successfully scheduled. To determine if the backup process was completed, refer to the GET API. |
400 Bad Request | Indicates issues with the request, for example when the backupId is not valid or backup is not enabled on the cluster. |
409 Conflict | Indicates a backup with the same backupId or a higher id already exists. |
500 Server Error | All other errors. Refer to the returned error message for more details. |
502 Bad Gateway | Zeebe has encountered issues while communicating to different brokers. |
504 Timeout | Zeebe failed to process the request within a pre-determined timeout. |
Example response body with 202 Accepted
{
"message": "A backup with id 100 has been scheduled. Use GET actuator/backups/100 to monitor the status."
}
Get backup info API
Information about a specific backup can be retrieved using the following request:
Request
GET actuator/backups/{backupId}
Example request
curl --request GET 'http://localhost:9600/actuator/backups/100'
Response
Code | Description |
---|---|
200 OK | Backup state could be determined and is returned in the response body (see example below). |
400 Bad Request | There is an issue with the request. Refer to the returned error message for details. |
404 Not Found | A backup with that ID does not exist. |
500 Server Error | All other errors. Refer to the returned error message for more details. |
502 Bad Gateway | Zeebe has encountered issues while communicating to different brokers. |
504 Timeout | Zeebe failed to process the request within a pre-determined timeout. |
When the response is 200 OK, the response body consists of a JSON object describing the state of the backup.
backupId
: Id in the request.state
: Gives the overall status of the backup. The state can be one of the following:COMPLETED
if all partitions have completed the backup.FAILED
if at least one partition has failed. In this case,failureReason
contains a string describing the reason for failure.INCOMPLETE
if at least one partition's backup does not exist.IN_PROGRESS
if at least one partition's backup is in progress.
details
: Gives the state of each partition's backup.failureReason
: The reason for failure if the state isFAILED
.
Example response body with 200 OK
{
"backupId": 100,
"details": [
{
"brokerVersion": "8.2.0-SNAPSHOT",
"checkpointPosition": 5,
"createdAt": "2022-12-08T13:00:55.344276672Z",
"lastUpdatedAt": "2022-12-08T13:00:55.805351556Z",
"partitionId": 1,
"snapshotId": "2-1-3-2",
"state": "COMPLETED"
},
{
"brokerVersion": "8.2.0-SNAPSHOT",
"checkpointPosition": 7,
"createdAt": "2022-12-08T13:00:55.370965069Z",
"lastUpdatedAt": "2022-12-08T13:00:55.84756566Z",
"partitionId": 2,
"snapshotId": "3-1-5-3",
"state": "COMPLETED"
}
],
"state": "COMPLETED"
}
List backups API
Information about all backups can be retrieved using the following request:
Request
GET actuator/backups
Example request
curl --request GET 'http://localhost:9600/actuator/backups'
Response
Code | Description |
---|---|
200 OK | Backup state could be determined and is returned in the response body (see example below). |
400 Bad Request | There is an issue with the request. Refer to returned error message for details. |
500 Server Error | All other errors. Refer to the returned error message for more details. |
502 Bad Gateway | Zeebe has encountered issues while communicating to different brokers. |
504 Timeout | Zeebe failed to process the request with in a pre-determined timeout. |
When the response is 200 OK, the response body consists of a JSON object with a list of backup info. See get backup info API response for the description of each field.
Example response body with 200 OK
[
{
"backupId": 100,
"details": [
{
"brokerVersion": "8.2.0-SNAPSHOT",
"createdAt": "2022-12-08T13:00:55.344276672Z",
"partitionId": 1,
"state": "COMPLETED"
},
{
"brokerVersion": "8.2.0-SNAPSHOT",
"createdAt": "2022-12-08T13:00:55.370965069Z",
"partitionId": 2,
"state": "COMPLETED"
}
],
"state": "COMPLETED"
},
{
"backupId": 200,
"details": [
{
"brokerVersion": "8.2.0-SNAPSHOT",
"createdAt": "2022-12-08T13:01:15.27750375Z",
"partitionId": 1,
"state": "COMPLETED"
},
{
"brokerVersion": "8.2.0-SNAPSHOT",
"createdAt": "2022-12-08T13:01:15.279995106Z",
"partitionId": 2,
"state": "COMPLETED"
}
],
"state": "COMPLETED"
}
]
Delete backup API
A backup can be deleted using the following request:
Request
DELETE actuator/backups/{backupId}
Example request
curl --request DELETE 'http://localhost:9600/actuator/backups/100'
Response
Code | Description |
---|---|
204 No Content | The backup has been deleted. |
400 Bad Request | There is an issue with the request. Refer to returned error message for details. |
500 Server Error | All other errors. Refer to the returned error message for more details. |
502 Bad Gateway | Zeebe has encountered issues while communicating to different brokers. |
504 Timeout | Zeebe failed to process the request with in a pre-determined timeout. |
Restore
A new Zeebe cluster can be created from a specific backup. Camunda provides a standalone app which must be run on each node where a Zeebe broker will be running. This is a Spring Boot application similar to the broker and can run using the binary provided as part of the distribution. The app can be configured the same way a broker is configured - via environment variables or using the configuration file located in config/application.yaml
.
To restore a Zeebe cluster, run the following in each node where the broker will be running:
tar -xzf zeebe-distribution-X.Y.Z.tar.gz -C zeebe/
./bin/restore --backupId=<backupId>
If restore was successful, the app exits with a log message of Successfully restored broker from backup
.
Restore fails if:
- There is no valid backup with the given backupId.
- Backup store is not configured correctly.
- The configured data directory is not empty.
- Any other unexpected errors.
If the restore fails, you can re-run the application after fixing the root cause.
When restoring, provide the same configuration (node id, data directory, cluster size, and replication count) as the broker that will be running in this node. The partition count must be same as in the backup.