RDBMS troubleshooting and operations
This page covers troubleshooting common issues, TLS configuration, and post-deployment operations for RDBMS deployments. For configuration reference, see configure RDBMS in Helm charts.
- Validate RDBMS connectivity - Quick validation checklist with database client examples.
- Schema management - Schema creation and lifecycle.
- JDBC drivers - Managing database drivers.
Connection failures
Symptom: Pod fails to connect to the database (connection timeout, connection refused).
Diagnosis:
- Verify network connectivity from the pod to the database:
kubectl exec <pod-name> -- nc -zv database-hostname port
- Check the JDBC URL in your configuration:
kubectl get secret camunda-db-secret -o jsonpath='{.data.<key>}' -n camunda | base64 -d
- Verify the database is running and accepting connections.
Fix: Confirm the JDBC URL, hostname, port, and network policies allow traffic between pods and database.
Authentication errors
Symptom: "Authentication failed" or "Invalid password" in logs.
Diagnosis:
- Verify the secret exists and contains the correct password:
kubectl get secret camunda-db-secret -o jsonpath='{.data.<key>}' -n camunda | base64 -d
-
Check the username in your Helm values matches the database user.
-
Test connection credentials manually (if possible from a pod or bastion host).
Fix: Ensure the username, password, and secret key reference are correct in your Helm values.
JDBC driver not found
Symptom: ClassNotFoundException or "No suitable JDBC driver" in logs.
Diagnosis:
- Verify the driver JAR file was loaded:
kubectl exec <pod-name> -- ls -la /driver-lib/
- Check init container logs:
kubectl logs <pod-name> -c fetch-jdbc-drivers
- Verify the JDBC URL matches the driver type (e.g., Oracle URL with Oracle driver).
Fix: Re-apply the init container configuration or verify the custom image includes the driver. See JDBC driver management.
Schema creation failure
Symptom: Liquibase errors; tables not created.
Diagnosis:
- Check Liquibase logs:
kubectl logs <pod-name> | grep -i liquibase
- Verify
autoDDLis enabled (default:true):
orchestration:
extraConfiguration:
- file: "manual-schema-management.yaml"
content: |
camunda:
data:
secondary-storage:
rdbms:
auto-ddl: false # Confirm this is set
- Test database user permissions (see Schema management).
Fix: Ensure database user has DDL permissions or disable autoDDL and apply schema manually. See Schema management.
Liquibase lock after pod crash or restart
Symptom: Pod startup appears stuck on Liquibase, or repeated restarts fail while waiting for databasechangeloglock.
Cause: A previous pod may have been terminated while Liquibase was still running, leaving a lock row behind.
Behavior: Camunda waits for a stale Liquibase DDL lock using camunda.data.secondary-storage.rdbms.ddl-lock-wait-timeout (default: PT15M).
Fix:
- Increase the timeout for slow migrations (for example, large index creation):
orchestration:
extraConfiguration:
- file: "rdbms-liquibase-lock-timeout.yaml"
content: |
camunda:
data:
secondary-storage:
rdbms:
ddl-lock-wait-timeout: PT30M
- Avoid terminating Orchestration Cluster pods while Liquibase is actively applying migrations.
- Only release
databasechangeloglockmanually when you have verified no migration is still running.
Slow data export
Symptom: Data takes a long time to appear in the database after process events.
Cause: Flush interval or queue size not tuned for your workload.
Diagnosis:
- Check current flush interval in logs:
kubectl logs <pod-name> | grep -i flushinterval
- Verify queue size settings in your Helm values.
Fix: Adjust these settings:
orchestration:
extraConfiguration:
- file: "flush-interval.yaml"
content: |
camunda:
data:
secondary-storage:
rdbms:
flush-interval: PT1S # More frequent flushes
queue-size: 5000 # Larger queue for buffering
queue-memory-limit: 50 # Increase if needed
- Smaller
flushInterval→ more frequent writes (increases DB load). - Larger
queueSize→ more events buffered before flush (increases memory).
TLS/SSL configuration
PostgreSQL with TLS
Add SSL parameters to the JDBC URL:
orchestration:
data:
secondaryStorage:
rdbms:
url: jdbc:postgresql://hostname:5432/camunda?ssl=true&sslmode=require
Oracle with TLS
Oracle uses TCPS (TLS over Oracle protocol):
orchestration:
data:
secondaryStorage:
rdbms:
url: jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=hostname)(PORT=2484))(CONNECT_DATA=(SERVICE_NAME=FREEPDB1)))
Self-signed certificates
If your database uses self-signed certificates:
- Extract the certificate from your database server.
- Create a Kubernetes secret:
kubectl create secret generic db-certs \
--from-file=ca.crt=/path/to/ca.crt \
-n camunda
- Mount the certificate and configure trust (consult your database vendor's JDBC documentation).
Post-deployment operations
These operations are officially supported on running Camunda clusters:
Database password rotation
Password rotations are safe:
- Update the password in your RDBMS.
- Update the Kubernetes secret:
kubectl patch secret camunda-db-secret \
-p '{"data":{"db-password":"'$(echo -n 'new-password' | base64)'"}}' \
-n camunda
- Restart the Orchestration Cluster pods:
kubectl rollout restart deployment/camunda-orchestration -n camunda
JDBC driver updates
Updating bundled drivers or replacing custom drivers:
- For custom drivers via init container: Update the JAR source in your Helm values.
- For bundled drivers: Update the Camunda version.
- Redeploy:
helm upgrade camunda camunda/camunda-platform -f values.yaml -n camunda
Schema validation
Verify schema integrity after upgrades or restores:
-- PostgreSQL: Count expected tables
SELECT COUNT(*) FROM information_schema.tables
WHERE table_schema = 'public'
AND (table_name LIKE 'zeebe_%' OR table_name LIKE 'process_%');
-- Oracle: Count expected tables
SELECT COUNT(*) FROM user_tables
WHERE table_name LIKE 'ZEEBE_%' OR table_name LIKE 'PROCESS_%';
Expect roughly 20-30 tables depending on your Camunda version.
Connectivity health checks
After deployment, verify the cluster is healthy:
- Check pod readiness:
kubectl get pods -n camunda | grep orchestration
- Check exporter logs:
kubectl logs <pod-name> | grep RdbmsExporter
- Verify table creation:
SELECT COUNT(*) FROM zeebe_process;
- Deploy a test process and verify it appears in the database.
For a complete post-deployment checklist, see validate RDBMS connectivity.
Transaction isolation levels
This section applies to:
- Microsoft SQL Server
- Azure SQL Database
SQL Server and Azure SQL Database use lock-based behavior for the READ COMMITTED isolation level by default. Under high concurrent load, this can increase deadlock frequency in Camunda workloads.
Example deadlock error:
com.microsoft.sqlserver.jdbc.SQLServerException: An error occurred during the current command (Done status 0). Transaction (Process ID 83) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
If you observe this deadlock pattern in your MSSQL installation, enable READ_COMMITTED_SNAPSHOT on the Camunda database:
ALTER DATABASE [database-name]
SET READ_COMMITTED_SNAPSHOT ON;