Zero-downtime migration from Bitnami subcharts
Migrate a Camunda 8 Helm installation from Bitnami-managed infrastructure to operator-managed or managed service equivalents without planned application downtime. Instead of the freeze-backup-restore-switch pattern used in the standard migration, this approach keeps source and target synchronized with real-time data replication before cutover.
This guide describes an advanced migration strategy that eliminates the downtime window present in the standard migration. The commands and examples in this guide are provided for informational purposes only. You must test them on a staging environment that mirrors your production setup before executing them in production. You are expected to familiarize yourself with the underlying concepts and write your own cutover runbook that accounts for your specific constraints, network topology, and data volumes.
For most deployments, Camunda recommends the simpler standard migration with a 5–60 minute maintenance window.
When to use this guide
Use this guide only if all of the following are true:
- You have already ruled out the standard migration because even a short maintenance window is unacceptable.
- Your team is comfortable operating PostgreSQL logical replication and one of the supported Elasticsearch synchronization strategies.
- You can monitor replication lag and validate consistency before cutover.
- You are prepared to adapt the examples to your topology, especially if the targets are managed services instead of in-cluster operators.
Read the topic overview to learn why you should migrate.
How it works
The zero-downtime migration replaces the backup/restore phases with continuous replication:
| Phase | Name | Downtime | Description |
|---|---|---|---|
| 1 | Deploy targets | None | Install operators and create target clusters alongside Bitnami |
| 2 | Enable replication | None | Set up PG logical replication and ES CCR / continuous snapshots |
| 3 | Sync and verify | None | Wait for replication lag to reach 0, verify data consistency |
| 4 | Instantaneous cutover | None | Helm upgrade to switch backends (rolling restart, no freeze) |
| 5 | Validate and clean up | None | Verify health, tear down replication, remove old resources |
Key differences from the standard migration
| Aspect | Standard migration | Zero-downtime migration |
|---|---|---|
| Downtime | 5–60 minutes (Phase 3 freeze) | None |
| Data transfer | pg_dump/pg_restore + ES _reindex | Logical replication + CCR/continuous snapshot |
| Complexity | Low — scripted and automated | High — manual setup, monitoring required |
| Risk | Low — rollback via Helm values | Medium — replication lag must be monitored |
| PostgreSQL version requirement | Any | PostgreSQL 10+ (logical replication) |
| Elasticsearch requirement | _reindex API (reindex from remote) | CCR (Platinum license) or continuous snapshots |
Prerequisites
Before starting the migration, ensure you have the following general prerequisites:
- A running Camunda 8 installation using the Helm chart with Bitnami subcharts enabled
kubectlconfigured and pointing to your clusterhelmwith thecamunda/camunda-platformrepository added- Sufficient cluster resources to temporarily run both old and new infrastructure side-by-side
- A tested backup of your current installation (see Precautions)
In addition to the general prerequisites:
- PostgreSQL source must support logical replication (
wal_level = logical). This may require a restart of the Bitnami PostgreSQL StatefulSet. - Deep understanding of your data volumes, replication lag tolerances, and network throughput between source and target.
- A monitoring solution to track replication lag (for example, Prometheus, Grafana, or manual queries).
Elasticsearch requires an explicit tradeoff:
- Cross-cluster replication
- Use if you need the closest possible parity at cutover.
- This option requires an Elastic Platinum license
- Continuous snapshots
- Use if a small lag window is acceptable.
- This option requires the ability to run continuous snapshots with very short intervals.
Precautions
Review the general precautions that apply to all migration paths.
Additionally, note that while this migration path removes the planned downtime window, it does not remove the need for rehearsal, monitoring, and rollback planning. Treat it as a custom migration pattern rather than a push-button alternative to the standard workflow.
Review the operational readiness checklist, including the staging rehearsal and pre-migration checklist, before starting a production migration.
PostgreSQL logical replication limitations
-
DDL not replicated: Schema changes (CREATE TABLE, ALTER TABLE, etc.) are not replicated. If the source schema changes during migration, you must apply the same changes to the target manually.
-
Large objects:
pg_largeobjectdata is not replicated via logical replication. -
Sequences: Sequence values are not replicated. After cutover, sequences on the target may need to be reset:
-- Run on each target database after cutover
SELECT setval(pg_get_serial_sequence(table_name, column_name), max(column_name))
FROM table_name; -
TRUNCATE:
TRUNCATEis replicated only in PostgreSQL 11+.
Elasticsearch limitations
- CCR requires Platinum license: The open-source and Basic tiers do not include cross-cluster replication.
- Continuous snapshots have lag: The snapshot approach introduces a replication delay equal to the snapshot interval.
- Index mapping conflicts: If the source creates new indices during replication, they must be manually added to the CCR follow configuration.
Keycloak considerations
Keycloak data is stored in PostgreSQL, so it is covered by the PostgreSQL logical replication. The Keycloak Operator CR will start using the replicated data in the CNPG cluster after the Helm upgrade.
However, be aware of Keycloak session data:
- Active user sessions stored in PostgreSQL will be replicated.
- In-memory Infinispan caches will be rebuilt on the new Keycloak pods.
- Users may need to re-authenticate after the cutover (session cookies point to the old Keycloak pods).
Assumed target infrastructure
The commands and snippets in this guide assume CloudNativePG (CNPG) as the PostgreSQL target and ECK as the Elasticsearch target. If you are migrating to managed services (for example, AWS RDS or Elastic Cloud), replace the target hostnames, credentials, and connection methods accordingly.
Clone the deployment references repository
This guide uses scripts from the Camunda deployment references repository. Clone the repository and navigate to the migration directory:
git clone https://github.com/camunda/camunda-deployment-references.git
cd camunda-deployment-references/generic/kubernetes/migration
Configure the migration by editing env.sh to match your current Camunda installation, then source it:
source env.sh
For a full description of configuration variables, see configure the migration.
Phase 1: Deploy target infrastructure
This phase is identical to Phase 1 of the standard migration. Deploy the target operators and clusters alongside the existing Bitnami components:
bash 1-deploy-targets.sh
After this phase, both the old Bitnami infrastructure and the new operator-managed infrastructure run side by side. No traffic is routed to the new targets yet.
Phase 2: Enable real-time replication
PostgreSQL: Logical replication
PostgreSQL logical replication allows streaming changes in real time from the Bitnami PostgreSQL instances to the CNPG (or managed service) targets without stopping the source.
Step 1: Enable logical replication on the source
The source Bitnami PostgreSQL must have wal_level = logical. Check the current setting:
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "SHOW wal_level;"
If it returns replica (the default), you need to change it:
# Patch the Bitnami PostgreSQL ConfigMap or StatefulSet
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "ALTER SYSTEM SET wal_level = 'logical';"
Changing wal_level requires a PostgreSQL restart. This is the only brief interruption in the zero-downtime approach — a PostgreSQL restart typically completes in a few seconds, and Camunda components reconnect automatically.
kubectl rollout restart statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE}
kubectl rollout status statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} --timeout=120s
Also ensure max_replication_slots and max_wal_senders are sufficient (at least 4 each — one per database plus overhead):
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "SHOW max_replication_slots; SHOW max_wal_senders;"
Step 2: Create publications on the source
For each database, create a publication that includes all tables:
# Identity database
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "CREATE PUBLICATION identity_migration FOR ALL TABLES;"
# Keycloak database
KEYCLOAK_STS="${CAMUNDA_RELEASE_NAME}-keycloak-postgresql"
kubectl exec -it ${KEYCLOAK_STS}-0 -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "CREATE PUBLICATION keycloak_migration FOR ALL TABLES;"
# Web Modeler database
WEBMODELER_STS="${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler"
kubectl exec -it ${WEBMODELER_STS}-0 -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "CREATE PUBLICATION webmodeler_migration FOR ALL TABLES;"
Depending on your Helm chart version, each component may use a separate Bitnami PostgreSQL StatefulSet or share one. Adjust the StatefulSet names accordingly.
Step 3: Perform initial data sync
Before enabling subscriptions, perform a one-time schema and data sync. Logical replication only replicates DML (INSERT/UPDATE/DELETE), not DDL (schema changes):
Show details: initial PostgreSQL sync example
# For each component, dump the schema + data and restore to the target
for COMPONENT in identity keycloak webmodeler; do
SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')
SOURCE_HOST="${SOURCE_STS}.${NAMESPACE}.svc.cluster.local"
# Determine the target based on operator or external
if [[ "$COMPONENT" == "identity" ]]; then
TARGET_HOST="${CNPG_IDENTITY_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_IDENTITY_CLUSTER}-secret"
elif [[ "$COMPONENT" == "keycloak" ]]; then
TARGET_HOST="${CNPG_KEYCLOAK_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_KEYCLOAK_CLUSTER}-secret"
elif [[ "$COMPONENT" == "webmodeler" ]]; then
TARGET_HOST="${CNPG_WEBMODELER_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_WEBMODELER_CLUSTER}-secret"
fi
echo "Syncing ${COMPONENT}: ${SOURCE_HOST} → ${TARGET_HOST}"
# Dump and restore (this is a one-time operation, not a freeze)
kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
pg_dump -U ${COMPONENT} -d ${COMPONENT} -F custom -f /tmp/${COMPONENT}.dump
kubectl cp ${NAMESPACE}/${SOURCE_STS}-0:/tmp/${COMPONENT}.dump ./${COMPONENT}.dump
# Get target password
TARGET_PWD=$(kubectl get secret ${TARGET_SECRET} -n ${NAMESPACE} -o jsonpath='{.data.password}' | base64 -d)
# Restore to target via a temporary pod
kubectl run pg-restore-${COMPONENT} --rm -i --restart=Never \
--image=postgres:16 -n ${NAMESPACE} \
--env="PGPASSWORD=${TARGET_PWD}" -- \
pg_restore -h ${TARGET_HOST} -U ${COMPONENT} -d ${COMPONENT} \
--clean --if-exists --no-owner --no-privileges /dev/stdin < ./${COMPONENT}.dump
done
Step 4: Create subscriptions on the target
On each CNPG target cluster, create a subscription pointing to the source:
Show details: subscription creation example
# Get source password
SOURCE_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)
# Identity — target the -rw service to ensure writes land on the current primary
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "
CREATE SUBSCRIPTION identity_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=identity user=postgres password=${SOURCE_PWD}'
PUBLICATION identity_migration
WITH (copy_data = false);
"
# Keycloak
KEYCLOAK_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "
CREATE SUBSCRIPTION keycloak_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-keycloak-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=keycloak user=postgres password=${KEYCLOAK_PWD}'
PUBLICATION keycloak_migration
WITH (copy_data = false);
"
# Web Modeler
WEBMODELER_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "
CREATE SUBSCRIPTION webmodeler_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler.${NAMESPACE}.svc.cluster.local port=5432 dbname=webmodeler user=postgres password=${WEBMODELER_PWD}'
PUBLICATION webmodeler_migration
WITH (copy_data = false);
"
The copy_data = false flag is important because we already performed the initial sync in Step 3. The subscription will now stream only new changes in real-time.
Elasticsearch: Continuous synchronization
Unlike PostgreSQL, Elasticsearch does not have a built-in logical replication feature available in the open-source version. Choose one of the following approaches:
| Strategy | Best when | Tradeoff |
|---|---|---|
| CCR | You need the closest possible real-time replica and have an Elastic Platinum license | Highest operational complexity |
| Continuous snapshots | You can tolerate a small lag window and want an open-source-compatible approach | Recent writes may be missing until re-export catches up |
- Cross-cluster replication (Platinum)
- Continuous snapshots (open-source)
If you have an Elastic Platinum license, you can use cross-cluster replication (CCR) to replicate indices in real-time:Show details: CCR setup example
# Get ECK ES password
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)
# Get source ES password
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
-o jsonpath='{.data.elasticsearch-password}' | base64 -d)
# Configure the target ECK cluster to recognize the source as a remote
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X PUT \
"http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"persistent": {
"cluster": {
"remote": {
"bitnami_source": {
"seeds": ["'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-0.'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-headless.'${NAMESPACE}'.svc.cluster.local:9300"]
}
}
}
}
}'
# Create follower indices for each Camunda index pattern
for PATTERN in zeebe operate tasklist optimize connectors camunda; do
# List source indices matching the pattern
INDICES=$(kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" \
"http://localhost:9200/_cat/indices/${PATTERN}-*?h=index" | tr -d '[:space:]' | tr '\n' ' ')
for IDX in $INDICES; do
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X PUT \
"http://localhost:9200/${IDX}/_ccr/follow" \
-H 'Content-Type: application/json' \
-d '{
"remote_cluster": "bitnami_source",
"leader_index": "'${IDX}'"
}'
done
done
If you don't have a Platinum license, use continuous snapshot/restore with SLM (Snapshot Lifecycle Management) to keep the target close to the source. This approach has a small replication lag (typically 5–15 minutes):Show details: continuous snapshot setup example
# Get source ES password
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
-o jsonpath='{.data.elasticsearch-password}' | base64 -d)
# Register a shared snapshot repository on the source (using the backup PVC)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" -X PUT \
"http://localhost:9200/_snapshot/migration_continuous" \
-H 'Content-Type: application/json' \
-d '{"type":"fs","settings":{"location":"/backup/elasticsearch/continuous"}}'
# Create an SLM policy for frequent snapshots (every 5 minutes)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" -X PUT \
"http://localhost:9200/_slm/policy/migration_continuous" \
-H 'Content-Type: application/json' \
-d '{
"schedule": "0 */5 * * * ?",
"name": "<migration-snap-{now/m{yyyyMMdd-HHmmss}}>",
"repository": "migration_continuous",
"config": {
"indices": ["*"],
"ignore_unavailable": true,
"include_global_state": false
},
"retention": {
"expire_after": "1h",
"min_count": 1,
"max_count": 5
}
}'
Before cutover, you will restore the latest snapshot to the target ECK cluster.
Phase 3: Verify synchronization
Before performing the cutover, verify that replication is caught up and data is consistent.
Monitor PostgreSQL replication lag
Check the replication lag on each subscription:
Show details: PostgreSQL lag check example
# On each CNPG target, check subscription status
for CLUSTER in ${CNPG_IDENTITY_CLUSTER} ${CNPG_KEYCLOAK_CLUSTER} ${CNPG_WEBMODELER_CLUSTER}; do
echo "=== ${CLUSTER} ==="
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -c "
SELECT subname, received_lsn, latest_end_lsn,
latest_end_lsn - received_lsn AS lag_bytes
FROM pg_stat_subscription;
"
done
Wait until lag_bytes is consistently 0 or near-zero before proceeding.
Monitor Elasticsearch sync
- Cross-cluster replication (Platinum)
- Continuous snapshots (open-source)
Show details: CCR status check example
# Check CCR follower status
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" \
"http://localhost:9200/_ccr/stats" | jq '.follow_stats.indices[].shards[].leader_global_checkpoint'
Show details: snapshot status check example
# Check the latest snapshot status
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
-o jsonpath='{.data.elasticsearch-password}' | base64 -d)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" \
"http://localhost:9200/_slm/policy/migration_continuous" | jq '.last_success'
Verify row counts
Compare row counts between source and target for each database to confirm data consistency:
Show details: row count verification example
for COMPONENT in identity keycloak webmodeler; do
SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')
echo "=== ${COMPONENT} ==="
echo "Source:"
kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
psql -U ${COMPONENT} -d ${COMPONENT} -c "
SELECT schemaname, relname, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC LIMIT 10;
"
CNPG_CLUSTER_VAR="CNPG_${COMPONENT^^}_CLUSTER"
echo "Target (${!CNPG_CLUSTER_VAR}):"
kubectl exec -it ${!CNPG_CLUSTER_VAR}-1 -n ${NAMESPACE} -- \
psql -U ${COMPONENT} -d ${COMPONENT} -c "
SELECT schemaname, relname, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC LIMIT 10;
"
done
Phase 4: Instantaneous cutover
Once replication is confirmed in sync, perform the cutover. This phase uses a rolling Helm upgrade instead of the freeze-then-restore approach, resulting in zero downtime.
Before starting the cutover, confirm this checklist:
- PostgreSQL subscriptions show
lag_bytesat0or close to0for a sustained interval. - Your chosen Elasticsearch sync method is healthy and up to date.
- The target services are reachable from the Camunda namespace.
- You have the rollback command, values, and on-call contacts ready before the Helm upgrade.
Step 1: Stop replication (PostgreSQL)
Drop the subscriptions on the target to stop replication and allow the targets to accept writes:
Show details: stop PostgreSQL replication example
# Drop subscriptions — target the current primary by label selector
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "ALTER SUBSCRIPTION identity_sub DISABLE; DROP SUBSCRIPTION identity_sub;"
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "ALTER SUBSCRIPTION keycloak_sub DISABLE; DROP SUBSCRIPTION keycloak_sub;"
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "ALTER SUBSCRIPTION webmodeler_sub DISABLE; DROP SUBSCRIPTION webmodeler_sub;"
Step 2: Stop Elasticsearch replication
- Cross-cluster replication (Platinum)
- Continuous snapshots (open-source)
Promote follower indices to regular indices:Show details: CCR cutover example
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)
# Pause and unfollow each replicated index
INDICES=$(kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" "http://localhost:9200/_cat/indices?h=index" | grep -E "^(zeebe|operate|tasklist|optimize)-")
for IDX in $INDICES; do
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/pause_follow"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_close"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/unfollow"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_open"
done
Restore the latest snapshot to the target ECK cluster:Show details: snapshot restore example
# Delete the SLM policy
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
-o jsonpath='{.data.elasticsearch-password}' | base64 -d)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" -X DELETE \
"http://localhost:9200/_slm/policy/migration_continuous"
# Get the latest snapshot name
LATEST_SNAP=$(kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" \
"http://localhost:9200/_snapshot/migration_continuous/_all" | jq -r '.snapshots[-1].snapshot')
# Restore to target ECK
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)
# Register the repo on the target
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X PUT \
"http://localhost:9200/_snapshot/migration_continuous" \
-H 'Content-Type: application/json' \
-d '{"type":"fs","settings":{"location":"/backup/elasticsearch/continuous"}}'
# Restore
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST \
"http://localhost:9200/_snapshot/migration_continuous/${LATEST_SNAP}/_restore?wait_for_completion=true" \
-H 'Content-Type: application/json' \
-d '{"indices":"*","ignore_unavailable":true,"include_global_state":false}'
With the continuous snapshot approach, there is a small window (up to the snapshot interval, for example 5 minutes) where recent Elasticsearch writes may not be captured. Zeebe will re-export these events after the cutover.
Step 3: Helm upgrade (rolling restart)
Perform the Helm upgrade to switch Camunda to the new backends. Because there is no freeze, pods are restarted in rolling fashion.
The zero-downtime approach does not use 3-cutover.sh — that script freezes the application, which defeats the purpose. Instead, run the Helm upgrade manually with the operator-based values:
Show details: Helm upgrade example
helm upgrade ${CAMUNDA_RELEASE_NAME} camunda/camunda-platform \
-n ${NAMESPACE} \
--version ${CAMUNDA_HELM_CHART_VERSION} \
-f operator-based-values.yaml \
--wait --timeout 10m
Build the values file by combining the operator-based Helm values files from the reference architecture (for example, camunda-identity-values.yml, camunda-elastic-values.yml, camunda-keycloak-domain-values.yml) to point Camunda at the new backends. Ensure Bitnami subcharts are disabled.
The Helm upgrade triggers a rolling restart of Camunda pods. During this process:
- Zeebe StatefulSet pods restart one at a time, maintaining quorum.
- Operate, Tasklist, Optimize, and other deployments restart with zero-downtime rollout strategy.
- There is a brief period where some pods use old backends and others use new ones, but this is safe because the data has already been replicated.
Step 4: Clean up source publications
After the cutover is confirmed working, clean up the publications on the source:
Show details: source publication cleanup example
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "DROP PUBLICATION IF EXISTS identity_migration;"
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "DROP PUBLICATION IF EXISTS keycloak_migration;"
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler-0 -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "DROP PUBLICATION IF EXISTS webmodeler_migration;"
Phase 5: Validate and clean up
Validate
Run the standard validation to confirm all components are healthy on the new infrastructure:
bash 4-validate.sh
Do not clean up immediately after validation. Operate with the new infrastructure through at least one full business cycle (for example, a complete weekday with peak traffic) to confirm stability. Once Bitnami resources are deleted, rollback is no longer possible without restoring from backup. If you need to fail back, run bash rollback.sh before this phase (see rollback).
Rollback (if needed)
If the zero-downtime migration reveals issues after cutover:
-
Immediate rollback (within minutes): If detected quickly, the source Bitnami databases still have all data (publications were only cleaned up in the last step). Run the standard rollback:
bash rollback.sh -
Late rollback (after publications are dropped): You would need to perform a reverse data migration — dump from the CNPG targets back to the Bitnami sources. This is the same process in reverse.
Clean up Bitnami resources
This phase permanently deletes old Bitnami StatefulSets, PVCs, and the migration backup PVC. After cleanup, rollback to Bitnami subcharts is no longer possible.
Before running this phase, strongly consider:
- Taking a full backup of all databases (
pg_dumpallor equivalent) - Taking PVC or storage volume snapshots (cloud provider snapshots)
- Storing backups in cold storage—for example, S3 Glacier or GCS Archive
- Keeping rollback artifacts in
.state/as a safety net
After confirming the migration is successful, remove old Bitnami StatefulSets, PVCs, services, and the migration backup PVC:
bash 5-cleanup-bitnami.sh
What happens:
- The script requires cutover and validation to be completed and displays a destructive operation warning with a confirmation prompt.
- Deletes old Bitnami PostgreSQL StatefulSets, their PVCs, and headless services (for each migrated component: Identity, Keycloak, and Web Modeler).
- Deletes old Bitnami Elasticsearch StatefulSet, PVCs, and services.
- Deletes old Bitnami Keycloak StatefulSet.
- Deletes the migration backup PVC.
- Reverifies that all Camunda components and operator-managed targets remain healthy after cleanup.
- Suggests removing the
reindex.remote.whitelistsetting from the ECK Elasticsearch configuration as a post-cleanup step.
The script checks whether each resource exists before attempting deletion, so it can be safely rerun if interrupted.
Show details: Cleanup script reference
loading...
Operational readiness
Before running this migration in production, use the checklist below to reduce risk and confirm the cutover plan is ready.
Staging rehearsal
- Clone your production environment to a staging cluster with the same Helm chart version, same component configuration, and comparable data volumes.
- Run the full migration end to end in staging, including all five phases: deploy targets, enable replication, verify synchronization, cutover, and validate/cleanup.
- Measure replication convergence: record how long it takes for PostgreSQL subscriptions to reach
lag_bytes = 0and for Elasticsearch to fully synchronize. These timings determine how long you must wait before cutover. - Test failback: after a successful staging migration, verify you can roll back cleanly with
bash rollback.sh.
Use a representative data set. Small databases converge almost instantly but hide the replication lag behavior you will face with production-sized workloads. Include realistic write load during the staging rehearsal to observe replication behavior under pressure.
Pre-migration checklist
Before starting the migration in production:
- Verify replication prerequisites: confirm
wal_level = logical, sufficientmax_replication_slotsandmax_wal_senders, and (if using CCR) a valid Elastic Platinum license. - Notify stakeholders: although there is no planned downtime, inform them of the migration. If the PostgreSQL restart for
wal_levelis required, coordinate it during a low-traffic period. - Verify backups: confirm your existing backup strategy (Velero, volume snapshots, or cloud provider backups) has a recent successful backup.
- Check cluster resources: ensure the cluster has enough CPU, memory, and storage to run both old and new infrastructure simultaneously — they coexist for the entire replication phase.
- Review
env.sh: double-check all variables, especiallyNAMESPACE,CAMUNDA_RELEASE_NAME, and target cluster names. - Prepare monitoring: set up dashboards for PostgreSQL replication lag, Elasticsearch sync status, pod health, and storage capacity.
Post-migration monitoring
After completing the cutover, monitor the following for at least 48 hours:
- Pod restarts:
kubectl get pods -n ${NAMESPACE} --watch - CNPG cluster health:
kubectl get clusters -n ${NAMESPACE}(should showCluster in healthy state) - ECK cluster health:
kubectl get elasticsearch -n ${NAMESPACE}(should showgreen) - Camunda component logs: check for connection errors, authentication failures, or data inconsistencies.
- Process instance completion: verify that in-flight process instances continue to execute correctly.
- Zeebe export lag: confirm that Zeebe exporters are writing to the new Elasticsearch without delays.
- Sequence values: verify PostgreSQL sequences are correct after cutover (see PostgreSQL logical replication limitations).