Skip to main content
Version: 8.9 (unreleased)

Zero-downtime migration from Bitnami subcharts

Migrate a Camunda 8 Helm installation from Bitnami-managed infrastructure to operator-managed or managed service equivalents without planned application downtime. Instead of the freeze-backup-restore-switch pattern used in the standard migration, this approach keeps source and target synchronized with real-time data replication before cutover.

Advanced topic — commands provided for reference only

This guide describes an advanced migration strategy that eliminates the downtime window present in the standard migration. The commands and examples in this guide are provided for informational purposes only. You must test them on a staging environment that mirrors your production setup before executing them in production. You are expected to familiarize yourself with the underlying concepts and write your own cutover runbook that accounts for your specific constraints, network topology, and data volumes.

For most deployments, Camunda recommends the simpler standard migration with a 5–60 minute maintenance window.

When to use this guide

Use this guide only if all of the following are true:

  • You have already ruled out the standard migration because even a short maintenance window is unacceptable.
  • Your team is comfortable operating PostgreSQL logical replication and one of the supported Elasticsearch synchronization strategies.
  • You can monitor replication lag and validate consistency before cutover.
  • You are prepared to adapt the examples to your topology, especially if the targets are managed services instead of in-cluster operators.

Read the topic overview to learn why you should migrate.

How it works

The zero-downtime migration replaces the backup/restore phases with continuous replication:

PhaseNameDowntimeDescription
1Deploy targetsNoneInstall operators and create target clusters alongside Bitnami
2Enable replicationNoneSet up PG logical replication and ES CCR / continuous snapshots
3Sync and verifyNoneWait for replication lag to reach 0, verify data consistency
4Instantaneous cutoverNoneHelm upgrade to switch backends (rolling restart, no freeze)
5Validate and clean upNoneVerify health, tear down replication, remove old resources

Key differences from the standard migration

AspectStandard migrationZero-downtime migration
Downtime5–60 minutes (Phase 3 freeze)None
Data transferpg_dump/pg_restore + ES _reindexLogical replication + CCR/continuous snapshot
ComplexityLow — scripted and automatedHigh — manual setup, monitoring required
RiskLow — rollback via Helm valuesMedium — replication lag must be monitored
PostgreSQL version requirementAnyPostgreSQL 10+ (logical replication)
Elasticsearch requirement_reindex API (reindex from remote)CCR (Platinum license) or continuous snapshots

Prerequisites

Before starting the migration, ensure you have the following general prerequisites:

  • A running Camunda 8 installation using the Helm chart with Bitnami subcharts enabled
  • kubectl configured and pointing to your cluster
  • helm with the camunda/camunda-platform repository added
  • Sufficient cluster resources to temporarily run both old and new infrastructure side-by-side
  • A tested backup of your current installation (see Precautions)

In addition to the general prerequisites:

  • PostgreSQL source must support logical replication (wal_level = logical). This may require a restart of the Bitnami PostgreSQL StatefulSet.
  • Deep understanding of your data volumes, replication lag tolerances, and network throughput between source and target.
  • A monitoring solution to track replication lag (for example, Prometheus, Grafana, or manual queries).

Elasticsearch requires an explicit tradeoff:

  • Cross-cluster replication
    • Use if you need the closest possible parity at cutover.
    • This option requires an Elastic Platinum license
  • Continuous snapshots
    • Use if a small lag window is acceptable.
    • This option requires the ability to run continuous snapshots with very short intervals.

Precautions

Review the general precautions that apply to all migration paths.

Additionally, note that while this migration path removes the planned downtime window, it does not remove the need for rehearsal, monitoring, and rollback planning. Treat it as a custom migration pattern rather than a push-button alternative to the standard workflow.

Before running in production

Review the operational readiness checklist, including the staging rehearsal and pre-migration checklist, before starting a production migration.

PostgreSQL logical replication limitations

  • DDL not replicated: Schema changes (CREATE TABLE, ALTER TABLE, etc.) are not replicated. If the source schema changes during migration, you must apply the same changes to the target manually.

  • Large objects: pg_largeobject data is not replicated via logical replication.

  • Sequences: Sequence values are not replicated. After cutover, sequences on the target may need to be reset:

    -- Run on each target database after cutover
    SELECT setval(pg_get_serial_sequence(table_name, column_name), max(column_name))
    FROM table_name;
  • TRUNCATE: TRUNCATE is replicated only in PostgreSQL 11+.

Elasticsearch limitations

  • CCR requires Platinum license: The open-source and Basic tiers do not include cross-cluster replication.
  • Continuous snapshots have lag: The snapshot approach introduces a replication delay equal to the snapshot interval.
  • Index mapping conflicts: If the source creates new indices during replication, they must be manually added to the CCR follow configuration.

Keycloak considerations

Keycloak data is stored in PostgreSQL, so it is covered by the PostgreSQL logical replication. The Keycloak Operator CR will start using the replicated data in the CNPG cluster after the Helm upgrade.

However, be aware of Keycloak session data:

  • Active user sessions stored in PostgreSQL will be replicated.
  • In-memory Infinispan caches will be rebuilt on the new Keycloak pods.
  • Users may need to re-authenticate after the cutover (session cookies point to the old Keycloak pods).

Assumed target infrastructure

The commands and snippets in this guide assume CloudNativePG (CNPG) as the PostgreSQL target and ECK as the Elasticsearch target. If you are migrating to managed services (for example, AWS RDS or Elastic Cloud), replace the target hostnames, credentials, and connection methods accordingly.

Clone the deployment references repository

This guide uses scripts from the Camunda deployment references repository. Clone the repository and navigate to the migration directory:

git clone https://github.com/camunda/camunda-deployment-references.git
cd camunda-deployment-references/generic/kubernetes/migration

Configure the migration by editing env.sh to match your current Camunda installation, then source it:

source env.sh

For a full description of configuration variables, see configure the migration.

Phase 1: Deploy target infrastructure

This phase is identical to Phase 1 of the standard migration. Deploy the target operators and clusters alongside the existing Bitnami components:

bash 1-deploy-targets.sh

After this phase, both the old Bitnami infrastructure and the new operator-managed infrastructure run side by side. No traffic is routed to the new targets yet.

Phase 2: Enable real-time replication

PostgreSQL: Logical replication

PostgreSQL logical replication allows streaming changes in real time from the Bitnami PostgreSQL instances to the CNPG (or managed service) targets without stopping the source.

Step 1: Enable logical replication on the source

The source Bitnami PostgreSQL must have wal_level = logical. Check the current setting:

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "SHOW wal_level;"

If it returns replica (the default), you need to change it:

# Patch the Bitnami PostgreSQL ConfigMap or StatefulSet
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "ALTER SYSTEM SET wal_level = 'logical';"
Restart required

Changing wal_level requires a PostgreSQL restart. This is the only brief interruption in the zero-downtime approach — a PostgreSQL restart typically completes in a few seconds, and Camunda components reconnect automatically.

kubectl rollout restart statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE}
kubectl rollout status statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} --timeout=120s

Also ensure max_replication_slots and max_wal_senders are sufficient (at least 4 each — one per database plus overhead):

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -c "SHOW max_replication_slots; SHOW max_wal_senders;"

Step 2: Create publications on the source

For each database, create a publication that includes all tables:

# Identity database
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "CREATE PUBLICATION identity_migration FOR ALL TABLES;"

# Keycloak database
KEYCLOAK_STS="${CAMUNDA_RELEASE_NAME}-keycloak-postgresql"
kubectl exec -it ${KEYCLOAK_STS}-0 -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "CREATE PUBLICATION keycloak_migration FOR ALL TABLES;"

# Web Modeler database
WEBMODELER_STS="${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler"
kubectl exec -it ${WEBMODELER_STS}-0 -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "CREATE PUBLICATION webmodeler_migration FOR ALL TABLES;"

Depending on your Helm chart version, each component may use a separate Bitnami PostgreSQL StatefulSet or share one. Adjust the StatefulSet names accordingly.

Step 3: Perform initial data sync

Before enabling subscriptions, perform a one-time schema and data sync. Logical replication only replicates DML (INSERT/UPDATE/DELETE), not DDL (schema changes):

Show details: initial PostgreSQL sync example
# For each component, dump the schema + data and restore to the target
for COMPONENT in identity keycloak webmodeler; do
SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')
SOURCE_HOST="${SOURCE_STS}.${NAMESPACE}.svc.cluster.local"

# Determine the target based on operator or external
if [[ "$COMPONENT" == "identity" ]]; then
TARGET_HOST="${CNPG_IDENTITY_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_IDENTITY_CLUSTER}-secret"
elif [[ "$COMPONENT" == "keycloak" ]]; then
TARGET_HOST="${CNPG_KEYCLOAK_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_KEYCLOAK_CLUSTER}-secret"
elif [[ "$COMPONENT" == "webmodeler" ]]; then
TARGET_HOST="${CNPG_WEBMODELER_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
TARGET_SECRET="${CNPG_WEBMODELER_CLUSTER}-secret"
fi

echo "Syncing ${COMPONENT}: ${SOURCE_HOST}${TARGET_HOST}"

# Dump and restore (this is a one-time operation, not a freeze)
kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
pg_dump -U ${COMPONENT} -d ${COMPONENT} -F custom -f /tmp/${COMPONENT}.dump

kubectl cp ${NAMESPACE}/${SOURCE_STS}-0:/tmp/${COMPONENT}.dump ./${COMPONENT}.dump

# Get target password
TARGET_PWD=$(kubectl get secret ${TARGET_SECRET} -n ${NAMESPACE} -o jsonpath='{.data.password}' | base64 -d)

# Restore to target via a temporary pod
kubectl run pg-restore-${COMPONENT} --rm -i --restart=Never \
--image=postgres:16 -n ${NAMESPACE} \
--env="PGPASSWORD=${TARGET_PWD}" -- \
pg_restore -h ${TARGET_HOST} -U ${COMPONENT} -d ${COMPONENT} \
--clean --if-exists --no-owner --no-privileges /dev/stdin < ./${COMPONENT}.dump
done

Step 4: Create subscriptions on the target

On each CNPG target cluster, create a subscription pointing to the source:

Show details: subscription creation example
# Get source password
SOURCE_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)

# Identity — target the -rw service to ensure writes land on the current primary
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "
CREATE SUBSCRIPTION identity_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=identity user=postgres password=${SOURCE_PWD}'
PUBLICATION identity_migration
WITH (copy_data = false);
"

# Keycloak
KEYCLOAK_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "
CREATE SUBSCRIPTION keycloak_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-keycloak-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=keycloak user=postgres password=${KEYCLOAK_PWD}'
PUBLICATION keycloak_migration
WITH (copy_data = false);
"

# Web Modeler
WEBMODELER_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler -n ${NAMESPACE} \
-o jsonpath='{.data.postgres-password}' | base64 -d)

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "
CREATE SUBSCRIPTION webmodeler_sub
CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler.${NAMESPACE}.svc.cluster.local port=5432 dbname=webmodeler user=postgres password=${WEBMODELER_PWD}'
PUBLICATION webmodeler_migration
WITH (copy_data = false);
"

The copy_data = false flag is important because we already performed the initial sync in Step 3. The subscription will now stream only new changes in real-time.

Elasticsearch: Continuous synchronization

Unlike PostgreSQL, Elasticsearch does not have a built-in logical replication feature available in the open-source version. Choose one of the following approaches:

StrategyBest whenTradeoff
CCRYou need the closest possible real-time replica and have an Elastic Platinum licenseHighest operational complexity
Continuous snapshotsYou can tolerate a small lag window and want an open-source-compatible approachRecent writes may be missing until re-export catches up

If you have an Elastic Platinum license, you can use cross-cluster replication (CCR) to replicate indices in real-time:

Show details: CCR setup example
# Get ECK ES password
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)

# Get source ES password
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
-o jsonpath='{.data.elasticsearch-password}' | base64 -d)

# Configure the target ECK cluster to recognize the source as a remote
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X PUT \
"http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d '{
"persistent": {
"cluster": {
"remote": {
"bitnami_source": {
"seeds": ["'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-0.'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-headless.'${NAMESPACE}'.svc.cluster.local:9300"]
}
}
}
}
}'

# Create follower indices for each Camunda index pattern
for PATTERN in zeebe operate tasklist optimize connectors camunda; do
# List source indices matching the pattern
INDICES=$(kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${SOURCE_ES_PWD}" \
"http://localhost:9200/_cat/indices/${PATTERN}-*?h=index" | tr -d '[:space:]' | tr '\n' ' ')

for IDX in $INDICES; do
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X PUT \
"http://localhost:9200/${IDX}/_ccr/follow" \
-H 'Content-Type: application/json' \
-d '{
"remote_cluster": "bitnami_source",
"leader_index": "'${IDX}'"
}'
done
done

Phase 3: Verify synchronization

Before performing the cutover, verify that replication is caught up and data is consistent.

Monitor PostgreSQL replication lag

Check the replication lag on each subscription:

Show details: PostgreSQL lag check example
# On each CNPG target, check subscription status
for CLUSTER in ${CNPG_IDENTITY_CLUSTER} ${CNPG_KEYCLOAK_CLUSTER} ${CNPG_WEBMODELER_CLUSTER}; do
echo "=== ${CLUSTER} ==="
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -c "
SELECT subname, received_lsn, latest_end_lsn,
latest_end_lsn - received_lsn AS lag_bytes
FROM pg_stat_subscription;
"
done

Wait until lag_bytes is consistently 0 or near-zero before proceeding.

Monitor Elasticsearch sync

Show details: CCR status check example
# Check CCR follower status
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)

kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" \
"http://localhost:9200/_ccr/stats" | jq '.follow_stats.indices[].shards[].leader_global_checkpoint'

Verify row counts

Compare row counts between source and target for each database to confirm data consistency:

Show details: row count verification example
for COMPONENT in identity keycloak webmodeler; do
SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')

echo "=== ${COMPONENT} ==="
echo "Source:"
kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
psql -U ${COMPONENT} -d ${COMPONENT} -c "
SELECT schemaname, relname, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC LIMIT 10;
"

CNPG_CLUSTER_VAR="CNPG_${COMPONENT^^}_CLUSTER"
echo "Target (${!CNPG_CLUSTER_VAR}):"
kubectl exec -it ${!CNPG_CLUSTER_VAR}-1 -n ${NAMESPACE} -- \
psql -U ${COMPONENT} -d ${COMPONENT} -c "
SELECT schemaname, relname, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC LIMIT 10;
"
done

Phase 4: Instantaneous cutover

Once replication is confirmed in sync, perform the cutover. This phase uses a rolling Helm upgrade instead of the freeze-then-restore approach, resulting in zero downtime.

Before starting the cutover, confirm this checklist:

  • PostgreSQL subscriptions show lag_bytes at 0 or close to 0 for a sustained interval.
  • Your chosen Elasticsearch sync method is healthy and up to date.
  • The target services are reachable from the Camunda namespace.
  • You have the rollback command, values, and on-call contacts ready before the Helm upgrade.

Step 1: Stop replication (PostgreSQL)

Drop the subscriptions on the target to stop replication and allow the targets to accept writes:

Show details: stop PostgreSQL replication example
# Drop subscriptions — target the current primary by label selector
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "ALTER SUBSCRIPTION identity_sub DISABLE; DROP SUBSCRIPTION identity_sub;"

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "ALTER SUBSCRIPTION keycloak_sub DISABLE; DROP SUBSCRIPTION keycloak_sub;"

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "ALTER SUBSCRIPTION webmodeler_sub DISABLE; DROP SUBSCRIPTION webmodeler_sub;"

Step 2: Stop Elasticsearch replication

Promote follower indices to regular indices:

Show details: CCR cutover example
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
-o jsonpath='{.data.elastic}' | base64 -d)

# Pause and unfollow each replicated index
INDICES=$(kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" "http://localhost:9200/_cat/indices?h=index" | grep -E "^(zeebe|operate|tasklist|optimize)-")

for IDX in $INDICES; do
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/pause_follow"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_close"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/unfollow"
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_open"
done

Step 3: Helm upgrade (rolling restart)

Perform the Helm upgrade to switch Camunda to the new backends. Because there is no freeze, pods are restarted in rolling fashion.

The zero-downtime approach does not use 3-cutover.sh — that script freezes the application, which defeats the purpose. Instead, run the Helm upgrade manually with the operator-based values:

Show details: Helm upgrade example
helm upgrade ${CAMUNDA_RELEASE_NAME} camunda/camunda-platform \
-n ${NAMESPACE} \
--version ${CAMUNDA_HELM_CHART_VERSION} \
-f operator-based-values.yaml \
--wait --timeout 10m

Build the values file by combining the operator-based Helm values files from the reference architecture (for example, camunda-identity-values.yml, camunda-elastic-values.yml, camunda-keycloak-domain-values.yml) to point Camunda at the new backends. Ensure Bitnami subcharts are disabled.

The Helm upgrade triggers a rolling restart of Camunda pods. During this process:

  • Zeebe StatefulSet pods restart one at a time, maintaining quorum.
  • Operate, Tasklist, Optimize, and other deployments restart with zero-downtime rollout strategy.
  • There is a brief period where some pods use old backends and others use new ones, but this is safe because the data has already been replicated.

Step 4: Clean up source publications

After the cutover is confirmed working, clean up the publications on the source:

Show details: source publication cleanup example
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d identity -c "DROP PUBLICATION IF EXISTS identity_migration;"

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql-0 -n ${NAMESPACE} -- \
psql -U postgres -d keycloak -c "DROP PUBLICATION IF EXISTS keycloak_migration;"

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler-0 -n ${NAMESPACE} -- \
psql -U postgres -d webmodeler -c "DROP PUBLICATION IF EXISTS webmodeler_migration;"

Phase 5: Validate and clean up

Validate

Run the standard validation to confirm all components are healthy on the new infrastructure:

bash 4-validate.sh
Wait before cleanup

Do not clean up immediately after validation. Operate with the new infrastructure through at least one full business cycle (for example, a complete weekday with peak traffic) to confirm stability. Once Bitnami resources are deleted, rollback is no longer possible without restoring from backup. If you need to fail back, run bash rollback.sh before this phase (see rollback).

Rollback (if needed)

If the zero-downtime migration reveals issues after cutover:

  1. Immediate rollback (within minutes): If detected quickly, the source Bitnami databases still have all data (publications were only cleaned up in the last step). Run the standard rollback:

    bash rollback.sh
  2. Late rollback (after publications are dropped): You would need to perform a reverse data migration — dump from the CNPG targets back to the Bitnami sources. This is the same process in reverse.

Clean up Bitnami resources

Destructive and irreversible

This phase permanently deletes old Bitnami StatefulSets, PVCs, and the migration backup PVC. After cleanup, rollback to Bitnami subcharts is no longer possible.

Before running this phase, strongly consider:

  1. Taking a full backup of all databases (pg_dumpall or equivalent)
  2. Taking PVC or storage volume snapshots (cloud provider snapshots)
  3. Storing backups in cold storage—for example, S3 Glacier or GCS Archive
  4. Keeping rollback artifacts in .state/ as a safety net

After confirming the migration is successful, remove old Bitnami StatefulSets, PVCs, services, and the migration backup PVC:

bash 5-cleanup-bitnami.sh

What happens:

  1. The script requires cutover and validation to be completed and displays a destructive operation warning with a confirmation prompt.
  2. Deletes old Bitnami PostgreSQL StatefulSets, their PVCs, and headless services (for each migrated component: Identity, Keycloak, and Web Modeler).
  3. Deletes old Bitnami Elasticsearch StatefulSet, PVCs, and services.
  4. Deletes old Bitnami Keycloak StatefulSet.
  5. Deletes the migration backup PVC.
  6. Reverifies that all Camunda components and operator-managed targets remain healthy after cleanup.
  7. Suggests removing the reindex.remote.whitelist setting from the ECK Elasticsearch configuration as a post-cleanup step.

The script checks whether each resource exists before attempting deletion, so it can be safely rerun if interrupted.

Show details: Cleanup script reference
generic/kubernetes/migration/5-cleanup-bitnami.sh
loading...

Operational readiness

Before running this migration in production, use the checklist below to reduce risk and confirm the cutover plan is ready.

Staging rehearsal

  1. Clone your production environment to a staging cluster with the same Helm chart version, same component configuration, and comparable data volumes.
  2. Run the full migration end to end in staging, including all five phases: deploy targets, enable replication, verify synchronization, cutover, and validate/cleanup.
  3. Measure replication convergence: record how long it takes for PostgreSQL subscriptions to reach lag_bytes = 0 and for Elasticsearch to fully synchronize. These timings determine how long you must wait before cutover.
  4. Test failback: after a successful staging migration, verify you can roll back cleanly with bash rollback.sh.
tip

Use a representative data set. Small databases converge almost instantly but hide the replication lag behavior you will face with production-sized workloads. Include realistic write load during the staging rehearsal to observe replication behavior under pressure.

Pre-migration checklist

Before starting the migration in production:

  • Verify replication prerequisites: confirm wal_level = logical, sufficient max_replication_slots and max_wal_senders, and (if using CCR) a valid Elastic Platinum license.
  • Notify stakeholders: although there is no planned downtime, inform them of the migration. If the PostgreSQL restart for wal_level is required, coordinate it during a low-traffic period.
  • Verify backups: confirm your existing backup strategy (Velero, volume snapshots, or cloud provider backups) has a recent successful backup.
  • Check cluster resources: ensure the cluster has enough CPU, memory, and storage to run both old and new infrastructure simultaneously — they coexist for the entire replication phase.
  • Review env.sh: double-check all variables, especially NAMESPACE, CAMUNDA_RELEASE_NAME, and target cluster names.
  • Prepare monitoring: set up dashboards for PostgreSQL replication lag, Elasticsearch sync status, pod health, and storage capacity.

Post-migration monitoring

After completing the cutover, monitor the following for at least 48 hours:

  • Pod restarts: kubectl get pods -n ${NAMESPACE} --watch
  • CNPG cluster health: kubectl get clusters -n ${NAMESPACE} (should show Cluster in healthy state)
  • ECK cluster health: kubectl get elasticsearch -n ${NAMESPACE} (should show green)
  • Camunda component logs: check for connection errors, authentication failures, or data inconsistencies.
  • Process instance completion: verify that in-flight process instances continue to execute correctly.
  • Zeebe export lag: confirm that Zeebe exporters are writing to the new Elasticsearch without delays.
  • Sequence values: verify PostgreSQL sequences are correct after cutover (see PostgreSQL logical replication limitations).