Version: 8.9 (unreleased)

Zero-downtime migration from Bitnami subcharts

Migrate a Camunda 8 Helm installation from Bitnami-managed infrastructure to operator-managed or managed service equivalents without planned application downtime. Instead of the freeze-backup-restore-switch pattern used in the standard migration, this approach keeps source and target synchronized with real-time data replication before cutover.

Advanced topic — commands provided for reference only

This guide describes an advanced migration strategy that eliminates the downtime window present in the standard migration. The commands and examples in this guide are provided for informational purposes only. You must test them on a staging environment that mirrors your production setup before executing them in production. You are expected to familiarize yourself with the underlying concepts and write your own cutover runbook that accounts for your specific constraints, network topology, and data volumes.

For most deployments, Camunda recommends the simpler standard migration with a 5–60 minute maintenance window.

When to use this guide

Use this guide only if all of the following are true:

You have already ruled out the standard migration because even a short maintenance window is unacceptable.
Your team is comfortable operating PostgreSQL logical replication and one of the supported Elasticsearch synchronization strategies.
You can monitor replication lag and validate consistency before cutover.
You are prepared to adapt the examples to your topology, especially if the targets are managed services instead of in-cluster operators.

Read the topic overview to learn why you should migrate.

How it works

The zero-downtime migration replaces the backup/restore phases with continuous replication:

Phase	Name	Downtime	Description
1	Deploy targets	None	Install operators and create target clusters alongside Bitnami
2	Enable replication	None	Set up PG logical replication and ES CCR / continuous snapshots
3	Sync and verify	None	Wait for replication lag to reach 0, verify data consistency
4	Instantaneous cutover	None	Helm upgrade to switch backends (rolling restart, no freeze)
5	Validate and clean up	None	Verify health, tear down replication, remove old resources

Key differences from the standard migration

Aspect	Standard migration	Zero-downtime migration
Downtime	5–60 minutes (Phase 3 freeze)	None
Data transfer	`pg_dump`/`pg_restore` + ES `_reindex`	Logical replication + CCR/continuous snapshot
Complexity	Low — scripted and automated	High — manual setup, monitoring required
Risk	Low — rollback via Helm values	Medium — replication lag must be monitored
PostgreSQL version requirement	Any	PostgreSQL 10+ (logical replication)
Elasticsearch requirement	`_reindex` API (reindex from remote)	CCR (Platinum license) or continuous snapshots

Prerequisites

Before starting the migration, ensure you have the following general prerequisites:

A running Camunda 8 installation using the Helm chart with Bitnami subcharts enabled
kubectl configured and pointing to your cluster
helm with the camunda/camunda-platform repository added
Sufficient cluster resources to temporarily run both old and new infrastructure side-by-side
A tested backup of your current installation (see Precautions)

In addition to the general prerequisites:

PostgreSQL source must support logical replication (wal_level = logical). This may require a restart of the Bitnami PostgreSQL StatefulSet.
Deep understanding of your data volumes, replication lag tolerances, and network throughput between source and target.
A monitoring solution to track replication lag (for example, Prometheus, Grafana, or manual queries).

Elasticsearch requires an explicit tradeoff:

Cross-cluster replication
- Use if you need the closest possible parity at cutover.
- This option requires an Elastic Platinum license
Continuous snapshots
- Use if a small lag window is acceptable.
- This option requires the ability to run continuous snapshots with very short intervals.

Precautions

Review the general precautions that apply to all migration paths.

Additionally, note that while this migration path removes the planned downtime window, it does not remove the need for rehearsal, monitoring, and rollback planning. Treat it as a custom migration pattern rather than a push-button alternative to the standard workflow.

Before running in production

Review the operational readiness checklist, including the staging rehearsal and pre-migration checklist, before starting a production migration.

PostgreSQL logical replication limitations

DDL not replicated: Schema changes (CREATE TABLE, ALTER TABLE, etc.) are not replicated. If the source schema changes during migration, you must apply the same changes to the target manually.
Large objects: pg_largeobject data is not replicated via logical replication.

Sequences: Sequence values are not replicated. After cutover, sequences on the target may need to be reset:

-- Run on each target database after cutover
SELECT setval(pg_get_serial_sequence(table_name, column_name), max(column_name))
FROM table_name;

TRUNCATE: TRUNCATE is replicated only in PostgreSQL 11+.

Elasticsearch limitations

CCR requires Platinum license: The open-source and Basic tiers do not include cross-cluster replication.
Continuous snapshots have lag: The snapshot approach introduces a replication delay equal to the snapshot interval.
Index mapping conflicts: If the source creates new indices during replication, they must be manually added to the CCR follow configuration.

Keycloak considerations

Keycloak data is stored in PostgreSQL, so it is covered by the PostgreSQL logical replication. The Keycloak Operator CR will start using the replicated data in the CNPG cluster after the Helm upgrade.

However, be aware of Keycloak session data:

Active user sessions stored in PostgreSQL will be replicated.
In-memory Infinispan caches will be rebuilt on the new Keycloak pods.
Users may need to re-authenticate after the cutover (session cookies point to the old Keycloak pods).

Assumed target infrastructure

The commands and snippets in this guide assume CloudNativePG (CNPG) as the PostgreSQL target and ECK as the Elasticsearch target. If you are migrating to managed services (for example, AWS RDS or Elastic Cloud), replace the target hostnames, credentials, and connection methods accordingly.

Clone the deployment references repository

This guide uses scripts from the Camunda deployment references repository. Clone the repository and navigate to the migration directory:

git clone https://github.com/camunda/camunda-deployment-references.git
cd camunda-deployment-references/generic/kubernetes/migration

Configure the migration by editing env.sh to match your current Camunda installation, then source it:

source env.sh

For a full description of configuration variables, see configure the migration.

Phase 1: Deploy target infrastructure

This phase is identical to Phase 1 of the standard migration. Deploy the target operators and clusters alongside the existing Bitnami components:

bash 1-deploy-targets.sh

After this phase, both the old Bitnami infrastructure and the new operator-managed infrastructure run side by side. No traffic is routed to the new targets yet.

Phase 2: Enable real-time replication

PostgreSQL: Logical replication

PostgreSQL logical replication allows streaming changes in real time from the Bitnami PostgreSQL instances to the CNPG (or managed service) targets without stopping the source.

Step 1: Enable logical replication on the source

The source Bitnami PostgreSQL must have wal_level = logical. Check the current setting:

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -c "SHOW wal_level;"

If it returns replica (the default), you need to change it:

# Patch the Bitnami PostgreSQL ConfigMap or StatefulSet
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -c "ALTER SYSTEM SET wal_level = 'logical';"

Restart required

Changing wal_level requires a PostgreSQL restart. This is the only brief interruption in the zero-downtime approach — a PostgreSQL restart typically completes in a few seconds, and Camunda components reconnect automatically.

kubectl rollout restart statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE}
kubectl rollout status statefulset ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} --timeout=120s

Also ensure max_replication_slots and max_wal_senders are sufficient (at least 4 each — one per database plus overhead):

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -c "SHOW max_replication_slots; SHOW max_wal_senders;"

Step 2: Create publications on the source

For each database, create a publication that includes all tables:

# Identity database
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -d identity -c "CREATE PUBLICATION identity_migration FOR ALL TABLES;"

# Keycloak database
KEYCLOAK_STS="${CAMUNDA_RELEASE_NAME}-keycloak-postgresql"
kubectl exec -it ${KEYCLOAK_STS}-0 -n ${NAMESPACE} -- \
  psql -U postgres -d keycloak -c "CREATE PUBLICATION keycloak_migration FOR ALL TABLES;"

# Web Modeler database
WEBMODELER_STS="${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler"
kubectl exec -it ${WEBMODELER_STS}-0 -n ${NAMESPACE} -- \
  psql -U postgres -d webmodeler -c "CREATE PUBLICATION webmodeler_migration FOR ALL TABLES;"

Depending on your Helm chart version, each component may use a separate Bitnami PostgreSQL StatefulSet or share one. Adjust the StatefulSet names accordingly.

Step 3: Perform initial data sync

Before enabling subscriptions, perform a one-time schema and data sync. Logical replication only replicates DML (INSERT/UPDATE/DELETE), not DDL (schema changes):

Show details: initial PostgreSQL sync example

# For each component, dump the schema + data and restore to the target
for COMPONENT in identity keycloak webmodeler; do
  SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')
  SOURCE_HOST="${SOURCE_STS}.${NAMESPACE}.svc.cluster.local"

  # Determine the target based on operator or external
  if [[ "$COMPONENT" == "identity" ]]; then
    TARGET_HOST="${CNPG_IDENTITY_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
    TARGET_SECRET="${CNPG_IDENTITY_CLUSTER}-secret"
  elif [[ "$COMPONENT" == "keycloak" ]]; then
    TARGET_HOST="${CNPG_KEYCLOAK_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
    TARGET_SECRET="${CNPG_KEYCLOAK_CLUSTER}-secret"
  elif [[ "$COMPONENT" == "webmodeler" ]]; then
    TARGET_HOST="${CNPG_WEBMODELER_CLUSTER}-rw.${NAMESPACE}.svc.cluster.local"
    TARGET_SECRET="${CNPG_WEBMODELER_CLUSTER}-secret"
  fi

  echo "Syncing ${COMPONENT}: ${SOURCE_HOST} → ${TARGET_HOST}"

  # Dump and restore (this is a one-time operation, not a freeze)
  kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
    pg_dump -U ${COMPONENT} -d ${COMPONENT} -F custom -f /tmp/${COMPONENT}.dump

  kubectl cp ${NAMESPACE}/${SOURCE_STS}-0:/tmp/${COMPONENT}.dump ./${COMPONENT}.dump

  # Get target password
  TARGET_PWD=$(kubectl get secret ${TARGET_SECRET} -n ${NAMESPACE} -o jsonpath='{.data.password}' | base64 -d)

  # Restore to target via a temporary pod
  kubectl run pg-restore-${COMPONENT} --rm -i --restart=Never \
    --image=postgres:16 -n ${NAMESPACE} \
    --env="PGPASSWORD=${TARGET_PWD}" -- \
    pg_restore -h ${TARGET_HOST} -U ${COMPONENT} -d ${COMPONENT} \
    --clean --if-exists --no-owner --no-privileges /dev/stdin < ./${COMPONENT}.dump
done

Step 4: Create subscriptions on the target

On each CNPG target cluster, create a subscription pointing to the source:

Show details: subscription creation example

# Get source password
SOURCE_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql -n ${NAMESPACE} \
  -o jsonpath='{.data.postgres-password}' | base64 -d)

# Identity — target the -rw service to ensure writes land on the current primary
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d identity -c "
    CREATE SUBSCRIPTION identity_sub
    CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=identity user=postgres password=${SOURCE_PWD}'
    PUBLICATION identity_migration
    WITH (copy_data = false);
  "

# Keycloak
KEYCLOAK_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql -n ${NAMESPACE} \
  -o jsonpath='{.data.postgres-password}' | base64 -d)

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d keycloak -c "
    CREATE SUBSCRIPTION keycloak_sub
    CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-keycloak-postgresql.${NAMESPACE}.svc.cluster.local port=5432 dbname=keycloak user=postgres password=${KEYCLOAK_PWD}'
    PUBLICATION keycloak_migration
    WITH (copy_data = false);
  "

# Web Modeler
WEBMODELER_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler -n ${NAMESPACE} \
  -o jsonpath='{.data.postgres-password}' | base64 -d)

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d webmodeler -c "
    CREATE SUBSCRIPTION webmodeler_sub
    CONNECTION 'host=${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler.${NAMESPACE}.svc.cluster.local port=5432 dbname=webmodeler user=postgres password=${WEBMODELER_PWD}'
    PUBLICATION webmodeler_migration
    WITH (copy_data = false);
  "

The copy_data = false flag is important because we already performed the initial sync in Step 3. The subscription will now stream only new changes in real-time.

Elasticsearch: Continuous synchronization

Unlike PostgreSQL, Elasticsearch does not have a built-in logical replication feature available in the open-source version. Choose one of the following approaches:

Strategy	Best when	Tradeoff
CCR	You need the closest possible real-time replica and have an Elastic Platinum license	Highest operational complexity
Continuous snapshots	You can tolerate a small lag window and want an open-source-compatible approach	Recent writes may be missing until re-export catches up

Cross-cluster replication (Platinum)
Continuous snapshots (open-source)

If you have an Elastic Platinum license, you can use cross-cluster replication (CCR) to replicate indices in real-time:

Show details: CCR setup example

# Get ECK ES password
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
  -o jsonpath='{.data.elastic}' | base64 -d)

# Get source ES password
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
  -o jsonpath='{.data.elasticsearch-password}' | base64 -d)

# Configure the target ECK cluster to recognize the source as a remote
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${ECK_PWD}" -X PUT \
  "http://localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' \
  -d '{
    "persistent": {
      "cluster": {
        "remote": {
          "bitnami_source": {
            "seeds": ["'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-0.'${CAMUNDA_RELEASE_NAME}'-elasticsearch-master-headless.'${NAMESPACE}'.svc.cluster.local:9300"]
          }
        }
      }
    }
  }'

# Create follower indices for each Camunda index pattern
for PATTERN in zeebe operate tasklist optimize connectors camunda; do
  # List source indices matching the pattern
  INDICES=$(kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
    curl -sf -u "elastic:${SOURCE_ES_PWD}" \
    "http://localhost:9200/_cat/indices/${PATTERN}-*?h=index" | tr -d '[:space:]' | tr '\n' ' ')

  for IDX in $INDICES; do
    kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
      curl -sf -u "elastic:${ECK_PWD}" -X PUT \
      "http://localhost:9200/${IDX}/_ccr/follow" \
      -H 'Content-Type: application/json' \
      -d '{
        "remote_cluster": "bitnami_source",
        "leader_index": "'${IDX}'"
      }'
  done
done

If you don't have a Platinum license, use continuous snapshot/restore with SLM (Snapshot Lifecycle Management) to keep the target close to the source. This approach has a small replication lag (typically 5–15 minutes):

Show details: continuous snapshot setup example

# Get source ES password
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
  -o jsonpath='{.data.elasticsearch-password}' | base64 -d)

# Register a shared snapshot repository on the source (using the backup PVC)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${SOURCE_ES_PWD}" -X PUT \
  "http://localhost:9200/_snapshot/migration_continuous" \
  -H 'Content-Type: application/json' \
  -d '{"type":"fs","settings":{"location":"/backup/elasticsearch/continuous"}}'

# Create an SLM policy for frequent snapshots (every 5 minutes)
kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${SOURCE_ES_PWD}" -X PUT \
  "http://localhost:9200/_slm/policy/migration_continuous" \
  -H 'Content-Type: application/json' \
  -d '{
    "schedule": "0 */5 * * * ?",
    "name": "<migration-snap-{now/m{yyyyMMdd-HHmmss}}>",
    "repository": "migration_continuous",
    "config": {
      "indices": ["*"],
      "ignore_unavailable": true,
      "include_global_state": false
    },
    "retention": {
      "expire_after": "1h",
      "min_count": 1,
      "max_count": 5
    }
  }'

Before cutover, you will restore the latest snapshot to the target ECK cluster.

Phase 3: Verify synchronization

Before performing the cutover, verify that replication is caught up and data is consistent.

Monitor PostgreSQL replication lag

Check the replication lag on each subscription:

Show details: PostgreSQL lag check example

# On each CNPG target, check subscription status
for CLUSTER in ${CNPG_IDENTITY_CLUSTER} ${CNPG_KEYCLOAK_CLUSTER} ${CNPG_WEBMODELER_CLUSTER}; do
  echo "=== ${CLUSTER} ==="
  kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
    psql -U postgres -c "
      SELECT subname, received_lsn, latest_end_lsn,
             latest_end_lsn - received_lsn AS lag_bytes
      FROM pg_stat_subscription;
    "
done

Wait until lag_bytes is consistently 0 or near-zero before proceeding.

Monitor Elasticsearch sync

Cross-cluster replication (Platinum)
Continuous snapshots (open-source)

Show details: CCR status check example

# Check CCR follower status
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
  -o jsonpath='{.data.elastic}' | base64 -d)

kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${ECK_PWD}" \
  "http://localhost:9200/_ccr/stats" | jq '.follow_stats.indices[].shards[].leader_global_checkpoint'

Show details: snapshot status check example

# Check the latest snapshot status
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
  -o jsonpath='{.data.elasticsearch-password}' | base64 -d)

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${SOURCE_ES_PWD}" \
  "http://localhost:9200/_slm/policy/migration_continuous" | jq '.last_success'

Verify row counts

Compare row counts between source and target for each database to confirm data consistency:

Show details: row count verification example

for COMPONENT in identity keycloak webmodeler; do
  SOURCE_STS=$(kubectl get statefulset -n ${NAMESPACE} -o name | grep -i "${COMPONENT}.*postgresql" | head -1 | sed 's|statefulset.apps/||')

  echo "=== ${COMPONENT} ==="
  echo "Source:"
  kubectl exec -it ${SOURCE_STS}-0 -n ${NAMESPACE} -- \
    psql -U ${COMPONENT} -d ${COMPONENT} -c "
      SELECT schemaname, relname, n_live_tup
      FROM pg_stat_user_tables
      ORDER BY n_live_tup DESC LIMIT 10;
    "

  CNPG_CLUSTER_VAR="CNPG_${COMPONENT^^}_CLUSTER"
  echo "Target (${!CNPG_CLUSTER_VAR}):"
  kubectl exec -it ${!CNPG_CLUSTER_VAR}-1 -n ${NAMESPACE} -- \
    psql -U ${COMPONENT} -d ${COMPONENT} -c "
      SELECT schemaname, relname, n_live_tup
      FROM pg_stat_user_tables
      ORDER BY n_live_tup DESC LIMIT 10;
    "
done

Phase 4: Instantaneous cutover

Once replication is confirmed in sync, perform the cutover. This phase uses a rolling Helm upgrade instead of the freeze-then-restore approach, resulting in zero downtime.

Before starting the cutover, confirm this checklist:

PostgreSQL subscriptions show lag_bytes at 0 or close to 0 for a sustained interval.
Your chosen Elasticsearch sync method is healthy and up to date.
The target services are reachable from the Camunda namespace.
You have the rollback command, values, and on-call contacts ready before the Helm upgrade.

Step 1: Stop replication (PostgreSQL)

Drop the subscriptions on the target to stop replication and allow the targets to accept writes:

Show details: stop PostgreSQL replication example

# Drop subscriptions — target the current primary by label selector
kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_IDENTITY_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d identity -c "ALTER SUBSCRIPTION identity_sub DISABLE; DROP SUBSCRIPTION identity_sub;"

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_KEYCLOAK_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d keycloak -c "ALTER SUBSCRIPTION keycloak_sub DISABLE; DROP SUBSCRIPTION keycloak_sub;"

kubectl exec -it $(kubectl get pod -n ${NAMESPACE} -l cnpg.io/cluster=${CNPG_WEBMODELER_CLUSTER},cnpg.io/instanceRole=primary -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} -- \
  psql -U postgres -d webmodeler -c "ALTER SUBSCRIPTION webmodeler_sub DISABLE; DROP SUBSCRIPTION webmodeler_sub;"

Step 2: Stop Elasticsearch replication

Cross-cluster replication (Platinum)
Continuous snapshots (open-source)

Promote follower indices to regular indices:

Show details: CCR cutover example

ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
  -o jsonpath='{.data.elastic}' | base64 -d)

# Pause and unfollow each replicated index
INDICES=$(kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${ECK_PWD}" "http://localhost:9200/_cat/indices?h=index" | grep -E "^(zeebe|operate|tasklist|optimize)-")

for IDX in $INDICES; do
  kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
    curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/pause_follow"
  kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
    curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_close"
  kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
    curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_ccr/unfollow"
  kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
    curl -sf -u "elastic:${ECK_PWD}" -X POST "http://localhost:9200/${IDX}/_open"
done

Restore the latest snapshot to the target ECK cluster:

Show details: snapshot restore example

# Delete the SLM policy
SOURCE_ES_PWD=$(kubectl get secret ${CAMUNDA_RELEASE_NAME}-elasticsearch -n ${NAMESPACE} \
  -o jsonpath='{.data.elasticsearch-password}' | base64 -d)

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${SOURCE_ES_PWD}" -X DELETE \
  "http://localhost:9200/_slm/policy/migration_continuous"

# Get the latest snapshot name
LATEST_SNAP=$(kubectl exec -it ${CAMUNDA_RELEASE_NAME}-elasticsearch-master-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${SOURCE_ES_PWD}" \
  "http://localhost:9200/_snapshot/migration_continuous/_all" | jq -r '.snapshots[-1].snapshot')

# Restore to target ECK
ECK_PWD=$(kubectl get secret ${ECK_CLUSTER_NAME}-es-elastic-user -n ${NAMESPACE} \
  -o jsonpath='{.data.elastic}' | base64 -d)

# Register the repo on the target
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${ECK_PWD}" -X PUT \
  "http://localhost:9200/_snapshot/migration_continuous" \
  -H 'Content-Type: application/json' \
  -d '{"type":"fs","settings":{"location":"/backup/elasticsearch/continuous"}}'

# Restore
kubectl exec -it ${ECK_CLUSTER_NAME}-es-masters-0 -n ${NAMESPACE} -- \
  curl -sf -u "elastic:${ECK_PWD}" -X POST \
  "http://localhost:9200/_snapshot/migration_continuous/${LATEST_SNAP}/_restore?wait_for_completion=true" \
  -H 'Content-Type: application/json' \
  -d '{"indices":"*","ignore_unavailable":true,"include_global_state":false}'

With the continuous snapshot approach, there is a small window (up to the snapshot interval, for example 5 minutes) where recent Elasticsearch writes may not be captured. Zeebe will re-export these events after the cutover.

Step 3: Helm upgrade (rolling restart)

Perform the Helm upgrade to switch Camunda to the new backends. Because there is no freeze, pods are restarted in rolling fashion.

The zero-downtime approach does not use 3-cutover.sh — that script freezes the application, which defeats the purpose. Instead, run the Helm upgrade manually with the operator-based values:

Show details: Helm upgrade example

helm upgrade ${CAMUNDA_RELEASE_NAME} camunda/camunda-platform \
  -n ${NAMESPACE} \
  --version ${CAMUNDA_HELM_CHART_VERSION} \
  -f operator-based-values.yaml \
  --wait --timeout 10m

Build the values file by combining the operator-based Helm values files from the reference architecture (for example, camunda-identity-values.yml, camunda-elastic-values.yml, camunda-keycloak-domain-values.yml) to point Camunda at the new backends. Ensure Bitnami subcharts are disabled.

The Helm upgrade triggers a rolling restart of Camunda pods. During this process:

Zeebe StatefulSet pods restart one at a time, maintaining quorum.
Operate, Tasklist, Optimize, and other deployments restart with zero-downtime rollout strategy.
There is a brief period where some pods use old backends and others use new ones, but this is safe because the data has already been replicated.

Step 4: Clean up source publications

After the cutover is confirmed working, clean up the publications on the source:

Show details: source publication cleanup example

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -d identity -c "DROP PUBLICATION IF EXISTS identity_migration;"

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-keycloak-postgresql-0 -n ${NAMESPACE} -- \
  psql -U postgres -d keycloak -c "DROP PUBLICATION IF EXISTS keycloak_migration;"

kubectl exec -it ${CAMUNDA_RELEASE_NAME}-postgresql-web-modeler-0 -n ${NAMESPACE} -- \
  psql -U postgres -d webmodeler -c "DROP PUBLICATION IF EXISTS webmodeler_migration;"

Phase 5: Validate and clean up

Validate

Run the standard validation to confirm all components are healthy on the new infrastructure:

bash 4-validate.sh

Wait before cleanup

Do not clean up immediately after validation. Operate with the new infrastructure through at least one full business cycle (for example, a complete weekday with peak traffic) to confirm stability. Once Bitnami resources are deleted, rollback is no longer possible without restoring from backup. If you need to fail back, run bash rollback.sh before this phase (see rollback).

Rollback (if needed)

If the zero-downtime migration reveals issues after cutover:

Immediate rollback (within minutes): If detected quickly, the source Bitnami databases still have all data (publications were only cleaned up in the last step). Run the standard rollback:
```
bash rollback.sh
```
Late rollback (after publications are dropped): You would need to perform a reverse data migration — dump from the CNPG targets back to the Bitnami sources. This is the same process in reverse.

Clean up Bitnami resources

Destructive and irreversible

This phase permanently deletes old Bitnami StatefulSets, PVCs, and the migration backup PVC. After cleanup, rollback to Bitnami subcharts is no longer possible.

Before running this phase, strongly consider:

Taking a full backup of all databases (pg_dumpall or equivalent)
Taking PVC or storage volume snapshots (cloud provider snapshots)
Storing backups in cold storage—for example, S3 Glacier or GCS Archive
Keeping rollback artifacts in .state/ as a safety net

After confirming the migration is successful, remove old Bitnami StatefulSets, PVCs, services, and the migration backup PVC:

bash 5-cleanup-bitnami.sh

What happens:

The script requires cutover and validation to be completed and displays a destructive operation warning with a confirmation prompt.
Deletes old Bitnami PostgreSQL StatefulSets, their PVCs, and headless services (for each migrated component: Identity, Keycloak, and Web Modeler).
Deletes old Bitnami Elasticsearch StatefulSet, PVCs, and services.
Deletes old Bitnami Keycloak StatefulSet.
Deletes the migration backup PVC.
Reverifies that all Camunda components and operator-managed targets remain healthy after cleanup.
Suggests removing the reindex.remote.whitelist setting from the ECK Elasticsearch configuration as a post-cleanup step.

The script checks whether each resource exists before attempting deletion, so it can be safely rerun if interrupted.

Show details: Cleanup script reference

generic/kubernetes/migration/5-cleanup-bitnami.sh
loading...

See full example on GitHub

Operational readiness

Before running this migration in production, use the checklist below to reduce risk and confirm the cutover plan is ready.

Staging rehearsal

Clone your production environment to a staging cluster with the same Helm chart version, same component configuration, and comparable data volumes.
Run the full migration end to end in staging, including all five phases: deploy targets, enable replication, verify synchronization, cutover, and validate/cleanup.
Measure replication convergence: record how long it takes for PostgreSQL subscriptions to reach lag_bytes = 0 and for Elasticsearch to fully synchronize. These timings determine how long you must wait before cutover.
Test failback: after a successful staging migration, verify you can roll back cleanly with bash rollback.sh.

tip

Use a representative data set. Small databases converge almost instantly but hide the replication lag behavior you will face with production-sized workloads. Include realistic write load during the staging rehearsal to observe replication behavior under pressure.

Pre-migration checklist

Before starting the migration in production:

Verify replication prerequisites: confirm wal_level = logical, sufficient max_replication_slots and max_wal_senders, and (if using CCR) a valid Elastic Platinum license.
Notify stakeholders: although there is no planned downtime, inform them of the migration. If the PostgreSQL restart for wal_level is required, coordinate it during a low-traffic period.
Verify backups: confirm your existing backup strategy (Velero, volume snapshots, or cloud provider backups) has a recent successful backup.
Check cluster resources: ensure the cluster has enough CPU, memory, and storage to run both old and new infrastructure simultaneously — they coexist for the entire replication phase.
Review env.sh: double-check all variables, especially NAMESPACE, CAMUNDA_RELEASE_NAME, and target cluster names.
Prepare monitoring: set up dashboards for PostgreSQL replication lag, Elasticsearch sync status, pod health, and storage capacity.

Post-migration monitoring

After completing the cutover, monitor the following for at least 48 hours:

Pod restarts: kubectl get pods -n ${NAMESPACE} --watch
CNPG cluster health: kubectl get clusters -n ${NAMESPACE} (should show Cluster in healthy state)
ECK cluster health: kubectl get elasticsearch -n ${NAMESPACE} (should show green)
Camunda component logs: check for connection errors, authentication failures, or data inconsistencies.
Process instance completion: verify that in-flight process instances continue to execute correctly.
Zeebe export lag: confirm that Zeebe exporters are writing to the new Elasticsearch without delays.
Sequence values: verify PostgreSQL sequences are correct after cutover (see PostgreSQL logical replication limitations).

When to use this guide​

How it works​

Key differences from the standard migration​

Prerequisites​

Precautions​

PostgreSQL logical replication limitations​

Elasticsearch limitations​

Keycloak considerations​

Assumed target infrastructure​

Clone the deployment references repository​

Phase 1: Deploy target infrastructure​

Phase 2: Enable real-time replication​

PostgreSQL: Logical replication​

Step 1: Enable logical replication on the source​

Step 2: Create publications on the source​

Step 3: Perform initial data sync​

Step 4: Create subscriptions on the target​

Elasticsearch: Continuous synchronization​

Phase 3: Verify synchronization​

Monitor PostgreSQL replication lag​

Monitor Elasticsearch sync​

Verify row counts​

Phase 4: Instantaneous cutover​

Step 1: Stop replication (PostgreSQL)​

Step 2: Stop Elasticsearch replication​

Step 3: Helm upgrade (rolling restart)​

Step 4: Clean up source publications​

Phase 5: Validate and clean up​

Validate​

Rollback (if needed)​

Clean up Bitnami resources​

Operational readiness​

Staging rehearsal​

Pre-migration checklist​

Post-migration monitoring​