Hello, I'm looking for guidance on how to properly configure MirrorMaker to ensure that no data is lost during normal replication, as well as during planned maintenance windows.
I recently encountered an issue where not all records were replicated to the target cluster, as only 974,345 out of 1 million records were present, which was verified using the kafka-get-offsets script. (only reproduced once) The environment consists of two Kubernetes clusters configured in an active/standby topology, where the Strimzi Operator is used to deploy Kafka with three replicas and MirrorMaker2. Before performing a switchover, I scale down MirrorMaker 2 and delete the heartbeats topic, as otherwise it gets replicated under different names, such as source.heartbeats, source.source.heartbeats, and so on. The configuration currently in use is the following: > apiVersion: kafka.strimzi.io/v1beta2 > kind: KafkaMirrorMaker2 > metadata: > name: kafka-main > spec: > clusters: > - alias: source > bootstrapServers: > kafka-external-kafka-mcs-bootstrap.kafka.svc.clusterset.local:9095 > config: > consumer.fetch.max.wait.ms: 500 > consumer.fetch.min.bytes: 1048576 > - alias: target > bootstrapServers: > kafka-main-kafka-mcs-bootstrap.kafka.svc.clusterset.local:9095 > config: > producer.batch.size: 65536 > producer.compression.type: lz4 > producer.linger.ms: 10 > producer.max.request.size: 10485760 > connectCluster: target > jvmOptions: > -Xms: 1g > -Xmx: 2g > mirrors: > - checkpointConnector: > config: > checkpoints.topic.replication.factor: 3 > emit.checkpoints.interval.seconds: 30 > replication.policy.class: > org.apache.kafka.connect.mirror.IdentityReplicationPolicy > sync.group.offsets.enabled: "true" > sync.group.offsets.interval.seconds: 60 > sync.topic.configs.enabled: "true" > tasksMax: 1 > groupsPattern: .* > heartbeatConnector: > config: > emit.heartbeats.interval.seconds: 5 > heartbeats.topic.replication.factor: 3 > tasksMax: 1 > sourceCluster: source > sourceConnector: > config: > consumer.auto.offset.reset: latest > offset-syncs.topic.location: target > offset-syncs.topic.replication.factor: 3 > offset.lag.max: 100 > refresh.groups.interval.seconds: 60 > refresh.topics.interval.seconds: 60 > replication.factor: 3 > replication.policy.class: > org.apache.kafka.connect.mirror.IdentityReplicationPolicy > sync.topic.acls.enabled: "true" > sync.topic.configs.enabled: "true" > topics: .* > topics.blacklist: .*[\-\.]internal, __consumer_offsets, __transaction_state, > connect-.*, exclude-.* > tasksMax: 10 > targetCluster: target > topicsPattern: .* > replicas: 3 > resources: > limits: > cpu: "2" > memory: 4Gi > requests: > cpu: "1" > memory: 2Gi > version: 4.0.0 Any suggestions to improve the current configuration are welcome.
