Re: [DISCUSS] KIP-1279: Cluster Mirroring

Omnia Ibrahim Wed, 18 Feb 2026 09:34:01 -0800

Hi Viquar,
Thank you for taking the time to provide such detailed feedback on KIP-1279. I 
really appreciate your thorough review and the opportunity to clarify several 
aspects of the design. Let me address each of your points:

VK1 - Thundering Herd and Memory Isolation
I want to clarify the throttling mechanisms in the design, as I believe there 
may be some misunderstanding about how this compares to MM2 in production:
Throttling is built-in: The KIP explicitly includes 
mirror.replication.throttled.rate (broker-level) configuration. This operate 
identically to intra-cluster replication throttling, which has proven stable at 
scale for years. We also planning to extend on same protocol in followup KIP 
for source throttling similar to leader.replication.throttled.rate.
MM2 doesn't isolate impact: In production, MM2 acts as a noisy neighbor. During 
catch-up phases, it saturates network bandwidth, competes for broker disk I/O 
(source cluster), and creates backpressure on the destination cluster through 
produce requests. The "separate JVM" doesn't eliminate this—it just moves the 
memory pressure to a different process that still shares physical resources 
(network, disk).
I'd be interested to hear if you have specific production experience with MM2 
that differs from this characterization.

VK2 - Blast Radius (Poison Pill)
Great question about error isolation. The KIP uses Kafka's standard Fetch 
protocol, which already handles malformed batches safely:
Thread pool isolation: As described in the MirrorFetcherManager section, mirror 
fetchers run in a dedicated thread pool (mirror.num.replica.fetchers), 
completely separate from intra-cluster replication threads.
Existing precedent: When a ReplicaFetcherThread encounters a corrupt batch 
today, it marks that partition as failed and continues processing other 
partitions. The same exception handling applies to MirrorFetcherThread—the 
partition transitions to FAILED state while other partitions continue mirroring.
No node-wide panic: Kafka brokers don't crash when individual fetch threads 
encounter errors. This is well-tested behavior across billions of partitions in 
production.

Interestingly, the blast radius is actually smaller than MM2, where a malformed 
batch causes the entire Connect task to restart, potentially disrupting 
hundreds of partitions simultaneously.

VK3 - Control Plane Saturation
I think there may be some confusion about how state transitions work in the 
design. Let me clarify:
State transitions during link flaps: Partition state changes (MIRRORING → 
FAILED → PREPARING → MIRRORING) are written to __mirror_state, not the 
controller's metadata log. This is a dedicated coordinator topic, analogous to 
__consumer_offsets. State flaps don't touch the controller.
Metadata synchronization: The MirrorMetadataManager queries source cluster 
metadata on a configurable interval (mirror.metadata.refresh.interval.ms 
<http://mirror.metadata.refresh.interval.ms/>, default 30s). This is a 
background operation that doesn't generate huge controller write traffic.
Controller interaction: The only controller interaction occurs during 
addTopicsToMirror/removeTopicsFromMirrorAPI calls, which are operator-driven 
administrative actions—not automatic responses to link flaps.
Could you help me understand what specific 'metadata updates blocking ISR 
changes' scenario you're concerned about? I want to make sure the design 
explicitly addresses your use case.

VK4 - Transactional Integrity
This is an important point, and I think it highlights a common misunderstanding 
of Kafka's transactional protocol. Let me break down how it works:
Consumer-side transaction visibility requires only log-level markers, not 
coordinator state.
What consumers need: The Last Stable Offset (LSO) is computed from control 
markers (COMMIT/ABORT) in the log. Consumers reading with 
isolation.level=read_committed only see records up to the LSO. They never 
interact with the transaction coordinator or validate PIDs.

What the KIP does:
Mirrors all data records and control markers byte-for-byte
During failover (STOPPING → STOPPED transition), truncates to the last mirrored 
LSO to ensure no incomplete transactions remain
Topics are read-only until failover, so no new transactions can be started on 
the destination
Why coordinator state is irrelevant:
No producers write to mirrored topics (they're read-only)
Control markers are replicated as data, not generated by the destination's 
coordinator
After failover, producers reconnect with new PIDs and transaction IDs
There are no 'zombie transactions' because incomplete transactions are 
truncated before the topic becomes writable. I've added additional 
clarification on this in the "Transactional Consumer Guarantees" section of the 
KIP.

VK5 - Infinite Loop Prevention
I believe this concern relates to MM2's active-active use case, which is 
outside the scope of this KIP. Let me explain why loops aren't possible in our 
design:
Read-only enforcement: Mirrored topics cannot accept produce requests (they 
throw ReadOnlyTopicException). This physically prevents A→B→A loops because 
cluster B cannot write data back to topic A while it's being mirrored.
 Failover is explicit: The removeTopicsFromMirror API is a deliberate operator 
action that makes a topic writable. At that point, the topic is no longer 
mirrored from A, so there's no loop.
Bidirectional mirroring ≠ active-active: The KIP supports bidirectional 
mirroring of different topics (e.g., A mirrors topic-x from B, B mirrors 
topic-y from A). Same-topic loops are impossible due to read-only semantics, so 
you can't setup a mirror on an actively mirrored topic.
As stated in the 'Active-Active Writes' section, this KIP doesn't attempt to 
replace MM2 for multi-master scenarios. It's focused on DR/failover/migration 
where operational simplicity is paramount.
MM2's header-based loop detection is necessary precisely because it allows 
active-active writes. That complexity is a feature of MM2, not a requirement 
for all replication systems.

VK6 - Data Divergence and Epoch Reconciliation
The KIP explicitly documents this limitation in the 'Non-Goals: Unclean Leader 
Election' section. This is a conscious design decision, not an oversight:
Clear documentation: The KIP states that when 
unclean.leader.election.enable=true, brokers log a warning at every sync cycle. 
Operators are explicitly informed this configuration is unsupported.
Shared epoch requirement: Resolving unclean elections across clusters requires 
synchronous cross-cluster communication to establish a shared leader epoch. 
This introduces cross-datacenter latency into the critical path of leader 
elections, which contradicts the asynchronous design principle.
Alternative for zero data loss: Operators who require protection against 
unclean elections should either:
Disable unclean leader election (best practice for mission-critical data).
Wait for the follow-up KIP on this.
Use stretched clusters (though as noted, these provide no DR protection).

Fallback behavior: During failback, if the source cluster experienced an 
unclean election, the LastMirroredOffset API will detect the divergence. The 
operator must choose to either truncate to the last known good offset 
(accepting data loss of post-divergence records) or manually reconcile the logs.

This is no worse than MM2, which has no mechanism to detect or resolve log 
divergence from unclean elections.

VK7 - Tiered Storage Operational Gaps
You're right that a roadmap would be helpful here. The KIP explicitly states: 
'Tiered Storage is not initially supported, but a detailed design will be 
provided in a follow-up KIP.'
This is intentional phasing, not an oversight:
Complexity separation: Tiered storage support requires changes to fetch 
semantics (fetching from remote storage instead of local logs) and offset 
management. Designing this correctly requires dedicated focus.
Current workaround: For clusters using tiered storage today, operators can:
Mirror only recent data by configuring source cluster retention to keep data in 
local storage for the required retention period.
Use the FAILED state as a signal to manually intervene (e.g., temporarily 
increase source retention).
Roadmap commitment: The 'Future Work' section explicitly lists tiered storage 
as a follow-up. This is standard KIP practice—core functionality first, 
extensions later.

Comparison: MM2 also doesn't have native tiered storage integration. It fetches 
from brokers, which serve data from local storage (reading from remote storage 
when needed). However, MM2 doesn't guarantee the destination topic is also 
tiered—it's the same limitation here.

VK8 - Transactional State and PID Mapping
This relates back to VK4. Let me restate the transactional protocol more 
explicitly to clarify:
Three independent components:
Transaction Coordinator (source cluster):
Tracks active transactions via __transaction_state.
Times out hanging transactions.
Only matters for producers writing to the source cluster.
Log-level markers (mirrored):
COMMIT/ABORT markers are appended to topic partitions.
These determine the LSO.
This is what gets replicated byte-for-byte.
Consumer read isolation (destination cluster):
Consumers read up to the LSO based on markers in the log.
Never queries the transaction coordinator.

Why destination doesn't need coordinator state:
Mirrored topics are read-only. No producers write to them, so the destination's 
transaction coordinator never gets involved.
When failover occurs, the KIP truncates to the LSO, removing any records 
without markers.
After failover, producers reconnect to the destination cluster with new 
producer IDs and transaction IDs. The destination's transaction coordinator 
manages these new transactions normally.
PID rewriting (-(sourceProducerId + 2)) is purely to avoid ID conflicts. It 
doesn't affect transactional semantics because:
The LSO calculation uses markers, not PIDs.
Consumers validate transaction state via markers, not PIDs.
There is no zombie transaction risk because the destination coordinator is 
never responsible for transactions that originated on the source cluster.

Updates to the KIP
To help address these questions, I've updated the KIP with:
A detailed comparison table between MM2 and KIP-1279 (in the Rejected 
Alternatives section)
Additional clarification in the "Transactional Consumer Guarantees" section
I hope this addresses your concerns! I'm happy to discuss any of these points 
further or hop on a call if that would be helpful. Your feedback is helping 
make this KIP stronger. 

Best,
Omnia

> On 14 Feb 2026, at 20:37, vaquar khan <[email protected]> wrote:
> 
> Hi Fede,
> 
> I reviewed the KIP-1279 proposal yesterday and corrected the KIP number. I
> now have time to share my very detailed observations. While I fully support
> the goal of removing the operational complexity of Kafka , the design
> appears to trade that complexity for broker stability.
> 
> By moving WAN replication into the broker’s core runtime, we are
> effectively removing the failure domain isolation that MirrorMaker 2
> provides. We risk coupling the stability of our production clusters to the
> instability of cross-datacenter networks.Before this KIP moves to a vote, I
> strongly recommend you and other authors to address the following stability
> gaps. Without concrete answers here, the risk profile is likely too high
> for mission-critical deployments.
> 
> 1. The Thundering Herd and Memory Isolation Risk
> In the current architecture, MirrorMaker 2 (MM2) Connect workers provide a
> physical failure domain through a separate JVM heap. This isolates the
> broker from the memory pressure and Garbage Collection (GC) impact caused
> by replication surges. In this proposal, that pressure hits the broker’s
> core runtime directly.
> 
> The Gap: We need simulation data for a sustained link outage (e.g., 6 hours
> on 10Gbps). When 5,000 partitions resume fetching, does the resulting
> backfill I/O and heap pressure cause GC pauses that push P99 Produce
> latency on the target cluster over 10ms? We must ensure that a massive
> catch-up phase does not starve the broker's Request Handler threads or
> destabilize the JVM.
> 
> 
> 2. Blast Radius (Poison Pill  Problem)
> The Gap: If a source broker sends a malformed batch (e.g., bit rot), does
> it crash the entire broker process? In MM2, this kills a single task. We
> need confirmation that exceptions are isolated to the replication thread
> pool and will not trigger a node-wide panic.
> 
> 3. Control Plane Saturation
> The Gap: How does the system handle a "link flap" event where 50,000
> partitions transition states rapidly? We need to verify that the resulting
> flood of metadata updates will not block the Controller from processing
> critical ISR changes for local topics.
> 
> 4. Transactional Integrity
> "Byte-for-byte" replication copies transaction markers but not the
> Coordinator’s state (PIDs).
> The Gap: How does the destination broker validate an aborted transaction
> without the source PID? We should avoid creating "zombie" transactions that
> look valid but cannot be authoritatively managed.
> 
> 5. Infinite Loop Prevention
> Since byte-for-byte precludes injecting lineage headers e.g., dc-source, we
> lose the standard mechanism for detecting loops in mesh topologies (A→B→A).
> The Gap: Relying solely on topic naming conventions is operationally
> fragile. What is the deterministic mechanism to prevent infinite recursion?
> 
> 6. Data Divergence and Epoch Reconciliation
> The current proposal explicitly excludes support for unclean leader
> election because there is no mechanism for a "shared leader epoch" between
> clusters.
> The Gap: Without epoch reconciliation, if the source cluster experiences an
> unclean election, the source and destination logs will diverge. If an
> operator later attempts a failback (reverse mirroring), the clusters will
> contain inconsistent data for the same offset, leading to potential silent
> data corruption or permanent replication failure.
> 
> 7. Tiered Storage Operational Gaps
> The design states that Tiered Storage is not initially supported and that a
> mirror follower encountering an OffsetMovedToTieredStorageException will
> simply mark the partition as FAILED.
> The Gap: For mission-critical clusters using Tiered Storage for long-term
> retention, this creates an operational cliff. Mirroring will fail as soon
> as the source cluster offloads data to remote storage. We need a roadmap
> for how native mirroring will eventually interact with tiered segments
> without failing the partition.
> 
> 8. Transactional State and PID Mapping
> While the KIP proposes a deterministic formula for rewriting Producer IDs
> ,calculated as destinationProducerId= (sourceProducerId+2) it does not
> replicate the transaction_state metadata.
> The Gap: How does the destination broker authoritatively validate or expire
> hanging transactions if the source PID state is rewritten but the
> transaction coordinator state is missing?
> We risk a scenario where consumers encounter zombie transactions that can
> never be decided on the destination cluster.
> 
> This is a big change to how our system is built. We need to make sure it
> doesn't create a weak link that could bring the whole system down,We should
> ensure it does not introduce a new single point of failure.
> 
> Regards,
> Viquar Khan
> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
> *Book *-
> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
> *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
> *github*-https://github.com/vaquarkhan
> 
> On Sat, 14 Feb 2026 at 01:18, Federico Valeri <[email protected]> wrote:
> 
>> Hi, we would like to start a discussion thread about KIP-1279: Cluster
>> Mirroring.
>> 
>> Cluster Mirroring is a new Kafka feature that enables native,
>> broker-level topic replication across clusters. Unlike MirrorMaker 2
>> (which runs as an external Connect-based tool), Cluster Mirroring is
>> built into the broker itself, allowing tighter integration with the
>> controller, coordinator, and partition lifecycle.
>> 
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1279%3A+Cluster+Mirroring
>> 
>> There are a few missing bits, but most of the design is there, so we
>> think it is the right time to involve the community and get feedback.
>> Please help validating our approach.
>> 
>> Thanks
>> Fede
>>

Re: [DISCUSS] KIP-1279: Cluster Mirroring

Reply via email to