Fabien LD created KAFKA-6915:
--------------------------------
Summary: MirrorMaker: avoid duplicates when source cluster is
unreachable for more than session.timeout.ms
Key: KAFKA-6915
URL: https://issues.apache.org/jira/browse/KAFKA-6915
Project: Kafka
Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Fabien LD
According to doc, see
[https://kafka.apache.org/11/documentation.html#semantics], the exactly-once
delivery can be achieved by storing offsets in the same store as produced data:
{quote}
When writing to an external system, the limitation is in the need to coordinate
the consumer's position with what is actually stored as output. The classic way
of achieving this would be to introduce a two-phase commit between the storage
of the consumer position and the storage of the consumers output. But this can
be handled more simply and generally by letting the consumer store its offset
in the same place as its output
{quote}
Indeed, with current implementation where the consumer stores the offsets in
the source cluster, we can have duplicates if networks makes source cluster
unreachable for more than {{session.timeout.ms}}.
Indeed, once that amount of time has passed, the source cluster will rebalance
the consumer group and later, when network is back, the generation has changed
and consumers cannot commit the offsets for the last batches of records
consumed (actually all records processed during the last
{{auto.commit.interval.ms}}). So all those records are processed again when
consumers from group are coming back.
Storing the offsets in the target cluster would resolve this risk of duplicate
records and would be a nice feature to have.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)