[ 
https://issues.apache.org/jira/browse/KAFKA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6915:
------------------------------
    Component/s: mirrormaker

> MirrorMaker: avoid duplicates when source cluster is unreachable for more 
> than session.timeout.ms
> -------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6915
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6915
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>    Affects Versions: 1.1.0
>            Reporter: Fabien LD
>            Priority: Major
>
> According to doc, see 
> [https://kafka.apache.org/11/documentation.html#semantics], the exactly-once 
> delivery can be achieved by storing offsets in the same store as produced 
> data:
> {quote}
> When writing to an external system, the limitation is in the need to 
> coordinate the consumer's position with what is actually stored as output. 
> The classic way of achieving this would be to introduce a two-phase commit 
> between the storage of the consumer position and the storage of the consumers 
> output. But this can be handled more simply and generally by letting the 
> consumer store its offset in the same place as its output
> {quote}
> Indeed, with current implementation where the consumer stores the offsets in 
> the source cluster, we can have duplicates if networks makes source cluster 
> unreachable for more than {{session.timeout.ms}}.
> Indeed, once that amount of time has passed, the source cluster will 
> rebalance the consumer group and later, when network is back, the generation 
> has changed and consumers cannot commit the offsets for the last batches of 
> records consumed (actually all records processed during the last 
> {{auto.commit.interval.ms}}). So all those records are processed again when 
> consumers from group are coming back.
> Storing the offsets in the target cluster would resolve this risk of 
> duplicate records and would be a nice feature to have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to