[ 
https://issues.apache.org/jira/browse/KAFKA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994511#comment-14994511
 ] 

Guozhang Wang commented on KAFKA-2759:
--------------------------------------

We were discussing about the default reset policy upon KAFKA-1006; I would like 
to ping [~toddpalino] for his opinions as AFAIR the main motivations of keeping 
it as "latest" are operations related.

> Mirror maker can leave gaps of missing messages if the process dies after a 
> partition is reset and before the first offset commit
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2759
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2759
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.2
>            Reporter: Ewen Cheslack-Postava
>            Priority: Minor
>
> Based on investigation of KAFKA-2747. When mirror maker first starts or if it 
> picks up new topics/partitions, it will use the reset policy to choose where 
> to start. By default this uses 'latest'. If it starts reading messages and 
> then dies before committing offsets for the first time, then the mirror maker 
> that takes over that partition will also reset. This can result in some 
> messages making it to the consumer, then a gap that were skipped, and then 
> messages that get processed by the new MM process.
> One solution to this problem would be to make sure that offsets are committed 
> after they are reset but before the first message is passed to the producer. 
> In other words, in the case of a reset, MM should record where it's going to 
> start reading data from before processing any messages. This guarantees all 
> messages after the first one delivered by MM will appear at least once.
> This is minor since it should be very rare, but it does break an assumption 
> that people probably make about the output -- once you start receiving data, 
> you aren't guaranteed all subsequent messages will appear at least once.
> This same issue could affect Copycat as well. In fact, it may be generally 
> useful to allow consumers to know when the offset was reset so they can 
> handle cases like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to