[ https://issues.apache.org/jira/browse/KAFKA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994452#comment-14994452 ]
Jiangjie Qin commented on KAFKA-2759: ------------------------------------- [~ewencp] By the way, committing offset alone on start up does not complete solve this problem. If a log segment is deleted after offset commit, the consumer will still reset to the largest and might miss the gap. I think we have to change the default auto reset anyway. I am not sure if changing auto rest to smallest would break things. It does not look a regression to me. For current users, this only has impact when the committed offsets are out of range. If user ever want to rely on the auto reset, they just need to change the setting to latest. > Mirror maker can leave gaps of missing messages if the process dies after a > partition is reset and before the first offset commit > --------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-2759 > URL: https://issues.apache.org/jira/browse/KAFKA-2759 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.2 > Reporter: Ewen Cheslack-Postava > Priority: Minor > > Based on investigation of KAFKA-2747. When mirror maker first starts or if it > picks up new topics/partitions, it will use the reset policy to choose where > to start. By default this uses 'latest'. If it starts reading messages and > then dies before committing offsets for the first time, then the mirror maker > that takes over that partition will also reset. This can result in some > messages making it to the consumer, then a gap that were skipped, and then > messages that get processed by the new MM process. > One solution to this problem would be to make sure that offsets are committed > after they are reset but before the first message is passed to the producer. > In other words, in the case of a reset, MM should record where it's going to > start reading data from before processing any messages. This guarantees all > messages after the first one delivered by MM will appear at least once. > This is minor since it should be very rare, but it does break an assumption > that people probably make about the output -- once you start receiving data, > you aren't guaranteed all subsequent messages will appear at least once. > This same issue could affect Copycat as well. In fact, it may be generally > useful to allow consumers to know when the offset was reset so they can > handle cases like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)