[ https://issues.apache.org/jira/browse/KAFKA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewen Cheslack-Postava updated KAFKA-2759: ----------------------------------------- Description: Based on investigation of KAFKA-2747. When mirror maker first starts or if it picks up new topics/partitions, it will use the reset policy to choose where to start. By default this uses 'latest'. If it starts reading messages and then dies before committing offsets for the first time, then the mirror maker that takes over that partition will also reset. This can result in some messages making it to the consumer, then a gap that were skipped, and then messages that get processed by the new MM process. One solution to this problem would be to make sure that offsets are committed after they are reset but before the first message is passed to the producer. In other words, in the case of a reset, MM should record where it's going to start reading data from before processing any messages. This guarantees all messages after the first one delivered by MM will appear at least once. This is minor since it should be very rare, but it does break an assumption that people probably make about the output -- once you start receiving data, you aren't guaranteed all subsequent messages will appear at least once. This same issue could affect Copycat as well. In fact, it may be generally useful to allow consumers to know when the offset was reset so they can handle cases like this. was: Based on investigation of KAFKA-2747. When mirror maker first starts or if it picks up new topics/partitions, it will use the reset policy to choose where to start. By default this uses 'latest'. If it starts reading messages and then dies before committing offsets for the first time, then the mirror maker that takes over that partition will also reset. This can result in some messages making it to the consumer, then a gap that were skipped, and then messages that get processed by the new MM process. One solution to this problem would be to make sure that offsets are committed after they are reset but before the first message is passed to the producer. In other words, in the case of a reset, MM should record where it's going to start reading data from before processing any messages. This guarantees all messages after the first one delivered by MM will appear at least once. This is minor since it should be very rare, but it does break an assumption that people probably make about the output -- once you start receiving data, you aren't guaranteed all subsequent messages will appear at least once. This same issue could affect Copycat as well. > Mirror maker can leave gaps of missing messages if the process dies after a > partition is reset and before the first offset commit > --------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-2759 > URL: https://issues.apache.org/jira/browse/KAFKA-2759 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.2 > Reporter: Ewen Cheslack-Postava > Priority: Minor > > Based on investigation of KAFKA-2747. When mirror maker first starts or if it > picks up new topics/partitions, it will use the reset policy to choose where > to start. By default this uses 'latest'. If it starts reading messages and > then dies before committing offsets for the first time, then the mirror maker > that takes over that partition will also reset. This can result in some > messages making it to the consumer, then a gap that were skipped, and then > messages that get processed by the new MM process. > One solution to this problem would be to make sure that offsets are committed > after they are reset but before the first message is passed to the producer. > In other words, in the case of a reset, MM should record where it's going to > start reading data from before processing any messages. This guarantees all > messages after the first one delivered by MM will appear at least once. > This is minor since it should be very rare, but it does break an assumption > that people probably make about the output -- once you start receiving data, > you aren't guaranteed all subsequent messages will appear at least once. > This same issue could affect Copycat as well. In fact, it may be generally > useful to allow consumers to know when the offset was reset so they can > handle cases like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)