Hi Becket, I made some changes and clarified the motivation for this KIP. :)It should be easier to understand now since I included a diagram. Thanks,Richard Yu On Tuesday, July 17, 2018, 4:38:11 PM GMT+8, Richard Yu <yrichar...@yahoo.com.INVALID> wrote: Hi Becket, Thanks for reviewing this KIP. :) I probably did not explicitly state what we were trying to avoid by introducing this mode. As mentioned in the KIP, there is a offset lag which could result after a crash. Our main goal is to avoid this lag (i.e. the latency in terms of time that results from the crash, not to reduce the number of records reprocessed). I could provide a couple of diagrams with what I am envisioning because some points in my KIP might otherwise be hard to grasp (I will also include some diagrams to give you a better idea of an use case). As for your questions, I could provide a couple of answers: 1. Yes, the two consumers will in fact be processing in parallel. We do this because we want to accelerate the processing speed of the records to make up for the latency caused by the crash. 2. After the recovery point, records will not be processed twice. Let me describe the scenario I was envisioning: we would let the consumer that crashed seek to the end of the log using KafkaConsumer#seekToEnd. Meanwhile, a secondary consumer will start processing from the latest checkpointed offset and continue until it has hit the place where the first consumer that crashed began processing after seekToEnd was first called. Since the consumer that crashed skipped from the recovery point to the end of the log, the intermediate offsets will be processed only by the secondary consumer. So it is important to note that the offset ranges which the two threads process will not overlap. (This is important as it prevents offsets from being processed more than once)
3. As for the committed offsets, the possibility of rewinding is not likely. If my understanding is correct, you are probably worried that after the crash, offsets that has already been previously committed will be committed again. The current design prevents that from happening, as the policy of where to start processing after a crash is universal across all Consumer instances -- we will begin processing from the latest offset committed. I hope that you at least got some of your questions answered. I will update the KIP soon, so please stay tuned. Thanks,Richard Yu On Tuesday, July 17, 2018, 2:14:07 PM GMT+8, Becket Qin <becket....@gmail.com> wrote: Hi Richard, Thanks for the KIP. I am a little confused on what is proposed. The KIP suggests that after recovery from a consumer crash, there will be two consumers consuming from the same partition. One consumes starting from the log end offset at the point of recovery, and another consumes starting from the last committed offset and keeping consuming with the first consumer in parallel? Does that mean the messages after the recovery point will be consumed twice? If those two consumer commits offsets, does that mean the committed offsets may rewind? The proposal sounds a little hacky and introduce some non-deterministic behavior. It would be useful to have a concrete use case example to explain what is actually needed. If the goal is to reduce the number of records that are reprocessed when consume crashes, maybe we can have an auto commit interval based on number of messages. If the application just wants to read from the end of the log after recovery from crash, would calling seekToEnd explicitly work? Thanks, Jiangjie (Becket) Qin On Thu, Jul 5, 2018 at 6:46 PM, Richard Yu <yohan.richard...@gmail.com> wrote: > Hi all, > > I would like to discuss KIP-333 (which proposes a faster mode of > rebalancing). > Here is the link for the KIP: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 333%3A+Add+faster+mode+of+rebalancing > > Thanks, > Richard Yu >