[ https://issues.apache.org/jira/browse/KAFKA-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewen Cheslack-Postava updated KAFKA-2894: ----------------------------------------- Priority: Blocker (was: Major) Fix Version/s: 0.10.1.0 More generally, I think we need to handle cases like rewind() after *any* connector methods are invoked. The rewind() in poll() handles Connector.start() and Connector.put(), but we also need to handle the rebalance callbacks (where we can ignore Connector.close() and only do this after Connector.open()) and Connector.flush(). > WorkerSinkTask doesn't handle rewinding offsets on rebalance > ------------------------------------------------------------ > > Key: KAFKA-2894 > URL: https://issues.apache.org/jira/browse/KAFKA-2894 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 0.9.0.0 > Reporter: Ewen Cheslack-Postava > Assignee: Liquan Pei > Priority: Blocker > Fix For: 0.10.1.0 > > > rewind() is only invoked at the beginning of each poll(). This means that if > a rebalance occurs in the poll, it's feasible to get data that doesn't match > a request to change offsets during the rebalance. I think the consumer will > hold on to consumer data across the rebalance if it is reassigned the same > offset, so there may already be data ready to be delivered. Additionally we > may already have data in an incomplete messageBatch that should be discarded > when the rewind is requested. > While connectors that care about this (i.e. ones that manage their own > offsets) can handle this correctly by tracking the offsets they're expecting > to see, it's a hassle, error prone, an pretty unintuitive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)