If there are no offsets stored in ZK, I think it's possible to get some dups during startup. Once the offsets are in ZK, there shouldn't be dups during subsequent rebalances.
Thanks, Jun On Fri, Jun 14, 2013 at 2:04 PM, Philip O'Toole <phi...@loggly.com> wrote: > On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao <jun...@gmail.com> wrote: > > Are you messages compressed in batches? If so, some dups are expected > > during rebalance. In 0.8, such dups are eliminated. Other than that, > > rebalance shouldn't cause dups since we commit consumed offsets to ZK > > before doing a rebalance. > > Jun -- quick clarification. Is this guarantee valid even if there is > no state in Zookeeper? If the consumers that will rebalance are coming > up for the *very first time*? I.e.: > > [zk: localhost:2181(CONNECTED) 1] ls /consumers > Node does not exist: /consumers > [zk: localhost:2181(CONNECTED) 2] > > Philip > > > > > Thanks, > > > > Jun > > > > > > On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <phi...@loggly.com> > wrote: > > > >> Hello -- is it possible for our code to stall a ConsumerConnector from > >> doing any consuming for, say, 30 seconds, until we can be sure that > >> all other ConsumeConnectors are rebalanced? > >> > >> It seems that the first ConsumerConnector to come up is prefetching > >> some data, and we end up with duplicate messages. We looked at the > >> code for the high-level consumer > >> (core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala) > >> and it looks like it empties some queues after a rebalance, but we > >> still see duplicate messages. > >> > >> I'm sure this question has been asked before :-) but this is our first > >> time really working with the high-level consumer, and this caught us > >> by surprise. When there is *no* data in Kafka, wait until everything > >> balances and then send data in everything works fine, but if there is > >> some data sitting in the brokers, we seems to get dupes, even when > >> each thread sleeps for many seconds after creating the > >> ConsumerConnector. > >> > >> Are we missing something? > >> > >> Thanks, > >> > >> Philip > >> >