Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

Philip O'Toole Fri, 14 Jun 2013 14:05:41 -0700

On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao <jun...@gmail.com> wrote:
> Are you messages compressed in batches? If so, some dups are expected
> during rebalance. In 0.8, such dups are eliminated. Other than that,
> rebalance shouldn't cause dups since we commit consumed offsets to ZK
> before doing a rebalance.


Jun -- quick clarification. Is this guarantee valid even if there is
no state in Zookeeper? If the consumers that will rebalance are coming
up for the *very first time*? I.e.:

[zk: localhost:2181(CONNECTED) 1] ls /consumers
Node does not exist: /consumers
[zk: localhost:2181(CONNECTED) 2]

Philip

>
> Thanks,
>
> Jun
>
>
> On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole <phi...@loggly.com> wrote:
>
>> Hello -- is it possible for our code to stall a ConsumerConnector from
>> doing any consuming for, say, 30 seconds, until we can be sure that
>> all other ConsumeConnectors are rebalanced?
>>
>> It seems that the first ConsumerConnector to come up is prefetching
>> some data, and we end up with duplicate messages. We looked at the
>> code for the high-level consumer
>> (core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala)
>> and it looks like it empties some queues after a rebalance, but we
>> still see duplicate messages.
>>
>> I'm sure this question has been asked before :-) but this is our first
>> time really working with the high-level consumer, and this caught us
>> by surprise. When there is *no* data in Kafka, wait until everything
>> balances and then send data in everything works fine, but if there is
>> some data sitting in the brokers, we seems to get dupes, even when
>> each thread sleeps for many seconds after creating the
>> ConsumerConnector.
>>
>> Are we missing something?
>>
>> Thanks,
>>
>> Philip
>>

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

Reply via email to