Re: Proper use of ConsumerConnector

2012-12-23 Thread Neha Narkhede
Tom, That is a good suggestion. Some of us started thinking about re-designing the consumer client a while ago and wrote up some ideas here - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design. In addition to this, we have a working prototype of stage 1 of that re-design h

Re: Proper use of ConsumerConnector

2012-12-21 Thread Tom Brown
It seems that a common thread is that while ConsumerConnector works well for the standard case, it just doesn't work for any case where manual offset management (explicit checkpoints, rollbacks, etc) is required. If any Kafka devs are looking for a way to improve it, I think modifying it to be mor

Re: Proper use of ConsumerConnector

2012-12-21 Thread Yonghui Zhao
In our project we use senseidb to consume kafka data. Senseidb will process the message immediately but won't flush to disk immeidately. So if senseidb crash then all result not flushed will be lost, we want to rewind kafka. The offset we want to rewind to is the flush checkpoint. In this case,

Re: Proper use of ConsumerConnector

2012-12-21 Thread Jun Rao
It tracks the consumed offset, not the fetched offset. Thanks, Jun On Fri, Dec 21, 2012 at 9:46 AM, Tom Brown wrote: > Does the ConsumerConnector keep track of the offsets of data > downloaded from the server (and queued for consumption by the end user > of the API), or does it keep track of t

Re: Proper use of ConsumerConnector

2012-12-21 Thread Tom Brown
Does the ConsumerConnector keep track of the offsets of data downloaded from the server (and queued for consumption by the end user of the API), or does it keep track of the actual offset that has been consumed by the end user? --Tom On Fri, Dec 21, 2012 at 10:37 AM, Neha Narkhede wrote: >> But

Re: Proper use of ConsumerConnector

2012-12-21 Thread Neha Narkhede
> But if crash happens just after offset committed, then unprocessed > message in consumer will be skipped after reconnected. > If the consumer crashes, you will get duplicates, not lose any data. Thanks, Neha

Re: Proper use of ConsumerConnector

2012-12-21 Thread 永辉 赵
has more than 1 simpleConnector. We need manage these connector with Zk and merge result for each connector. I tend to option 1. Thanks, Yonghui From: Neha Narkhede Date: 2012年12月21日星期五 上午2:13 To: Cc: 永辉 赵 Subject: Re: Proper use of ConsumerConnector > An alternative to using simpleco

Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> Is that the correct interpretation? Correct.

Re: Proper use of ConsumerConnector

2012-12-20 Thread Tom Brown
In order to support rollbacks and checkpoints, there would have to be a way to both supply partition offsets to the consumer before reading, as well as retrieve partition offsets from them consumer once reading is complete. >From what I've read here, it appears that neither the ConsumerConnector n

Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> An alternative to using simpleconsumer in this use case is to use the > zookeeper consumer connector and turn off auto commit. > Keep in mind that this works only if you don't care about controlling per partition rewind capability. The high level consumer will not give you control over which par

Re: Proper use of ConsumerConnector

2012-12-20 Thread Joel Koshy
“unless you have a good reason to load balance and manage offsets manually” > > In general one consumer connector consumes more than one partition. > In client side, we want to get all partitions offset for any message, if > crash happens(some message is fetched from kafka but the result is not > f

Re: Proper use of ConsumerConnector

2012-12-20 Thread Neha Narkhede
> Do you think this is a good reason to use SimpleConsumer rather than > ConsumerConnector? > Yes, if you want to be able to rewind to some offset, SimpleConsumer is the right API for this purpose.

Re: Proper use of ConsumerConnector

2012-12-20 Thread 永辉 赵
Hi Joel, “unless you have a good reason to load balance and manage offsets manually” In general one consumer connector consumes more than one partition. In client side, we want to get all partitions offset for any message, if crash happens(some message is fetched from kafka but the result is not

Re: Proper use of ConsumerConnector

2012-12-19 Thread Joel Koshy
In general, you should use the consumer connector - unless you have a good reason to load balance and manage offsets manually (which is taken care of in the consumer connector). - Does the ConsumerConnector manage connections to multiple brokers, > or just a single broker? > Multiple brokers.