Thanks Neha and Joel. My understanding about offset is:
1. Offset stored in zk is only used when the consumer is connected again. 2. Joel's suggestion "in fact setting an autocommit interval and being willing to deal with duplicates is almost equivalent. " makes sense. But if crash happens just after offset committed, then unprocessed message in consumer will be skipped after reconnected. Please correct me if I am wrong. In ConsumerConnector, if ConsumerIterator can return partition offset with message together, then we save offset in client side and commit offset only after all the message before this offset is done(turn off autoCommit). I roughly go through the code, if use this option I need change some code. Another option is use simpleConnector as we discussed before, but this option required more code work in client side, since one consumer may has more than 1 simpleConnector. We need manage these connector with Zk and merge result for each connector. I tend to option 1. Thanks, Yonghui From: Neha Narkhede <neha.narkh...@gmail.com> Date: 2012年12月21日星期五 上午2:13 To: <users@kafka.apache.org> Cc: 永辉 赵 <zhaoyong...@gmail.com> Subject: Re: Proper use of ConsumerConnector > An alternative to using simpleconsumer in this use case is to use the > zookeeper consumer connector and turn off auto commit. Keep in mind that this works only if you don't care about controlling per partition rewind capability. The high level consumer will not give you control over which partitions your consumer consumes and which partitions it commits the offsets for. If you need to rewind consumption for a subset of those partitions, then ZookeeperConsumerConnector will not work for you. Thanks, Neha