Re: hadoop-consumer code in contrib package

2013-01-17 Thread Jun Rao
That may be an alternative feasible approach. You can call ConsumerConnector.shutdown() to close the consumer cleanly. Thanks, Jun On Thu, Jan 17, 2013 at 6:20 AM, navneet sharma wrote: > That makes sense. > > I tried an alternate approach- i am using high level consumer and going > through Ha

Re: hadoop-consumer code in contrib package

2013-01-17 Thread navneet sharma
That makes sense. I tried an alternate approach- i am using high level consumer and going through Hadoop HDFS APIs and pushing data in HDFS. I am not creating any jobs for that. The only problem i am seeing here is that the consumer is designed to run forever. Which means i need to find out how

Re: hadoop-consumer code in contrib package

2013-01-16 Thread Jun Rao
I think the main reason for using SimpleConsumer is to manage offsets explicitly. For example, this is useful when Hadoop retries failed tasks. Another reason is that Hadoop already does load balancing. So, there is not much need to balance the load again using the high level consumer. Thanks, Ju

Re: hadoop-consumer code in contrib package

2013-01-16 Thread navneet sharma
Thanks Felix. One question still remains. Why SimpleConsumer? Why not high level Consumer? If i change the code to high level consumer, will it create any challenges? Navneet On Tue, Jan 15, 2013 at 11:46 PM, Felix GV wrote: > Please read the Kafka design paper

Re: hadoop-consumer code in contrib package

2013-01-15 Thread Felix GV
Please read the Kafka design paper . It may look a little long, but it's as short as it can be. Kafka differs from other messaging system in a couple of ways, and it's important to understand the fundamental design choices that were made in order to understand

Re: hadoop-consumer code in contrib package

2013-01-15 Thread navneet sharma
Thanks Felix for sharing your work. Contrib hadoop-consumer looks like the same way. I think i need to really understand this offset stuff. So far i have used only high level consumer.When consumer is done reading all the messages, i used to kill the process(because it won't on its own). Again i

Re: hadoop-consumer code in contrib package

2013-01-14 Thread Felix GV
I think you may be misunderstanding the way Kafka works. A kafka broker is never supposed to clear messages just because a consumer read them. The kafka broker will instead clear messages after their retention period ends, though it will not delete the messages at the exact time when they expire.