Re: Atlanta AJUG presentation on Kafka March 19th

2013-01-16 Thread Jun Rao
Chris, That's great. Thanks for doing this. For architecture slides, you may want to take a look at the Kafka ApacheCon 2011 slides in our wiki (just fixed the link). Thanks, Jun On Wed, Jan 16, 2013 at 7:26 AM, Chris Curtin wrote: > Hi, > > I'm going to be presenting an introduction to Kafka

Re: Atlanta AJUG presentation on Kafka March 19th

2013-01-16 Thread Jay Kreps
Also, this wiki has a pretty good collection of presentations which may give you ideas. If you want the source ppt or omnigraffle for any of the presentations we made let us know. https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations -Jay On Wed, Jan 16, 2013 at 7:26

Kafka, Work Distribution, and Work Stealing

2013-01-16 Thread David Ross
Hello, We use Kafka to distribute batches work across several boxes. These batches may take anywhere from 1 hour to 24 hours to complete. Currently, we have N partitions, each allocated to one of N consumer worker boxes. We find that as the batch nears completion, with only M < N partitions still

Re: Kafka, Work Distribution, and Work Stealing

2013-01-16 Thread Neha Narkhede
David, It looks like the consumer throughput suffers because of imbalance of data across partitions. When you say the batch nears completion, it seems like the number of partitions that have new data reduces leading to fewer consumer instances processing large amount of data. Is that true ? In Ka

Re: hadoop-consumer code in contrib package

2013-01-16 Thread navneet sharma
Thanks Felix. One question still remains. Why SimpleConsumer? Why not high level Consumer? If i change the code to high level consumer, will it create any challenges? Navneet On Tue, Jan 15, 2013 at 11:46 PM, Felix GV wrote: > Please read the Kafka design paper

Re: hadoop-consumer code in contrib package

2013-01-16 Thread Jun Rao
I think the main reason for using SimpleConsumer is to manage offsets explicitly. For example, this is useful when Hadoop retries failed tasks. Another reason is that Hadoop already does load balancing. So, there is not much need to balance the load again using the high level consumer. Thanks, Ju

Consumer Question

2013-01-16 Thread Bo Sun
I'v got a problem like this. 1. I use the groupname "GourpA" to consume the kafka topic "topicA" . several days later , we cannot got the new data from the consumer. 2. Then i use the groupname "groupB" to consume the kafa topic "topicA". in this new consumer , i got the new data. and i get the ne