Best practices for changing partition numbers

2013-01-07 Thread David Ross
Hello, We have found that, for our application, having a number of total partitions as a multiple of the number of consumer hosts is beneficial. Because of this, whenever we add or remove consumer hosts, we have to change the number of partitions in the server config. What are best practices for

LinkedIn's Kafka->Hadoop ETL pipeline is open source

2013-01-07 Thread Jay Kreps
Hey All, There has been interesting in getting something a little more sophisticated then the Input- and OutputFormat we include in contrib for reading Kafka data into HDFS. Internally at LinkedIn we have had a pretty sophisticated system that we use for Kafka ETL. It automatically discovers topi

Re: ETL with Kafka

2013-01-07 Thread Ken Krugler
On Jan 7, 2013, at 2:05pm, Russell Jurney wrote: > I previously posted a link to contrib in this thread. Thanks, I missed that - all I saw was the long URL to the Talend integration doc on Hortonworks. > No, its not a > cascading tap. Its a complete job. One to read kafka events to hdfs, one t

Re: ETL with Kafka

2013-01-07 Thread Russell Jurney
I previously posted a link to contrib in this thread. No, its not a cascading tap. Its a complete job. One to read kafka events to hdfs, one to generate kafka events from hdfs. ETL can happen in between. On Jan 7, 2013 1:51 PM, "Ken Krugler" wrote: > Hi Russell, > > On Jan 7, 2013, at 12:48pm, Ru

Re: ETL with Kafka

2013-01-07 Thread Ken Krugler
Hi Russell, On Jan 7, 2013, at 12:48pm, Russell Jurney wrote: > Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans > Hadoop records, which may be ETL'd first, and emits new Kafka events. Can you point me at the code? And just to confirm, you're talking about a Cascading Tap,

Re: Can't start Kafka server with 0.8.0.

2013-01-07 Thread Jason Huang
OK. thanks! Jason On Mon, Jan 7, 2013 at 3:55 PM, Neha Narkhede wrote: > Jason, > > In 0.8, we changed the zookeeper data structures as well. You might want to > either use a new zk namespace or delete all of your zk data and restart 0.8. > > Thanks, > Neha > > > On Mon, Jan 7, 2013 at 12:28 PM

Re: Can't start Kafka server with 0.8.0.

2013-01-07 Thread Neha Narkhede
Jason, In 0.8, we changed the zookeeper data structures as well. You might want to either use a new zk namespace or delete all of your zk data and restart 0.8. Thanks, Neha On Mon, Jan 7, 2013 at 12:28 PM, Jason Huang wrote: > Never mind - I was able to start the server after removing the > p

Re: ETL with Kafka

2013-01-07 Thread Russell Jurney
Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans Hadoop records, which may be ETL'd first, and emits new Kafka events. On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler wrote: > Hi Guy, > > On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: > > > Hi, > > Thanks David, > > > > I am lo

Re: Can't start Kafka server with 0.8.0.

2013-01-07 Thread Jason Huang
Never mind - I was able to start the server after removing the previous installed 0.7.2 instance of Kafka. Jason On Mon, Jan 7, 2013 at 2:56 PM, Jason Huang wrote: > Hello, > > I am trying out Kafka 0.8 using only one broker and I am unable to > start the server. > > With the instruction from th

Can't start Kafka server with 0.8.0.

2013-01-07 Thread Jason Huang
Hello, I am trying out Kafka 0.8 using only one broker and I am unable to start the server. With the instruction from this link - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.8+Quick+Start, I was able to download and install 0.8. Since I only have one machine, I did the following co

Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?

2013-01-07 Thread Felix GV
OOOH that's awesome :D !! I'll take a look at this shiny stuff right away! Thanks a lot :D !! -- Felix On Mon, Jan 7, 2013 at 2:28 PM, Neha Narkhede wrote: > > Finally, I haven't seen anything mentioned about the LinkedIn > > kafka/avro/hadoop ETL stuff we've been hearing about for a while. >

Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?

2013-01-07 Thread Neha Narkhede
> Finally, I haven't seen anything mentioned about the LinkedIn > kafka/avro/hadoop ETL stuff we've been hearing about for a while. > The LinkedIn ETL kafka/avro/hadoop project is open sourced. See here - https://github.com/linkedin/camus/wiki/Camus-Overview Thanks, Neha

Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?

2013-01-07 Thread Felix GV
Hello all, I haven't been reading the list for the past couple weeks, I've quite busy... but I've searched and didn't find any discussions related to my current issue, so I thought I'd ask while I'm still investigating on my own...! We've been running a Kafka 0.7.0 cluster without problem for a w

Re: Graceful termination of kafka broker after draining all the data consumed

2013-01-07 Thread Bae, Jae Hyeon
0.8 sounds really great! OK, I will try after you release stable build of 0.8 Thank you Best, Jae On Sun, Jan 6, 2013 at 10:36 AM, Neha Narkhede wrote: > In 0.8, we will provide a way for your to shutdown the broker in a > controlled fashion. What that would include is moving all the leaders aw

Re: Consumer rebalance per topic

2013-01-07 Thread Pablo Barrera González
Thank you Jun and Neha I was trying to avoid adding more partitions. I have enough partitions if you count all partitions in all topics. I understand the problem with different data load per topic but the current schema does not solve this problem either so we shouldn't be worse is we consider all

Re: Kafka 0.8 - KeyedMessage?

2013-01-07 Thread Jason Huang
I see. This makes sense. thanks Neha, Jason On Mon, Jan 7, 2013 at 1:52 PM, Neha Narkhede wrote: > Jason, > > If you specify a key for a message but do not explicitly wire in a > partitioner, messages with the same key will still land up in the same > partition. This is because we use a defaul

Re: Kafka 0.8 - KeyedMessage?

2013-01-07 Thread Neha Narkhede
Jason, If you specify a key for a message but do not explicitly wire in a partitioner, messages with the same key will still land up in the same partition. This is because we use a default partitioner that does a simple hash(key) % num_partitions. Thanks, Neha On Mon, Jan 7, 2013 at 9:30 AM, Ja

Re: Consumer rebalance per topic

2013-01-07 Thread Neha Narkhede
Pablo, That is a good suggestion. Ideally, the partitions across all topics should be distributed evenly across consumer streams instead of a per-topic based decision. There is no particular advantage to the current scheme of per-topic rebalancing that I can think of. Would you mind filing a JIRA

Re: ETL with Kafka

2013-01-07 Thread Ken Krugler
Hi Guy, On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote: > Hi, > Thanks David, > > I am looking for a product (open source or not), something like Talend or > Pentaho that in which I can design the ETL (from and to kafka), and run the > the ETL in Storm/ IronCount or even maybe I can run it in

Re: Kafka 0.8 - KeyedMessage?

2013-01-07 Thread Jason Huang
Jun, Thanks for the response. If I understand you correctly, messages with the same key will not be automatically stored at the same partition unless I implement a partition function to route the message based on the key? The quick start guide for 0.7 has the following: "Send a message with a par

Re: Consumer rebalance per topic

2013-01-07 Thread Jun Rao
Pablo, Currently, partition is the smallest unit that we distribute data among consumers (in the same consumer group). So, if the # of consumers is larger than the total number of partitions in a Kafka cluster (across all brokers), some consumers will never get any data. Such a decision is done on

Re: Kafka 0.8 - KeyedMessage?

2013-01-07 Thread Jun Rao
Jason, In 0.8, each message can optionally have a key. The key is retained as part of the message and will be stored in the broker. One can design a partition function to route the message based on the key. The default partitioner ignores the key and selects a partition at random. Thanks, Jun O

Consumer rebalance per topic

2013-01-07 Thread Pablo Barrera González
Hello We are starting to use Kafka in production but we found an unexpected (at least for me) behavior with the use of partitions. We have a bunch of topics with a few partitions each. We try to consume all data from several consumers (just one consumer group). The problem is in the rebalance ste

Kafka 0.8 - KeyedMessage?

2013-01-07 Thread Jason Huang
Hello, I did some search on the web but couldn't find any documentation for 0.8 so I am trying to ask here: KeyedMessage is introduced in 0.8.0: class KeyedMessage[K, V](val topic: String, val key: K, val message: V) Does the parameter "key" = "partition key"? If I build a KeyedMessage with a s