Hello,
We have found that, for our application, having a number of total
partitions as a multiple of the number of consumer hosts is beneficial.
Because of this, whenever we add or remove consumer hosts, we have to
change the number of partitions in the server config.
What are best practices for
Hey All,
There has been interesting in getting something a little more sophisticated
then the Input- and OutputFormat we include in contrib for reading Kafka
data into HDFS.
Internally at LinkedIn we have had a pretty sophisticated system that we
use for Kafka ETL. It automatically discovers topi
On Jan 7, 2013, at 2:05pm, Russell Jurney wrote:
> I previously posted a link to contrib in this thread.
Thanks, I missed that - all I saw was the long URL to the Talend integration
doc on Hortonworks.
> No, its not a
> cascading tap. Its a complete job. One to read kafka events to hdfs, one t
I previously posted a link to contrib in this thread. No, its not a
cascading tap. Its a complete job. One to read kafka events to hdfs, one to
generate kafka events from hdfs. ETL can happen in between.
On Jan 7, 2013 1:51 PM, "Ken Krugler" wrote:
> Hi Russell,
>
> On Jan 7, 2013, at 12:48pm, Ru
Hi Russell,
On Jan 7, 2013, at 12:48pm, Russell Jurney wrote:
> Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans
> Hadoop records, which may be ETL'd first, and emits new Kafka events.
Can you point me at the code?
And just to confirm, you're talking about a Cascading Tap,
OK.
thanks!
Jason
On Mon, Jan 7, 2013 at 3:55 PM, Neha Narkhede wrote:
> Jason,
>
> In 0.8, we changed the zookeeper data structures as well. You might want to
> either use a new zk namespace or delete all of your zk data and restart 0.8.
>
> Thanks,
> Neha
>
>
> On Mon, Jan 7, 2013 at 12:28 PM
Jason,
In 0.8, we changed the zookeeper data structures as well. You might want to
either use a new zk namespace or delete all of your zk data and restart 0.8.
Thanks,
Neha
On Mon, Jan 7, 2013 at 12:28 PM, Jason Huang wrote:
> Never mind - I was able to start the server after removing the
> p
Just to be clear - a Kafka 'Tap' of sorts exists in contrib: it scans
Hadoop records, which may be ETL'd first, and emits new Kafka events.
On Mon, Jan 7, 2013 at 9:57 AM, Ken Krugler wrote:
> Hi Guy,
>
> On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote:
>
> > Hi,
> > Thanks David,
> >
> > I am lo
Never mind - I was able to start the server after removing the
previous installed 0.7.2 instance of Kafka.
Jason
On Mon, Jan 7, 2013 at 2:56 PM, Jason Huang wrote:
> Hello,
>
> I am trying out Kafka 0.8 using only one broker and I am unable to
> start the server.
>
> With the instruction from th
Hello,
I am trying out Kafka 0.8 using only one broker and I am unable to
start the server.
With the instruction from this link -
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.8+Quick+Start,
I was able to download and install 0.8. Since I only have one
machine, I did the following co
OOOH that's awesome :D !!
I'll take a look at this shiny stuff right away!
Thanks a lot :D !!
--
Felix
On Mon, Jan 7, 2013 at 2:28 PM, Neha Narkhede wrote:
> > Finally, I haven't seen anything mentioned about the LinkedIn
> > kafka/avro/hadoop ETL stuff we've been hearing about for a while.
>
> Finally, I haven't seen anything mentioned about the LinkedIn
> kafka/avro/hadoop ETL stuff we've been hearing about for a while.
>
The LinkedIn ETL kafka/avro/hadoop project is open sourced. See here -
https://github.com/linkedin/camus/wiki/Camus-Overview
Thanks,
Neha
Hello all,
I haven't been reading the list for the past couple weeks, I've quite
busy... but I've searched and didn't find any discussions related to my
current issue, so I thought I'd ask while I'm still investigating on my
own...!
We've been running a Kafka 0.7.0 cluster without problem for a w
0.8 sounds really great!
OK, I will try after you release stable build of 0.8
Thank you
Best, Jae
On Sun, Jan 6, 2013 at 10:36 AM, Neha Narkhede wrote:
> In 0.8, we will provide a way for your to shutdown the broker in a
> controlled fashion. What that would include is moving all the leaders aw
Thank you Jun and Neha
I was trying to avoid adding more partitions. I have enough partitions if
you count all partitions in all topics. I understand the problem with
different data load per topic but the current schema does not solve this
problem either so we shouldn't be worse is we consider all
I see.
This makes sense.
thanks Neha,
Jason
On Mon, Jan 7, 2013 at 1:52 PM, Neha Narkhede wrote:
> Jason,
>
> If you specify a key for a message but do not explicitly wire in a
> partitioner, messages with the same key will still land up in the same
> partition. This is because we use a defaul
Jason,
If you specify a key for a message but do not explicitly wire in a
partitioner, messages with the same key will still land up in the same
partition. This is because we use a default partitioner that does a simple
hash(key) % num_partitions.
Thanks,
Neha
On Mon, Jan 7, 2013 at 9:30 AM, Ja
Pablo,
That is a good suggestion. Ideally, the partitions across all topics should
be distributed evenly across consumer streams instead of a per-topic based
decision. There is no particular advantage to the current scheme of
per-topic rebalancing that I can think of. Would you mind filing a JIRA
Hi Guy,
On Jan 6, 2013, at 11:11pm, Guy Doulberg wrote:
> Hi,
> Thanks David,
>
> I am looking for a product (open source or not), something like Talend or
> Pentaho that in which I can design the ETL (from and to kafka), and run the
> the ETL in Storm/ IronCount or even maybe I can run it in
Jun,
Thanks for the response. If I understand you correctly, messages with
the same key will not be automatically stored at the same partition
unless I implement a partition function to route the message based on
the key?
The quick start guide for 0.7 has the following:
"Send a message with a par
Pablo,
Currently, partition is the smallest unit that we distribute data among
consumers (in the same consumer group). So, if the # of consumers is larger
than the total number of partitions in a Kafka cluster (across all
brokers), some consumers will never get any data. Such a decision is done
on
Jason,
In 0.8, each message can optionally have a key. The key is retained as part
of the message and will be stored in the broker. One can design a partition
function to route the message based on the key. The default partitioner
ignores the key and selects a partition at random.
Thanks,
Jun
O
Hello
We are starting to use Kafka in production but we found an unexpected (at
least for me) behavior with the use of partitions. We have a bunch of
topics with a few partitions each. We try to consume all data from several
consumers (just one consumer group).
The problem is in the rebalance ste
Hello,
I did some search on the web but couldn't find any documentation for
0.8 so I am trying to ask here:
KeyedMessage is introduced in 0.8.0:
class KeyedMessage[K, V](val topic: String, val key: K, val message: V)
Does the parameter "key" = "partition key"?
If I build a KeyedMessage with a s
24 matches
Mail list logo