Kafka High Level Consumer

2014-09-12 Thread Rahul Mittal
Hi ,
Is there a way in kafka to read data from all topics, from a consumer group
without specifying topics in a dynamic way.
That is if new topics are created on kafka brokers the consumer group
should figure it out and start reading from the new topic as well without
explicitly defining new topic for consumer.

-- 
*With Regards *
*RAHUL MITTAL*


Re: Kafka High Level Consumer

2014-09-12 Thread Joe Stein
You want to use the createMessageStreamsByFilter and pass in a WhiteList
with a regex that would include everything you want... here is e.g. how to
use  that
https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consumer/ConsoleConsumer.scala#L196

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Fri, Sep 12, 2014 at 5:49 AM, Rahul Mittal 
wrote:

> Hi ,
> Is there a way in kafka to read data from all topics, from a consumer group
> without specifying topics in a dynamic way.
> That is if new topics are created on kafka brokers the consumer group
> should figure it out and start reading from the new topic as well without
> explicitly defining new topic for consumer.
>
> --
> *With Regards *
> *RAHUL MITTAL*
>


Dynamic partitioning

2014-09-12 Thread István
Hi all,

My understanding is that with 0.8.1.x you can manually change the number of
partitions on the broker, and this change is going to be picked up by the
producers and consumers (high level).

kafka-topics.sh --alter --zookeeper zk.net:2181/stream --topic test
--partitions 3

Is that the case?

Is there a way to adjust the number of partitions based on the load? This
might not be the best way of scaling up and down (auto-scaling) Kafka, so
if there is a better way I would like to know how.

Thank you in advance,
Istvan


-- 
the sun shines for all


Re: Dynamic partitioning

2014-09-12 Thread Joe Stein
That command will change how many partitions the topic has.

What you are looking for I think is
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-6.ReassignPartitionsTool
which allows you to change what partitions are running on which replicas
and which replicas are the preferred leader... some more documentation on
that https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
(that entire section https://kafka.apache.org/documentation.html#basic_ops
has the type of information are you are looking for I think)

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Fri, Sep 12, 2014 at 11:33 AM, István  wrote:

> Hi all,
>
> My understanding is that with 0.8.1.x you can manually change the number of
> partitions on the broker, and this change is going to be picked up by the
> producers and consumers (high level).
>
> kafka-topics.sh --alter --zookeeper zk.net:2181/stream --topic test
> --partitions 3
>
> Is that the case?
>
> Is there a way to adjust the number of partitions based on the load? This
> might not be the best way of scaling up and down (auto-scaling) Kafka, so
> if there is a better way I would like to know how.
>
> Thank you in advance,
> Istvan
>
>
> --
> the sun shines for all
>


Getting replicas back in sync

2014-09-12 Thread Cory Watson
I noticed this morning that a few of our partitions do not have their full
complement of ISRs:

Topic:migration PartitionCount:16 ReplicationFactor:3
Configs:retention.bytes=32985348833280
Topic: migration Partition: 0 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
Topic: migration Partition: 1 Leader: 1 Replicas: 2,5,1 Isr: 1,5
Topic: migration Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,2
Topic: migration Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 4,2
Topic: migration Partition: 4 Leader: 5 Replicas: 5,3,4 Isr: 3,5,4
Topic: migration Partition: 5 Leader: 1 Replicas: 1,5,2 Isr: 1,5
Topic: migration Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 1,2
Topic: migration Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 2,4,3
Topic: migration Partition: 8 Leader: 4 Replicas: 4,3,5 Isr: 4,5
Topic: migration Partition: 9 Leader: 5 Replicas: 5,4,1 Isr: 1,5,4
Topic: migration Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,2
Topic: migration Partition: 11 Leader: 2 Replicas: 2,3,4 Isr: 2,3,4
Topic: migration Partition: 12 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
Topic: migration Partition: 13 Leader: 4 Replicas: 4,5,1 Isr: 1,5,4
Topic: migration Partition: 14 Leader: 5 Replicas: 5,1,2 Isr: 1,2,5
Topic: migration Partition: 15 Leader: 1 Replicas: 1,3,4 Isr: 1,4

I'm a bit confused by partitions with only 2 ISRs, yet that same broker is
leading other healthy partitions.

What is the appropriate way to kick a broker into re-syncing? I see lots of
chatter on docs and the mailing list about watching for this but from what
I can find it's supposed to come back in to sync. Mine aren't.

I considered just restarting the affected brokers (3 and 2 in this example)
but thought I'd ask first.

-- 
Cory Watson
Principal Infrastructure Engineer // Keen IO


Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

2014-09-12 Thread Jun Rao
One of the differences btw 0.7.x and 0.8.x is that the latter does the I/O
flushing in the background. So, in 0.7.x, more frequent I/O flushing will
increase the producer latency.

Thanks,

Jun

On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala 
wrote:

> Neha,
>
> Thanks. We are on 0.7.2. I have written on another thread on the list here
> about one of the reasons we are stuck - the absence of a PHP client for our
> front end producer systems. (On a side note, would appreciate if any inputs
> can be given on that thread as well)
>
> When you mean performance, do you mean throughput ? We did measure
> throughput with our default configuration of 1000 ms for the flush interval
> value, and the much lower 100 ms value I proposed on this thread. Our
> numbers were identical - for a single broker we were clocking at around
> 20,000 messages read per second on the consumer side. Using a small 'n'
> brokers we can easily exceed our target numbers. (The load was
> synthetically generated - using a likely message size and at a rate that
> seems reasonable for our producing side).
>
> Given this observation, do you suggest any further tests / measurements for
> us to be sure ? Would appreciate any inputs.
>
> Thanks
> Hemanth
>
> On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede 
> wrote:
>
> > I should mention that the impact of doing so is much higher wrt to
> taking a
> > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> > later, it should mostly be fine. You might want to keep a close tab on
> how
> > your iostat numbers are doing, to be sure.
> >
> > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala 
> > wrote:
> >
> > > Thanks Jun.
> > >
> > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao  wrote:
> > >
> > > > As long as the I/O load is reasonable, this is probably ok.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
> yhema...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In order to meet latency requirements for a system we are building,
> > we
> > > > > tested with different values of the above two parameters and found
> > that
> > > > > settings as low as 100 work best for us, balancing the required
> > > > throughput
> > > > > and latencies.
> > > > >
> > > > > I just wanted to check if 100 is a sane value, notwithstanding we
> are
> > > > > getting good results in our tests, anything we need to be aware of
> > > while
> > > > > setting to low values like this (apart from the throughput, which
> we
> > > see
> > > > is
> > > > > OK for us) ?
> > > > >
> > > > > Any experience reports will help.
> > > > >
> > > > > Thanks
> > > > > Hemanth
> > > > >
> > > >
> > >
> >
>


Re: Getting replicas back in sync

2014-09-12 Thread Kashyap Paidimarri
We're seeing the same behaviour today on our cluster. It is not like a
single broker went out of the cluster, rather a few partitions seem lazy on
every broker.

On Fri, Sep 12, 2014 at 9:31 PM, Cory Watson  wrote:

> I noticed this morning that a few of our partitions do not have their full
> complement of ISRs:
>
> Topic:migration PartitionCount:16 ReplicationFactor:3
> Configs:retention.bytes=32985348833280
> Topic: migration Partition: 0 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
> Topic: migration Partition: 1 Leader: 1 Replicas: 2,5,1 Isr: 1,5
> Topic: migration Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,2
> Topic: migration Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 4,2
> Topic: migration Partition: 4 Leader: 5 Replicas: 5,3,4 Isr: 3,5,4
> Topic: migration Partition: 5 Leader: 1 Replicas: 1,5,2 Isr: 1,5
> Topic: migration Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 1,2
> Topic: migration Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 2,4,3
> Topic: migration Partition: 8 Leader: 4 Replicas: 4,3,5 Isr: 4,5
> Topic: migration Partition: 9 Leader: 5 Replicas: 5,4,1 Isr: 1,5,4
> Topic: migration Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,2
> Topic: migration Partition: 11 Leader: 2 Replicas: 2,3,4 Isr: 2,3,4
> Topic: migration Partition: 12 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
> Topic: migration Partition: 13 Leader: 4 Replicas: 4,5,1 Isr: 1,5,4
> Topic: migration Partition: 14 Leader: 5 Replicas: 5,1,2 Isr: 1,2,5
> Topic: migration Partition: 15 Leader: 1 Replicas: 1,3,4 Isr: 1,4
>
> I'm a bit confused by partitions with only 2 ISRs, yet that same broker is
> leading other healthy partitions.
>
> What is the appropriate way to kick a broker into re-syncing? I see lots of
> chatter on docs and the mailing list about watching for this but from what
> I can find it's supposed to come back in to sync. Mine aren't.
>
> I considered just restarting the affected brokers (3 and 2 in this example)
> but thought I'd ask first.
>
> --
> Cory Watson
> Principal Infrastructure Engineer // Keen IO
>



-- 
“The difference between ramen and varelse is not in the creature judged,
but in the creature judging. When we declare an alien species to be ramen,
it does not mean that *they* have passed a threshold of moral maturity. It
means that *we* have.

—Demosthenes, *Letter to the Framlings*
”


Re: Getting replicas back in sync

2014-09-12 Thread Cory Watson
What follows is a guess on my part, but here's what I *think* was happening:

We hit an OOM that seems to've killed some of the replica fetcher threads.
I had a mishmash of replicas that weren't making progress as determined by
the JMX stats for the replica. The thread for which the JMX attribute was
named was also not running in the JVM…

We ended up having to roll through the cluster and increase the heap from
1G to 4G. This was pretty brutal since neither our readers (storm spout) or
our writers (python) dealt well with leadership changes.

Upside is that things are hunky dory again. This was a failure on my part
to monitor the under replicated partitions, which would've detected this
far sooner.

On Fri, Sep 12, 2014 at 12:42 PM, Kashyap Paidimarri 
wrote:

> We're seeing the same behaviour today on our cluster. It is not like a
> single broker went out of the cluster, rather a few partitions seem lazy on
> every broker.
>
> On Fri, Sep 12, 2014 at 9:31 PM, Cory Watson  wrote:
>
> > I noticed this morning that a few of our partitions do not have their
> full
> > complement of ISRs:
> >
> > Topic:migration PartitionCount:16 ReplicationFactor:3
> > Configs:retention.bytes=32985348833280
> > Topic: migration Partition: 0 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
> > Topic: migration Partition: 1 Leader: 1 Replicas: 2,5,1 Isr: 1,5
> > Topic: migration Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,2
> > Topic: migration Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 4,2
> > Topic: migration Partition: 4 Leader: 5 Replicas: 5,3,4 Isr: 3,5,4
> > Topic: migration Partition: 5 Leader: 1 Replicas: 1,5,2 Isr: 1,5
> > Topic: migration Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 1,2
> > Topic: migration Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 2,4,3
> > Topic: migration Partition: 8 Leader: 4 Replicas: 4,3,5 Isr: 4,5
> > Topic: migration Partition: 9 Leader: 5 Replicas: 5,4,1 Isr: 1,5,4
> > Topic: migration Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,2
> > Topic: migration Partition: 11 Leader: 2 Replicas: 2,3,4 Isr: 2,3,4
> > Topic: migration Partition: 12 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
> > Topic: migration Partition: 13 Leader: 4 Replicas: 4,5,1 Isr: 1,5,4
> > Topic: migration Partition: 14 Leader: 5 Replicas: 5,1,2 Isr: 1,2,5
> > Topic: migration Partition: 15 Leader: 1 Replicas: 1,3,4 Isr: 1,4
> >
> > I'm a bit confused by partitions with only 2 ISRs, yet that same broker
> is
> > leading other healthy partitions.
> >
> > What is the appropriate way to kick a broker into re-syncing? I see lots
> of
> > chatter on docs and the mailing list about watching for this but from
> what
> > I can find it's supposed to come back in to sync. Mine aren't.
> >
> > I considered just restarting the affected brokers (3 and 2 in this
> example)
> > but thought I'd ask first.
> >
> > --
> > Cory Watson
> > Principal Infrastructure Engineer // Keen IO
> >
>
>
>
> --
> “The difference between ramen and varelse is not in the creature judged,
> but in the creature judging. When we declare an alien species to be ramen,
> it does not mean that *they* have passed a threshold of moral maturity. It
> means that *we* have.
>
> —Demosthenes, *Letter to the Framlings*
> ”
>



-- 
Cory Watson
Principal Infrastructure Engineer // Keen IO


Right Tool

2014-09-12 Thread Patrick Barker
Hey, I'm new to kafka and I'm trying to get a handle on how it all works. I
want to integrate polyglot persistence into my application. Kafka looks
like exactly what I want just on a smaller scale. I am currently only
dealing with about 2,000 users, which may grow,  but is kafka a good use
case here, or is there another technology thats better suited?

Thanks


Re: Right Tool

2014-09-12 Thread Stephen Boesch
Hi Patrick,   Kafka can be used at any scale including small ones
(initially anyways). The issues I ran into personally various issues with
ZooKeeper management and a bug in deleting topics (is that fixed yet?)  In
any case you might try out Kafka  - given its highly performant, scalable,
and flexible backbone.   After that you will have little worry about scale
- given Kafka's use within massive web scale deployments.

2014-09-12 15:18 GMT-07:00 Patrick Barker :

> Hey, I'm new to kafka and I'm trying to get a handle on how it all works. I
> want to integrate polyglot persistence into my application. Kafka looks
> like exactly what I want just on a smaller scale. I am currently only
> dealing with about 2,000 users, which may grow,  but is kafka a good use
> case here, or is there another technology thats better suited?
>
> Thanks
>


Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

2014-09-12 Thread Neha Narkhede
Hemanth,

Specifically, you'd want to monitor
kafka:type=kafka.SocketServerStats:getMaxProduceRequestMs and
kafka:type=kafka.LogFlushStats:getMaxFlushMs. If the broker is under load
due to frequent flushes, it will almost certainly show up as spikes in the
flush latency and consequently the produce request latency. A side effect
of that is that your producer queue will back up and your producer will
eventually lose data.

Thanks,
Neha

On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala 
wrote:

> Neha,
>
> Thanks. We are on 0.7.2. I have written on another thread on the list here
> about one of the reasons we are stuck - the absence of a PHP client for our
> front end producer systems. (On a side note, would appreciate if any inputs
> can be given on that thread as well)
>
> When you mean performance, do you mean throughput ? We did measure
> throughput with our default configuration of 1000 ms for the flush interval
> value, and the much lower 100 ms value I proposed on this thread. Our
> numbers were identical - for a single broker we were clocking at around
> 20,000 messages read per second on the consumer side. Using a small 'n'
> brokers we can easily exceed our target numbers. (The load was
> synthetically generated - using a likely message size and at a rate that
> seems reasonable for our producing side).
>
> Given this observation, do you suggest any further tests / measurements for
> us to be sure ? Would appreciate any inputs.
>
> Thanks
> Hemanth
>
> On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede 
> wrote:
>
> > I should mention that the impact of doing so is much higher wrt to
> taking a
> > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> > later, it should mostly be fine. You might want to keep a close tab on
> how
> > your iostat numbers are doing, to be sure.
> >
> > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala 
> > wrote:
> >
> > > Thanks Jun.
> > >
> > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao  wrote:
> > >
> > > > As long as the I/O load is reasonable, this is probably ok.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
> yhema...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In order to meet latency requirements for a system we are building,
> > we
> > > > > tested with different values of the above two parameters and found
> > that
> > > > > settings as low as 100 work best for us, balancing the required
> > > > throughput
> > > > > and latencies.
> > > > >
> > > > > I just wanted to check if 100 is a sane value, notwithstanding we
> are
> > > > > getting good results in our tests, anything we need to be aware of
> > > while
> > > > > setting to low values like this (apart from the throughput, which
> we
> > > see
> > > > is
> > > > > OK for us) ?
> > > > >
> > > > > Any experience reports will help.
> > > > >
> > > > > Thanks
> > > > > Hemanth
> > > > >
> > > >
> > >
> >
>


Re: [Java New Producer Configuration] Maximum time spent in Queue in Async mode

2014-09-12 Thread Jun Rao
This is controlled by linger.ms in the new producer in trunk.

Thanks,

Jun

On Thu, Sep 11, 2014 at 5:56 PM, Bhavesh Mistry 
wrote:

> Hi Kafka team,
>
> How do I configure a max amount a message spend in Queue ?  In old
> producer, there is property called queue.buffering.max.ms and it is not
> present in new one.  Basically, if I just send one message that is less
> than batch size,  what is amount of time message will be in local Producer
> Queue ?
>
> How do I control the time a message time spent in Queue with configuration
> and batch size never reaches configuration limit  ?
>
> Thanks,
>
> Bhavesh
>


Re: Right Tool

2014-09-12 Thread cac...@gmail.com
I would say that it depends upon what you mean by persistence. I don't
believe Kafka is intended to be your permanent data store, but it would
work if you were basically write once with appropriate query patterns. It
would be an odd way to describe it though.

Christian

On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch  wrote:

> Hi Patrick,   Kafka can be used at any scale including small ones
> (initially anyways). The issues I ran into personally various issues with
> ZooKeeper management and a bug in deleting topics (is that fixed yet?)  In
> any case you might try out Kafka  - given its highly performant, scalable,
> and flexible backbone.   After that you will have little worry about scale
> - given Kafka's use within massive web scale deployments.
>
> 2014-09-12 15:18 GMT-07:00 Patrick Barker :
>
> > Hey, I'm new to kafka and I'm trying to get a handle on how it all
> works. I
> > want to integrate polyglot persistence into my application. Kafka looks
> > like exactly what I want just on a smaller scale. I am currently only
> > dealing with about 2,000 users, which may grow,  but is kafka a good use
> > case here, or is there another technology thats better suited?
> >
> > Thanks
> >
>


Re: Right Tool

2014-09-12 Thread Patrick Barker
O, I'm not trying to use it for persistence, I'm wanting to sync 3
databases: sql, mongo, graph. I want to publish to kafka and then have it
update the db's. I'm wanting to keep this as efficient as possible.

On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com  wrote:

> I would say that it depends upon what you mean by persistence. I don't
> believe Kafka is intended to be your permanent data store, but it would
> work if you were basically write once with appropriate query patterns. It
> would be an odd way to describe it though.
>
> Christian
>
> On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch  wrote:
>
> > Hi Patrick,   Kafka can be used at any scale including small ones
> > (initially anyways). The issues I ran into personally various issues with
> > ZooKeeper management and a bug in deleting topics (is that fixed yet?)
> In
> > any case you might try out Kafka  - given its highly performant,
> scalable,
> > and flexible backbone.   After that you will have little worry about
> scale
> > - given Kafka's use within massive web scale deployments.
> >
> > 2014-09-12 15:18 GMT-07:00 Patrick Barker :
> >
> > > Hey, I'm new to kafka and I'm trying to get a handle on how it all
> > works. I
> > > want to integrate polyglot persistence into my application. Kafka looks
> > > like exactly what I want just on a smaller scale. I am currently only
> > > dealing with about 2,000 users, which may grow,  but is kafka a good
> use
> > > case here, or is there another technology thats better suited?
> > >
> > > Thanks
> > >
> >
>


Re: Right Tool

2014-09-12 Thread cac...@gmail.com
Right that makes much more sense. You will probably want to make sure that
your updates are idempotent (or you could just accept the risk), though in
the SQL case you could commit your offset to the DB as part of the same
transaction (requires more custom stuff).

Christian

On Fri, Sep 12, 2014 at 5:45 PM, Patrick Barker 
wrote:

> O, I'm not trying to use it for persistence, I'm wanting to sync 3
> databases: sql, mongo, graph. I want to publish to kafka and then have it
> update the db's. I'm wanting to keep this as efficient as possible.
>
> On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com 
> wrote:
>
> > I would say that it depends upon what you mean by persistence. I don't
> > believe Kafka is intended to be your permanent data store, but it would
> > work if you were basically write once with appropriate query patterns. It
> > would be an odd way to describe it though.
> >
> > Christian
> >
> > On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch 
> wrote:
> >
> > > Hi Patrick,   Kafka can be used at any scale including small ones
> > > (initially anyways). The issues I ran into personally various issues
> with
> > > ZooKeeper management and a bug in deleting topics (is that fixed yet?)
> > In
> > > any case you might try out Kafka  - given its highly performant,
> > scalable,
> > > and flexible backbone.   After that you will have little worry about
> > scale
> > > - given Kafka's use within massive web scale deployments.
> > >
> > > 2014-09-12 15:18 GMT-07:00 Patrick Barker :
> > >
> > > > Hey, I'm new to kafka and I'm trying to get a handle on how it all
> > > works. I
> > > > want to integrate polyglot persistence into my application. Kafka
> looks
> > > > like exactly what I want just on a smaller scale. I am currently only
> > > > dealing with about 2,000 users, which may grow,  but is kafka a good
> > use
> > > > case here, or is there another technology thats better suited?
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: Right Tool

2014-09-12 Thread Steve Morin
What record format are you writing to Kafka with?

> On Sep 12, 2014, at 17:45, Patrick Barker  wrote:
> 
> O, I'm not trying to use it for persistence, I'm wanting to sync 3
> databases: sql, mongo, graph. I want to publish to kafka and then have it
> update the db's. I'm wanting to keep this as efficient as possible.
> 
>> On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com  wrote:
>> 
>> I would say that it depends upon what you mean by persistence. I don't
>> believe Kafka is intended to be your permanent data store, but it would
>> work if you were basically write once with appropriate query patterns. It
>> would be an odd way to describe it though.
>> 
>> Christian
>> 
>>> On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch  wrote:
>>> 
>>> Hi Patrick,   Kafka can be used at any scale including small ones
>>> (initially anyways). The issues I ran into personally various issues with
>>> ZooKeeper management and a bug in deleting topics (is that fixed yet?)
>> In
>>> any case you might try out Kafka  - given its highly performant,
>> scalable,
>>> and flexible backbone.   After that you will have little worry about
>> scale
>>> - given Kafka's use within massive web scale deployments.
>>> 
>>> 2014-09-12 15:18 GMT-07:00 Patrick Barker :
>>> 
 Hey, I'm new to kafka and I'm trying to get a handle on how it all
>>> works. I
 want to integrate polyglot persistence into my application. Kafka looks
 like exactly what I want just on a smaller scale. I am currently only
 dealing with about 2,000 users, which may grow,  but is kafka a good
>> use
 case here, or is there another technology thats better suited?
 
 Thanks
>> 


Re: Getting replicas back in sync

2014-09-12 Thread Stephen Sprague
i find this situation occurs frequently in my setup - only takes one day -
and blam - the leader board is all skewed to a single one.  not really sure
to overcome that once it happens so if there is a solution out there i'd be
interested.

On Fri, Sep 12, 2014 at 12:50 PM, Cory Watson  wrote:

> What follows is a guess on my part, but here's what I *think* was
> happening:
>
> We hit an OOM that seems to've killed some of the replica fetcher threads.
> I had a mishmash of replicas that weren't making progress as determined by
> the JMX stats for the replica. The thread for which the JMX attribute was
> named was also not running in the JVM…
>
> We ended up having to roll through the cluster and increase the heap from
> 1G to 4G. This was pretty brutal since neither our readers (storm spout) or
> our writers (python) dealt well with leadership changes.
>
> Upside is that things are hunky dory again. This was a failure on my part
> to monitor the under replicated partitions, which would've detected this
> far sooner.
>
> On Fri, Sep 12, 2014 at 12:42 PM, Kashyap Paidimarri 
> wrote:
>
> > We're seeing the same behaviour today on our cluster. It is not like a
> > single broker went out of the cluster, rather a few partitions seem lazy
> on
> > every broker.
> >
> > On Fri, Sep 12, 2014 at 9:31 PM, Cory Watson  wrote:
> >
> > > I noticed this morning that a few of our partitions do not have their
> > full
> > > complement of ISRs:
> > >
> > > Topic:migration PartitionCount:16 ReplicationFactor:3
> > > Configs:retention.bytes=32985348833280
> > > Topic: migration Partition: 0 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
> > > Topic: migration Partition: 1 Leader: 1 Replicas: 2,5,1 Isr: 1,5
> > > Topic: migration Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,2
> > > Topic: migration Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 4,2
> > > Topic: migration Partition: 4 Leader: 5 Replicas: 5,3,4 Isr: 3,5,4
> > > Topic: migration Partition: 5 Leader: 1 Replicas: 1,5,2 Isr: 1,5
> > > Topic: migration Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 1,2
> > > Topic: migration Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 2,4,3
> > > Topic: migration Partition: 8 Leader: 4 Replicas: 4,3,5 Isr: 4,5
> > > Topic: migration Partition: 9 Leader: 5 Replicas: 5,4,1 Isr: 1,5,4
> > > Topic: migration Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,2
> > > Topic: migration Partition: 11 Leader: 2 Replicas: 2,3,4 Isr: 2,3,4
> > > Topic: migration Partition: 12 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
> > > Topic: migration Partition: 13 Leader: 4 Replicas: 4,5,1 Isr: 1,5,4
> > > Topic: migration Partition: 14 Leader: 5 Replicas: 5,1,2 Isr: 1,2,5
> > > Topic: migration Partition: 15 Leader: 1 Replicas: 1,3,4 Isr: 1,4
> > >
> > > I'm a bit confused by partitions with only 2 ISRs, yet that same broker
> > is
> > > leading other healthy partitions.
> > >
> > > What is the appropriate way to kick a broker into re-syncing? I see
> lots
> > of
> > > chatter on docs and the mailing list about watching for this but from
> > what
> > > I can find it's supposed to come back in to sync. Mine aren't.
> > >
> > > I considered just restarting the affected brokers (3 and 2 in this
> > example)
> > > but thought I'd ask first.
> > >
> > > --
> > > Cory Watson
> > > Principal Infrastructure Engineer // Keen IO
> > >
> >
> >
> >
> > --
> > “The difference between ramen and varelse is not in the creature judged,
> > but in the creature judging. When we declare an alien species to be
> ramen,
> > it does not mean that *they* have passed a threshold of moral maturity.
> It
> > means that *we* have.
> >
> > —Demosthenes, *Letter to the Framlings*
> > ”
> >
>
>
>
> --
> Cory Watson
> Principal Infrastructure Engineer // Keen IO
>


Re: Getting replicas back in sync

2014-09-12 Thread Joe Stein
Hey Stephen, two things on that.

1) You need to figure out what is the root cause making the leader election
occur. Could be the brokers are having ZK timeouts and leader election is
occurring as result... if so you need to dig into why (look at all your
logs... You should look for some type of flapping in your monitoring system
metrics that match the time the leader change happens.

2) After this does happen you can run bin/kafka-preferred-replica-election.sh
--zookeeper $zklist which will make the preferred replicas the leader again
for the entire cluster and every topic.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop 
/


On Fri, Sep 12, 2014 at 9:04 PM, Stephen Sprague  wrote:

> i find this situation occurs frequently in my setup - only takes one day -
> and blam - the leader board is all skewed to a single one.  not really sure
> to overcome that once it happens so if there is a solution out there i'd be
> interested.
>
> On Fri, Sep 12, 2014 at 12:50 PM, Cory Watson  wrote:
>
> > What follows is a guess on my part, but here's what I *think* was
> > happening:
> >
> > We hit an OOM that seems to've killed some of the replica fetcher
> threads.
> > I had a mishmash of replicas that weren't making progress as determined
> by
> > the JMX stats for the replica. The thread for which the JMX attribute was
> > named was also not running in the JVM…
> >
> > We ended up having to roll through the cluster and increase the heap from
> > 1G to 4G. This was pretty brutal since neither our readers (storm spout)
> or
> > our writers (python) dealt well with leadership changes.
> >
> > Upside is that things are hunky dory again. This was a failure on my part
> > to monitor the under replicated partitions, which would've detected this
> > far sooner.
> >
> > On Fri, Sep 12, 2014 at 12:42 PM, Kashyap Paidimarri  >
> > wrote:
> >
> > > We're seeing the same behaviour today on our cluster. It is not like a
> > > single broker went out of the cluster, rather a few partitions seem
> lazy
> > on
> > > every broker.
> > >
> > > On Fri, Sep 12, 2014 at 9:31 PM, Cory Watson  wrote:
> > >
> > > > I noticed this morning that a few of our partitions do not have their
> > > full
> > > > complement of ISRs:
> > > >
> > > > Topic:migration PartitionCount:16 ReplicationFactor:3
> > > > Configs:retention.bytes=32985348833280
> > > > Topic: migration Partition: 0 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4
> > > > Topic: migration Partition: 1 Leader: 1 Replicas: 2,5,1 Isr: 1,5
> > > > Topic: migration Partition: 2 Leader: 1 Replicas: 3,1,2 Isr: 1,2
> > > > Topic: migration Partition: 3 Leader: 4 Replicas: 4,2,3 Isr: 4,2
> > > > Topic: migration Partition: 4 Leader: 5 Replicas: 5,3,4 Isr: 3,5,4
> > > > Topic: migration Partition: 5 Leader: 1 Replicas: 1,5,2 Isr: 1,5
> > > > Topic: migration Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 1,2
> > > > Topic: migration Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 2,4,3
> > > > Topic: migration Partition: 8 Leader: 4 Replicas: 4,3,5 Isr: 4,5
> > > > Topic: migration Partition: 9 Leader: 5 Replicas: 5,4,1 Isr: 1,5,4
> > > > Topic: migration Partition: 10 Leader: 1 Replicas: 1,2,3 Isr: 1,2
> > > > Topic: migration Partition: 11 Leader: 2 Replicas: 2,3,4 Isr: 2,3,4
> > > > Topic: migration Partition: 12 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5
> > > > Topic: migration Partition: 13 Leader: 4 Replicas: 4,5,1 Isr: 1,5,4
> > > > Topic: migration Partition: 14 Leader: 5 Replicas: 5,1,2 Isr: 1,2,5
> > > > Topic: migration Partition: 15 Leader: 1 Replicas: 1,3,4 Isr: 1,4
> > > >
> > > > I'm a bit confused by partitions with only 2 ISRs, yet that same
> broker
> > > is
> > > > leading other healthy partitions.
> > > >
> > > > What is the appropriate way to kick a broker into re-syncing? I see
> > lots
> > > of
> > > > chatter on docs and the mailing list about watching for this but from
> > > what
> > > > I can find it's supposed to come back in to sync. Mine aren't.
> > > >
> > > > I considered just restarting the affected brokers (3 and 2 in this
> > > example)
> > > > but thought I'd ask first.
> > > >
> > > > --
> > > > Cory Watson
> > > > Principal Infrastructure Engineer // Keen IO
> > > >
> > >
> > >
> > >
> > > --
> > > “The difference between ramen and varelse is not in the creature
> judged,
> > > but in the creature judging. When we declare an alien species to be
> > ramen,
> > > it does not mean that *they* have passed a threshold of moral maturity.
> > It
> > > means that *we* have.
> > >
> > > —Demosthenes, *Letter to the Framlings*
> > > ”
> > >
> >
> >
> >
> > --
> > Cory Watson
> > Principal Infrastructure Engineer // Keen IO
> >
>


Re: Right Tool

2014-09-12 Thread Patrick Barker
I'm just getting familiar with kafka, currently I just save everything to
all my db's in a single transaction, if any of them fail I roll them all
back. However, this is slowing my app down. So, as I understand it I could
write to kafka, close the transaction, and then it would keep on publishing
out to my databases. I'm not sure what format I would write it in yet, I
guess json

On Fri, Sep 12, 2014 at 7:00 PM, Steve Morin  wrote:

> What record format are you writing to Kafka with?
>
> > On Sep 12, 2014, at 17:45, Patrick Barker 
> wrote:
> >
> > O, I'm not trying to use it for persistence, I'm wanting to sync 3
> > databases: sql, mongo, graph. I want to publish to kafka and then have it
> > update the db's. I'm wanting to keep this as efficient as possible.
> >
> >> On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com 
> wrote:
> >>
> >> I would say that it depends upon what you mean by persistence. I don't
> >> believe Kafka is intended to be your permanent data store, but it would
> >> work if you were basically write once with appropriate query patterns.
> It
> >> would be an odd way to describe it though.
> >>
> >> Christian
> >>
> >>> On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch 
> wrote:
> >>>
> >>> Hi Patrick,   Kafka can be used at any scale including small ones
> >>> (initially anyways). The issues I ran into personally various issues
> with
> >>> ZooKeeper management and a bug in deleting topics (is that fixed yet?)
> >> In
> >>> any case you might try out Kafka  - given its highly performant,
> >> scalable,
> >>> and flexible backbone.   After that you will have little worry about
> >> scale
> >>> - given Kafka's use within massive web scale deployments.
> >>>
> >>> 2014-09-12 15:18 GMT-07:00 Patrick Barker :
> >>>
>  Hey, I'm new to kafka and I'm trying to get a handle on how it all
> >>> works. I
>  want to integrate polyglot persistence into my application. Kafka
> looks
>  like exactly what I want just on a smaller scale. I am currently only
>  dealing with about 2,000 users, which may grow,  but is kafka a good
> >> use
>  case here, or is there another technology thats better suited?
> 
>  Thanks
> >>
>


Re: Right Tool

2014-09-12 Thread Steve Morin
You would need make sure they were all persisted down properly to each
database?  Why are you persisting it to three different databases (sql,
mongo, graph)?
-Steve

On Fri, Sep 12, 2014 at 7:35 PM, Patrick Barker 
wrote:

> I'm just getting familiar with kafka, currently I just save everything to
> all my db's in a single transaction, if any of them fail I roll them all
> back. However, this is slowing my app down. So, as I understand it I could
> write to kafka, close the transaction, and then it would keep on publishing
> out to my databases. I'm not sure what format I would write it in yet, I
> guess json
>
> On Fri, Sep 12, 2014 at 7:00 PM, Steve Morin 
> wrote:
>
> > What record format are you writing to Kafka with?
> >
> > > On Sep 12, 2014, at 17:45, Patrick Barker 
> > wrote:
> > >
> > > O, I'm not trying to use it for persistence, I'm wanting to sync 3
> > > databases: sql, mongo, graph. I want to publish to kafka and then have
> it
> > > update the db's. I'm wanting to keep this as efficient as possible.
> > >
> > >> On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com 
> > wrote:
> > >>
> > >> I would say that it depends upon what you mean by persistence. I don't
> > >> believe Kafka is intended to be your permanent data store, but it
> would
> > >> work if you were basically write once with appropriate query patterns.
> > It
> > >> would be an odd way to describe it though.
> > >>
> > >> Christian
> > >>
> > >>> On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch 
> > wrote:
> > >>>
> > >>> Hi Patrick,   Kafka can be used at any scale including small ones
> > >>> (initially anyways). The issues I ran into personally various issues
> > with
> > >>> ZooKeeper management and a bug in deleting topics (is that fixed
> yet?)
> > >> In
> > >>> any case you might try out Kafka  - given its highly performant,
> > >> scalable,
> > >>> and flexible backbone.   After that you will have little worry about
> > >> scale
> > >>> - given Kafka's use within massive web scale deployments.
> > >>>
> > >>> 2014-09-12 15:18 GMT-07:00 Patrick Barker  >:
> > >>>
> >  Hey, I'm new to kafka and I'm trying to get a handle on how it all
> > >>> works. I
> >  want to integrate polyglot persistence into my application. Kafka
> > looks
> >  like exactly what I want just on a smaller scale. I am currently
> only
> >  dealing with about 2,000 users, which may grow,  but is kafka a good
> > >> use
> >  case here, or is there another technology thats better suited?
> > 
> >  Thanks
> > >>
> >
>


Re: Right Tool

2014-09-12 Thread Patrick Barker
Yeah, I would want to know they made it there. I like to use polyglot for
the availability of data, I build my recommendation engine in graph, my
bulk data is in mongo, and sql is kind of my default/ad hoc store. This is
working really well for me, but I want to ease up on the payload within my
app and provide a more streamlined synchronization.

On Fri, Sep 12, 2014 at 8:42 PM, Steve Morin  wrote:

> You would need make sure they were all persisted down properly to each
> database?  Why are you persisting it to three different databases (sql,
> mongo, graph)?
> -Steve
>
> On Fri, Sep 12, 2014 at 7:35 PM, Patrick Barker  >
> wrote:
>
> > I'm just getting familiar with kafka, currently I just save everything to
> > all my db's in a single transaction, if any of them fail I roll them all
> > back. However, this is slowing my app down. So, as I understand it I
> could
> > write to kafka, close the transaction, and then it would keep on
> publishing
> > out to my databases. I'm not sure what format I would write it in yet, I
> > guess json
> >
> > On Fri, Sep 12, 2014 at 7:00 PM, Steve Morin 
> > wrote:
> >
> > > What record format are you writing to Kafka with?
> > >
> > > > On Sep 12, 2014, at 17:45, Patrick Barker  >
> > > wrote:
> > > >
> > > > O, I'm not trying to use it for persistence, I'm wanting to sync 3
> > > > databases: sql, mongo, graph. I want to publish to kafka and then
> have
> > it
> > > > update the db's. I'm wanting to keep this as efficient as possible.
> > > >
> > > >> On Fri, Sep 12, 2014 at 6:39 PM, cac...@gmail.com  >
> > > wrote:
> > > >>
> > > >> I would say that it depends upon what you mean by persistence. I
> don't
> > > >> believe Kafka is intended to be your permanent data store, but it
> > would
> > > >> work if you were basically write once with appropriate query
> patterns.
> > > It
> > > >> would be an odd way to describe it though.
> > > >>
> > > >> Christian
> > > >>
> > > >>> On Fri, Sep 12, 2014 at 4:05 PM, Stephen Boesch  >
> > > wrote:
> > > >>>
> > > >>> Hi Patrick,   Kafka can be used at any scale including small ones
> > > >>> (initially anyways). The issues I ran into personally various
> issues
> > > with
> > > >>> ZooKeeper management and a bug in deleting topics (is that fixed
> > yet?)
> > > >> In
> > > >>> any case you might try out Kafka  - given its highly performant,
> > > >> scalable,
> > > >>> and flexible backbone.   After that you will have little worry
> about
> > > >> scale
> > > >>> - given Kafka's use within massive web scale deployments.
> > > >>>
> > > >>> 2014-09-12 15:18 GMT-07:00 Patrick Barker <
> patrickbarke...@gmail.com
> > >:
> > > >>>
> > >  Hey, I'm new to kafka and I'm trying to get a handle on how it all
> > > >>> works. I
> > >  want to integrate polyglot persistence into my application. Kafka
> > > looks
> > >  like exactly what I want just on a smaller scale. I am currently
> > only
> > >  dealing with about 2,000 users, which may grow,  but is kafka a
> good
> > > >> use
> > >  case here, or is there another technology thats better suited?
> > > 
> > >  Thanks
> > > >>
> > >
> >
>