Re: why did Kafka choose pull instead of push for a consumer ?

kant kodali Fri, 23 Sep 2016 03:18:08 -0700

@Gerard
Here are my initial benchmarks
Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS)
Consumer on Machine 3 (m4.xlarge on AWS)
Data size 1.2KB
Receive throughtput: ~24K
Kafka Receive throughput ~58K (same exact configuration)
All the benchmarks I ran are with default options So what pulsar guys are saying
is that Kafka doesn't persist every message by default instead it would batch
them for a period of time and then persist so if the JVM crashes before it
persist all the messages that are in the batch are lost whereas pulsar
guarantees strong durability by storing every message to write ahead log so
messages are never lost.
My question now is that what settings I need to change in Kafka so it will store
every message? that way I am comparing apples to apples.






On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs [email protected]
wrote:
I haven't tried it myself, nor very likely will in the near future, but

since it's also distributed I guess that with a large enough cluster you

will be able to handle any load. One of the things kafka might be better at

is more connecters available, a better at least once guarantee, better

monitoring options. I really don't know, but if latancy is really important

pulsar might be better, they used kafka before at yahoo and maybe still do

for some stuff, recent work on https://github.com/yahoo/kafka-manager seems

to suggest so.

Alternatively you could configure a kafka topic/producer/consumer to limit

latency, and that may also be enough to get a low enough latency. It would

certainly be interesting to compare the two, with the same hardware, and

with high load.




On Thu, Sep 22, 2016 at 6:01 PM kant kodali <[email protected]> wrote:




> @Gerard Thanks for this. It looks good any benchmarks on this throughput

> wise?

>

>

>

>

>

>

> On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs [email protected]

> wrote:

> We have a simple application producing 1 msg/sec, and did nothing to

>

> optimise the performance and have about a 10 msec delay between consumer

>

> and producer. When low latency is important, maybe pulsar is a better fit,

>

> https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ .

>

>

>

>

> On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <[email protected]>

>

> wrote:

>

>

>

>

> > Thanks for sharing Radek, great article.

>

> >

>

> > Michael

>

> >

>

> > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <[email protected]>

>

> > wrote:

>

> > >

>

> > > Please read this article:

>

> > >

>

> >

>

>
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

>

> > >

>

> > > –

>

> > > Best regards,

>

> > > Radek Gruchalski

>

> > > [email protected]

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:49:43 PM, kant kodali ([email protected])

>

> > wrote:

>

> > >

>

> > > Still it should be possible to implement using reactive streams right.

>

> > > Could you please enlighten me on what are the some major differences

> you

>

> > > see

>

> > > between a commit log and a message queue? I see them being different

> only

>

> > > in the

>

> > > implementation but not functionality wise so I would be glad to hear

> your

>

> > > thoughts.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski

> [email protected]

>

> > > wrote:

>

> > > Kafka is not a queue. It’s a distributed commit log.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > –

>

> > >

>

> > > Best regards,

>

> > >

>

> > > Radek Gruchalski

>

> > >

>

> > > [email protected]

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On September 17, 2016 at 9:23:09 PM, kant kodali ([email protected])

>

> > > wrote:

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Hmm...Looks like Kafka is written in Scala. There is this thing called

>

> > >

>

> > > reactive

>

> > >

>

> > > streams where a slow consumer can apply back pressure if they are

>

> > consuming

>

> > >

>

> > > slow. Even with Java this is possible with a Library called RxJava and

>

> > >

>

> > > these

>

> > >

>

> > > ideas will be incorporated in Java 9 as well.

>

> > >

>

> > > I still don't see why they would pick poll just to solve this one

> problem

>

> > >

>

> > > and

>

> > >

>

> > > compensating on others. Poll just don't sound realtime. I heard from

> some

>

> > >

>

> > > people

>

> > >

>

> > > that they would set poll to 100ms. Well 1) that is a lot of time. 2)

>

> > >

>

> > > Financial

>

> > >

>

> > > applications requires micro second latency. Kafka from what I

> understand

>

> > >

>

> > > looks

>

> > >

>

> > > like has a very high latency and here is the article.

>

> > >

>

> > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go

> by

>

> > >

>

> > > articles but I ran my own experiments on different queues and my

> numbers

>

> > >

>

> > > are

>

> > >

>

> > > very close to this article so I would say whoever wrote this article

> has

>

> > >

>

> > > done a

>

> > >

>

> > > good Job. 3) poll does generate unnecessary traffic in case if the data

>

> > >

>

> > > isn't

>

> > >

>

> > > available.

>

> > >

>

> > > Finally still not sure why they would pick poll() ? or do they plan on

>

> > >

>

> > > introducing reactive streams?Thanks,kant

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski [email protected]

>

> > >

>

> > > wrote:

>

> > >

>

> > > I'm only guessing here regarding if this is the reason:

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Pull is much more sensible when a lot of data is pushed through. It

>

> > allows

>

> > >

>

> > > consumers consuming at their own pace, slow consumers do not slow the

>

> > >

>

> > > complete

>

> > >

>

> > > system down.

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > --

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Best regards,

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > Rad

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" <

>

> > [email protected]>

>

> > >

>

> > > wrote:

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > why did Kafka choose pull instead of push for a consumer? push sounds

>

> > like

>

> > >

>

> > > it

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > is more realtime to me than poll and also wouldn't poll just keeps

>

> > polling

>

> > >

>

> > > even

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > when they are no messages in the broker causing more traffic? please

>

> > >

>

> > > enlighten

>

> > >

>

> > >

>

> > >

>

> > >

>

> > > me

>

> >

Re: why did Kafka choose pull instead of push for a consumer ?

Reply via email to