@Gerard Here are my initial benchmarks Producer on Machine 1 (m4.xlarge on AWS)Broker on Machine 2 (m4.xlarge on AWS) Consumer on Machine 3 (m4.xlarge on AWS) Data size 1.2KB Receive throughtput: ~24K Kafka Receive throughput ~58K (same exact configuration) All the benchmarks I ran are with default options So what pulsar guys are saying is that Kafka doesn't persist every message by default instead it would batch them for a period of time and then persist so if the JVM crashes before it persist all the messages that are in the batch are lost whereas pulsar guarantees strong durability by storing every message to write ahead log so messages are never lost. My question now is that what settings I need to change in Kafka so it will store every message? that way I am comparing apples to apples.
On Fri, Sep 23, 2016 12:06 AM, Gerard Klijs gerard.kl...@dizzit.com wrote: I haven't tried it myself, nor very likely will in the near future, but since it's also distributed I guess that with a large enough cluster you will be able to handle any load. One of the things kafka might be better at is more connecters available, a better at least once guarantee, better monitoring options. I really don't know, but if latancy is really important pulsar might be better, they used kafka before at yahoo and maybe still do for some stuff, recent work on https://github.com/yahoo/kafka-manager seems to suggest so. Alternatively you could configure a kafka topic/producer/consumer to limit latency, and that may also be enough to get a low enough latency. It would certainly be interesting to compare the two, with the same hardware, and with high load. On Thu, Sep 22, 2016 at 6:01 PM kant kodali <kanth...@gmail.com> wrote: > @Gerard Thanks for this. It looks good any benchmarks on this throughput > wise? > > > > > > > On Thu, Sep 22, 2016 7:45 AM, Gerard Klijs gerard.kl...@dizzit.com > wrote: > We have a simple application producing 1 msg/sec, and did nothing to > > optimise the performance and have about a 10 msec delay between consumer > > and producer. When low latency is important, maybe pulsar is a better fit, > > https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/ . > > > > > On Tue, Sep 20, 2016 at 2:24 PM Michael Freeman <mikfree...@gmail.com> > > wrote: > > > > > > Thanks for sharing Radek, great article. > > > > > > Michael > > > > > > > On 17 Sep 2016, at 21:13, Radoslaw Gruchalski <ra...@gruchalski.com> > > > wrote: > > > > > > > > Please read this article: > > > > > > > > > https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying > > > > > > > > – > > > > Best regards, > > > > Radek Gruchalski > > > > ra...@gruchalski.com > > > > > > > > > > > > On September 17, 2016 at 9:49:43 PM, kant kodali (kanth...@gmail.com) > > > wrote: > > > > > > > > Still it should be possible to implement using reactive streams right. > > > > Could you please enlighten me on what are the some major differences > you > > > > see > > > > between a commit log and a message queue? I see them being different > only > > > > in the > > > > implementation but not functionality wise so I would be glad to hear > your > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 12:39 PM, Radoslaw Gruchalski > ra...@gruchalski.com > > > > wrote: > > > > Kafka is not a queue. It’s a distributed commit log. > > > > > > > > > > > > > > > > > > > > – > > > > > > > > Best regards, > > > > > > > > Radek Gruchalski > > > > > > > > ra...@gruchalski.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On September 17, 2016 at 9:23:09 PM, kant kodali (kanth...@gmail.com) > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hmm...Looks like Kafka is written in Scala. There is this thing called > > > > > > > > reactive > > > > > > > > streams where a slow consumer can apply back pressure if they are > > > consuming > > > > > > > > slow. Even with Java this is possible with a Library called RxJava and > > > > > > > > these > > > > > > > > ideas will be incorporated in Java 9 as well. > > > > > > > > I still don't see why they would pick poll just to solve this one > problem > > > > > > > > and > > > > > > > > compensating on others. Poll just don't sound realtime. I heard from > some > > > > > > > > people > > > > > > > > that they would set poll to 100ms. Well 1) that is a lot of time. 2) > > > > > > > > Financial > > > > > > > > applications requires micro second latency. Kafka from what I > understand > > > > > > > > looks > > > > > > > > like has a very high latency and here is the article. > > > > > > > > http://bravenewgeek.com/dissecting-message-queues/ I usually don't go > by > > > > > > > > articles but I ran my own experiments on different queues and my > numbers > > > > > > > > are > > > > > > > > very close to this article so I would say whoever wrote this article > has > > > > > > > > done a > > > > > > > > good Job. 3) poll does generate unnecessary traffic in case if the data > > > > > > > > isn't > > > > > > > > available. > > > > > > > > Finally still not sure why they would pick poll() ? or do they plan on > > > > > > > > introducing reactive streams?Thanks,kant > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 5:14 AM, Radoslaw Gruchalski ra...@gruchalski.com > > > > > > > > wrote: > > > > > > > > I'm only guessing here regarding if this is the reason: > > > > > > > > > > > > > > > > > > > > Pull is much more sensible when a lot of data is pushed through. It > > > allows > > > > > > > > consumers consuming at their own pace, slow consumers do not slow the > > > > > > > > complete > > > > > > > > system down. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > Rad > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Sep 17, 2016 at 11:18 AM +0200, "kant kodali" < > > > kanth...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why did Kafka choose pull instead of push for a consumer? push sounds > > > like > > > > > > > > it > > > > > > > > > > > > > > > > > > > > is more realtime to me than poll and also wouldn't poll just keeps > > > polling > > > > > > > > even > > > > > > > > > > > > > > > > > > > > when they are no messages in the broker causing more traffic? please > > > > > > > > enlighten > > > > > > > > > > > > > > > > > > > > me > > >