Hi Khurrum, Is ready now. https://github.com/Landoop/stream-reactor
Regards Andrew From: Khurrum Nasim Sent: Thursday, 7 December, 08:36 Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming To: dev@kafka.apache.org Cc: us...@kafka.apache.org Andrew, Thank you! Is there any estimation on when I can try out Kafka Connect with Pulsar? Can you also point me when I can find the Kafka-to-Pulsar source and sink? - KN On Wed, Dec 6, 2017 at 2:48 AM, Andrew Stevenson wrote: > In terms of building out the Apache Pulsar ecosystem, Landoop is working > on porting our Kafka Connect Connectors to Pulsars framework, > We already have a Kafka to Pulsar source and sink. > > > On 05/12/2017, 19:59, "Jason Gustafson" wrote: > > > I believe a lot of users are using the kafka high level consumers, > it is > > effectively an **unordered** messaging/streaming pattern. People > using high > > level consumers don't actually need any ordering guarantees. In this > sense, > > a *shared* subscription in Apache Pulsar seems to be better than > current > > Kafka's consumer group model, as it allows the consumption rate not > limited > > by the number of partitions, can actually grow beyond the number of > > partitions. We do see a lot of operational pain points on production > coming > > from consumer lags, which I think it is very commonly seen during > partition > > rebalancing in a consumer group. Selective acking seems to provide a > finer > > granularity on acknowledgment, which can be actually good for > avoiding > > consumer lags and avoid reprocessing messages during partition > rebalance. > > > Yeah, I'm not sure about this. I'd be interested to understand the > design > of this feature a little better. In practice, when ordering is > unimportant, > adding partitions seems not too big of a deal. Also, I'm aware of > active > efforts to make rebalancing less of a pain point for our users ;) > > The last question, from users perspective, since both kafka and pulsar > are > > distributed pub/sub messaging systems and both of them at the ASF, > is there > > any possibility for these two projects to collaborate, e.g. kafka > adopts > > pulsar's messaging model, pulsar can use kafka streams and kafka > connect. I > > believe a lot of people in the mailing list might have same or > similar > > question. From end-user perspective, if such collaboration can > happen, that > > is going to great for users and also the ASF. I would like to hear > any > > thoughts from kafka committers and pmc members. > > > I see this a little differently. Although there is some overlap > between the > projects, they have quite different underlying philosophies (as Marina > alluded to) and I hope this will take them on different trajectories > over > time. That would ultimately benefit users more than having two > competing > projects solving all the same use cases. We don't need to try to cram > Pulsar features into Kafka if it's not a good fit and vice versa. At > the > same time, where capabilities do overlap, we can try to learn from > their > experience and they can learn from ours. The example of message > retention > seemed like one of these instances since there are legitimate use > cases and > Pulsar's approach has some benefits. > > > -Jason > > > > On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim > > wrote: > > > Hi Marina, > > > > > > On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova < > ppine7...@protonmail.com> > > wrote: > > > > > Hi, > > > I don't think it would be such a great idea to start modifying the > very > > > foundation of Kafka's design to accommodate more and more extra use > > cases. > > > Kafka because so widely adopted and popular because its creator > made a > > > brilliant decision to make it "dumb broker - smart consumer" type > of the > > > system, where there is no to minimal dependencies between Kafka > brokers > > and > > > Consumers. This is what make Kafka blazingly fast and truly > scalable - > > able > > > to handle thousands of Consumers with no impact on performance. > > > > > > > I am not sure I agree with this. I think from end-user perspective, > what > > users expect is a ultra simple streaming/messaging system: > applications > > sends messages, messaging systems store and dispatch them, consumers > > consume the messages and tell the systems that they already consumed > the > > messages. IMO whether a centralized management or decentralize > management > > doesn't really matter here if kafka is able to do things without > impacting > > performance. > > > > sometimes people assume that smarter brokers (like traditional > messaging > > brokers) can not offer high throughput and scalability, because they > do > > "too many things". but I took a look at Pulsar documentation and > their > > presentation. There are a few metrics very impressive: > > > > https://image.slidesharecdn.com/apachepulsar-171113225233/ > > 95/bdam-multitenant-and-georeplication-messaging-with- > > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2- > > 638.jpg?cb=1510613990 > > > > > 95/bdam-multitenant-and-georeplication-messaging-with- > > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2- > > 638.jpg?cb=1510613990>- > > 1.8 million messages/second per topic partition > > - 99pct producing latency less than 5ms with stronger durability > > - support millions of topics > > - it also supports at-least-once and effectively-once producing > > > > Those metrics sound appealing to me if pulsar supports both > streaming and > > queuing. I am wondering if anyone in the community tries to do a > > performance testing or benchmark between Pulsar and Kafka. I would > love to > > see such results that can help people understand both systems, pros > and > > cons. > > > > > > - KN > > > > > > > > > > > > One unfortunate consequence of becoming so popular - is that more > and > > more > > > people are trying to fit Kafka into their architectures not > because it > > > really fits, but because everybody else is doing so :) And this > causes > > many > > > requests to support more and more reacher functionality to be > added to > > > Kafka - like transactional messages, more complex acks, centralized > > > consumer management, etc. > > > > > > If you really need those feature - there are other systems that are > > > designed for that. > > > > > > I truly worry that if all those changes are added to Core Kafka - > it will > > > become just another "do it all" enterprise-level monster that will > be > > able > > > to do it all but at a price of mediocre performance and ten-fold > > increased > > > complexity (and, thus, management and possibility of bugs). Sure, > there > > has > > > to be innovation and new features added - but maybe those that > require > > > major changes to the Kafka's core principles should go into > separate > > > frameworks, plug-ing (like Connectors) or something in that line, > rather > > > that packing it all into the Core Kafka. > > > > > > Just my 2 cents :) > > > > > > Marina > > > > > > Sent with [ProtonMail](https://protonmail.com) Secure Email. > > > > > > > -------- Original Message -------- > > > > Subject: Re: Comparing Pulsar and Kafka: unified queuing and > streaming > > > > Local Time: December 4, 2017 2:56 PM > > > > UTC Time: December 4, 2017 7:56 PM > > > > From: ja...@confluent.io > > > > To: dev@kafka.apache.org > > > > Kafka Users > > > > > > > > Hi Khurrum, > > > > > > > > Thanks for sharing the article. I think one interesting aspect of > > Pulsar > > > > that stands out to me is its notion of a subscription and how it > > impacts > > > > message retention. In Kafka, consumers are more loosely coupled > and > > > > retention is enforced independently of consumption. There are > some > > > > scenarios I can imagine where the tighter coupling might be > beneficial. > > > For > > > > example, in Kafka Streams, we often use intermediate topics to > store > > the > > > > data in one stage of the topology's computation. These topics are > > > > exclusively owned by the application and once the messages have > been > > > > successfully received by the next stage, we do not need to > retain them > > > > further. But since consumption is independent of retention, we > either > > > have > > > > to choose a large retention time and deal with some temporary > storage > > > waste > > > > or we use a low retention time and possibly lose some messages > during > > an > > > > outage. > > > > > > > > We have solved this problem to some extent in Kafka by > introducing an > > API > > > > to delete the records in a partition up to a certain offset, but > this > > > > effectively puts the burden of this use case on clients. It > would be > > > > interesting to consider whether we could do something like > Pulsar in > > the > > > > Kafka broker. For example, we have a consumer group coordinator > which > > is > > > > able to track the progress of the group through its committed > offsets. > > It > > > > might be possible to extend it to automatically delete records > in a > > topic > > > > after offsets are committed if the topic is known to be > exclusively > > owned > > > > by the consumer group. We already have the DeleteRecords API > that need, > > > so > > > > maybe this is "just" a matter of some additional topic metadata. > I'd be > > > > interested to hear whether this kind of use case is common among > our > > > users. > > > > > > > > -Jason > > > > > > > > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim > khurrumnas...@gmail.com > > > > wrote: > > > > > > > >> Dear Kafka Community, > > > >> I happened to read this blog post comparing the messaging model > > between > > > >> Apache Pulsar and Apache Kafka. It sounds interesting. Apache > Pulsar > > > claims > > > >> to unify streaming (kafka) and queuing (rabbitmq) in one > unified API. > > > >> Pulsar also seems to support Kafka API. Have anyone taken a > look at > > > Pulsar? > > > >> How does the community think about this? Pulsar is also an > Apache > > > project. > > > >> Is there any collaboration can happen between these two > projects? > > > >> https://streaml.io/blog/pulsar-streaming-queuing/ > > > >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see > what other > > > >> people think about this. > > > >> > > > >> - KN > > > > > > > > >