Slack digest for #dev - 2020-03-11

Apache Pulsar Slack Wed, 11 Mar 2020 02:12:10 -0700

2020-03-10 15:23:51 UTC - Liam Condon: hey y'all - I'm finishing up some work 
on the NodeJS client to add topic schema support (and documentation for this 
"new" functionality) and am a bit curious about a few things. most importantly, 
where should serialization/deserialization actually occur for messages with 
Protobuf or AVRO schema types? Is it something that developers should be doing 
before passing the message to `producer.send` ?
----
2020-03-10 16:01:53 UTC - Andy Papia: @Andy Papia has joined the channel
----
2020-03-10 16:29:30 UTC - Sijie Guo: It will be the schema implemention doing 
the serialization and deserialization. The client configures to use a schema 
and the client passes in objects and the schema handles serialization and 
deserialization. 
----
2020-03-10 16:29:51 UTC - Sijie Guo: You can check java schema implementations 
as references 
----
2020-03-10 16:31:58 UTC - Liam Condon: I figured as much after reading more in 
the schema section of the docs, just haven't looked far enough into the cpp 
client code to figure out where that happens exactly...
----
2020-03-10 16:42:28 UTC - Sijie Guo: so you need to serialize and deserialize 
at nodejs level and pass the serialized bytes and the schema information to the 
cpp side
----
2020-03-10 16:42:49 UTC - Sijie Guo: python and go use the cpp client. so they 
are good examples to check.
----
2020-03-10 16:49:17 UTC - Liam Condon: interesting - that makes more sense to 
me. was trying to figure out what a message would look like in a nodejs 
consumer on a topic with a schema if the cpp client was handling the message 
serialization/deserialization.
----
2020-03-10 16:51:20 UTC - Liam Condon: but now I see that the python client is 
handling the message serialization/deserialization outside of binding to the 
c/cpp client lib using a set of helper classes
----
2020-03-10 17:29:57 UTC - Sijie Guo: cpp client doesn’t handle the 
serialization and deserialization. it only handles passing schema info as part 
of wire protocols.
----
2020-03-10 17:30:11 UTC - Sijie Guo: the serialization and deserialization is 
done at the language client level.
----
2020-03-10 17:31:03 UTC - Liam Condon: yup, thanks for helping me figure that 
out. looks like I've got a bit more work to do to fully complete schema support 
in the nodejs client :smile:
----
2020-03-10 17:48:49 UTC - Sijie Guo: looking forward to your contribution
----
2020-03-10 19:32:46 UTC - Evan Furman: @Evan Furman has joined the channel
----
2020-03-10 22:15:21 UTC - Eugen: Just an FYI - we have a client here in Japan 
who wants us to build their next generation stock market data feed platform. 
They suggested using Kafka, we suggested Pulsar, and we tried to convince them 
of Pulsar's benefits. And although Pulsar shines in incredibly high throughput 
numbers even when fsyncing, and despite Pulsar's architectural benefits (broker 
/ bookie separation), what counted in the end for our client was Kafka's 
adoption and user base, i.e. a conservative decision.
Before making the decision, they also got a presentation by a big Japanese 
financial market data provider about their experience using Kafka. 
Unfortunately (thanks Corona craze!) we were prevented from attending that 
presentation, but judging from the decision, it seems they got convinced the 
problem is tackleable with Kafka, albeit I'd think with a much greater amount 
of hardware resources, and operations headaches.
About the task: This clients needs to ingest a small number of feeds with high 
throughput peaks (up to 200 to 300k msg/sec, 500 bytes/msg on average), where 
feeds can be partitioned, but in-partition order is crucial. Those feeds come 
in redundantly, and need to be deduplicated, leading to one reliable feed. The 
data needs to be made available to various clients that process the data, and 
one of those clients would be making the data available in real-time to clients 
subscribing to very small parts of the feeds End-to-end latency from ingestion 
to those clients must be kept within 300 milliseconds.
----
2020-03-10 22:18:31 UTC - Ali Ahmed: if they are japan they can be told yahoo 
japan is massive user of pulsar and they have lots oof presentations in 
japanese regarding it’s benefits.
----
2020-03-10 22:39:03 UTC - Eugen: @Ali Ahmed That is a helpful success story, 
but it did not swing the result
----
2020-03-10 23:04:03 UTC - David Kjerrumgaard: It's sad that the Kafka community 
can spin it's age as a positive.
----
2020-03-10 23:13:07 UTC - Eugen: if - as I believe - Pulsar is really that much 
better, not for that much longer though. Pulsar however still has some work to 
do in a number of areas, e.g. function state is still beta. The question of 
course is: how many stream engine features should go into Pulsar, and which 
should be left to other products (Flink? Heron?)
+1 : David Kjerrumgaard, Chris Bartholomew
----
2020-03-11 00:51:12 UTC - Greg Methvin: I think there’s a lot of opportunity 
for Pulsar to capture the market of more traditional message brokers like 
RabbitMQ. That’s our primary use case for it, and it feels close to feature 
parity. If you’re a current user of RabbitMQ, a big selling point is that you 
can migrate to Pulsar without having to significantly change the semantics of 
how you interact with the message broker. With Kafka there’s a lot of 
architectural changes you’ll have to make to switch to a streaming model, which 
can be a challenge if you have limited engineering resources.
----
2020-03-11 01:08:54 UTC - Greg Methvin: I think the most significant issues we 
had were around batching and metrics. We had to disable producer batching 
because it breaks negative acknowledgements, and it also makes backlog metrics 
no longer tell you the actual number of messages.
----
2020-03-11 01:10:37 UTC - Alexandre DUVAL: @Greg Methvin the issue about 
batching throws exceptions after some time running due to negative acks?
----
2020-03-11 01:10:45 UTC - Greg Methvin: Overall it’s been totally worth it to 
migrate to Pulsar, but understanding these differences/limitations required a 
bit of extra time and effort.
----
2020-03-11 01:11:05 UTC - Greg Methvin: @Alexandre DUVAL I’m referring to 
<https://github.com/apache/pulsar/issues/5969>
----
2020-03-11 01:11:24 UTC - Greg Methvin: there may be other issues as well, but 
that was the main one that caused pain for us, since we use nacks pretty heavily
----
2020-03-11 01:12:20 UTC - Greg Methvin: and also we use pulsar to schedule huge 
email campaigns all at once, so will often enqueue in large batches
----
2020-03-11 01:13:35 UTC - Alexandre DUVAL: ok, will follow it too, i think 
<https://github.com/apache/pulsar/issues/6195> is related, brokers should be in 
strange state on this topic
----
2020-03-11 01:13:53 UTC - Greg Methvin: cool thanks @Alexandre DUVAL
----
2020-03-11 01:14:15 UTC - Alexandre DUVAL: (not sure it's related btw)
----
2020-03-11 01:14:48 UTC - Greg Methvin: I just think it’s interesting how much 
focus there is in talking about pulsar as an alternative to kafka, whereas it’s 
probably much easier to sell it as an alternative to traditional message 
brokers like rabbitmq
----
2020-03-11 01:14:58 UTC - Greg Methvin: I think it’s both, of course
----
2020-03-11 01:15:13 UTC - Greg Methvin: but I feel like the kafka use cases are 
talked about much more
----
2020-03-11 01:17:13 UTC - Eugen: @Greg Methvin what are the kafka use cases for 
you?
----
2020-03-11 01:17:50 UTC - Greg Methvin: primarily data ingestion
----
2020-03-11 01:19:47 UTC - Greg Methvin: most of our other use cases are work 
queues: sending emails, sms, pushes, etc.
----
2020-03-11 01:20:30 UTC - Roman Popenov: RabbitMQ is a nightmare with failed 
and re-delivered messages
----
2020-03-11 01:20:53 UTC - Greg Methvin: rabbitmq is a nightmare for many 
reasons…
----
2020-03-11 01:21:17 UTC - Roman Popenov: What about RocketMQ as alternative to 
RabbitMQ?
----
2020-03-11 01:22:20 UTC - Eugen: We have both uses cases, and Pulsar seems a 
better fit overall for me personally, for a number of reasons, including 
multi-tenancy, no necessity to rebalance partitions when scaling out etc. and 
of course fsync!
----
2020-03-11 01:22:41 UTC - Greg Methvin: we did look into rocketmq, though I 
don’t recall exactly why we decided not to go with it, but definitely pulsar 
supporting both streaming and queuing was a big plus
----
2020-03-11 01:23:21 UTC - Greg Methvin: there are also a lot of use cases we 
probably could migrate to a streaming model that we’re currently using rabbitmq 
for
----
2020-03-11 01:23:52 UTC - Greg Methvin: in the sense that we can guarantee 
ordering
----
2020-03-11 01:25:13 UTC - Greg Methvin: oh, we also like pulsar because it 
supports a large number of topics
+1 : Eugen
heavy_plus_sign : Roman Popenov
----
2020-03-11 01:25:22 UTC - Greg Methvin: seems like rocketmq doesn’t do that well
----
2020-03-11 01:25:40 UTC - Greg Methvin: we’re a b2b app and having isolation 
between customers is very useful
----
2020-03-11 01:25:46 UTC - Roman Popenov: There is also the issue with message 
size, RabbitMQ can handle VERY large files
----
2020-03-11 01:26:12 UTC - Roman Popenov: I think it will be a big plus when 
support for chunking will be implemented
----
2020-03-11 01:26:18 UTC - Greg Methvin: that’s not a huge issue for us at the 
moment
----
2020-03-11 01:26:37 UTC - Greg Methvin: our average message size is less than 1k
----
2020-03-11 01:28:00 UTC - Rajan Dhabalia: @Roman Popenov what’s your usecase 
with large message size??
----
2020-03-11 01:28:30 UTC - Roman Popenov: Process email attachments and analysis 
of the data
----
2020-03-11 01:28:47 UTC - Rajan Dhabalia: We have a PR created to support this 
feature ..checking if that can solve your usecase 
----
2020-03-11 01:29:35 UTC - Roman Popenov: PIP 37?
----
2020-03-11 01:29:43 UTC - Rajan Dhabalia: Yes
----
2020-03-11 01:30:54 UTC - Roman Popenov: Is there an approximate release when 
this might see the day?
----
2020-03-11 01:32:06 UTC - Rajan Dhabalia: It’s there for a while and we can try 
to include in next coming release 
----
2020-03-11 01:35:04 UTC - Greg Methvin: I also think having many topics would 
be more useful if there was a way to do a regex subscription without it 
counting as being “subscribed” to the topic, so the topic could still get 
automatically deleted after some inactivity.
----
2020-03-11 01:35:27 UTC - Greg Methvin: I already mentioned this to @Sijie Guo 
and I think there’s an issue for it.
----
2020-03-11 01:37:33 UTC - Greg Methvin: essentially we want it so we can have 
one-time use queues that have their own rate limit and not have to worry about 
deleting them manually
----
2020-03-11 01:39:30 UTC - Sijie Guo: @Greg Methvin yeah. I think I created an 
issue for that. I think it was resolved (but need to double confirm).
+1 : Greg Methvin
----
2020-03-11 06:35:48 UTC - Prashant  Shandilya: Got query regarding I/O 
connector,


_Does Casandra I/O connector support all datatype including blob_  
please share configuration example that would help
----

Slack digest for #dev - 2020-03-11

Reply via email to