2020-03-10 15:23:51 UTC - Liam Condon: hey y'all - I'm finishing up some work on the NodeJS client to add topic schema support (and documentation for this "new" functionality) and am a bit curious about a few things. most importantly, where should serialization/deserialization actually occur for messages with Protobuf or AVRO schema types? Is it something that developers should be doing before passing the message to `producer.send` ? ---- 2020-03-10 16:01:53 UTC - Andy Papia: @Andy Papia has joined the channel ---- 2020-03-10 16:29:30 UTC - Sijie Guo: It will be the schema implemention doing the serialization and deserialization. The client configures to use a schema and the client passes in objects and the schema handles serialization and deserialization. ---- 2020-03-10 16:29:51 UTC - Sijie Guo: You can check java schema implementations as references ---- 2020-03-10 16:31:58 UTC - Liam Condon: I figured as much after reading more in the schema section of the docs, just haven't looked far enough into the cpp client code to figure out where that happens exactly... ---- 2020-03-10 16:42:28 UTC - Sijie Guo: so you need to serialize and deserialize at nodejs level and pass the serialized bytes and the schema information to the cpp side ---- 2020-03-10 16:42:49 UTC - Sijie Guo: python and go use the cpp client. so they are good examples to check. ---- 2020-03-10 16:49:17 UTC - Liam Condon: interesting - that makes more sense to me. was trying to figure out what a message would look like in a nodejs consumer on a topic with a schema if the cpp client was handling the message serialization/deserialization. ---- 2020-03-10 16:51:20 UTC - Liam Condon: but now I see that the python client is handling the message serialization/deserialization outside of binding to the c/cpp client lib using a set of helper classes ---- 2020-03-10 17:29:57 UTC - Sijie Guo: cpp client doesn’t handle the serialization and deserialization. it only handles passing schema info as part of wire protocols. ---- 2020-03-10 17:30:11 UTC - Sijie Guo: the serialization and deserialization is done at the language client level. ---- 2020-03-10 17:31:03 UTC - Liam Condon: yup, thanks for helping me figure that out. looks like I've got a bit more work to do to fully complete schema support in the nodejs client :smile: ---- 2020-03-10 17:48:49 UTC - Sijie Guo: looking forward to your contribution ---- 2020-03-10 19:32:46 UTC - Evan Furman: @Evan Furman has joined the channel ---- 2020-03-10 22:15:21 UTC - Eugen: Just an FYI - we have a client here in Japan who wants us to build their next generation stock market data feed platform. They suggested using Kafka, we suggested Pulsar, and we tried to convince them of Pulsar's benefits. And although Pulsar shines in incredibly high throughput numbers even when fsyncing, and despite Pulsar's architectural benefits (broker / bookie separation), what counted in the end for our client was Kafka's adoption and user base, i.e. a conservative decision. Before making the decision, they also got a presentation by a big Japanese financial market data provider about their experience using Kafka. Unfortunately (thanks Corona craze!) we were prevented from attending that presentation, but judging from the decision, it seems they got convinced the problem is tackleable with Kafka, albeit I'd think with a much greater amount of hardware resources, and operations headaches. About the task: This clients needs to ingest a small number of feeds with high throughput peaks (up to 200 to 300k msg/sec, 500 bytes/msg on average), where feeds can be partitioned, but in-partition order is crucial. Those feeds come in redundantly, and need to be deduplicated, leading to one reliable feed. The data needs to be made available to various clients that process the data, and one of those clients would be making the data available in real-time to clients subscribing to very small parts of the feeds End-to-end latency from ingestion to those clients must be kept within 300 milliseconds. ---- 2020-03-10 22:18:31 UTC - Ali Ahmed: if they are japan they can be told yahoo japan is massive user of pulsar and they have lots oof presentations in japanese regarding it’s benefits. ---- 2020-03-10 22:39:03 UTC - Eugen: @Ali Ahmed That is a helpful success story, but it did not swing the result ---- 2020-03-10 23:04:03 UTC - David Kjerrumgaard: It's sad that the Kafka community can spin it's age as a positive. ---- 2020-03-10 23:13:07 UTC - Eugen: if - as I believe - Pulsar is really that much better, not for that much longer though. Pulsar however still has some work to do in a number of areas, e.g. function state is still beta. The question of course is: how many stream engine features should go into Pulsar, and which should be left to other products (Flink? Heron?) +1 : David Kjerrumgaard, Chris Bartholomew ---- 2020-03-11 00:51:12 UTC - Greg Methvin: I think there’s a lot of opportunity for Pulsar to capture the market of more traditional message brokers like RabbitMQ. That’s our primary use case for it, and it feels close to feature parity. If you’re a current user of RabbitMQ, a big selling point is that you can migrate to Pulsar without having to significantly change the semantics of how you interact with the message broker. With Kafka there’s a lot of architectural changes you’ll have to make to switch to a streaming model, which can be a challenge if you have limited engineering resources. ---- 2020-03-11 01:08:54 UTC - Greg Methvin: I think the most significant issues we had were around batching and metrics. We had to disable producer batching because it breaks negative acknowledgements, and it also makes backlog metrics no longer tell you the actual number of messages. ---- 2020-03-11 01:10:37 UTC - Alexandre DUVAL: @Greg Methvin the issue about batching throws exceptions after some time running due to negative acks? ---- 2020-03-11 01:10:45 UTC - Greg Methvin: Overall it’s been totally worth it to migrate to Pulsar, but understanding these differences/limitations required a bit of extra time and effort. ---- 2020-03-11 01:11:05 UTC - Greg Methvin: @Alexandre DUVAL I’m referring to <https://github.com/apache/pulsar/issues/5969> ---- 2020-03-11 01:11:24 UTC - Greg Methvin: there may be other issues as well, but that was the main one that caused pain for us, since we use nacks pretty heavily ---- 2020-03-11 01:12:20 UTC - Greg Methvin: and also we use pulsar to schedule huge email campaigns all at once, so will often enqueue in large batches ---- 2020-03-11 01:13:35 UTC - Alexandre DUVAL: ok, will follow it too, i think <https://github.com/apache/pulsar/issues/6195> is related, brokers should be in strange state on this topic ---- 2020-03-11 01:13:53 UTC - Greg Methvin: cool thanks @Alexandre DUVAL ---- 2020-03-11 01:14:15 UTC - Alexandre DUVAL: (not sure it's related btw) ---- 2020-03-11 01:14:48 UTC - Greg Methvin: I just think it’s interesting how much focus there is in talking about pulsar as an alternative to kafka, whereas it’s probably much easier to sell it as an alternative to traditional message brokers like rabbitmq ---- 2020-03-11 01:14:58 UTC - Greg Methvin: I think it’s both, of course ---- 2020-03-11 01:15:13 UTC - Greg Methvin: but I feel like the kafka use cases are talked about much more ---- 2020-03-11 01:17:13 UTC - Eugen: @Greg Methvin what are the kafka use cases for you? ---- 2020-03-11 01:17:50 UTC - Greg Methvin: primarily data ingestion ---- 2020-03-11 01:19:47 UTC - Greg Methvin: most of our other use cases are work queues: sending emails, sms, pushes, etc. ---- 2020-03-11 01:20:30 UTC - Roman Popenov: RabbitMQ is a nightmare with failed and re-delivered messages ---- 2020-03-11 01:20:53 UTC - Greg Methvin: rabbitmq is a nightmare for many reasons… ---- 2020-03-11 01:21:17 UTC - Roman Popenov: What about RocketMQ as alternative to RabbitMQ? ---- 2020-03-11 01:22:20 UTC - Eugen: We have both uses cases, and Pulsar seems a better fit overall for me personally, for a number of reasons, including multi-tenancy, no necessity to rebalance partitions when scaling out etc. and of course fsync! ---- 2020-03-11 01:22:41 UTC - Greg Methvin: we did look into rocketmq, though I don’t recall exactly why we decided not to go with it, but definitely pulsar supporting both streaming and queuing was a big plus ---- 2020-03-11 01:23:21 UTC - Greg Methvin: there are also a lot of use cases we probably could migrate to a streaming model that we’re currently using rabbitmq for ---- 2020-03-11 01:23:52 UTC - Greg Methvin: in the sense that we can guarantee ordering ---- 2020-03-11 01:25:13 UTC - Greg Methvin: oh, we also like pulsar because it supports a large number of topics +1 : Eugen heavy_plus_sign : Roman Popenov ---- 2020-03-11 01:25:22 UTC - Greg Methvin: seems like rocketmq doesn’t do that well ---- 2020-03-11 01:25:40 UTC - Greg Methvin: we’re a b2b app and having isolation between customers is very useful ---- 2020-03-11 01:25:46 UTC - Roman Popenov: There is also the issue with message size, RabbitMQ can handle VERY large files ---- 2020-03-11 01:26:12 UTC - Roman Popenov: I think it will be a big plus when support for chunking will be implemented ---- 2020-03-11 01:26:18 UTC - Greg Methvin: that’s not a huge issue for us at the moment ---- 2020-03-11 01:26:37 UTC - Greg Methvin: our average message size is less than 1k ---- 2020-03-11 01:28:00 UTC - Rajan Dhabalia: @Roman Popenov what’s your usecase with large message size?? ---- 2020-03-11 01:28:30 UTC - Roman Popenov: Process email attachments and analysis of the data ---- 2020-03-11 01:28:47 UTC - Rajan Dhabalia: We have a PR created to support this feature ..checking if that can solve your usecase ---- 2020-03-11 01:29:35 UTC - Roman Popenov: PIP 37? ---- 2020-03-11 01:29:43 UTC - Rajan Dhabalia: Yes ---- 2020-03-11 01:30:54 UTC - Roman Popenov: Is there an approximate release when this might see the day? ---- 2020-03-11 01:32:06 UTC - Rajan Dhabalia: It’s there for a while and we can try to include in next coming release ---- 2020-03-11 01:35:04 UTC - Greg Methvin: I also think having many topics would be more useful if there was a way to do a regex subscription without it counting as being “subscribed” to the topic, so the topic could still get automatically deleted after some inactivity. ---- 2020-03-11 01:35:27 UTC - Greg Methvin: I already mentioned this to @Sijie Guo and I think there’s an issue for it. ---- 2020-03-11 01:37:33 UTC - Greg Methvin: essentially we want it so we can have one-time use queues that have their own rate limit and not have to worry about deleting them manually ---- 2020-03-11 01:39:30 UTC - Sijie Guo: @Greg Methvin yeah. I think I created an issue for that. I think it was resolved (but need to double confirm). +1 : Greg Methvin ---- 2020-03-11 06:35:48 UTC - Prashant Shandilya: Got query regarding I/O connector,
_Does Casandra I/O connector support all datatype including blob_ please share configuration example that would help ----