Hi Yuri, In today's community meeting, Matteo shared some of his thoughts about this PIP.
You can find some meeting notes here: https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE/edit#bookmark=id.rezbt4xmjxpz Matteo can also chime in as well. - Sijie On Sun, Jan 31, 2021 at 7:21 PM Yuri Mizushima <yumiz...@yahoo-corp.jp> wrote: > Sijie, > > Thank you for your reply! > I'll check it. > > Regards, > -- > Yuri Mizushima > yumiz...@yahoo-corp.jp > > > "Sijie Guo" <guosi...@gmail.com> wrote: > > Yuri, > > Thank you for bringing this up! This is a super helpful proposal! > > The problem is very similar to what an RPC framework (like Finagle) > with > client-side load balancing has. > > An RPC framework with a client-side load-balancing mechanism needs to > send > requests across multiple nodes. If you have an RPC service that has > thousands of nodes, there are thousands of clients connecting to that > RPC > service. How to reduce the connections and how to effectively load > balance > requests across thousands of nodes are the problems that a client-side > loading technology needs to solve. If you think about "partition" as > "node" > and "partitioned producer" as "RPC client", the problem is exactly the > same. Finagle (the Twitter RPC framework) has implemented a lot of > client-side > load-balancing algorithms > <https://twitter.github.io/finagle/guide/Clients.html#load-balancing> > and > there are some great articles that you can reference > < > https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html > > > . > > I agree with the direction of introducing a mechanism to reduce the > number > of producers in a partitioned topic producer. However, I have a concern > about introducing `.numPartitionsLimit(10)` directly to the producer > builder. It limits the possibility to implement different algorithms on > selecting partitions. > > So instead of directly implementing the logic within the partitioned > topic > producer, I think the proposal can be broken into two parts: > > 1) Introduce some kind of lazy-loading mechanism in the partitioned > producer to initialize the producers for partitions lazily. I.e., only > initialize a producer when the message router selects a partition. > 2) Implement a message router that only selects one or N partitions. > > In this way, the partitioned producer is only responsible for managing > a > collection of producers, and the message router is responsible for > selecting the partitions. This allows people to be able to implement > different message routers. We can even adopt the client-side load > balancing > algorithms from Finagle. > > Thanks, > Sijie > > On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <yumiz...@yahoo-corp.jp > > > wrote: > > > I notice that PIP-78 has already assigned to another issue. > > > > > https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E > > > > So, I'll change the PIP number to 79. > > > > > https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer > > > > Regards, > > -- > > Yuri Mizushima > > yumiz...@yahoo-corp.jp > > > > "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote: > > > > Dear Pulsar community, > > > > When partitioned producer connects to partitioned topic, > > sometimes doesn't need to connect to all of partitions depending > on > > rate, routing mode, etc. > > So, I drafted a PIP about reducing redundant producers from > > partitioned producer. > > I'd like to use system resources (e.g. connections between > Client and > > Broker, memory usage of both Client and Broker) > > more efficiently by this feature. > > > > > https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer > > > > Feel free to ask me any questions or suggestions, etc. > > > > Best regards, > > -- > > Yuri Mizushima > > yumiz...@yahoo-corp.jp > > > > > > > >