Hi Yuri and Sijie, I definitely like the idea of lazily creating producers as well as introducing a way to provide custom routing logic.
Which Pulsar Clients will benefit from this proposal? I’d love to see this feature in the go client. Thanks, Michael Marshall > On Feb 7, 2021, at 9:53 PM, Yuri Mizushima <yumiz...@yahoo-corp.jp> wrote: > > Sijie, > > Thank you for sharing! > > First, I considered your suggestion. > I think these implementations sound good. > > I think we should consider the State of partitioned producer: Ready, > Connecting, etc. > Currently, partitioned producer gets "Ready" only when all producers connect > to Broker correctly. > https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PartitionedProducerImpl.java#L146 > It seems that we should change meaning of state (or change handling) if we > introduce the lazy-load feature. > To guarantee the message ordering (e.g. using partitionKey), partitioned > producer should stop (or don't send messages to be routed to unavailable > partition) when producer can't connect to one of partition. > > Secondly, I considered Matteo's comments. > I couldn't understand well about issue of authn/authz. Please tell me more > detail. > > I wrote "connection" as number of producers which connect to broker. Also, > TCP connections between partitioned producer and broker will be less than or > equal to current in some cases. I'll show a case below. > > Suppose > * cluster has Broker0, 1, 2 > * partitioned topic has 5 partitions > * limit conf is 3 partitions > * loadbalance partitions as below > - Broker0: partition-0, partition-1 > - Broker1: partition-2 > - Broker2: partition-3, partition-4 > > Currently, client will create 3 connections (Broker0, 1, 2). If client uses > limit conf and elects partitions such as [0, 1, 2], then client will create 2 > connections (Broker0, 1). Of course, if client elects partitions such as [0, > 2, 3], then client will still create 3 connections. > > I'd like to decrease number of producers. I think that resources of broker > will be improved slightly by this feature because broker has list of > producers by some classes such as ServerCnx, AbstractTopic. > https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1096-L1097 > https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java#L577 > > In my case, unspecified number of producers will connect to the same > partitioned topic with different rate. We need to set the number of > partitions according to the high-rate producer. > However, on the other hand, this number is excessively large for low-rate > producers. > I want to reduce such redundant producers for resource efficiency. > > Regards, > -- > Yuri Mizushima > yumiz...@yahoo-corp.jp > > > "Sijie Guo" <guosi...@gmail.com> wrote: > > Hi Yuri, > > In today's community meeting, Matteo shared some of his thoughts about this > PIP. > > You can find some meeting notes here: > > https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE/edit#bookmark=id.rezbt4xmjxpz > > Matteo can also chime in as well. > > - Sijie > >> On Sun, Jan 31, 2021 at 7:21 PM Yuri Mizushima <yumiz...@yahoo-corp.jp> >> wrote: >> Sijie, >> Thank you for your reply! >> I'll check it. >> Regards, >> -- >> Yuri Mizushima >> yumiz...@yahoo-corp.jp >> "Sijie Guo" <guosi...@gmail.com> wrote: >> Yuri, >> Thank you for bringing this up! This is a super helpful proposal! >> The problem is very similar to what an RPC framework (like Finagle) >> with >> client-side load balancing has. >> An RPC framework with a client-side load-balancing mechanism needs to >> send >> requests across multiple nodes. If you have an RPC service that has >> thousands of nodes, there are thousands of clients connecting to that >> RPC >> service. How to reduce the connections and how to effectively load >> balance >> requests across thousands of nodes are the problems that a client-side >> loading technology needs to solve. If you think about "partition" as >> "node" >> and "partitioned producer" as "RPC client", the problem is exactly the >> same. Finagle (the Twitter RPC framework) has implemented a lot of >> client-side >> load-balancing algorithms >> <https://twitter.github.io/finagle/guide/Clients.html#load-balancing> >> and >> there are some great articles that you can reference >> < >> https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html >> . >> I agree with the direction of introducing a mechanism to reduce the >> number >> of producers in a partitioned topic producer. However, I have a concern >> about introducing `.numPartitionsLimit(10)` directly to the producer >> builder. It limits the possibility to implement different algorithms on >> selecting partitions. >> So instead of directly implementing the logic within the partitioned >> topic >> producer, I think the proposal can be broken into two parts: >> 1) Introduce some kind of lazy-loading mechanism in the partitioned >> producer to initialize the producers for partitions lazily. I.e., only >> initialize a producer when the message router selects a partition. >> 2) Implement a message router that only selects one or N partitions. >> In this way, the partitioned producer is only responsible for managing >> a >> collection of producers, and the message router is responsible for >> selecting the partitions. This allows people to be able to implement >> different message routers. We can even adopt the client-side load >> balancing >> algorithms from Finagle. >> Thanks, >> Sijie >> On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <yumiz...@yahoo-corp.jp >> wrote: >>> I notice that PIP-78 has already assigned to another issue. >> https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E >>> So, I'll change the PIP number to 79. >> https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer >>> Regards, >>> -- >>> Yuri Mizushima >>> yumiz...@yahoo-corp.jp >>> "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote: >>> Dear Pulsar community, >>> When partitioned producer connects to partitioned topic, >>> sometimes doesn't need to connect to all of partitions depending >> on >>> rate, routing mode, etc. >>> So, I drafted a PIP about reducing redundant producers from >>> partitioned producer. >>> I'd like to use system resources (e.g. connections between >> Client and >>> Broker, memory usage of both Client and Broker) >>> more efficiently by this feature. >> https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer >>> Feel free to ask me any questions or suggestions, etc. >>> Best regards, >>> -- >>> Yuri Mizushima >>> yumiz...@yahoo-corp.jp