Hi Yuri and Sijie,

I definitely like the idea of lazily creating producers as well as introducing 
a way to provide custom routing logic.

Which Pulsar Clients will benefit from this proposal? I’d love to see this 
feature in the go client.

Thanks,
Michael Marshall

> On Feb 7, 2021, at 9:53 PM, Yuri Mizushima <yumiz...@yahoo-corp.jp> wrote:
> 
> Sijie,
> 
> Thank you for sharing!
> 
> First, I considered your suggestion.
> I think these implementations sound good.
> 
> I think we should consider the State of partitioned producer: Ready, 
> Connecting, etc.
> Currently, partitioned producer gets "Ready" only when all producers connect 
> to Broker correctly.
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PartitionedProducerImpl.java#L146
> It seems that we should change meaning of state (or change handling) if we 
> introduce the lazy-load feature.
> To guarantee the message ordering (e.g. using partitionKey), partitioned 
> producer should stop (or don't send messages to be routed to unavailable 
> partition) when producer can't connect to one of partition.
> 
> Secondly, I considered Matteo's comments.
> I couldn't understand well about issue of authn/authz. Please tell me more 
> detail.
> 
> I wrote "connection" as number of producers which connect to broker. Also, 
> TCP connections between partitioned producer and broker will be less than or 
> equal to current in some cases. I'll show a case below.
> 
> Suppose
> * cluster has Broker0, 1, 2
> * partitioned topic has 5 partitions
> * limit conf is 3 partitions
> * loadbalance partitions as below
> - Broker0: partition-0, partition-1
> - Broker1: partition-2
> - Broker2: partition-3, partition-4
> 
> Currently, client will create 3 connections (Broker0, 1, 2). If client uses 
> limit conf and elects partitions such as [0, 1, 2], then client will create 2 
> connections (Broker0, 1). Of course, if client elects partitions such as [0, 
> 2, 3], then client will still create 3 connections.
> 
> I'd like to decrease number of producers. I think that resources of broker 
> will be improved slightly by this feature because broker has list of 
> producers by some classes such as ServerCnx, AbstractTopic.
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1096-L1097
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java#L577
> 
> In my case, unspecified number of producers will connect to the same 
> partitioned topic with different rate. We need to set the number of 
> partitions according to the high-rate producer.
> However, on the other hand, this number is excessively large for low-rate 
> producers.
> I want to reduce such redundant producers for resource efficiency.
> 
> Regards,
> -- 
> Yuri Mizushima
> yumiz...@yahoo-corp.jp
> 
> 
> "Sijie Guo" <guosi...@gmail.com> wrote:
> 
>  Hi Yuri,
> 
>  In today's community meeting, Matteo shared some of his thoughts about this
>  PIP.
> 
>  You can find some meeting notes here:
>  
> https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE/edit#bookmark=id.rezbt4xmjxpz
> 
>  Matteo can also chime in as well.
> 
>  - Sijie
> 
>>  On Sun, Jan 31, 2021 at 7:21 PM Yuri Mizushima <yumiz...@yahoo-corp.jp>
>>  wrote:
>> Sijie,
>> Thank you for your reply!
>> I'll check it.
>> Regards,
>> --
>> Yuri Mizushima
>> yumiz...@yahoo-corp.jp
>> "Sijie Guo" <guosi...@gmail.com> wrote:
>>  Yuri,
>>  Thank you for bringing this up! This is a super helpful proposal!
>>  The problem is very similar to what an RPC framework (like Finagle)
>> with
>>  client-side load balancing has.
>>  An RPC framework with a client-side load-balancing mechanism needs to
>> send
>>  requests across multiple nodes. If you have an RPC service that has
>>  thousands of nodes, there are thousands of clients connecting to that
>> RPC
>>  service. How to reduce the connections and how to effectively load
>> balance
>>  requests across thousands of nodes are the problems that a client-side
>>  loading technology needs to solve. If you think about "partition" as
>> "node"
>>  and "partitioned producer" as "RPC client", the problem is exactly the
>>  same. Finagle (the Twitter RPC framework) has implemented a lot of
>> client-side
>>  load-balancing algorithms
>>  <https://twitter.github.io/finagle/guide/Clients.html#load-balancing>
>> and
>>  there are some great articles that you can reference
>>  <
>> https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html
>>  .
>>  I agree with the direction of introducing a mechanism to reduce the
>> number
>>  of producers in a partitioned topic producer. However, I have a concern
>>  about introducing `.numPartitionsLimit(10)` directly to the producer
>>  builder. It limits the possibility to implement different algorithms on
>>  selecting partitions.
>>  So instead of directly implementing the logic within the partitioned
>> topic
>>  producer, I think the proposal can be broken into two parts:
>>  1) Introduce some kind of lazy-loading mechanism in the partitioned
>>  producer to initialize the producers for partitions lazily. I.e., only
>>  initialize a producer when the message router selects a partition.
>>  2) Implement a message router that only selects one or N partitions.
>>  In this way, the partitioned producer is only responsible for managing
>> a
>>  collection of producers, and the message router is responsible for
>>  selecting the partitions. This allows people to be able to implement
>>  different message routers. We can even adopt the client-side load
>> balancing
>>  algorithms from Finagle.
>>  Thanks,
>>  Sijie
>>  On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <yumiz...@yahoo-corp.jp
>>  wrote:
>>> I notice that PIP-78 has already assigned to another issue.
>> https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E
>>> So, I'll change the PIP number to 79.
>> https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
>>> Regards,
>>> --
>>> Yuri Mizushima
>>> yumiz...@yahoo-corp.jp
>>> "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:
>>>  Dear Pulsar community,
>>>  When partitioned producer connects to partitioned topic,
>>>  sometimes doesn't need to connect to all of partitions depending
>> on
>>> rate, routing mode, etc.
>>>  So, I drafted a PIP about reducing redundant producers from
>>> partitioned producer.
>>>  I'd like to use system resources (e.g. connections between
>> Client and
>>> Broker, memory usage of both Client and Broker)
>>>  more efficiently by this feature.
>> https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer
>>>  Feel free to ask me any questions or suggestions, etc.
>>>  Best regards,
>>>  --
>>>  Yuri Mizushima
>>>  yumiz...@yahoo-corp.jp

Reply via email to