Yuri,

Thank you for bringing this up! This is a super helpful proposal!

The problem is very similar to what an RPC framework (like Finagle) with
client-side load balancing has.

An RPC framework with a client-side load-balancing mechanism needs to send
requests across multiple nodes. If you have an RPC service that has
thousands of nodes, there are thousands of clients connecting to that RPC
service. How to reduce the connections and how to effectively load balance
requests across thousands of nodes are the problems that a client-side
loading technology needs to solve. If you think about "partition" as "node"
and "partitioned producer" as "RPC client", the problem is exactly the
same. Finagle (the Twitter RPC framework) has implemented a lot of client-side
load-balancing algorithms
<https://twitter.github.io/finagle/guide/Clients.html#load-balancing> and
there are some great articles that you can reference
<https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html>
.

I agree with the direction of introducing a mechanism to reduce the number
of producers in a partitioned topic producer. However, I have a concern
about introducing `.numPartitionsLimit(10)` directly to the producer
builder. It limits the possibility to implement different algorithms on
selecting partitions.

So instead of directly implementing the logic within the partitioned topic
producer, I think the proposal can be broken into two parts:

1) Introduce some kind of lazy-loading mechanism in the partitioned
producer to initialize the producers for partitions lazily. I.e., only
initialize a producer when the message router selects a partition.
2) Implement a message router that only selects one or N partitions.

In this way, the partitioned producer is only responsible for managing a
collection of producers, and the message router is responsible for
selecting the partitions. This allows people to be able to implement
different message routers. We can even adopt the client-side load balancing
algorithms from Finagle.

Thanks,
Sijie

On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <yumiz...@yahoo-corp.jp>
wrote:

> I notice that PIP-78 has already assigned to another issue.
>
> https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E
>
> So, I'll change the PIP number to 79.
>
> https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
>
> Regards,
> --
> Yuri Mizushima
> yumiz...@yahoo-corp.jp
>
> "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:
>
>     Dear Pulsar community,
>
>     When partitioned producer connects to partitioned topic,
>     sometimes doesn't need to connect to all of partitions depending on
> rate, routing mode, etc.
>     So, I drafted a PIP about reducing redundant producers from
> partitioned producer.
>     I'd like to use system resources (e.g. connections between Client and
> Broker, memory usage of both Client and Broker)
>     more efficiently by this feature.
>
> https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer
>
>     Feel free to ask me any questions or suggestions, etc.
>
>     Best regards,
>     --
>     Yuri Mizushima
>     yumiz...@yahoo-corp.jp
>
>
>

Reply via email to