Sijie,

Thank you for your reply!
I'll check it.

Regards,
-- 
Yuri Mizushima
yumiz...@yahoo-corp.jp
 

"Sijie Guo" <guosi...@gmail.com> wrote:

    Yuri,

    Thank you for bringing this up! This is a super helpful proposal!

    The problem is very similar to what an RPC framework (like Finagle) with
    client-side load balancing has.

    An RPC framework with a client-side load-balancing mechanism needs to send
    requests across multiple nodes. If you have an RPC service that has
    thousands of nodes, there are thousands of clients connecting to that RPC
    service. How to reduce the connections and how to effectively load balance
    requests across thousands of nodes are the problems that a client-side
    loading technology needs to solve. If you think about "partition" as "node"
    and "partitioned producer" as "RPC client", the problem is exactly the
    same. Finagle (the Twitter RPC framework) has implemented a lot of 
client-side
    load-balancing algorithms
    <https://twitter.github.io/finagle/guide/Clients.html#load-balancing> and
    there are some great articles that you can reference
    
<https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html>
    .

    I agree with the direction of introducing a mechanism to reduce the number
    of producers in a partitioned topic producer. However, I have a concern
    about introducing `.numPartitionsLimit(10)` directly to the producer
    builder. It limits the possibility to implement different algorithms on
    selecting partitions.

    So instead of directly implementing the logic within the partitioned topic
    producer, I think the proposal can be broken into two parts:

    1) Introduce some kind of lazy-loading mechanism in the partitioned
    producer to initialize the producers for partitions lazily. I.e., only
    initialize a producer when the message router selects a partition.
    2) Implement a message router that only selects one or N partitions.

    In this way, the partitioned producer is only responsible for managing a
    collection of producers, and the message router is responsible for
    selecting the partitions. This allows people to be able to implement
    different message routers. We can even adopt the client-side load balancing
    algorithms from Finagle.

    Thanks,
    Sijie

    On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <yumiz...@yahoo-corp.jp>
    wrote:

    > I notice that PIP-78 has already assigned to another issue.
    >
    > 
https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E
    >
    > So, I'll change the PIP number to 79.
    >
    > 
https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
    >
    > Regards,
    > --
    > Yuri Mizushima
    > yumiz...@yahoo-corp.jp
    >
    > "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:
    >
    >     Dear Pulsar community,
    >
    >     When partitioned producer connects to partitioned topic,
    >     sometimes doesn't need to connect to all of partitions depending on
    > rate, routing mode, etc.
    >     So, I drafted a PIP about reducing redundant producers from
    > partitioned producer.
    >     I'd like to use system resources (e.g. connections between Client and
    > Broker, memory usage of both Client and Broker)
    >     more efficiently by this feature.
    >
    > 
https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer
    >
    >     Feel free to ask me any questions or suggestions, etc.
    >
    >     Best regards,
    >     --
    >     Yuri Mizushima
    >     yumiz...@yahoo-corp.jp
    >
    >
    >

Reply via email to