Sijie,

After sending previous mail, I watched meeting recording and understand about 
authn/authz issue.
Therefore, I updated the PIP document.
https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
 
Regards,
-- 
Yuri Mizushima
yumiz...@yahoo-corp.jp
 

"Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:

    Sijie,

    > If the lazy-loading approach sounds attractive to you and you like it,
    > maybe the next step is to update the PIP, what do you think?

    I think so too. I will update the PIP after discussing the authn/authz 
issue.

    Regards,
    -- 
    Yuri Mizushima
    yumiz...@yahoo-corp.jp


    "Sijie Guo" <guosi...@gmail.com> wrote:

        Hi Yuri,

        Regarding the authn/authz issue, @Matteo Merli <mme...@apache.org> can
        probably chime in more about that part.

        If the lazy-loading approach sounds attractive to you and you like it,
        maybe the next step is to update the PIP, what do you think?

        - Sijie

        On Mon, Feb 8, 2021 at 6:57 PM Yuri Mizushima <yumiz...@yahoo-corp.jp>
        wrote:

        > Michael,
        >
        > Thank you for your comment!
        >
        > > Which Pulsar Clients will benefit from this proposal?
        > I think that this proposal will be useful to any clients.
        > In my schedule, if this proposal is accepted then I will implement 
this
        > feature to Java client.
        > If needed, then implement same feature to other clients such as C++, 
Go,
        > etc.
        >
        > Regards,
        > --
        > Yuri Mizushima
        > yumiz...@yahoo-corp.jp
        >
        >
        > "Michael Marshall" <mikemars...@gmail.com> wrote:
        >
        >     Hi Yuri and Sijie,
        >
        >     I definitely like the idea of lazily creating producers as well as
        > introducing a way to provide custom routing logic.
        >
        >     Which Pulsar Clients will benefit from this proposal? I’d love to 
see
        > this feature in the go client.
        >
        >     Thanks,
        >     Michael Marshall
        >
        >     > On Feb 7, 2021, at 9:53 PM, Yuri Mizushima 
<yumiz...@yahoo-corp.jp>
        > wrote:
        >     >
        >     > Sijie,
        >     >
        >     > Thank you for sharing!
        >     >
        >     > First, I considered your suggestion.
        >     > I think these implementations sound good.
        >     >
        >     > I think we should consider the State of partitioned producer: 
Ready,
        > Connecting, etc.
        >     > Currently, partitioned producer gets "Ready" only when all 
producers
        > connect to Broker correctly.
        >     >
        > 
https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PartitionedProducerImpl.java#L146
        >     > It seems that we should change meaning of state (or change 
handling)
        > if we introduce the lazy-load feature.
        >     > To guarantee the message ordering (e.g. using partitionKey),
        > partitioned producer should stop (or don't send messages to be routed 
to
        > unavailable partition) when producer can't connect to one of 
partition.
        >     >
        >     > Secondly, I considered Matteo's comments.
        >     > I couldn't understand well about issue of authn/authz. Please 
tell
        > me more detail.
        >     >
        >     > I wrote "connection" as number of producers which connect to 
broker.
        > Also, TCP connections between partitioned producer and broker will be 
less
        > than or equal to current in some cases. I'll show a case below.
        >     >
        >     > Suppose
        >     > * cluster has Broker0, 1, 2
        >     > * partitioned topic has 5 partitions
        >     > * limit conf is 3 partitions
        >     > * loadbalance partitions as below
        >     > - Broker0: partition-0, partition-1
        >     > - Broker1: partition-2
        >     > - Broker2: partition-3, partition-4
        >     >
        >     > Currently, client will create 3 connections (Broker0, 1, 2). If
        > client uses limit conf and elects partitions such as [0, 1, 2], then 
client
        > will create 2 connections (Broker0, 1). Of course, if client elects
        > partitions such as [0, 2, 3], then client will still create 3 
connections.
        >     >
        >     > I'd like to decrease number of producers. I think that 
resources of
        > broker will be improved slightly by this feature because broker has 
list of
        > producers by some classes such as ServerCnx, AbstractTopic.
        >     >
        > 
https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1096-L1097
        >     >
        > 
https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java#L577
        >     >
        >     > In my case, unspecified number of producers will connect to the 
same
        > partitioned topic with different rate. We need to set the number of
        > partitions according to the high-rate producer.
        >     > However, on the other hand, this number is excessively large for
        > low-rate producers.
        >     > I want to reduce such redundant producers for resource 
efficiency.
        >     >
        >     > Regards,
        >     > --
        >     > Yuri Mizushima
        >     > yumiz...@yahoo-corp.jp
        >     >
        >     >
        >     > "Sijie Guo" <guosi...@gmail.com> wrote:
        >     >
        >     >  Hi Yuri,
        >     >
        >     >  In today's community meeting, Matteo shared some of his 
thoughts
        > about this
        >     >  PIP.
        >     >
        >     >  You can find some meeting notes here:
        >     >
        > 
https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE/edit#bookmark=id.rezbt4xmjxpz
        >     >
        >     >  Matteo can also chime in as well.
        >     >
        >     >  - Sijie
        >     >
        >     >>  On Sun, Jan 31, 2021 at 7:21 PM Yuri Mizushima <
        > yumiz...@yahoo-corp.jp>
        >     >>  wrote:
        >     >> Sijie,
        >     >> Thank you for your reply!
        >     >> I'll check it.
        >     >> Regards,
        >     >> --
        >     >> Yuri Mizushima
        >     >> yumiz...@yahoo-corp.jp
        >     >> "Sijie Guo" <guosi...@gmail.com> wrote:
        >     >>  Yuri,
        >     >>  Thank you for bringing this up! This is a super helpful 
proposal!
        >     >>  The problem is very similar to what an RPC framework (like 
Finagle)
        >     >> with
        >     >>  client-side load balancing has.
        >     >>  An RPC framework with a client-side load-balancing mechanism 
needs
        > to
        >     >> send
        >     >>  requests across multiple nodes. If you have an RPC service 
that has
        >     >>  thousands of nodes, there are thousands of clients connecting 
to
        > that
        >     >> RPC
        >     >>  service. How to reduce the connections and how to effectively 
load
        >     >> balance
        >     >>  requests across thousands of nodes are the problems that a
        > client-side
        >     >>  loading technology needs to solve. If you think about 
"partition"
        > as
        >     >> "node"
        >     >>  and "partitioned producer" as "RPC client", the problem is 
exactly
        > the
        >     >>  same. Finagle (the Twitter RPC framework) has implemented a 
lot of
        >     >> client-side
        >     >>  load-balancing algorithms
        >     >>  <
        > https://twitter.github.io/finagle/guide/Clients.html#load-balancing>
        >     >> and
        >     >>  there are some great articles that you can reference
        >     >>  <
        >     >>
        > 
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html
        >     >>  .
        >     >>  I agree with the direction of introducing a mechanism to 
reduce the
        >     >> number
        >     >>  of producers in a partitioned topic producer. However, I have 
a
        > concern
        >     >>  about introducing `.numPartitionsLimit(10)` directly to the
        > producer
        >     >>  builder. It limits the possibility to implement different
        > algorithms on
        >     >>  selecting partitions.
        >     >>  So instead of directly implementing the logic within the
        > partitioned
        >     >> topic
        >     >>  producer, I think the proposal can be broken into two parts:
        >     >>  1) Introduce some kind of lazy-loading mechanism in the 
partitioned
        >     >>  producer to initialize the producers for partitions lazily. 
I.e.,
        > only
        >     >>  initialize a producer when the message router selects a 
partition.
        >     >>  2) Implement a message router that only selects one or N
        > partitions.
        >     >>  In this way, the partitioned producer is only responsible for
        > managing
        >     >> a
        >     >>  collection of producers, and the message router is 
responsible for
        >     >>  selecting the partitions. This allows people to be able to
        > implement
        >     >>  different message routers. We can even adopt the client-side 
load
        >     >> balancing
        >     >>  algorithms from Finagle.
        >     >>  Thanks,
        >     >>  Sijie
        >     >>  On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <
        > yumiz...@yahoo-corp.jp
        >     >>  wrote:
        >     >>> I notice that PIP-78 has already assigned to another issue.
        >     >>
        > 
https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E
        >     >>> So, I'll change the PIP number to 79.
        >     >>
        > 
https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
        >     >>> Regards,
        >     >>> --
        >     >>> Yuri Mizushima
        >     >>> yumiz...@yahoo-corp.jp
        >     >>> "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:
        >     >>>  Dear Pulsar community,
        >     >>>  When partitioned producer connects to partitioned topic,
        >     >>>  sometimes doesn't need to connect to all of partitions 
depending
        >     >> on
        >     >>> rate, routing mode, etc.
        >     >>>  So, I drafted a PIP about reducing redundant producers from
        >     >>> partitioned producer.
        >     >>>  I'd like to use system resources (e.g. connections between
        >     >> Client and
        >     >>> Broker, memory usage of both Client and Broker)
        >     >>>  more efficiently by this feature.
        >     >>
        > 
https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer
        >     >>>  Feel free to ask me any questions or suggestions, etc.
        >     >>>  Best regards,
        >     >>>  --
        >     >>>  Yuri Mizushima
        >     >>>  yumiz...@yahoo-corp.jp
        >
        >


Reply via email to