Hi Yuri,

Regarding the authn/authz issue, @Matteo Merli <mme...@apache.org> can
probably chime in more about that part.

If the lazy-loading approach sounds attractive to you and you like it,
maybe the next step is to update the PIP, what do you think?

- Sijie

On Mon, Feb 8, 2021 at 6:57 PM Yuri Mizushima <yumiz...@yahoo-corp.jp>
wrote:

> Michael,
>
> Thank you for your comment!
>
> > Which Pulsar Clients will benefit from this proposal?
> I think that this proposal will be useful to any clients.
> In my schedule, if this proposal is accepted then I will implement this
> feature to Java client.
> If needed, then implement same feature to other clients such as C++, Go,
> etc.
>
> Regards,
> --
> Yuri Mizushima
> yumiz...@yahoo-corp.jp
>
>
> "Michael Marshall" <mikemars...@gmail.com> wrote:
>
>     Hi Yuri and Sijie,
>
>     I definitely like the idea of lazily creating producers as well as
> introducing a way to provide custom routing logic.
>
>     Which Pulsar Clients will benefit from this proposal? I’d love to see
> this feature in the go client.
>
>     Thanks,
>     Michael Marshall
>
>     > On Feb 7, 2021, at 9:53 PM, Yuri Mizushima <yumiz...@yahoo-corp.jp>
> wrote:
>     >
>     > Sijie,
>     >
>     > Thank you for sharing!
>     >
>     > First, I considered your suggestion.
>     > I think these implementations sound good.
>     >
>     > I think we should consider the State of partitioned producer: Ready,
> Connecting, etc.
>     > Currently, partitioned producer gets "Ready" only when all producers
> connect to Broker correctly.
>     >
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PartitionedProducerImpl.java#L146
>     > It seems that we should change meaning of state (or change handling)
> if we introduce the lazy-load feature.
>     > To guarantee the message ordering (e.g. using partitionKey),
> partitioned producer should stop (or don't send messages to be routed to
> unavailable partition) when producer can't connect to one of partition.
>     >
>     > Secondly, I considered Matteo's comments.
>     > I couldn't understand well about issue of authn/authz. Please tell
> me more detail.
>     >
>     > I wrote "connection" as number of producers which connect to broker.
> Also, TCP connections between partitioned producer and broker will be less
> than or equal to current in some cases. I'll show a case below.
>     >
>     > Suppose
>     > * cluster has Broker0, 1, 2
>     > * partitioned topic has 5 partitions
>     > * limit conf is 3 partitions
>     > * loadbalance partitions as below
>     > - Broker0: partition-0, partition-1
>     > - Broker1: partition-2
>     > - Broker2: partition-3, partition-4
>     >
>     > Currently, client will create 3 connections (Broker0, 1, 2). If
> client uses limit conf and elects partitions such as [0, 1, 2], then client
> will create 2 connections (Broker0, 1). Of course, if client elects
> partitions such as [0, 2, 3], then client will still create 3 connections.
>     >
>     > I'd like to decrease number of producers. I think that resources of
> broker will be improved slightly by this feature because broker has list of
> producers by some classes such as ServerCnx, AbstractTopic.
>     >
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1096-L1097
>     >
> https://github.com/apache/pulsar/blob/fa41d02bebfd841767846240f3ae574047f118f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java#L577
>     >
>     > In my case, unspecified number of producers will connect to the same
> partitioned topic with different rate. We need to set the number of
> partitions according to the high-rate producer.
>     > However, on the other hand, this number is excessively large for
> low-rate producers.
>     > I want to reduce such redundant producers for resource efficiency.
>     >
>     > Regards,
>     > --
>     > Yuri Mizushima
>     > yumiz...@yahoo-corp.jp
>     >
>     >
>     > "Sijie Guo" <guosi...@gmail.com> wrote:
>     >
>     >  Hi Yuri,
>     >
>     >  In today's community meeting, Matteo shared some of his thoughts
> about this
>     >  PIP.
>     >
>     >  You can find some meeting notes here:
>     >
> https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE/edit#bookmark=id.rezbt4xmjxpz
>     >
>     >  Matteo can also chime in as well.
>     >
>     >  - Sijie
>     >
>     >>  On Sun, Jan 31, 2021 at 7:21 PM Yuri Mizushima <
> yumiz...@yahoo-corp.jp>
>     >>  wrote:
>     >> Sijie,
>     >> Thank you for your reply!
>     >> I'll check it.
>     >> Regards,
>     >> --
>     >> Yuri Mizushima
>     >> yumiz...@yahoo-corp.jp
>     >> "Sijie Guo" <guosi...@gmail.com> wrote:
>     >>  Yuri,
>     >>  Thank you for bringing this up! This is a super helpful proposal!
>     >>  The problem is very similar to what an RPC framework (like Finagle)
>     >> with
>     >>  client-side load balancing has.
>     >>  An RPC framework with a client-side load-balancing mechanism needs
> to
>     >> send
>     >>  requests across multiple nodes. If you have an RPC service that has
>     >>  thousands of nodes, there are thousands of clients connecting to
> that
>     >> RPC
>     >>  service. How to reduce the connections and how to effectively load
>     >> balance
>     >>  requests across thousands of nodes are the problems that a
> client-side
>     >>  loading technology needs to solve. If you think about "partition"
> as
>     >> "node"
>     >>  and "partitioned producer" as "RPC client", the problem is exactly
> the
>     >>  same. Finagle (the Twitter RPC framework) has implemented a lot of
>     >> client-side
>     >>  load-balancing algorithms
>     >>  <
> https://twitter.github.io/finagle/guide/Clients.html#load-balancing>
>     >> and
>     >>  there are some great articles that you can reference
>     >>  <
>     >>
> https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/daperture-load-balancer.html
>     >>  .
>     >>  I agree with the direction of introducing a mechanism to reduce the
>     >> number
>     >>  of producers in a partitioned topic producer. However, I have a
> concern
>     >>  about introducing `.numPartitionsLimit(10)` directly to the
> producer
>     >>  builder. It limits the possibility to implement different
> algorithms on
>     >>  selecting partitions.
>     >>  So instead of directly implementing the logic within the
> partitioned
>     >> topic
>     >>  producer, I think the proposal can be broken into two parts:
>     >>  1) Introduce some kind of lazy-loading mechanism in the partitioned
>     >>  producer to initialize the producers for partitions lazily. I.e.,
> only
>     >>  initialize a producer when the message router selects a partition.
>     >>  2) Implement a message router that only selects one or N
> partitions.
>     >>  In this way, the partitioned producer is only responsible for
> managing
>     >> a
>     >>  collection of producers, and the message router is responsible for
>     >>  selecting the partitions. This allows people to be able to
> implement
>     >>  different message routers. We can even adopt the client-side load
>     >> balancing
>     >>  algorithms from Finagle.
>     >>  Thanks,
>     >>  Sijie
>     >>  On Wed, Jan 27, 2021 at 7:18 PM Yuri Mizushima <
> yumiz...@yahoo-corp.jp
>     >>  wrote:
>     >>> I notice that PIP-78 has already assigned to another issue.
>     >>
> https://mail-archives.apache.org/mod_mbox/pulsar-dev/202101.mbox/%3CCAG%3DTQOrPH49v9ToDE_aeQzEiDC%2BEgSR61ERoqanpWfQGvEB_Vw%40mail.gmail.com%3E
>     >>> So, I'll change the PIP number to 79.
>     >>
> https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
>     >>> Regards,
>     >>> --
>     >>> Yuri Mizushima
>     >>> yumiz...@yahoo-corp.jp
>     >>> "Yuri Mizushima" <yumiz...@yahoo-corp.jp> wrote:
>     >>>  Dear Pulsar community,
>     >>>  When partitioned producer connects to partitioned topic,
>     >>>  sometimes doesn't need to connect to all of partitions depending
>     >> on
>     >>> rate, routing mode, etc.
>     >>>  So, I drafted a PIP about reducing redundant producers from
>     >>> partitioned producer.
>     >>>  I'd like to use system resources (e.g. connections between
>     >> Client and
>     >>> Broker, memory usage of both Client and Broker)
>     >>>  more efficiently by this feature.
>     >>
> https://github.com/apache/pulsar/wiki/PIP-78%3A-Reduce-redundant-producers-from-partitioned-producer
>     >>>  Feel free to ask me any questions or suggestions, etc.
>     >>>  Best regards,
>     >>>  --
>     >>>  Yuri Mizushima
>     >>>  yumiz...@yahoo-corp.jp
>
>

Reply via email to