Re: [DISCUSS] PIP-385 Add rate limit semantics to pulsar protocol and Java client

Lari Hotari Fri, 25 Oct 2024 13:22:01 -0700

Thanks for the great progress, Girish.
I apologize for the delayed feedback due to Pulsar 4.0 release activities. I'll 
follow up in more detail next week.


The Pulsar 4.0 blog post mentions: "While Pulsar already supports producer rate 
limiting, the community is building on this foundation with PIP-385 to improve 
producer flow control — a key piece in completing Pulsar's end-to-end QoS 
capabilities." The post explains why rate limiting serves as a foundation for 
QoS controls and capacity management in multi-tenant environments.

You can find the blog post here:
https://pulsar.apache.org/blog/2024/10/24/announcing-apache-pulsar-4-0/#enhanced-quality-of-service-qos-controls

-Lari

On 2024/10/10 11:23:30 Girish Sharma wrote:
> I've updated the proposal with suggestions from Lari about utilization
> based rate limit exceptions on clients, along with a minor change in the
> blocking section to ensure ordering is maintained.
> Please have a look again.
> 
> 
> Regarding this comment:
> 
> > Well, even if we have throttle producer protocol, if client app is keep
> > producing messages then client app will see high timeout and to fast fail
> > this issue, Pulsar Client has internal producer queue and client can
> always
> > tune that queue. once that queue is fail, client can configure to fast
> fail
> > or wait by blocking client thread but in both ways, client application
> will
> > be aware that publish is taking longer time and client app can always do
> > backoff if needed, So, this is very well known issue and it's already
> > solved in Pulsar.
> 
> The core issue here is about communicating back to the client about
> throttling, which is missing today.
> Yes, clients can tune their send timeouts and pending queue size and rely
> solely on timeouts, but that wastes a lot of resources..
> If clients were aware of throttling, i.e. the server is not reading any
> more messages anyway, then the client can make smart decisions to fail fast
> etc.
> 
> For example, suppose a client has a contract with its upstream components
> about retries, then when the client is well aware of throttling, it can
> inform its upstream about the same as well and fail fast rather than
> holding on pending connections until the timeout. This is especially true
> when a REST bus system is using pulsar as a backend and the HTTP call does
> not exit until a send receipt from pulsar is received.
> 
> Moreover, if you now combine this "rely on pending queue size to fail fast
> or block" approach with "separate client per topic or partition to
> segregate TCP connection" approach, it leads to more issues, specifically
> around memory usage. If an app has to produce to 100 partitions, it now has
> to divide the available memory it has by 100 while setting for each
> individual pulsar client. This may be very suboptimal. Or otherwise, the
> app will have to make some assumptions and oversubscribe the available
> memory between those 100 clients which can lead to OOM if many partitions
> are throttling.
> 
> Hope this helps and gives more context around how the PIP is useful.
> 
> Regards
> 
> On Sat, Oct 5, 2024 at 12:53 PM Girish Sharma <scrapmachi...@gmail.com>
> wrote:
> 
> > Hi Rajan,
> > Thanks for taking the time and going through the PIP.
> >
> > >>> Well, even if we have throttle producer protocol, if client app is keep
> > >>> producing messages then client app will see high timeout and to fast
> > fail
> > >>> this issue, Pulsar Client has internal producer queue and client can
> > always
> > >>> tune that queue. once that queue is fail, client can configure to fast
> > fail
> > >>> or wait by blocking client thread but in both ways, client application
> > will
> > >>> be aware that publish is taking longer time and client app can always
> > do
> > >>> backoff if needed, So, this is very well known issue and it's already
> > >>> solved in Pulsar.
> > Your github comments are missing this point about the client timeout,
> > producer queue etc. Could you please paste it there itself so that we can
> > keep the discussion contained at one place?
> >
> > Regards
> >
> > On Sat, Oct 5, 2024 at 4:58 AM Rajan Dhabalia <rdhaba...@apache.org>
> > wrote:
> >
> >> Hi Girish,
> >>
> >> I have gone through the proposal and you mentioned few problems as a
> >> motivation of this improvements
> >>
> >> >> Noisy neighbors - Even if one topic is exceeding the quota, since the
> >> entire channel read is paused, all topics sharing the same connect (for
> >> example - using the same java client object) get rate limited.
> >>
> >> I don't think it's a noisy neighbor issue. There are many ways: clients
> >> can
> >> use a separate connection for different topics by increasing the number of
> >> connections and more specifically create Cache of PulsarClient objects to
> >> manage topics belonging to different usecases. If you use one channel for
> >> different tenants/usecases and if they get impacted then it's not a noisy
> >> neighbor but the application might need design improvement.
> >> For example: If client app use the same topic for different usecases then
> >> all usecases can be impacted by each other, and that doesn't mean Pulsar
> >> has a noisy neighbor issue but it needs a design change to use separate
> >> topics for each usecase. So, this challenge is easily achievable.
> >>
> >> >> Unaware clients - clients are completely unaware that they are being
> >> rate limited. This leads to all send calls taking super long time or
> >> simply
> >> timing out... they can either fail fast or induce back-pressure to their
> >> upstream.
> >>
> >> Well, even if we have throttle producer protocol, if client app is keep
> >> producing messages then client app will see high timeout and to fast fail
> >> this issue, Pulsar Client has internal producer queue and client can
> >> always
> >> tune that queue. once that queue is fail, client can configure to fast
> >> fail
> >> or wait by blocking client thread but in both ways, client application
> >> will
> >> be aware that publish is taking longer time and client app can always do
> >> backoff if needed, So, this is very well known issue and it's already
> >> solved in Pulsar.
> >>
> >> and we should have server side metrics for topic throttling which should
> >> give a clear picture of msgRate and throttling for any further debugging.
> >>
> >> So, I think every issue is already addressed and I don't see any specific
> >> need for these issue.
> >>
> >> Thanks,
> >> Rajan
> >>
> >>
> >>
> >> On Fri, Oct 4, 2024 at 3:45 PM Lari Hotari <lhot...@apache.org> wrote:
> >>
> >> > Great work on this proposal, Girish!
> >> >
> >> > This improvement addresses a crucial aspect of Pulsar's functionality.
> >> > You're effectively bridging an important gap in Pulsar's producer flow
> >> > control. This addition will improve the ability to set and meet SLAs
> >> across
> >> > various Pulsar use cases, which is invaluable for many of our users.
> >> >
> >> > Thank you for driving this important improvement. It's contributions
> >> like
> >> > these that continue to enhance Pulsar's robustness and flexibility.
> >> >
> >> > Looking forward to seeing this develop further.
> >> >
> >> > -Lari
> >> >
> >> > On 2024/10/04 14:48:09 Girish Sharma wrote:
> >> > > Hello Pulsar Community,
> >> > >
> >> > > I would like to propose a new improvement for Pulsar protocol related
> >> to
> >> > > rate limiting that the broker imposes to maintain quality of service.
> >> > This
> >> > > proposal adds a new binary protocol command pair and corresponding
> >> server
> >> > > and java client changes. With the new protocol command, clients would
> >> be
> >> > > able to understand that they are breaching the quota for a topic and
> >> take
> >> > > action accordingly.
> >> > >
> >> > > The full proposal can be found at
> >> > > https://github.com/apache/pulsar/pull/23398
> >> > > Direct link to rendered markdown with mermaid flowcharts -
> >> > > https://github.com/grssam/pulsar/blob/rl-protocol/pip/pip-385.md
> >> > >
> >> > > Please share your thoughts on this proposal along with any concerns or
> >> > > suggestions.
> >> > >
> >> > > Regards
> >> > > --
> >> > > Girish Sharma
> >> > >
> >> >
> >>
> >
> >
> > --
> > Girish Sharma
> >
> 
> 
> -- 
> Girish Sharma
>

Re: [DISCUSS] PIP-385 Add rate limit semantics to pulsar protocol and Java client

Reply via email to