Hi Girish, I apologize for the delayed response. I have added my comments on the proposal with few questions and suggestions.
Thank, Rajan On Tue, Oct 29, 2024 at 10:48 PM Girish Sharma <scrapmachi...@gmail.com> wrote: > Hello everyone, gentle reminder. > > If there are no further comments, then please close the github comments as > resolved so that the PR can be merged after voting. > > Regards > > On Sat, Oct 26, 2024 at 1:52 AM Lari Hotari <lhot...@apache.org> wrote: > > > Thanks for the great progress, Girish. > > I apologize for the delayed feedback due to Pulsar 4.0 release > activities. > > I'll follow up in more detail next week. > > > > The Pulsar 4.0 blog post mentions: "While Pulsar already supports > producer > > rate limiting, the community is building on this foundation with PIP-385 > to > > improve producer flow control — a key piece in completing Pulsar's > > end-to-end QoS capabilities." The post explains why rate limiting serves > as > > a foundation for QoS controls and capacity management in multi-tenant > > environments. > > > > You can find the blog post here: > > > > > https://pulsar.apache.org/blog/2024/10/24/announcing-apache-pulsar-4-0/#enhanced-quality-of-service-qos-controls > > > > -Lari > > > > On 2024/10/10 11:23:30 Girish Sharma wrote: > > > I've updated the proposal with suggestions from Lari about utilization > > > based rate limit exceptions on clients, along with a minor change in > the > > > blocking section to ensure ordering is maintained. > > > Please have a look again. > > > > > > > > > Regarding this comment: > > > > > > > Well, even if we have throttle producer protocol, if client app is > keep > > > > producing messages then client app will see high timeout and to fast > > fail > > > > this issue, Pulsar Client has internal producer queue and client can > > > always > > > > tune that queue. once that queue is fail, client can configure to > fast > > > fail > > > > or wait by blocking client thread but in both ways, client > application > > > will > > > > be aware that publish is taking longer time and client app can always > > do > > > > backoff if needed, So, this is very well known issue and it's already > > > > solved in Pulsar. > > > > > > The core issue here is about communicating back to the client about > > > throttling, which is missing today. > > > Yes, clients can tune their send timeouts and pending queue size and > rely > > > solely on timeouts, but that wastes a lot of resources.. > > > If clients were aware of throttling, i.e. the server is not reading any > > > more messages anyway, then the client can make smart decisions to fail > > fast > > > etc. > > > > > > For example, suppose a client has a contract with its upstream > components > > > about retries, then when the client is well aware of throttling, it can > > > inform its upstream about the same as well and fail fast rather than > > > holding on pending connections until the timeout. This is especially > true > > > when a REST bus system is using pulsar as a backend and the HTTP call > > does > > > not exit until a send receipt from pulsar is received. > > > > > > Moreover, if you now combine this "rely on pending queue size to fail > > fast > > > or block" approach with "separate client per topic or partition to > > > segregate TCP connection" approach, it leads to more issues, > specifically > > > around memory usage. If an app has to produce to 100 partitions, it now > > has > > > to divide the available memory it has by 100 while setting for each > > > individual pulsar client. This may be very suboptimal. Or otherwise, > the > > > app will have to make some assumptions and oversubscribe the available > > > memory between those 100 clients which can lead to OOM if many > partitions > > > are throttling. > > > > > > Hope this helps and gives more context around how the PIP is useful. > > > > > > Regards > > > > > > On Sat, Oct 5, 2024 at 12:53 PM Girish Sharma <scrapmachi...@gmail.com > > > > > wrote: > > > > > > > Hi Rajan, > > > > Thanks for taking the time and going through the PIP. > > > > > > > > >>> Well, even if we have throttle producer protocol, if client app > is > > keep > > > > >>> producing messages then client app will see high timeout and to > > fast > > > > fail > > > > >>> this issue, Pulsar Client has internal producer queue and client > > can > > > > always > > > > >>> tune that queue. once that queue is fail, client can configure to > > fast > > > > fail > > > > >>> or wait by blocking client thread but in both ways, client > > application > > > > will > > > > >>> be aware that publish is taking longer time and client app can > > always > > > > do > > > > >>> backoff if needed, So, this is very well known issue and it's > > already > > > > >>> solved in Pulsar. > > > > Your github comments are missing this point about the client timeout, > > > > producer queue etc. Could you please paste it there itself so that we > > can > > > > keep the discussion contained at one place? > > > > > > > > Regards > > > > > > > > On Sat, Oct 5, 2024 at 4:58 AM Rajan Dhabalia <rdhaba...@apache.org> > > > > wrote: > > > > > > > >> Hi Girish, > > > >> > > > >> I have gone through the proposal and you mentioned few problems as a > > > >> motivation of this improvements > > > >> > > > >> >> Noisy neighbors - Even if one topic is exceeding the quota, since > > the > > > >> entire channel read is paused, all topics sharing the same connect > > (for > > > >> example - using the same java client object) get rate limited. > > > >> > > > >> I don't think it's a noisy neighbor issue. There are many ways: > > clients > > > >> can > > > >> use a separate connection for different topics by increasing the > > number of > > > >> connections and more specifically create Cache of PulsarClient > > objects to > > > >> manage topics belonging to different usecases. If you use one > channel > > for > > > >> different tenants/usecases and if they get impacted then it's not a > > noisy > > > >> neighbor but the application might need design improvement. > > > >> For example: If client app use the same topic for different usecases > > then > > > >> all usecases can be impacted by each other, and that doesn't mean > > Pulsar > > > >> has a noisy neighbor issue but it needs a design change to use > > separate > > > >> topics for each usecase. So, this challenge is easily achievable. > > > >> > > > >> >> Unaware clients - clients are completely unaware that they are > > being > > > >> rate limited. This leads to all send calls taking super long time or > > > >> simply > > > >> timing out... they can either fail fast or induce back-pressure to > > their > > > >> upstream. > > > >> > > > >> Well, even if we have throttle producer protocol, if client app is > > keep > > > >> producing messages then client app will see high timeout and to fast > > fail > > > >> this issue, Pulsar Client has internal producer queue and client can > > > >> always > > > >> tune that queue. once that queue is fail, client can configure to > fast > > > >> fail > > > >> or wait by blocking client thread but in both ways, client > application > > > >> will > > > >> be aware that publish is taking longer time and client app can > always > > do > > > >> backoff if needed, So, this is very well known issue and it's > already > > > >> solved in Pulsar. > > > >> > > > >> and we should have server side metrics for topic throttling which > > should > > > >> give a clear picture of msgRate and throttling for any further > > debugging. > > > >> > > > >> So, I think every issue is already addressed and I don't see any > > specific > > > >> need for these issue. > > > >> > > > >> Thanks, > > > >> Rajan > > > >> > > > >> > > > >> > > > >> On Fri, Oct 4, 2024 at 3:45 PM Lari Hotari <lhot...@apache.org> > > wrote: > > > >> > > > >> > Great work on this proposal, Girish! > > > >> > > > > >> > This improvement addresses a crucial aspect of Pulsar's > > functionality. > > > >> > You're effectively bridging an important gap in Pulsar's producer > > flow > > > >> > control. This addition will improve the ability to set and meet > SLAs > > > >> across > > > >> > various Pulsar use cases, which is invaluable for many of our > users. > > > >> > > > > >> > Thank you for driving this important improvement. It's > contributions > > > >> like > > > >> > these that continue to enhance Pulsar's robustness and > flexibility. > > > >> > > > > >> > Looking forward to seeing this develop further. > > > >> > > > > >> > -Lari > > > >> > > > > >> > On 2024/10/04 14:48:09 Girish Sharma wrote: > > > >> > > Hello Pulsar Community, > > > >> > > > > > >> > > I would like to propose a new improvement for Pulsar protocol > > related > > > >> to > > > >> > > rate limiting that the broker imposes to maintain quality of > > service. > > > >> > This > > > >> > > proposal adds a new binary protocol command pair and > corresponding > > > >> server > > > >> > > and java client changes. With the new protocol command, clients > > would > > > >> be > > > >> > > able to understand that they are breaching the quota for a topic > > and > > > >> take > > > >> > > action accordingly. > > > >> > > > > > >> > > The full proposal can be found at > > > >> > > https://github.com/apache/pulsar/pull/23398 > > > >> > > Direct link to rendered markdown with mermaid flowcharts - > > > >> > > > https://github.com/grssam/pulsar/blob/rl-protocol/pip/pip-385.md > > > >> > > > > > >> > > Please share your thoughts on this proposal along with any > > concerns or > > > >> > > suggestions. > > > >> > > > > > >> > > Regards > > > >> > > -- > > > >> > > Girish Sharma > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > -- > > > > Girish Sharma > > > > > > > > > > > > > -- > > > Girish Sharma > > > > > > > > -- > Girish Sharma >