Hello Lari,
Thanks for bringing this to my attention. I went through the links, but
does this sharing of the same tcp/ip connection happen across partitions as
well (assuming both the partitions of the topic are on the same broker)?
i.e. producer 127.0.0.1 for partition
`persistent://tenant/ns/topic0-partition0` and producer 127.0.0.1 for
partition `persistent://tenant/ns/topic0-partition1` share the same tcp/ip
connection assuming both are on broker-0 ?

In general, the major use case behind this PIP for me and my organization
is about supporting produce spikes. We do not want to allocate absolute
maximum throughput for a topic which would not even be utilized 99.99% of
the time. Thus, for a topic that stays constantly at 100MBps and goes to
150MBps only once in a blue moon, it's unwise to allocate 150MBps worth of
resources 100% of the time. The poller based rate limiter is also not good
here as it would allow over use of hardware without a check, degrading the
system.

@Asif, I have been sick these last 10 days, but will be updating the PIP
with the discussed changes early next week.

Regards

On Fri, Nov 3, 2023 at 3:25 PM Lari Hotari <lhot...@apache.org> wrote:

> Hi Girish,
>
> In order to address your problem described in the PIP document [1], it
> might be necessary to make improvements in how rate limiters apply
> backpressure in Pulsar.
>
> Pulsar uses mainly TCP/IP connection level controls for achieving
> backpressure. The challenge is that Pulsar can share a single TCP/IP
> connection across multiple producers and consumers. Because of this, there
> could be multiple producers and consumers and rate limiters operating on
> the same connection on the broker, and they will do conflicting decisions,
> which results in undesired behavior.
>
> Regarding the shared TCP/IP connection backpressure issue, Apache Flink
> had a somewhat similar problem before Flink 1.5. It is described in the
> "inflicting backpressure" section of this blog post from 2019:
>
> https://flink.apache.org/2019/06/05/flink-network-stack.html#inflicting-backpressure-1
> Flink solved the issue of multiplexing multiple streams of data on a
> single TCP/IP connection in Flink 1.5 by introducing it's own flow control
> mechanism.
>
> The backpressure and rate limiting challenges have been discussed a few
> times in Pulsar community meetings over the past years. There was also a
> generic backpressure thread on the dev mailing list [2] in Sep 2022.
> However, we haven't really documented Pulsar's backpressure design and how
> rate limiters are part of the overall solution and how we could improve.
> I think it might be time to do so since there's a requirement to improve
> rate limiting. I guess that's the main motivation also for PIP-310.
>
> -Lari
>
> 1 - https://github.com/apache/pulsar/pull/21399/files
> 2 - https://lists.apache.org/thread/03w6x9zsgx11mqcp5m4k4n27cyqmp271
>
> On 2023/10/19 12:51:14 Girish Sharma wrote:
> > Hi,
> > Currently, there are only 2 kinds of publish rate limiters - polling
> based
> > and precise. Users have an option to use either one of them in the topic
> > publish rate limiter, but the resource group rate limiter only uses
> polling
> > one.
> >
> > There are challenges with both the rate limiters and the fact that we
> can't
> > use precise rate limiter in the resource group level.
> >
> > Thus, in order to support custom rate limiters, I've created the PIP-310
> >
> > This is the discussion thread. Please go through the PIP and provide your
> > inputs.
> >
> > Link - https://github.com/apache/pulsar/pull/21399
> >
> > Regards
> > --
> > Girish Sharma
> >
>


-- 
Girish Sharma

Reply via email to