Hello Lari, Thanks for bringing this to my attention. I went through the links, but does this sharing of the same tcp/ip connection happen across partitions as well (assuming both the partitions of the topic are on the same broker)? i.e. producer 127.0.0.1 for partition `persistent://tenant/ns/topic0-partition0` and producer 127.0.0.1 for partition `persistent://tenant/ns/topic0-partition1` share the same tcp/ip connection assuming both are on broker-0 ?
In general, the major use case behind this PIP for me and my organization is about supporting produce spikes. We do not want to allocate absolute maximum throughput for a topic which would not even be utilized 99.99% of the time. Thus, for a topic that stays constantly at 100MBps and goes to 150MBps only once in a blue moon, it's unwise to allocate 150MBps worth of resources 100% of the time. The poller based rate limiter is also not good here as it would allow over use of hardware without a check, degrading the system. @Asif, I have been sick these last 10 days, but will be updating the PIP with the discussed changes early next week. Regards On Fri, Nov 3, 2023 at 3:25 PM Lari Hotari <lhot...@apache.org> wrote: > Hi Girish, > > In order to address your problem described in the PIP document [1], it > might be necessary to make improvements in how rate limiters apply > backpressure in Pulsar. > > Pulsar uses mainly TCP/IP connection level controls for achieving > backpressure. The challenge is that Pulsar can share a single TCP/IP > connection across multiple producers and consumers. Because of this, there > could be multiple producers and consumers and rate limiters operating on > the same connection on the broker, and they will do conflicting decisions, > which results in undesired behavior. > > Regarding the shared TCP/IP connection backpressure issue, Apache Flink > had a somewhat similar problem before Flink 1.5. It is described in the > "inflicting backpressure" section of this blog post from 2019: > > https://flink.apache.org/2019/06/05/flink-network-stack.html#inflicting-backpressure-1 > Flink solved the issue of multiplexing multiple streams of data on a > single TCP/IP connection in Flink 1.5 by introducing it's own flow control > mechanism. > > The backpressure and rate limiting challenges have been discussed a few > times in Pulsar community meetings over the past years. There was also a > generic backpressure thread on the dev mailing list [2] in Sep 2022. > However, we haven't really documented Pulsar's backpressure design and how > rate limiters are part of the overall solution and how we could improve. > I think it might be time to do so since there's a requirement to improve > rate limiting. I guess that's the main motivation also for PIP-310. > > -Lari > > 1 - https://github.com/apache/pulsar/pull/21399/files > 2 - https://lists.apache.org/thread/03w6x9zsgx11mqcp5m4k4n27cyqmp271 > > On 2023/10/19 12:51:14 Girish Sharma wrote: > > Hi, > > Currently, there are only 2 kinds of publish rate limiters - polling > based > > and precise. Users have an option to use either one of them in the topic > > publish rate limiter, but the resource group rate limiter only uses > polling > > one. > > > > There are challenges with both the rate limiters and the fact that we > can't > > use precise rate limiter in the resource group level. > > > > Thus, in order to support custom rate limiters, I've created the PIP-310 > > > > This is the discussion thread. Please go through the PIP and provide your > > inputs. > > > > Link - https://github.com/apache/pulsar/pull/21399 > > > > Regards > > -- > > Girish Sharma > > > -- Girish Sharma