Replies inline On Fri, 3 Nov 2023 at 20:48, Girish Sharma <scrapmachi...@gmail.com> wrote:
> Could you please elaborate more on these details? Here are some questions: > > 1. What do you mean that it is too strict? > > - Should the rate limiting allow bursting over the limit for some > time? > > > > That's one of the major use cases, yes. > One possibility would be to improve the existing rate limiter to allow bursting. I think that Pulsar's out-of-the-box rate limiter should cover 99% of the use cases instead of having one implementing their own rate limiter algorithm. The problems you are describing seem to be common to many Pulsar use cases, and therefore, I think they should be handled directly in Pulsar. Optimally, there would be a single solution that abstracts the rate limiting in a way where it does the right thing based on the declarative configuration. I would prefer that over having a pluggable solution for rate limiter implementations. What would help is getting deeper in the design of the rate limiter itself, without limiting ourselves to the existing rate limiter implementation in Pulsar. In textbooks, there are algorithms such as "leaky bucket" [1] and "token bucket" [2]. Both algorithms have several variations and in some ways they are very similar algorithms but looking from the different point of view. It would possibly be easier to conceptualize and understand a rate limiting algorithm if common algorithm names and implementation choices mentioned in textbooks would be referenced in the implementation. It seems that a "token bucket" type of algorithm can be used to implement rate limiting with bursting. In the token bucket algorithm, the size of the token bucket defines how large bursts will be allowed. The design could also be something where 2 rate limiters with different type of algorithms and/or configuration parameters are combined to achieve a desired behavior. For example, to achieve a rate limiter with bursting and a fixed maximum rate. By default, the token bucket algorithm doesn't enforce a maximum rate for bursts, but that could be achieved by chaining 2 rate limiters if that is really needed. The current Pulsar rate limiter implementation could be implemented in a cleaner way, which would also be more efficient. Instead of having a scheduler call a method once per second, I think that the rate limiter could be implemented in a reactive way where the algorithm is implemented without a scheduler. I wonder if there are others that would be interested in getting down into such implementation details? 1 - https://en.wikipedia.org/wiki/Leaky_bucket 2 - https://en.wikipedia.org/wiki/Token_bucket > 2. What type of data loss are you experiencing? > > > > Messages produced by the producers which eventually get timed out due to > rate limiting. > Are you able to slow down producing on the client side? If that is possible, there could be ways to improve ways to do client side back pressure with Pulsar Client. Currently, the client doesn't expose this information until the sending blocks or fails with an exception (ProducerQueueIsFullError). Optimally, the client should slow down the rate of producing to the rate that it can actually send to the broker. Just curious if you have considered turning off producing timeouts on the client side completely or making them longer? Would that address the data loss problem? Or is your event/message source "hot" so that you cannot stop or slow it down, and it will just keep on flowing with a certain rate? > I think the core implementation of how the broker fails fast at the time > of rate limiting (whether it is by pausing netty channel or a new permits > based model) does not change the actual issue I am targeting. Multiplexing > has some impact on it - but yet again only limited, and can easily be fixed > by the client by increasing the connections per broker. Even after assuming > both these things are somehow "fixed", the fact remains that an absolutely > strict rate limiter will lead to the above mentioned data loss for burst > going above the limit and that a poller based rate limiter doesn't really > rate limit anything as it allows all produce in the first interval of the > next second. > Yes, it makes sense to have bursting configuration parameters in the rate limiter. As mentioned above, I think we could be improving the existing rate limiter in Pulsar to cover 99% of the use case by making it stable and by including the bursting configuration options. Is there additional functionality you feel the rate limiter needs beyond bursting support? One way to workaround the multiplexing problem would be to add a client side option for producers and consumers, where you could specify that the client picks a separate TCP/IP connection that is not shared and isn't from the connection pool. Preventing connection multiplexing seems to be the only way to make the current rate limiting deterministic and stable without adding the explicit flow control to the Pulsar binary protocol for producers. Are there other community members with input on the design and implementation of an improved rate limiter? I’m eager to continue this conversation and work together towards a robust solution. -Lari