Re: Lack of retries on TooManyRequests

Joe F Fri, 06 Aug 2021 17:34:58 -0700

Suppose you have about a million topics and your Pulsar cluster goes down
(say, ZK down). ..many millions of producers and consumers are now
anxiously awaiting the cluster to comeback. .. fun experience for the first
broker that comes up.   Don't ask me how I know,  ref blame
ServerCnx.java#L429
<https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L429>.
The broker limit was added to get through a cold restart.


-j


On Fri, Aug 6, 2021 at 12:29 PM Ivan Kelly <iv...@apache.org> wrote:

> Inline
>
> > In that scenario,
> > should we block or fail fast and let the application decide which is what
> > we do today? Also, should we distinguish between the two scenarios, i.e.
> > broker sends the error vs client internally throws the error?
> While I agree that the client limit is different to the broker limit,
> how likely are we to hit the client limit? 50k lookups is a lot. How
> many topics/partitions will a single client be talking to.
>
> Broker level limiting is a funny one. What we've seen is that
> TooManyRequest will only trigger if the server has to go to zookeeper
> to look up the topic. Otherwise, if the broker has cached the
> assignment, you'll never hit TooManyRequests as the handler is pretty
> much synchronous from this point. What is more likely to happen is
> that the request will timeout as it is queued in the TCP queue while
> waiting for other lookups to be processed. So TooManyRequests and
> request timeout are basically equivalent in the bad case.
>
> In terms of what the client should do, it should probably be
> configurable. In most cases, the default will be to block. The client
> isn't going to go "oh well, pulsar down, time to go home". Most
> likely, if we error, process will crash, restart and try the same
> thing again.
>
> -Ivan
>

Re: Lack of retries on TooManyRequests

Reply via email to