*The history behind introducing TooManyRequest error is to handle
backpressure for zookeeper by throttling a large number of concurrent
topics loading during broker cold restart. Therefore, pulsar has lookup
throttling at both client and server-side that slows down lookup because
lookup ultimately triggers topic loading at server side. So, when a client
sees TooManyRequest errors, the client should retry to perform this
operation and the client will eventually reconnect to the broker,
TooManyRequest can not harm the broker because broker already has a
safeguard to reject the flood of the requests. I am not sure what problem
https://github.com/apache/pulsar/pull/6584
<https://github.com/apache/pulsar/pull/6584> PR tries to solve but it
should not solve it by making TooManyRequest non-retriable. TooManyRequest
is a retriable error and the client should retry. Also, it should
definitely not close the producer/consumer due to this error otherwise it
can bring down the entire application which depends on the availability of
the pulsar client entities.Pulsar lookup is an operation similar to other
operations such as: connect, publish, subscribe, etc. So, I don’t think it
needs special treatment with a separate timeout config and we can avoid the
complexity introduced in PR #11627 that caches and depends on the
previously seen exception for lookup retry. Anyways, removing
TooManyRequest from the non-retriable error list will simplify the client
behavior and we can avoid the complexity of PR: #11627
<https://github.com/apache/pulsar/pull/11627/>Thanks,Rajan*

On Mon, Aug 9, 2021 at 12:54 AM Ivan Kelly <iv...@apache.org> wrote:

> > Suppose you have about a million topics and your Pulsar cluster goes down
> > (say, ZK down). ..many millions of producers and consumers are now
> > anxiously awaiting the cluster to comeback. .. fun experience for the
> first
> > broker that comes up.   Don't ask me how I know,  ref blame
> > ServerCnx.java#L429
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L429
> >.
> > The broker limit was added to get through a cold restart.
>
> Ok. Makes sense. The scenarios we've been seeing issues with have had
> modest numbers of topics.
>
> -Ivan
>

Reply via email to