*The history behind introducing TooManyRequest error is to handle backpressure for zookeeper by throttling a large number of concurrent topics loading during broker cold restart. Therefore, pulsar has lookup throttling at both client and server-side that slows down lookup because lookup ultimately triggers topic loading at server side. So, when a client sees TooManyRequest errors, the client should retry to perform this operation and the client will eventually reconnect to the broker, TooManyRequest can not harm the broker because broker already has a safeguard to reject the flood of the requests. I am not sure what problem https://github.com/apache/pulsar/pull/6584 <https://github.com/apache/pulsar/pull/6584> PR tries to solve but it should not solve it by making TooManyRequest non-retriable. TooManyRequest is a retriable error and the client should retry. Also, it should definitely not close the producer/consumer due to this error otherwise it can bring down the entire application which depends on the availability of the pulsar client entities.Pulsar lookup is an operation similar to other operations such as: connect, publish, subscribe, etc. So, I don’t think it needs special treatment with a separate timeout config and we can avoid the complexity introduced in PR #11627 that caches and depends on the previously seen exception for lookup retry. Anyways, removing TooManyRequest from the non-retriable error list will simplify the client behavior and we can avoid the complexity of PR: #11627 <https://github.com/apache/pulsar/pull/11627/>Thanks,Rajan*
On Mon, Aug 9, 2021 at 12:54 AM Ivan Kelly <iv...@apache.org> wrote: > > Suppose you have about a million topics and your Pulsar cluster goes down > > (say, ZK down). ..many millions of producers and consumers are now > > anxiously awaiting the cluster to comeback. .. fun experience for the > first > > broker that comes up. Don't ask me how I know, ref blame > > ServerCnx.java#L429 > > < > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L429 > >. > > The broker limit was added to get through a cold restart. > > Ok. Makes sense. The scenarios we've been seeing issues with have had > modest numbers of topics. > > -Ivan >