Re: Lack of retries on TooManyRequests

2021-08-11 Thread Ivan Kelly
Thank Rajan, will reply on the PR. https://github.com/apache/pulsar/pull/11627/ On Wed, Aug 11, 2021 at 10:06 AM Rajan Dhabalia wrote: > > *The history behind introducing TooManyRequest error is to handle > backpressure for zookeeper by throttling a large number of concurrent > topics loading dur

Re: Lack of retries on TooManyRequests

2021-08-11 Thread Rajan Dhabalia
*The history behind introducing TooManyRequest error is to handle backpressure for zookeeper by throttling a large number of concurrent topics loading during broker cold restart. Therefore, pulsar has lookup throttling at both client and server-side that slows down lookup because lookup ultimately

Re: Lack of retries on TooManyRequests

2021-08-09 Thread Ivan Kelly
> Suppose you have about a million topics and your Pulsar cluster goes down > (say, ZK down). ..many millions of producers and consumers are now > anxiously awaiting the cluster to comeback. .. fun experience for the first > broker that comes up. Don't ask me how I know, ref blame > ServerCnx.ja

Re: Lack of retries on TooManyRequests

2021-08-06 Thread Joe F
Suppose you have about a million topics and your Pulsar cluster goes down (say, ZK down). ..many millions of producers and consumers are now anxiously awaiting the cluster to comeback. .. fun experience for the first broker that comes up. Don't ask me how I know, ref blame ServerCnx.java#L429

Re: Lack of retries on TooManyRequests

2021-08-06 Thread Ivan Kelly
Inline > In that scenario, > should we block or fail fast and let the application decide which is what > we do today? Also, should we distinguish between the two scenarios, i.e. > broker sends the error vs client internally throws the error? While I agree that the client limit is different to the

Re: Lack of retries on TooManyRequests

2021-08-06 Thread Jerry Peng
Currently, there are two ways to get the TooManyRequest errors in the client. 1) The client enforces a maximum number of pending lookups: https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L733 The max number can be set when cre

Re: Lack of retries on TooManyRequests

2021-08-06 Thread Ivan Kelly
Another strangeness I've seen with partition metadata requests. https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarClientImpl.java#L886 It takes a backoff, and it does retries, but the opTimeoutMs variable

Lack of retries on TooManyRequests

2021-08-05 Thread Ivan Kelly
Hi folks, I'm currently digging into a customer issue we've seen where the retry logic isn't working well. Basically, they have a ton of producers and a load of partitions, and when they all connect at the same time, they bury the brokers for a few minutes. I'm looking at this from the consumer po