Thank Rajan, will reply on the PR.
https://github.com/apache/pulsar/pull/11627/
On Wed, Aug 11, 2021 at 10:06 AM Rajan Dhabalia wrote:
>
> *The history behind introducing TooManyRequest error is to handle
> backpressure for zookeeper by throttling a large number of concurrent
> topics loading dur
*The history behind introducing TooManyRequest error is to handle
backpressure for zookeeper by throttling a large number of concurrent
topics loading during broker cold restart. Therefore, pulsar has lookup
throttling at both client and server-side that slows down lookup because
lookup ultimately
> Suppose you have about a million topics and your Pulsar cluster goes down
> (say, ZK down). ..many millions of producers and consumers are now
> anxiously awaiting the cluster to comeback. .. fun experience for the first
> broker that comes up. Don't ask me how I know, ref blame
> ServerCnx.ja
Suppose you have about a million topics and your Pulsar cluster goes down
(say, ZK down). ..many millions of producers and consumers are now
anxiously awaiting the cluster to comeback. .. fun experience for the first
broker that comes up. Don't ask me how I know, ref blame
ServerCnx.java#L429
Inline
> In that scenario,
> should we block or fail fast and let the application decide which is what
> we do today? Also, should we distinguish between the two scenarios, i.e.
> broker sends the error vs client internally throws the error?
While I agree that the client limit is different to the
Currently, there are two ways to get the TooManyRequest errors in the
client.
1) The client enforces a maximum number of pending lookups:
https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L733
The max number can be set when cre
Another strangeness I've seen with partition metadata requests.
https://github.com/apache/pulsar/blob/ddb5fb0e062c2fe0967efce2a443a31f9cd12c07/pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarClientImpl.java#L886
It takes a backoff, and it does retries, but the opTimeoutMs variable
Hi folks,
I'm currently digging into a customer issue we've seen where the retry
logic isn't working well. Basically, they have a ton of producers and
a load of partitions, and when they all connect at the same time, they
bury the brokers for a few minutes. I'm looking at this from the
consumer po