[I] [Bug] Pulsar-function BK read-retry-loop with no backoff cause CPU exhaution [pulsar]

via GitHub Thu, 28 Aug 2025 01:08:21 -0700


vroyer opened a new issue, #24677:
URL: https://github.com/apache/pulsar/issues/24677


   ### Search before reporting
   
   - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Read release policy
   
   - [x] I understand that [unsupported 
versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions)
 don't get bug fixes. I will attempt to reproduce the issue on a supported 
version of Pulsar client and Pulsar broker.
   
   
   ### User environment
   
   We are running an HELM deployed pulsar cluster running on an on-prem 
kubernetes (lunastreaming 4.1.3.18, 
https://github.com/datastax/pulsar/tree/ls31_4.18 ) with 3 bookies + 3 brokers.
   
   
   
   
   ### Issue Description
   
   When a bookie is unavailable because of an k8s hardware issue (1 out of 3 
bookies with quorum=2), the pulsar-function try to read some metadata from 
bookkeeper and the unavilable bookie cause a very expensive CPU read retry 
loop. As the result, the pulsar-function health liveness check fails and k8s 
kills the pulsar-function pod. Meanwhile, some pulsar connector pods cannot 
start properly.
   
   ### Error messages
   
   ```text
   
   ```
   
   ### Reproducing the issue
   
   This read-retry-loop with no backoff for an unavailable bookie seems to be a 
bookkeeper issue (bk version 4.16.7),  because there is even no BK read backoff 
setting to mitigate this kind of situation. It is useless to immediately retry 
reading from a bookie with this error: "Cannot resolve bookieId 
glpdlskub016:3181, bookie does not exist or it is not running"
   
   ### Additional information
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Pulsar-function BK read-retry-loop with no backoff cause CPU exhaution [pulsar]

Reply via email to