BewareMyPower commented on issue #24697: URL: https://github.com/apache/pulsar/issues/24697#issuecomment-3292412656
I got the heap dump privately. Unfortunately, the heap dump was grabbed when many ledgers had been deleted due to retention, so many subscriptions' read position were the same read position and it's hard to get some useful info from the heap dump. P.S. I wanted to check the complicated read path of `ledger.asyncReadEntries(op)` from the heap dump before. Anyway, https://github.com/apache/pulsar/pull/24741 should help get around of this issue, but I'd do more analysis later. After that PR, I will sort out the lifetime of `OpReadEntry`. This issue is clearly that the `ReadEntriesCtx` has been recycled twice. But it's hardly an issue within the managed-ledger module: - `pendingReadOps` is 0 and `waitingReadOp` is 0, so an `OpReadEntry` has been passed to `ManagedLedgerImpl#asyncReadEntries` and `EntryCache#asyncReadEntry` eventually - `havingPendingRead` is true means the dispatcher has started a read operation but no completion like `internalReadEntriesFailed`, which sets it to false immediately. The exception thrown by `recycle()` means `internalReadEntriesComplete` was called but `havingPendingRead` was not set to false: ```java public synchronized void internalReadEntriesComplete(final List<Entry> entries, Object obj) { ReadEntriesCtx readEntriesCtx = (ReadEntriesCtx) obj; Consumer readConsumer = readEntriesCtx.getConsumer(); long epoch = readEntriesCtx.getEpoch(); readEntriesCtx.recycle(); havePendingRead = false; // it didn't reach here ``` It means `internalReadEntriesComplete` was called only once and then `recycle()` failed. The most possible reason is that `obj` refers a recycled `ReadEntriesCtx` anyway, but the reason is unknown yet. I noticed the problematic topic's executor has many tasks failed: <img width="1088" height="231" alt="Image" src="https://github.com/user-attachments/assets/74b2bcc9-0bfd-4f5e-9fe7-c1258fa7e20c" /> <img width="702" height="340" alt="Image" src="https://github.com/user-attachments/assets/e0475e0f-99e8-4dde-88a7-135cad6ce61a" /> There should be many `Error while running task:` error logs from `broker-topics-<xxx>-16-0` before, @dragonls could you check the history logs again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
