BewareMyPower commented on issue #24697:
URL: https://github.com/apache/pulsar/issues/24697#issuecomment-3292412656

   I got the heap dump privately. Unfortunately, the heap dump was grabbed when 
many ledgers had been deleted due to retention, so many subscriptions' read 
position were the same read position and it's hard to get some useful info from 
the heap dump.
   
   P.S. I wanted to check the complicated read path of 
`ledger.asyncReadEntries(op)` from the heap dump before.
   
   Anyway, https://github.com/apache/pulsar/pull/24741 should help get around 
of this issue, but I'd do more analysis later.
   
   After that PR, I will sort out the lifetime of `OpReadEntry`. This issue is 
clearly that the `ReadEntriesCtx` has been recycled twice. But it's hardly an 
issue within the managed-ledger module:
   - `pendingReadOps` is 0 and `waitingReadOp` is 0, so an `OpReadEntry` has 
been passed to `ManagedLedgerImpl#asyncReadEntries` and 
`EntryCache#asyncReadEntry` eventually
   - `havingPendingRead` is true means the dispatcher has started a read 
operation but no completion like `internalReadEntriesFailed`, which sets it to 
false immediately.
   
   The exception thrown by `recycle()` means `internalReadEntriesComplete` was 
called but `havingPendingRead` was not set to false:
   
   ```java
       public synchronized void internalReadEntriesComplete(final List<Entry> 
entries, Object obj) {
           ReadEntriesCtx readEntriesCtx = (ReadEntriesCtx) obj;
           Consumer readConsumer = readEntriesCtx.getConsumer();
           long epoch = readEntriesCtx.getEpoch();
           readEntriesCtx.recycle();
           havePendingRead = false; // it didn't reach here
   ```
   
   It means `internalReadEntriesComplete` was called only once and then 
`recycle()` failed.
   
   The most possible reason is that `obj` refers a recycled `ReadEntriesCtx` 
anyway, but the reason is unknown yet.
   
   I noticed the problematic topic's executor has many tasks failed:
   
   <img width="1088" height="231" alt="Image" 
src="https://github.com/user-attachments/assets/74b2bcc9-0bfd-4f5e-9fe7-c1258fa7e20c";
 />
   
   <img width="702" height="340" alt="Image" 
src="https://github.com/user-attachments/assets/e0475e0f-99e8-4dde-88a7-135cad6ce61a";
 />
   
   There should be many `Error while running task:` error logs from 
`broker-topics-<xxx>-16-0` before, @dragonls could you check the history logs 
again?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to