Re: [I] InterruptedException when acquiring partitonConsumerSemaphore [pinot]

via GitHub Wed, 14 Feb 2024 23:00:31 -0800


ankitsultana commented on issue #12390:
URL: https://github.com/apache/pinot/issues/12390#issuecomment-1945473584

Took a deep-dive today and found the root-cause. Here's the sequence of
events which triggers this:

* We have a consuming segment on a server. On time of commit, it receives a
DISCARD from the controller.
* There's a Huge GC
* Instead of receiving a CONSUMING to ONLINE transition, the segment
receives a OFFLINE to ONLINE transition.
* Since the TDM already has a Segment Data Manager (SDM) of the consuming
segment, we end up skipping the `addSegment` call and mark it succeeded
anyways.
[RealtimeSegmentDataManager.java#L387](https://github.com/apache/pinot/blob/38d86b0a6432e9a7249f1692ace36b6e34171b0a/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java#L387)

So the `RealtimeSegmentDataManager` is never destroyed and the
`_partitionGroupConsumerSemaphore` is never released.

As more and more segments for this partition are committed on other servers,
we receive many OFFLINE to CONSUMING transitions, which pile up.

When there's another big GC, and say Helix reconnects, then the
`HelixTaskExecutor` will be shutdown, leading to interruption of all the
semaphore acquire calls and the original error message above. (note
`onBecomeConsumingFromOffline` in the stack trace).

@Jackie-Jiang : this is an additional case where
`RealtimeTableDataManager#addSegment` can be called. Do you have any
recommendations on how to handle this?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] InterruptedException when acquiring partitonConsumerSemaphore [pinot]

Reply via email to