[
https://issues.apache.org/jira/browse/AMQ-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher L. Shannon reassigned AMQ-9747:
-------------------------------------------
Assignee: Christopher L. Shannon
> Kahadb checkpoint runner thread dies without catching exception
> ---------------------------------------------------------------
>
> Key: AMQ-9747
> URL: https://issues.apache.org/jira/browse/AMQ-9747
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: KahaDB
> Affects Versions: 6.1.4, 6.1.6, 5.18.7, 6.1.7
> Reporter: ritesh adval
> Assignee: Christopher L. Shannon
> Priority: Major
> Attachments: image-2025-07-21-11-14-20-510.png
>
>
> This bug affects all version of activemq, I have added a few ones starting
> 5.18.4. This is a really critical bug, which causes checkpoint runner thread
> to SILENTLY GET KILLED (exception eaten away), causing kahadb journal files
> to KEEP GROWING, till you restart activemq. There is a bug in handling of io
> exception... see screen shot below. In the catch block its calls
> brokerService.handleIOException().
> !image-2025-07-21-11-14-20-510.png! iif you take a look at the default io
> exception handler which is used, it will throw this SuppressReplyException at
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L165]
> and if stopStartConnectors is true (which is if you use
> LeaseLockerIOExceptionHandler) then also it throws this
> SuppressReplyException at
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L155]
> and because of this, the CheckPoint runner thread as shown in above screen
> shot would silently die, even though broker is still running...
> it seems the checkpoint runner should not be dying.... we had a situation
> where we were using EFS as our storage for kahadb... and due to a blip in
> connection EFS, an io exception in page.flush in MessageDatabase was thrown
> (we were using LeaseLockerIOExceptionHandler)... that caused
> DefaultIOExceptionHandler logic to start and stop connectors and return
> SuppressReplyException as i mentioned above, causing CheckPoint runner to
> silient get killed... while broker was still running....
>
> we had this in production and the fix we did is to extend
> LeaseLockerIOExceptionHandler and catch exception throw from
> handle(IOException ex) method and log it as warn and not propogate it up to
> checkpoint runner thread... but this is temporary fix.. i am not even sure if
> CheckpointRunner needs to use DefaultIOExceptionHandler.... it shouldn't die
> silently...
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact