[ 
https://issues.apache.org/jira/browse/AMQ-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher L. Shannon updated AMQ-9747:
----------------------------------------
    Fix Version/s: 6.2.0
                   5.19.1
                   6.1.8

> Kahadb checkpoint runner thread dies without catching exception
> ---------------------------------------------------------------
>
>                 Key: AMQ-9747
>                 URL: https://issues.apache.org/jira/browse/AMQ-9747
>             Project: ActiveMQ Classic
>          Issue Type: Bug
>          Components: KahaDB
>    Affects Versions: 6.1.4, 6.1.6, 5.18.7, 6.1.7
>            Reporter: ritesh adval
>            Assignee: Christopher L. Shannon
>            Priority: Major
>             Fix For: 6.2.0, 5.19.1, 6.1.8
>
>         Attachments: image-2025-07-21-11-14-20-510.png
>
>
> This bug affects all version of activemq, I have added a few ones starting 
> 5.18.4.  This is a really critical bug, which causes checkpoint runner thread 
> to SILENTLY GET KILLED (exception eaten away), causing kahadb journal files 
> to KEEP GROWING, till you restart activemq.  There is a bug in handling of io 
> exception... see screen shot below.  In the catch block its calls 
> brokerService.handleIOException(). 
> !image-2025-07-21-11-14-20-510.png! iif you take a look at the default io 
> exception handler which is used, it will throw this SuppressReplyException at 
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L165]
>   and if stopStartConnectors is true (which is if you use 
> LeaseLockerIOExceptionHandler) then also it throws this 
> SuppressReplyException at  
> [https://github.com/apache/activemq/blob/main/activemq-broker/src/main/java/org/apache/activemq/util/DefaultIOExceptionHandler.java#L155]
>  and because of this, the CheckPoint runner thread as shown in above screen 
> shot would silently die, even though broker is still running...   
> it seems the checkpoint runner should not be dying.... we had a situation 
> where we were using EFS as our storage for kahadb... and due to a blip in 
> connection EFS, an io exception in page.flush in MessageDatabase was thrown 
> (we were using LeaseLockerIOExceptionHandler)... that caused 
> DefaultIOExceptionHandler logic to start and stop connectors and return 
> SuppressReplyException as i mentioned above, causing CheckPoint runner to 
> silient get killed... while broker was still running....
>  
> we had this in production and the fix we did is to extend 
> LeaseLockerIOExceptionHandler and catch exception throw from 
> handle(IOException ex) method and log it as warn and not propogate it up to 
> checkpoint runner thread... but this is temporary fix.. i am not even sure if 
> CheckpointRunner needs to use DefaultIOExceptionHandler.... it shouldn't die 
> silently...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to