[ 
https://issues.apache.org/jira/browse/ARTEMIS-5861?focusedWorklogId=1002750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1002750
 ]

ASF GitHub Bot logged work on ARTEMIS-5861:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Jan/26 22:57
            Start Date: 30/Jan/26 22:57
    Worklog Time Spent: 10m 
      Work Description: jbertram commented on PR #6202:
URL: https://github.com/apache/artemis/pull/6202#issuecomment-3826236633

   > Garys idea seems reasonable...
   
   Agreed. I implemented his suggestion.
   
   > ...I also dont actually know that we really want this set to 0 all the 
time in the test suite?
   
   I don't know if we do or not. There's not much detail on 
https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.
   
   > Until yesterday the related bit previously waited for as long as needed 
during the entire test suite...
   
   The problem, as outlined on the Jira, is that the call to 
`awaitUninterruptibly()` can apparently hang forever so a timeout is needed for 
these calls. Rather than create and document a new parameter I simply re-used 
the existing, but undocumented, `shutdownTimeout` parameter. 
   
   I certainly could create a new parameter specifically for closing the Netty 
`ChannelGroup` instances. I could name it something like 
`channelGroupShutdownTimeout`, but then that would introduce a naming asymmetry 
with `shutdownTimeout` which is specifically aimed at the Netty 
`EventLoopGroup` instance. Since `shutdownTimeout` was undocumented I could 
potentially just rename it to `eventLoopGroupShutdownTimeout` and then document 
both new parameters, hoping that nobody was actually using `shutdownTimeout`, 
or I could deprecate `shutdownTimeout` and let it live alongside the new 
parameter. I'd probably need to do the same with `quietPeriod` as well.
   
   Ultimately we just need a timeout here so these calls can't hang 
indefinitely. Using `shutdownTimeout` seems the simplest path forward to me.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 1002750)
    Time Spent: 2h 10m  (was: 2h)

> Netty acceptor not shutting down
> --------------------------------
>
>                 Key: ARTEMIS-5861
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5861
>             Project: Artemis
>          Issue Type: Bug
>    Affects Versions: 2.44.0
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Thread dump analysis reveals that the broker hangs indefinitely when trying 
> to close Netty channel groups in a Netty acceptor, e.g.:
> {noformat}
>   State: WAITING (on object monitor)
>   Stack trace:
>     at 
> io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:290)
>       - locked <0x00000000dbd095a8> (a 
> io.netty.channel.group.DefaultChannelGroupFuture)
>     at 
> io.netty.channel.group.DefaultChannelGroupFuture.awaitUninterruptibly(DefaultChannelGroupFuture.java:178)
>     at 
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.asyncStop(NettyAcceptor.java:793){noformat}
>   
> The code at {{NettyAcceptor.java:793}} calls 
> {{channelGroup.close().awaitUninterruptibly()}} without a timeout parameter 
> causing indefinite hang when channels fail to close properly. This problem is 
> very rare and there is no good reproducer
> The broker should complete shutdown within a reasonable timeout period, 
> forcefully closing any remaining connections if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to