[
https://issues.apache.org/jira/browse/ARTEMIS-5861?focusedWorklogId=1002750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1002750
]
ASF GitHub Bot logged work on ARTEMIS-5861:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 30/Jan/26 22:57
Start Date: 30/Jan/26 22:57
Worklog Time Spent: 10m
Work Description: jbertram commented on PR #6202:
URL: https://github.com/apache/artemis/pull/6202#issuecomment-3826236633
> Garys idea seems reasonable...
Agreed. I implemented his suggestion.
> ...I also dont actually know that we really want this set to 0 all the
time in the test suite?
I don't know if we do or not. There's not much detail on
https://issues.apache.org/jira/browse/ARTEMIS-2428 where this change originated.
> Until yesterday the related bit previously waited for as long as needed
during the entire test suite...
The problem, as outlined on the Jira, is that the call to
`awaitUninterruptibly()` can apparently hang forever so a timeout is needed for
these calls. Rather than create and document a new parameter I simply re-used
the existing, but undocumented, `shutdownTimeout` parameter.
I certainly could create a new parameter specifically for closing the Netty
`ChannelGroup` instances. I could name it something like
`channelGroupShutdownTimeout`, but then that would introduce a naming asymmetry
with `shutdownTimeout` which is specifically aimed at the Netty
`EventLoopGroup` instance. Since `shutdownTimeout` was undocumented I could
potentially just rename it to `eventLoopGroupShutdownTimeout` and then document
both new parameters, hoping that nobody was actually using `shutdownTimeout`,
or I could deprecate `shutdownTimeout` and let it live alongside the new
parameter. I'd probably need to do the same with `quietPeriod` as well.
Ultimately we just need a timeout here so these calls can't hang
indefinitely. Using `shutdownTimeout` seems the simplest path forward to me.
Issue Time Tracking
-------------------
Worklog Id: (was: 1002750)
Time Spent: 2h 10m (was: 2h)
> Netty acceptor not shutting down
> --------------------------------
>
> Key: ARTEMIS-5861
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5861
> Project: Artemis
> Issue Type: Bug
> Affects Versions: 2.44.0
> Reporter: Justin Bertram
> Assignee: Justin Bertram
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> Thread dump analysis reveals that the broker hangs indefinitely when trying
> to close Netty channel groups in a Netty acceptor, e.g.:
> {noformat}
> State: WAITING (on object monitor)
> Stack trace:
> at
> io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:290)
> - locked <0x00000000dbd095a8> (a
> io.netty.channel.group.DefaultChannelGroupFuture)
> at
> io.netty.channel.group.DefaultChannelGroupFuture.awaitUninterruptibly(DefaultChannelGroupFuture.java:178)
> at
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.asyncStop(NettyAcceptor.java:793){noformat}
>
> The code at {{NettyAcceptor.java:793}} calls
> {{channelGroup.close().awaitUninterruptibly()}} without a timeout parameter
> causing indefinite hang when channels fail to close properly. This problem is
> very rare and there is no good reproducer
> The broker should complete shutdown within a reasonable timeout period,
> forcefully closing any remaining connections if necessary.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]