This is an automated email from the ASF dual-hosted git repository.
He-Pin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pekko.git
The following commit(s) were added to refs/heads/main by this push:
new 4d17e08c5b test: widen cluster shutdownAll await for aeron-udp drain
on JDK 25 nightly (#3017)
4d17e08c5b is described below
commit 4d17e08c5b1bbc750ec7c66cf2475211ad076efe
Author: He-Pin(kerr) <[email protected]>
AuthorDate: Sun May 31 21:21:16 2026 +0800
test: widen cluster shutdownAll await for aeron-udp drain on JDK 25 nightly
(#3017)
Motivation:
MixedProtocolClusterSpec "join a cluster with a node using the pekko
protocol (udp)" still fails on virtualized runs after #2997 reordered
shutdownAll to stop joining nodes first:
[WARN] CoordinatedShutdown(pekko://MixedProtocolClusterSpec) Coordinated
shutdown phase [actor-system-terminate] timed out after 30000 milliseconds
java.lang.RuntimeException: Failed to stop [MixedProtocolClusterSpec]
within [1 minute]
... StreamSupervisor ... remote-6-0-unnamed ActorGraphInterpreter
The "within [1 minute]" outer await is 30s base dilated by
pekko.test.timefactor=2, i.e. this lane runs at tf=2 (the JDK 25 nightly
runs at tf=4 -> 120s and passes). The actor-system-terminate phase only
calls system.finalTerminate() and recovers on its own (non-dilated) phase
timeout while termination keeps draining in the background
(CoordinatedShutdown.scala:264-269), so the inner phase WARN is
non-binding noise -- ClusterTestUtil.shutdownAll's dilated await on
whenTerminated is the real deadline. The aeron-udp transport is the
slowest to drain (embedded media driver + stacked Aeron liveness
timeouts), so 60s was simply too tight at tf=2.
Modification:
- ClusterTestUtil.shutdownAll: raise the outer await base from 30s to 60s
(the binding, timefactor-dilated deadline), so a tf=2 lane gets ~120s --
the same headroom the tf=4 nightly already passes with. Document why this
await, not the inner phase, governs pass/fail.
- MixedProtocolClusterSpec baseConfig: raise the (non-dilated, non-binding)
actor-system-terminate phase timeout 30s -> 60s to suppress the spurious
WARN on the slow path and align it with the new await base.
Result:
aeron-udp cluster systems get enough wall-clock to terminate cleanly on
lower-timefactor virtualized lanes without the shutdown-phase abort.
Healthy shutdowns still complete in well under a second, so local and
normal CI runs are unaffected. Test-only change; no production behaviour
or binary-compatibility impact.
Tests:
- sbt "cluster/Test/compile" - success (cluster test-classes compiled)
- scalafmt 3.10.7 on both changed files - no reformatting needed
- git diff --check - clean
- aeron-udp shutdown timing is timefactor/environment dependent and does
not reproduce on local runs (shutdown completes <1s); change is a
timeout widening verified by compile + format.
References:
nightly-builds.yml MixedProtocolClusterSpec (udp) shutdown timeout;
follow-up to #2997
---
.../test/scala/org/apache/pekko/cluster/ClusterTestKit.scala | 11 +++++++++--
.../org/apache/pekko/cluster/MixedProtocolClusterSpec.scala | 6 +++++-
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git
a/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
b/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
index 1166c866fa..34c8ce7db2 100644
--- a/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
+++ b/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
@@ -111,9 +111,16 @@ trait ClusterTestKit extends TestKitBase {
/** Shuts down all registered [[ActorSystem]]s */
// Shut down joining nodes before the first seed node so cluster leave and
remoting
// termination can complete while the seed is still available.
- // The timeout is dilated by TestKit; keep a larger base for virtualized
JDK 25 nightly runs.
+ //
+ // This outer await is the binding deadline: the `actor-system-terminate`
CoordinatedShutdown
+ // phase only fires `system.finalTerminate()` and recovers if its own
(non-dilated) phase
+ // timeout elapses, so termination keeps draining in the background and
this await on
+ // `whenTerminated` is what actually decides pass/fail. The base is
dilated by `pekko.test.timefactor`
+ // (TestKit), so 60s yields ~120s on a timeFactor=2 lane and ~240s on the
timeFactor=4 JDK 25 nightly.
+ // The aeron-udp transport is the slowest to drain (embedded media driver
+ stacked Aeron liveness
+ // timeouts), so keep this base generous; healthy shutdowns still complete
in well under a second.
def shutdownAll(): Unit =
- actorSystems.reverse.foreach(sys => shutdown(sys, 30.seconds,
verifySystemShutdown = true))
+ actorSystems.reverse.foreach(sys => shutdown(sys, 60.seconds,
verifySystemShutdown = true))
/**
* Force the passed [[ActorSystem]] to quit the cluster and shutdown.
diff --git
a/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
b/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
index cf7d6b8dd6..31da7e0f9f 100644
---
a/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
+++
b/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
@@ -37,7 +37,11 @@ object MixedProtocolClusterSpec {
pekko.remote.accept-protocol-names = ["pekko", "akka"]
pekko.remote.enforce-strict-config-prefix-check-on-join = on
- pekko.coordinated-shutdown.phases.actor-system-terminate.timeout = 30 s
+ # Inner CoordinatedShutdown phase timeout. This is NOT dilated by
pekko.test.timefactor, so it is
+ # intentionally non-binding: on a timeout it only logs a WARN and
recovers while finalTerminate keeps
+ # draining the (slow) aeron-udp streams in the background. The real
deadline is ClusterTestUtil.shutdownAll's
+ # dilated await on whenTerminated. Kept at 60s (aligned with that
await's base) to avoid spurious WARN noise.
+ pekko.coordinated-shutdown.phases.actor-system-terminate.timeout = 60 s
pekko.cluster.downing-provider-class =
"org.apache.pekko.cluster.sbr.SplitBrainResolverProvider"
pekko.cluster.split-brain-resolver.active-strategy = keep-majority
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]