This is an automated email from the ASF dual-hosted git repository.

He-Pin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pekko.git


The following commit(s) were added to refs/heads/main by this push:
     new 4d17e08c5b test: widen cluster shutdownAll await for aeron-udp drain 
on JDK 25 nightly (#3017)
4d17e08c5b is described below

commit 4d17e08c5b1bbc750ec7c66cf2475211ad076efe
Author: He-Pin(kerr) <[email protected]>
AuthorDate: Sun May 31 21:21:16 2026 +0800

    test: widen cluster shutdownAll await for aeron-udp drain on JDK 25 nightly 
(#3017)
    
    Motivation:
    MixedProtocolClusterSpec "join a cluster with a node using the pekko
    protocol (udp)" still fails on virtualized runs after #2997 reordered
    shutdownAll to stop joining nodes first:
    
      [WARN] CoordinatedShutdown(pekko://MixedProtocolClusterSpec) Coordinated
      shutdown phase [actor-system-terminate] timed out after 30000 milliseconds
      java.lang.RuntimeException: Failed to stop [MixedProtocolClusterSpec]
      within [1 minute]
      ... StreamSupervisor ... remote-6-0-unnamed ActorGraphInterpreter
    
    The "within [1 minute]" outer await is 30s base dilated by
    pekko.test.timefactor=2, i.e. this lane runs at tf=2 (the JDK 25 nightly
    runs at tf=4 -> 120s and passes). The actor-system-terminate phase only
    calls system.finalTerminate() and recovers on its own (non-dilated) phase
    timeout while termination keeps draining in the background
    (CoordinatedShutdown.scala:264-269), so the inner phase WARN is
    non-binding noise -- ClusterTestUtil.shutdownAll's dilated await on
    whenTerminated is the real deadline. The aeron-udp transport is the
    slowest to drain (embedded media driver + stacked Aeron liveness
    timeouts), so 60s was simply too tight at tf=2.
    
    Modification:
    - ClusterTestUtil.shutdownAll: raise the outer await base from 30s to 60s
      (the binding, timefactor-dilated deadline), so a tf=2 lane gets ~120s --
      the same headroom the tf=4 nightly already passes with. Document why this
      await, not the inner phase, governs pass/fail.
    - MixedProtocolClusterSpec baseConfig: raise the (non-dilated, non-binding)
      actor-system-terminate phase timeout 30s -> 60s to suppress the spurious
      WARN on the slow path and align it with the new await base.
    
    Result:
    aeron-udp cluster systems get enough wall-clock to terminate cleanly on
    lower-timefactor virtualized lanes without the shutdown-phase abort.
    Healthy shutdowns still complete in well under a second, so local and
    normal CI runs are unaffected. Test-only change; no production behaviour
    or binary-compatibility impact.
    
    Tests:
    - sbt "cluster/Test/compile" - success (cluster test-classes compiled)
    - scalafmt 3.10.7 on both changed files - no reformatting needed
    - git diff --check - clean
    - aeron-udp shutdown timing is timefactor/environment dependent and does
      not reproduce on local runs (shutdown completes <1s); change is a
      timeout widening verified by compile + format.
    
    References:
    nightly-builds.yml MixedProtocolClusterSpec (udp) shutdown timeout; 
follow-up to #2997
---
 .../test/scala/org/apache/pekko/cluster/ClusterTestKit.scala  | 11 +++++++++--
 .../org/apache/pekko/cluster/MixedProtocolClusterSpec.scala   |  6 +++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git 
a/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala 
b/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
index 1166c866fa..34c8ce7db2 100644
--- a/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
+++ b/cluster/src/test/scala/org/apache/pekko/cluster/ClusterTestKit.scala
@@ -111,9 +111,16 @@ trait ClusterTestKit extends TestKitBase {
     /** Shuts down all registered [[ActorSystem]]s */
     // Shut down joining nodes before the first seed node so cluster leave and 
remoting
     // termination can complete while the seed is still available.
-    // The timeout is dilated by TestKit; keep a larger base for virtualized 
JDK 25 nightly runs.
+    //
+    // This outer await is the binding deadline: the `actor-system-terminate` 
CoordinatedShutdown
+    // phase only fires `system.finalTerminate()` and recovers if its own 
(non-dilated) phase
+    // timeout elapses, so termination keeps draining in the background and 
this await on
+    // `whenTerminated` is what actually decides pass/fail. The base is 
dilated by `pekko.test.timefactor`
+    // (TestKit), so 60s yields ~120s on a timeFactor=2 lane and ~240s on the 
timeFactor=4 JDK 25 nightly.
+    // The aeron-udp transport is the slowest to drain (embedded media driver 
+ stacked Aeron liveness
+    // timeouts), so keep this base generous; healthy shutdowns still complete 
in well under a second.
     def shutdownAll(): Unit =
-      actorSystems.reverse.foreach(sys => shutdown(sys, 30.seconds, 
verifySystemShutdown = true))
+      actorSystems.reverse.foreach(sys => shutdown(sys, 60.seconds, 
verifySystemShutdown = true))
 
     /**
      * Force the passed [[ActorSystem]] to quit the cluster and shutdown.
diff --git 
a/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
 
b/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
index cf7d6b8dd6..31da7e0f9f 100644
--- 
a/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
+++ 
b/cluster/src/test/scala/org/apache/pekko/cluster/MixedProtocolClusterSpec.scala
@@ -37,7 +37,11 @@ object MixedProtocolClusterSpec {
       pekko.remote.accept-protocol-names = ["pekko", "akka"]
       pekko.remote.enforce-strict-config-prefix-check-on-join = on
 
-      pekko.coordinated-shutdown.phases.actor-system-terminate.timeout = 30 s
+      # Inner CoordinatedShutdown phase timeout. This is NOT dilated by 
pekko.test.timefactor, so it is
+      # intentionally non-binding: on a timeout it only logs a WARN and 
recovers while finalTerminate keeps
+      # draining the (slow) aeron-udp streams in the background. The real 
deadline is ClusterTestUtil.shutdownAll's
+      # dilated await on whenTerminated. Kept at 60s (aligned with that 
await's base) to avoid spurious WARN noise.
+      pekko.coordinated-shutdown.phases.actor-system-terminate.timeout = 60 s
 
       pekko.cluster.downing-provider-class = 
"org.apache.pekko.cluster.sbr.SplitBrainResolverProvider"
       pekko.cluster.split-brain-resolver.active-strategy = keep-majority


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to