Copilot commented on code in PR #4691:
URL: https://github.com/apache/cassandra/pull/4691#discussion_r2998997253


##########
src/java/org/apache/cassandra/utils/JVMStabilityInspector.java:
##########
@@ -124,7 +124,12 @@ public static void inspectThrowable(Throwable t, 
Consumer<Throwable> fn, boolean
 
             logger.error("OutOfMemory error letting the JVM handle the 
error:", t);
 
-            StorageService.instance.removeShutdownHook();
+            // Only remove the shutdown hook for fatal heap OOMs where the JVM 
will exit
+            // and the drain hook could hang due to lack of heap memory. For 
non-fatal OOMs
+            // (e.g. direct buffer, native thread), keep the hook for graceful 
shutdown.
+            // See CASSANDRA-16939.
+            if (isHeapSpaceOom((OutOfMemoryError) t))
+                StorageService.instance.removeShutdownHook();
 
             forceHeapSpaceOomMaybe((OutOfMemoryError) t);
 

Review Comment:
   The shutdown-hook removal is now gated on the *original* OOM message, but 
`forceHeapSpaceOomMaybe(...)` will intentionally exhaust the Java heap for any 
non-heap OOM (e.g. "Direct buffer memory"). That means we can still end up in a 
real heap OOM/VM exit path while keeping the drain shutdown hook installed, 
reintroducing the hang this code is trying to avoid. Consider either (a) also 
removing the hook when you are about to force a heap OOM, or (b) skipping 
`forceHeapSpaceOomMaybe` for the OOM types where we want to preserve graceful 
shutdown, so the behavior is consistent.
   ```suggestion
               {
                   StorageService.instance.removeShutdownHook();
                   forceHeapSpaceOomMaybe((OutOfMemoryError) t);
               }
   ```



##########
test/unit/org/apache/cassandra/utils/JVMStabilityInspectorTest.java:
##########
@@ -239,4 +240,54 @@ public void fileHandleTest()
             JVMStabilityInspector.replaceKiller(originalKiller);
         }
     }
+
+    @Test
+    public void testShutdownHookNotRemovedForNonFatalOom() throws Exception
+    {
+        testShutdownHookRemovalOnOom("Direct buffer memory", false);
+    }
+
+    @Test
+    public void testShutdownHookRemovedForFatalOom() throws Exception
+    {
+        for (String message : JVMStabilityInspector.FORCE_HEAP_OOM_IGNORE_SET)
+            testShutdownHookRemovalOnOom(message, true);
+    }
+
+    private void testShutdownHookRemovalOnOom(String oomMessage, boolean 
expectHookRemoved) throws Exception
+    {
+        Thread testHook = new Thread(() -> {}, "TestDrainHook"); // 
checkstyle: permit this instantiation
+        Runtime.getRuntime().addShutdownHook(testHook);
+
+        Field drainField = 
StorageService.class.getDeclaredField("drainOnShutdown");
+        drainField.setAccessible(true);
+        Thread originalHook = (Thread) drainField.get(StorageService.instance);
+
+        try
+        {
+            drainField.set(StorageService.instance, testHook);
+
+            Assertions.assertThatThrownBy(() -> 
JVMStabilityInspector.inspectThrowable(new OutOfMemoryError(oomMessage)))
+                      .isInstanceOf(OutOfMemoryError.class);
+
+            if (expectHookRemoved)
+                assertFalse("Shutdown hook should be removed for fatal OOM (" 
+ oomMessage + ')',
+                            Runtime.getRuntime().removeShutdownHook(testHook));
+            else
+                assertTrue("Shutdown hook should not be removed for non-fatal 
OOM (" + oomMessage + ')',
+                           Runtime.getRuntime().removeShutdownHook(testHook));
+        }
+        finally
+        {
+            drainField.set(StorageService.instance, originalHook);

Review Comment:
   This test triggers `StorageService.instance.removeShutdownHook()` for the 
fatal OOM cases, and that method unconditionally calls 
`PathUtils.clearOnExitThreads()` (removing global delete-on-exit shutdown 
hooks). That can interfere with other unit tests that rely on 
`PathUtils.deleteRecursiveOnExit(...)`/`File.deleteOnExit()` cleanup later in 
the same JVM. Consider snapshotting/restoring PathUtils' on-exit hook list via 
reflection in the `finally`, or refactoring the production code to make the 
shutdown-hook removal observable without clearing unrelated JVM shutdown hooks.



##########
src/java/org/apache/cassandra/utils/JVMStabilityInspector.java:
##########
@@ -186,7 +191,18 @@ else if (t instanceof UnrecoverableIllegalStateException)
             inspectThrowable(t.getCause(), fn, isUncaughtException);
     }
 
-    private static final Set<String> FORCE_HEAP_OOM_IGNORE_SET = 
ImmutableSet.of("Java heap space", "GC Overhead limit exceeded");
+    @VisibleForTesting
+    static final Set<String> FORCE_HEAP_OOM_IGNORE_SET = ImmutableSet.of("Java 
heap space", "GC Overhead limit exceeded");
+
+    /**
+     * Returns true if the given OOM is a fatal heap-related error (heap space 
exhaustion or GC overhead),
+     * as opposed to a non-fatal OOM like direct buffer or native thread 
exhaustion.
+     */
+    @VisibleForTesting
+    static boolean isHeapSpaceOom(OutOfMemoryError oom)
+    {
+        return FORCE_HEAP_OOM_IGNORE_SET.contains(oom.getMessage());
+    }

Review Comment:
   `isHeapSpaceOom(...)` (and `forceHeapSpaceOomMaybe`) rely on an exact match 
of `OutOfMemoryError.getMessage()`. This is brittle across JVMs/versions and 
even casing; e.g. HotSpot commonly uses "GC overhead limit exceeded" (lowercase 
'overhead'), while the set currently contains "GC Overhead limit exceeded". 
Consider normalizing (case-insensitive compare, or `contains`/prefix match) and 
guarding against JVM-specific message variations so fatal heap OOMs are 
consistently detected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to