Hi all,

On CI, some pulsar-metadata tests fail frequently with JVM crashes.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f10670d5113, pid=3902, tid=3941
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.8.1+1 (17.0.8.1+1)
(build 17.0.8.1+1)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (17.0.8.1+1, mixed
mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xad5113]  PhaseIdealLoop::build_loop_late_post_work(Node*,
bool)+0xe3

It appears that this is most likely a JVM bug JDK-8314024 [1] that is fixed
in Java 17.0.10 (Release date Jan 26, 2024) and fixed in Java 21.0.1 which
was released about 2 weeks ago.

One possible workaround would be to run Pulsar on Java 21.0.1 . All tests
pass in the master branch on Java 21, and therefore it is probable that
3.0.x or 3.1.x might be directly compatible with Java 21 at runtime. The
master branch contains support for developing and running tests with Java
21 and that required multiple library updates.

This is the Pulsar issue: https://github.com/apache/pulsar/issues/19307
Sample crash reports:
https://gist.github.com/lhotari/53b72683ad4f339dfbcfd8b9b97062b9

It appears that there was an earlier bug with a similar stack trace in the
crash report. This JVM bug was JDK-8285835 [2] which was fixed in 17.0.7 .
Since the crashes continue, it didn't fix the issue and we now expect that
JDK-8314024 [1] is the fix for the issue that we are facing.

Regards,

-Lari

1 - https://bugs.openjdk.org/browse/JDK-8314024
2 - https://bugs.openjdk.org/browse/JDK-8285835

Reply via email to