https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239894
--- Comment #6 from Greg Lewis <gle...@freebsd.org> --- Hi Konstantin, I think my explanation hasn't been clear enough. So let me try and include a few more links and some diagrams. Here is a diagram for what the Java thread stack looks like from https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os/bsd/os_bsd.cpp#L4262 Low memory addresses +------------------------+ | |\ Java thread created by VM does not have | pthread guard page | - pthread guard, attached Java thread usually | |/ has 1 pthread guard page. P1 +------------------------+ Thread::stack_base() - Thread::stack_size() | |\ | HotSpot Guard Pages | - red, yellow and reserved pages | |/ +------------------------+ JavaThread::stack_reserved_zone_base() | |\ | Normal Stack | - | |/ P2 +------------------------+ Thread::stack_base() When the JVM is creating the HotSpot guard pages, the kernel, based on the security.bsd.stack_guard_page setting will create some extra guarded pages that extend into the normal stack region. This causes the SIGSEGV to have a fault address in the normal stack region. There are two initial problems with this. The first is that the definition of StackOverflowError is an error that is thrown "Thrown when a stack overflow occurs because an application recurses too deeply." (see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/StackOverflowError.html). However, there are other reasons a SIGSEGV could occur in the normal stack region (e.g. a buffer overflow). The JVM uses the guard pages to be able to detect that it is clearly a stack overflow that is causing the SIGSEGV rather than any other possible cause. You can observe this in the JVM source itself. See https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L510 where it checks for the fault address being in the guard zone (first in the reserved + yellow zones, which is tries to handle gracefully, and then in the red zone, which is less graceful). The code on Linux is very similar (see https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L356). I'll note that the continuation of the code provides for some different handling if the fault address doesn't occur within the guard pages. The second is that you'll note in that code that when a stack overflow does occur, the JVM will often unprotect portions of the guard zone it has set up. E.g. at https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L525. This is because a StackOverflowError is something the Java program can catch and ignore, if it so chooses. The reserved pages provide an area the JVM can unprotect to allow a critical code section to complete so that a Java program which catches StackOverflowError and continues execution will not be left in a condition where, for example, it is deadlocked due to the fault occurring during the critical section of changing a lock state. The pages created by the security.bsd.stack_guard_page setting create problems with doing this. We're not in the reserved section for starters, but in the normal stack, so unprotecting it won't help. Also, it was the kernel which protected the pages, the JVM can't unprotect them. This means the critical section can't complete, meaning that data structures may be in an inconsistent state, which may include a deadlock as above. The JEP (https://openjdk.java.net/jeps/270) goes into a lot more detail around this and the motivation for introducing reserved pages. There are some other problems here as well. E.g., the JVM can't predictably determine which pages might have been protected by the kernel, since the sysctl can be changed dynamically but libthr can cache thread stacks. These are less likely but still problematic. Hopefully that has provided some clarification. I'd also like to draw your attention to Kurt's comment that this doesn't just impact the JVM but the interaction with libthr in general. This is something to consider in terms of a proposed fix. I'm also curious about how Linux (and other OSes) went about fixing the Stack Clash vulnerability and whether there is an approach there that might not cause application issues like this. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"