On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley <a...@openjdk.org> wrote:

>> A bug in GCC causes shared libraries linked with -ffast-math to disable 
>> denormal arithmetic. This breaks Java's floating-point semantics.
>> 
>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522
>> 
>> One solution is to save and restore the floating-point control word around 
>> System.loadLibrary(). This isn't perfect, because some shared library might 
>> load another shared library at runtime, but it's a lot better than what we 
>> do now. 
>> 
>> However, this fix is not complete. `dlopen()` is called from many places in 
>> the JDK. I guess the best thing to do is find and wrap them all. I'd like to 
>> hear people's opinions.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8295159: DSO created with -ffast-math breaks Java floating-point arithmetic

I agree with David. Unconditionally doing a check on every call seems to be 
overkill, since it's a mostly theoretical problem at this point, and in general 
I think we should be able to assume that foreign code respects the ABI.

There are other things that can go wrong as well, such as foreign code 
installing a signal handler, which can break implicit null checks. Other things 
like the foreign code returning with corrupted register state, which then leads 
to further corruption, is also a possibility. i.e. there seem to be many more 
things that can go wrong if we expect native code to violate the ABI.

Even though the check can be pretty fast, we've seen that people watch the 
performance in this area closely, and care about every nanosecond spent here. 
On my own box, the `panama_blank` benchmark takes just 3.4ns, so the relative 
overhead could be larger depending on the machine, it seems. There was also 
recently a flag added to speed up native calls, namely 
`-XX:+UseSystemMemoryBarrier`. This could further make the relative overhead of 
a check larger.

All in all, I think `-Xcheck:jni` is a better place to test this kind of stuff, 
and encourage people to run tests with `-Xcheck:jni` before deploying to 
production.

But, at the same time, loading libraries is a known problematic situation, and 
there the performance matters far less. I'd say always checking and restoring 
the FPU control state, and perhaps emitting a warning message to spur people on 
to fix the issue in the long term, seems like a good solution to me.

-------------

PR: https://git.openjdk.org/jdk/pull/10661

Reply via email to