On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:

> Build and use SLEEF library as a backend implementation for Vector API 
> trigonometric functions on macosx-aarch64 platform.
> 
> It improves raw throughput and eliminates GC overhead of non-intrinsified 
> Vector API operation.
> 
> PR includes build changes and libsleef sources relocation from 
> `src/jdk.incubator.vector/linux/native/` to 
> `src/jdk.incubator.vector/share/native/`.
> 
> Once libsleef library is present, existing code in 
> `stubGenerator_aarch64.cpp` successfully links at JVM startup. 
> 
> Testing: hs-tier1 - hs-tier4, microbenchmarks

Microbenchmark results on Apple M1 Pro:

   Benchmark          |            Throughput                 |             
Allocation rate                       |
                      |    Before           After             |      Before     
            After                 |
======================|=======================================|===================================================|
Float128Vector.ACOS   |   3.856 ±0.013   1.941 ± 0.008  us/op |   6076.461 ± 
20.067      0.007 ±0.001      MB/sec |
Float128Vector.ASIN   |   3.813 ±0.014   1.512 ± 0.017  us/op |   6145.040 ± 
22.824      0.007 ±0.001      MB/sec |
Float128Vector.ATAN   |   7.124 ±0.040   2.220 ± 0.003  us/op |   3289.059 ± 
18.539      0.007 ±0.001      MB/sec |
Float128Vector.ATAN2  |  16.983 ±1.031   3.412 ± 0.038  us/op |   2075.808 
±127.179      0.007 ±0.001      MB/sec |
Float128Vector.CBRT   |   6.431 ±0.014   4.075 ± 0.011  us/op |   3643.789 ±  
7.933      0.007 ±0.001      MB/sec |
Float128Vector.COS    |   8.269 ±0.094   5.614 ± 0.026  us/op |   2833.915 ± 
32.041      0.007 ±0.001      MB/sec |
Float128Vector.COSH   |   5.779 ±0.020   3.072 ± 0.010  us/op |   4054.800 ± 
14.028      0.007 ±0.001      MB/sec |
Float128Vector.EXP    |   5.456 ±0.006   0.936 ± 0.004  us/op |   4294.853 ±  
5.025      0.007 ±0.001      MB/sec |
Float128Vector.EXPM1  |   6.888 ±0.059   2.972 ± 0.010  us/op |   3402.363 ± 
28.694      0.007 ±0.001      MB/sec |
Float128Vector.HYPOT  |   6.369 ±0.013   2.213 ± 0.008  us/op |   5519.051 ± 
11.103      0.007 ±0.001      MB/sec |
Float128Vector.LOG    |   8.469 ±0.574   1.729 ± 0.004  us/op |   2775.039 
±157.629      0.007 ±0.001      MB/sec |
Float128Vector.LOG10  |  15.235 ±1.039   1.830 ± 0.006  us/op |   1544.009 
±107.436      0.007 ±0.001      MB/sec |
Float128Vector.LOG1P  |   8.823 ±0.040   1.745 ± 0.014  us/op |   2655.757 ± 
11.964      0.007 ±0.001      MB/sec |
Float128Vector.POW    |  27.511 ±0.918   7.467 ± 0.033  us/op |   1278.693 ± 
42.538      0.007 ±0.001      MB/sec |
Float128Vector.SIN    |   7.846 ±0.063   5.822 ± 0.015  us/op |   2986.480 ± 
24.025      0.007 ±0.001      MB/sec |
Float128Vector.SINH   |   5.747 ±0.033   3.206 ± 0.034  us/op |   4077.645 ± 
23.305      0.007 ±0.001      MB/sec |
Float128Vector.TAN    |  22.337 ±0.533   6.114 ± 0.016  us/op |   1049.469 ± 
24.969      0.007 ±0.001      MB/sec |

Double128Vector.ACOS  |   5.789 ±0.107   4.635 ± 0.013  us/op |   8097.069 
±146.593      0.007 ±0.001      MB/sec |
Double128Vector.ASIN  |   5.655 ±0.011   3.858 ± 0.017  us/op |   8287.521 ± 
16.023      0.007 ±0.001      MB/sec |
Double128Vector.ATAN  |  10.082 ±0.046   6.016 ± 0.016  us/op |   4648.068 ± 
21.401      0.007 ±0.001      MB/sec |
Double128Vector.ATAN2 |  17.286 ±0.113   8.148 ± 0.015  us/op |   4067.019 ± 
26.586      0.007 ±0.001      MB/sec |
Double128Vector.CBRT  |   9.779 ±0.048   8.861 ± 0.045  us/op |   4792.419 ± 
23.381      0.007 ±0.001      MB/sec |
Double128Vector.COS   |   9.071 ±0.107   6.948 ± 0.027  us/op |   5166.999 ± 
59.377      0.007 ±0.001      MB/sec |
Double128Vector.COSH  |   8.234 ±0.030   6.403 ± 0.025  us/op |   5692.144 ± 
20.625      0.007 ±0.001      MB/sec |
Double128Vector.EXP   |   7.506 ±0.012   3.073 ± 0.013  us/op |   6243.783 ± 
10.382      0.007 ±0.001      MB/sec |
Double128Vector.EXPM1 |   9.122 ±0.036   6.122 ± 0.036  us/op |   5137.721 ± 
20.350      0.007 ±0.001      MB/sec |
Double128Vector.HYPOT |  13.445 ±0.248   4.596 ± 0.035  us/op |   5229.977 ± 
96.222      0.007 ±0.001      MB/sec |
Double128Vector.LOG   |  10.396 ±0.042   4.629 ± 0.081  us/op |   4507.928 ± 
18.101      0.007 ±0.001      MB/sec |
Double128Vector.LOG10 |  13.923 ±0.046   4.889 ± 0.021  us/op |   3365.944 ± 
11.078      0.007 ±0.001      MB/sec |
Double128Vector.LOG1P |  12.336 ±0.045   5.010 ± 0.027  us/op |   3799.204 ± 
13.816      0.007 ±0.001      MB/sec |
Double128Vector.POW   |  28.852 ±0.043  15.270 ± 0.081  us/op |   2436.503 ±  
3.647      0.007 ±0.001      MB/sec |
Double128Vector.SIN   |   8.821 ±0.018   6.309 ± 0.037  us/op |   5313.077 ± 
11.056      0.007 ±0.001      MB/sec |
Double128Vector.SINH  |   8.289 ±0.037   6.566 ± 0.029  us/op |   5654.264 ± 
25.538      0.007 ±0.001      MB/sec |
Double128Vector.TAN   |  25.535 ±0.636   9.788 ± 0.036  us/op |   1836.177 ± 
44.430      0.007 ±0.001      MB/sec |

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2762959907

Reply via email to