Vector API binary op "`FIRST_NONZERO`" represents the vector operation of "`a != 0 ? a : b`", which can be implemented with existing APIs like "`compare + blend`". The current implementation is more complex especially for the floating point type vectors. The main idea is:
1) mask = a.compare(0, ne); 2) b = b.blend(0, mask); 3) result = a | b; And for the floating point types, it needs the vector reinterpretation between the floating point type and the relative integral type, since the final "`OR`" operation is only valid for bitwise integral types. A simpler implementation is: 1) mask = a.compare(0, eq); 2) result = a.blend(b, mask); This could save the final "`OR`" operation and the related reinterpretation between FP and integral types. Here are the performance data of the "`FIRST_NONZERO`" benchmarks (please see the benchmark details for byte vector from [1]) on ARM NEON system: Benchmark (size) Mode Cnt Before After Units ByteMaxVector.FIRST_NONZERO 1024 thrpt 15 12107.422 18385.157 ops/ms ByteMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 9765.282 14739.775 ops/ms DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 1798.545 2331.214 ops/ms DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 1211.838 1810.644 ops/ms FloatMaxVector.FIRST_NONZERO 1024 thrpt 15 3491.924 4377.167 ops/ms FloatMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 2307.085 3606.576 ops/ms IntMaxVector.FIRST_NONZERO 1024 thrpt 15 3602.727 5610.258 ops/ms IntMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 2726.843 4210.741 ops/ms LongMaxVector.FIRST_NONZERO 1024 thrpt 15 1819.886 2974.655 ops/ms LongMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 1337.737 2315.094 ops/ms ShortMaxVector.FIRST_NONZERO 1024 thrpt 15 6603.642 9586.320 ops/ms ShortMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 5222.006 7991.443 ops/ms We can also observe the similar improvement on x86 system. [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ByteMaxVector.java#L266 ------------- Commit messages: - 8291118: [vectorapi] Optimize the implementation of lanewise FIRST_NONZERO Changes: https://git.openjdk.org/jdk/pull/9683/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9683&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291118 Stats: 86 lines in 7 files changed: 9 ins; 38 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/9683.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9683/head:pull/9683 PR: https://git.openjdk.org/jdk/pull/9683