Re: RFR: 8381560: AArch64: Optimize String.equals intrinsic

Ehsan Behrangi Sat, 06 Jun 2026 11:53:52 -0700

On Sat, 6 Jun 2026 09:50:42 GMT, Andrew Haley <[email protected]> wrote:


>> This change improves the AArch64 implementation of String.equals by 
>> introducing SIMD-based fast paths using SVE and NEON.
>> 
>> SVE implementation:
>> - Uses predicated loads and comparisons for short lengths (len < VL)
>> - Uses a full predicated loop for longer inputs
>> - Handles the tail via an overlapped compare at (base + len - VL)
>> 
>> NEON implementation:
>> - Uses an 8-byte pre-read to simplify tail handling and eliminate 4/2/1-byte 
>> scalar branches
>> - Processes 16-byte chunks using LDP pair loads
>> - Uses CMP/CCMP to collapse comparisons into a single branch on mismatch
>> 
>> These changes reduce branch pressure and improve throughput for both short 
>> and long strings.
>> 
>> Correctness:
>> - The implementation preserves existing semantics and matches behavior for 
>> all lengths
>> 
>> Testing:
>> - Updated and extended intrinsic tests to cover boundary conditions and 
>> mismatch positions
>> 
>> Benchmark:
>> Across evaluated macrobenchmarks (DaCapo and Renaissance), most workloads 
>> spend <0.5% of CPU time in String.equals. DaCapo biojava is a notable 
>> exception (~8–9%). In biojava, most String.equals calls are on very short 
>> strings (1–2 bytes), where SVE shows ~1% end-to-end improvement, while NEON 
>> is largely neutral or shows a small regression (~1%).
>> 
>> Measured using JMH on AArch64 (Arm Neoverse V2 CPU). Values are relative (%) 
>> vs baseline. Negative values indicate regressions. Mismatch results are 
>> reported across first(DF), middle(DM), and last(DL) difference positions.
>> 
>> SVE results:
>> 
>> Length | L1_EQ  L1_DF  L1_DM  L1_DL | U16_EQ U16_DF U16_DM U16_DL | Avg 
>> -------+----------------------------+-----------------------------+------
>> 0      | 19.63                      | 20.05                      | 19.84
>> 1      | 16.59  17.81  16.57  18.34 | 16.02   0.71   0.42   1.39 | 10.98
>> 2      | 16.44   1.32   0.30  -0.16 | 15.90  -5.17  -4.55  -1.09 |  2.87
>> 3      | 26.58   1.60   1.43  27.07 | 30.34  -8.86  -7.06  14.08 | 10.65
>> 7      | 41.47  -2.94  -3.37  39.82 | 24.02  -8.82  -6.27  20.48 | 13.05
>> 8      | 19.08  -1.16  -3.50  -0.90 | 22.49  -9.75  17.50  13.13 |  7.11
>> 9      | 20.17  -4.12  -5.17  19.03 |  9.25  -2.24  21.35   3.39 |  7.71
>> 15     | 19.48  -3.83  -4.50  19.01 | 29.26 -10.06  11.76  17.07 |  9.77
>> 16     | 19.04  -3.15  16.41  16.85 | 38.37 -11.12  13.18  27.70 | 14.66
>> 17     |  8.95  -2.40   5.68   6.38 | 16.32  -1.61   7.49  11.44 |  6.53
>> 31     | 28.87  -0.01  19.79  23.37 | 41.43  -7.57  23.85  35.89 | 20.70
>> 32     | 32.58...
>
> src/hotspot/cpu/aarch64/aarch64.ad line 16035:
> 
>> 16033:     iRegP_R3   str2,      // str2 (kill)
>> 16034:     iRegI_R4   cnt,       // int length (kill)
>> 16035:     iRegI_R0   result,    // boolean
> 
> From what I can see here these don't need to be fixed registers.

The fixed scalar operands are inherited from the existing string_equalsL rule. 
Are you suggesting that they should be relaxed for the SVE variant?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31400#discussion_r3368043884

Re: RFR: 8381560: AArch64: Optimize String.equals intrinsic

Reply via email to