On Mon, 2 Sep 2024 14:03:55 GMT, Shaojin Wen <s...@openjdk.org> wrote:

>> Use fast path for ascii characters 1 to 127 to improve the performance of 
>> writing Utf8Entry to BufferWriter.
>
> Shaojin Wen has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 21 additional 
> commits since the last revision:
> 
>  - Update src/java.base/share/classes/java/lang/StringCoding.java
>    
>    Co-authored-by: ExE Boss <3889017+exe-b...@users.noreply.github.com>
>  - vectorized countGreaterThanZero
>  - add comments
>  - optimization for none-ascii latin1
>  - Revert "vectorized countGreaterThanZero"
>    
>    This reverts commit 88a77722c8f5401ac28572509d6a08b3e88e8e40.
>  - vectorized countGreaterThanZero
>  - copyright
>  - use JLA if length < 256
>  - fix utf_len error
>  - code style
>  - ... and 11 more: https://git.openjdk.org/jdk/compare/66682133...2a36b443

src/java.base/share/classes/java/lang/StringCoding.java line 55:

> 53:         int i = off;
> 54:         for (; i < limit; i += 8) {
> 55:             long v = UNSAFE.getLong(ba, i + ARRAY_BYTE_BASE_OFFSET);

Since `value` is a `byte[]`, `UNSAFE.getLong` could get bytes outside the array.

Also, that’s not even considering the fact that the address might not even be 
`long` aligned for `(off % 8) !⁠= 0` or `(off % 8) !⁠= 4`, depending on the 
array header size (see [JDK‑8139457] and [JDK‑8314882]).

[JDK‑8139457]: https://bugs.openjdk.org/browse/JDK-8139457
[JDK‑8314882]: https://bugs.openjdk.org/browse/JDK-8314882

Suggestion:

        for (int end = limit - 7; i < end; i += 8) {
            long v = UNSAFE.getLongUnaligned(ba, i + ARRAY_BYTE_BASE_OFFSET);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20772#discussion_r1740188542

Reply via email to