On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> Strings, after construction, are immutable but may be constructed from >> mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the >> arrays during construction that might invalidate internal invariants for the >> correct behavior of operations on the resulting strings. In particular, a >> number of operations have optimizations for operations on pairs of latin1 >> strings and pairs of non-latin1 strings, while operations between latin1 and >> non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to >> indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled >> from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least >> one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or >> another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as >> non-latin1 with the added verification that a non-latin1 character is >> present at the same index. >> If that character is found to be latin1, then the input array has been >> modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an >> existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; >> that scan determines whether the latin1 or the non-latin1 representation is >> returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic >> implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be >> guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional > commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was > modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the > bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Can you include PPC64, please? diff --git a/src/hotspot/cpu/ppc/ppc.ad b/src/hotspot/cpu/ppc/ppc.ad index 89ce51e997e..102701e4969 100644 --- a/src/hotspot/cpu/ppc/ppc.ad +++ b/src/hotspot/cpu/ppc/ppc.ad @@ -12727,16 +12727,8 @@ instruct string_compress(rarg1RegP src, rarg2RegP dst, iRegIsrc len, iRegIdst re ins_cost(300); format %{ "String Compress $src,$dst,$len -> $result \t// KILL $tmp1, $tmp2, $tmp3, $tmp4, $tmp5" %} ins_encode %{ - Label Lskip, Ldone; - __ li($result$$Register, 0); - __ string_compress_16($src$$Register, $dst$$Register, $len$$Register, $tmp1$$Register, - $tmp2$$Register, $tmp3$$Register, $tmp4$$Register, $tmp5$$Register, Ldone); - __ rldicl_($tmp1$$Register, $len$$Register, 0, 64-3); // Remaining characters. - __ beq(CCR0, Lskip); - __ string_compress($src$$Register, $dst$$Register, $tmp1$$Register, $tmp2$$Register, Ldone); - __ bind(Lskip); - __ mr($result$$Register, $len$$Register); - __ bind(Ldone); + __ encode_iso_array($src$$Register, $dst$$Register, $len$$Register, $tmp1$$Register, $tmp2$$Register, + $tmp3$$Register, $tmp4$$Register, $tmp5$$Register, $result$$Register, false); %} ins_pipe(pipe_class_default); %} @offamitkumar: I guess s390 also needs an adaptation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1806526012