On Mon, 30 Sep 2024 17:48:13 GMT, Roman Kennke <rken...@openjdk.org> wrote:
>> Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is >> slated to become *on by default* and then slated to go away. That means that >> array base offets <= 16 bytes will become the default. The generated code >> will be something like: >> >> >> if (haystack_len <= 8) { >> // Copy 8 bytes onto stack >> } else if (haystack_len <= 16) { >> // Copy 16 bytes onto stack >> } else { >> // Copy 32 bytes onto stack >> } >> >> >> So that is 2 branches in this prologue code instead of originally 1. >> >> However, I just noticed that what I proposed is not enough. Consider what >> happens when haystack_len is 17. This would take the last case and copy 32 >> bytes. But we only have 17+8=25 bytes that we can guarantee to be available >> for copying. If this happens to be the array at the very beginning of the >> heap (very rare/unlikely), this would segfault. >> >> I think I need to mull over it some more to come up with a correct fix. > > I changed the header<16 version to be a small loop: > https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 > > The idea is the same as before, except it's made as a small loop with a > maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, > such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't > run over the end of the array (which would potentially crash). > > I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would > encode better to use one of the regular registers.? > > Also, this new implementation could simply replace the old one, instead of > being an alternative. I am not sure if if would make any difference > performance-wise. @rkennke The small loop looks to me that it will run over the end of the array. Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array: // num_words (zero-based) = (haystack_len - 1) / 8; __ movq(index, haystack_len); __ subq(index, 1); __ shrq(index, LogBytesPerWord); __ bind(L_loop); __ movq(XMM_TMP1, Address(haystack, index, Address::times_8)); __ movq(Address(rsp, index, Address::times_8), XMM_TMP1); __ subq(index, 1); __ jcc(Assembler::positive, L_loop); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1785269849