On Mon, 30 Sep 2024 17:48:13 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>> Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is 
>> slated to become *on by default* and then slated to go away. That means that 
>> array base offets <= 16 bytes will become the default. The generated code 
>> will be something like:
>> 
>> 
>> if (haystack_len <= 8) {
>>   // Copy 8 bytes onto stack
>> } else if (haystack_len <= 16) {
>>   // Copy 16 bytes onto stack
>> } else {
>>   // Copy 32 bytes onto stack
>> }
>> 
>> 
>> So that is 2 branches in this prologue code instead of originally 1.
>> 
>> However, I just noticed that what I proposed is not enough. Consider what 
>> happens when haystack_len is 17. This would take the last case and copy 32 
>> bytes. But we only have 17+8=25 bytes that we can guarantee to be available 
>> for copying. If this happens to be the array at the very beginning of the 
>> heap (very rare/unlikely), this would segfault.
>> 
>> I think I need to mull over it some more to come up with a correct fix.
>
> I changed the header<16 version to be a small loop: 
> https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5
> 
> The idea is the same as before, except it's made as a small loop with a 
> maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, 
> such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't 
> run over the end of the array (which would potentially crash).
> 
> I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would 
> encode better to use one of the regular registers.?
> 
> Also, this new implementation could simply replace the old one, instead of 
> being an alternative. I am not sure if if would make any difference 
> performance-wise.

@rkennke The small loop looks to me that it will run over the end of the array.
Say the haystack_len is 7, the index below would be 0 after the shrq 
instruction, and the  movq(XMM_TMP1, Address(haystack, index, 
Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of 
the array:
       // num_words (zero-based) = (haystack_len - 1) / 8;
      __ movq(index, haystack_len);
      __ subq(index, 1);
      __ shrq(index, LogBytesPerWord);

      __ bind(L_loop);
      __ movq(XMM_TMP1, Address(haystack, index, Address::times_8));
      __ movq(Address(rsp, index, Address::times_8), XMM_TMP1);
      __ subq(index, 1);
      __ jcc(Assembler::positive, L_loop);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1785269849

Reply via email to