On Fri, 27 Sep 2024 16:23:15 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>> I believe the code in the patch is good enough as-is, especially if 
>> `UseCompactObjectHeaders` is slated to go away.  The existing `if` will 
>> prevent the < 16 byte header code from being emitted, which is the desired 
>> behavior - i.e., if the header size is >= 16, there will be no code emitted 
>> to the intrinsic for that block.  So there will not be an additional branch 
>> for the code when it is executed.
>> 
>> I'm good with a comment tying `UseCompactObjectHeaders` to the condition.  
>> The comment can be removed when the flag is removed.  "Ship it" :-)
>
> Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is 
> slated to become *on by default* and then slated to go away. That means that 
> array base offets <= 16 bytes will become the default. The generated code 
> will be something like:
> 
> 
> if (haystack_len <= 8) {
>   // Copy 8 bytes onto stack
> } else if (haystack_len <= 16) {
>   // Copy 16 bytes onto stack
> } else {
>   // Copy 32 bytes onto stack
> }
> 
> 
> So that is 2 branches in this prologue code instead of originally 1.
> 
> However, I just noticed that what I proposed is not enough. Consider what 
> happens when haystack_len is 17. This would take the last case and copy 32 
> bytes. But we only have 17+8=25 bytes that we can guarantee to be available 
> for copying. If this happens to be the array at the very beginning of the 
> heap (very rare/unlikely), this would segfault.
> 
> I think I need to mull over it some more to come up with a correct fix.

I changed the header<16 version to be a small loop: 
https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5

The idea is the same as before, except it's made as a small loop with a maximum 
of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 
1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the 
end of the array (which would potentially crash).

I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode 
better to use one of the regular registers.?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1781535745

Reply via email to