On Tue, 15 Oct 2024 10:47:55 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) 
>> #20603 and #20605, plus the Tiny Class-Pointers parts that have been 
>> previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All 
>> changes in this PR are protected by this flag. The purpose of the flag is to 
>> provide a fallback, in case that users unexpectedly observe problems with 
>> the new implementation. The intention is that this flag will remain 
>> experimental and opt-in for at least one release, then make it on-by-default 
>> and diagnostic (?), and eventually deprecate and obsolete it. However, there 
>> are a few unknowns in that plan, specifically, we may want to further 
>> improve compact headers to 4 bytes, we are planning to enhance the Klass* 
>> encoding to support virtually unlimited number of Klasses, at which point we 
>> could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In 
>> order to be able to do this, we are add some changes to GC forwarding (see 
>> below) to protect the relevant (upper 22) bits of the mark-word. Significant 
>> parts of this PR deal with loading the compressed Klass* from the mark-word. 
>> This PR also changes some code paths (mostly in GCs) to be more careful when 
>> accessing Klass* (or mark-word or size) to be able to fetch it from the 
>> forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now 
>> uses a bit to indicate 'self-forwarding'. This is needed to preserve the 
>> crucial Klass* bits in the header. This also allows to get rid of 
>> preserved-header machinery in SerialGC and G1 (Parallel GC abuses 
>> preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We 
>> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, 
>> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the 
>> GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field 
>> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, 
>> it is not possible to read an archive that has been written with an opposite 
>> setting of UseCompactObjectHeaders. Some build machinery is added so that 
>> _co...
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix aarch64.ad

Finished reviewing `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`, 
line by line and comparing old snippets that got merged into the new function: 
looks good to me, every (new) case handled 

Only have some minor comments about comments.

src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414:

> 412:   // to the valid haystack bytes on the stack.
> 413:   {
> 414:     const Register haystack = rbx;

Keep `rax` as index for clarity? Although it is really used as a temp..


const Register index = rax;
const Register haystack = rbx;
copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm);

src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1568:

> 1566:   assert((COPIED_HAYSTACK_STACK_SIZE == 64), "Must be 64!");
> 1567: 
> 1568:   // Copy incoming haystack onto stack

Old comment was slightly more precise. Move here. i.e.
`// Copy incoming haystack onto stack (haystack <= 32 bytes)`

src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634:

> 1632: 
> 1633: 
> 1634: // Copy the small (< 32 byte) haystack to the stack.  Allows for vector 
> reads without page fault

Just to be pedantic, its `(<=32)` - this function also handles 32bytes case.

-  line 401: 

 __ cmpq(haystack_len, 0x20);
 __ ja(L_bigSwitchTop);

- though line 293 (`highly_optimized_short_cases`) only seems to route16-byte 
cases here: 
```__ cmpq(haystack_len_p, isU ? 8 : 16);```

src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659:

> 1657:   Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack;
> 1658: 
> 1659:   assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8,

If we had to also optimize for header-size 16, it might be possible to remove 
one jump here. Looks correct for either size.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2370735887
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802041876
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802044880
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802088545
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802073195

Reply via email to