On Tue, 15 Oct 2024 10:47:55 GMT, Roman Kennke <rken...@openjdk.org> wrote:
>> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) >> #20603 and #20605, plus the Tiny Class-Pointers parts that have been >> previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All >> changes in this PR are protected by this flag. The purpose of the flag is to >> provide a fallback, in case that users unexpectedly observe problems with >> the new implementation. The intention is that this flag will remain >> experimental and opt-in for at least one release, then make it on-by-default >> and diagnostic (?), and eventually deprecate and obsolete it. However, there >> are a few unknowns in that plan, specifically, we may want to further >> improve compact headers to 4 bytes, we are planning to enhance the Klass* >> encoding to support virtually unlimited number of Klasses, at which point we >> could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In >> order to be able to do this, we are add some changes to GC forwarding (see >> below) to protect the relevant (upper 22) bits of the mark-word. Significant >> parts of this PR deal with loading the compressed Klass* from the mark-word. >> This PR also changes some code paths (mostly in GCs) to be more careful when >> accessing Klass* (or mark-word or size) to be able to fetch it from the >> forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now >> uses a bit to indicate 'self-forwarding'. This is needed to preserve the >> crucial Klass* bits in the header. This also allows to get rid of >> preserved-header machinery in SerialGC and G1 (Parallel GC abuses >> preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We >> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, >> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the >> GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field >> layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, >> it is not possible to read an archive that has been written with an opposite >> setting of UseCompactObjectHeaders. Some build machinery is added so that >> _co... > > Roman Kennke has updated the pull request incrementally with one additional > commit since the last revision: > > Fix aarch64.ad Finished reviewing `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`, line by line and comparing old snippets that got merged into the new function: looks good to me, every (new) case handled Only have some minor comments about comments. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414: > 412: // to the valid haystack bytes on the stack. > 413: { > 414: const Register haystack = rbx; Keep `rax` as index for clarity? Although it is really used as a temp.. const Register index = rax; const Register haystack = rbx; copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm); src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1568: > 1566: assert((COPIED_HAYSTACK_STACK_SIZE == 64), "Must be 64!"); > 1567: > 1568: // Copy incoming haystack onto stack Old comment was slightly more precise. Move here. i.e. `// Copy incoming haystack onto stack (haystack <= 32 bytes)` src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634: > 1632: > 1633: > 1634: // Copy the small (< 32 byte) haystack to the stack. Allows for vector > reads without page fault Just to be pedantic, its `(<=32)` - this function also handles 32bytes case. - line 401: __ cmpq(haystack_len, 0x20); __ ja(L_bigSwitchTop); - though line 293 (`highly_optimized_short_cases`) only seems to route16-byte cases here: ```__ cmpq(haystack_len_p, isU ? 8 : 16);``` src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659: > 1657: Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack; > 1658: > 1659: assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8, If we had to also optimize for header-size 16, it might be possible to remove one jump here. Looks correct for either size. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2370735887 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802041876 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802044880 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802088545 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802073195