On Thu, 11 Jul 2024 23:17:10 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:

>> This patch expands the use of a hash table for secondary superclasses
>> to the interpreter, C1, and runtime. It also adds a C2 implementation
>> of hashed lookup in cases where the superclass isn't known at compile
>> time.
>> 
>> HotSpot shared runtime
>> ----------------------
>> 
>> Building hashed secondary tables is now unconditional. It takes very
>> little time, and now that the shared runtime always has the tables, it
>> might as well take advantage of them. The shared code is easier to
>> follow now, I think.
>> 
>> There might be a performance issue with x86-64 in that we build
>> HotSpot for a default x86-64 target that does not support popcount.
>> This means that HotSpot C++ runtime on x86 always uses a software
>> emulation for popcount, even though the vast majority of machines made
>> for the past 20 years can do popcount in a single instruction. It
>> wouldn't be terribly hard to do something about that.
>> 
>> Having said that, the software popcount is really not bad.
>> 
>> x86
>> ---
>> 
>> x86 is rather tricky, because we still support
>> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as
>> well as 32- and 64-bit ports. There's some further complication in
>> that only `RCX` can be used as a shift count, so there's some register
>> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp
>> rather gnarly, with multiple levels of conditionals at compile time
>> and runtime.
>> 
>> AArch64
>> -------
>> 
>> AArch64 is considerably more straightforward. We always have a
>> popcount instruction and (thankfully) no 32-bit code to worry about.
>> 
>> Generally
>> ---------
>> 
>> I would dearly love simply to rip out the "old" secondary supers cache
>> support, but I've left it in just in case someone has a performance
>> regression.
>> 
>> The versions of `MacroAssembler::lookup_secondary_supers_table` that
>> work with variable superclasses don't take a fixed set of temp
>> registers, and neither do they call out to to a slow path subroutine.
>> Instead, the slow patch is expanded inline.
>> 
>> I don't think this is necessarily bad. Apart from the very rare cases
>> where C2 can't determine the superclass to search for at compile time,
>> this code is only used for generating stubs, and it seemed to me
>> ridiculous to have stubs calling other stubs.
>> 
>> I've followed the guidance from @iwanowww not to obsess too much about
>> the performance of C1-compiled secondary supers lookups, and to prefer
>> simplicity over absolute performance. Nonetheless, this i...
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433:
> 
>> 1431: 
>> 1432:   // Don't check secondary_super_cache
>> 1433:   if (super_check_offset.is_register()
> 
> Do you see any effects from this particular change?
> 
> It adds a runtime check on the fast path for all subtype checks (irrespective 
> of whether it checks primary or secondary super). Moreover, the very same 
> check is performed after primary super slot is checked.
> 
> Unless `_secondary_super_cache` field is removed, unconditionally checking 
> the slot at `super_check_offset` is benign.

BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: 
`super_check_offset` can be safely turned into `Register` thus eliminating the 
code guarded by `super_check_offset.is_register() == false`.

> src/hotspot/share/oops/instanceKlass.cpp line 1410:
> 
>> 1408:     return nullptr;
>> 1409:   } else if (num_extra_slots == 0) {
>> 1410:     if (num_extra_slots == 0 && interfaces->length() <= 1) {
> 
> Since `secondary_supers` are hashed unconditionally now,  is 
> `interfaces->length() <= 1` check still needed?

Also, `num_extra_slots == 0` check is redundant.

> src/hotspot/share/oops/klass.cpp line 284:
> 
>> 282: // which doesn't zero out the memory before calling the constructor.
>> 283: Klass::Klass(KlassKind kind) : _kind(kind),
>> 284:                                _bitmap(SECONDARY_SUPERS_BITMAP_FULL),
> 
> I like the idea, but what are the benefits of initializing `_bitmap` 
> separately from `_secondary_supers`?

Another observation while browsing the code: `_secondary_supers_bitmap` would 
be a better name. (Same considerations apply to `_hash_slot`.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674815196
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674798719
PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674828164

Reply via email to