On Thu, 11 Jul 2024 23:17:10 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:
>> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1433: > >> 1431: >> 1432: // Don't check secondary_super_cache >> 1433: if (super_check_offset.is_register() > > Do you see any effects from this particular change? > > It adds a runtime check on the fast path for all subtype checks (irrespective > of whether it checks primary or secondary super). Moreover, the very same > check is performed after primary super slot is checked. > > Unless `_secondary_super_cache` field is removed, unconditionally checking > the slot at `super_check_offset` is benign. BTW `MacroAssembler::check_klass_subtype_fast_path` deserves a cleanup: `super_check_offset` can be safely turned into `Register` thus eliminating the code guarded by `super_check_offset.is_register() == false`. > src/hotspot/share/oops/instanceKlass.cpp line 1410: > >> 1408: return nullptr; >> 1409: } else if (num_extra_slots == 0) { >> 1410: if (num_extra_slots == 0 && interfaces->length() <= 1) { > > Since `secondary_supers` are hashed unconditionally now, is > `interfaces->length() <= 1` check still needed? Also, `num_extra_slots == 0` check is redundant. > src/hotspot/share/oops/klass.cpp line 284: > >> 282: // which doesn't zero out the memory before calling the constructor. >> 283: Klass::Klass(KlassKind kind) : _kind(kind), >> 284: _bitmap(SECONDARY_SUPERS_BITMAP_FULL), > > I like the idea, but what are the benefits of initializing `_bitmap` > separately from `_secondary_supers`? Another observation while browsing the code: `_secondary_supers_bitmap` would be a better name. (Same considerations apply to `_hash_slot`.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674815196 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674798719 PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1674828164