On Mon, 9 Jun 2025 21:42:03 GMT, Ioi Lam <ik...@openjdk.org> wrote:

>> Radim Vansa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Revert removing FieldInfoReader::next_uint()
>
> src/hotspot/share/utilities/packedTable.cpp line 83:
> 
>> 81:   assert((element & ~((uint64_t) _key_mask | ((uint64_t) _value_mask << 
>> _value_shift))) == 0, "read too much");
>> 82:   return element;
>> 83: }
> 
> Since `_element_bytes` can be smaller than 8, memcpy will not work on big 
> endian.
> 
> Why are you trying to optimize this? This PR already cuts down the iterations 
> from O(n) to O(log(n)). You are already doing a lot at each iteration -- 
> decoding the name_index and signature_index from the unsigned5 stream, 
> looking up the symbols from the constant pool, etc.  So a few bit operations 
> in read_element isn't going to make any substantial difference:
> 
> 
> uint64_t element = 0;
> for (int i = 0; i < _elements_bytes; i++) {
>    element <<= 8;
>    element |= _table[offset++]; // Need to rewrite fill() accordingly.
> }

The idea comes from this comment: 
https://github.com/openjdk/jdk/pull/24847#discussion_r2106110163

> you can load 1..8 bytes in a single (misaligned) memory operation, loading 
> garbage into unused bytes, and then using shift or mask to clear the garbage. 
> That may be faster than asking C++ to do a bunch of branchy logic and byte 
> assembly on every access.

I like that idea, though it appears that the plethora of platforms that JDK 
supports (but one cannot simply test) makes it difficult to express. Let's rely 
on the compiler to get the idea from for cycle and do the right thing on each 
platform, then...

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2137178186

Reply via email to