Re: RFR: 8352075: Perf regression accessing fields [v8]

John R Rose Sun, 25 May 2025 00:22:42 -0700

On Sat, 24 May 2025 20:44:06 GMT, John R Rose <jr...@openjdk.org> wrote:


>> Radim Vansa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add search table validation
>
> src/hotspot/share/oops/fieldInfo.cpp line 252:
> 
>> 250: int FieldInfoReader::search_table_lookup(const Array<u1> *search_table, 
>> const Symbol *name, const Symbol *signature, ConstantPool *cp, int 
>> java_fields) {
>> 251:   UNSIGNED5::Reader<const u1*, int> r2(_r.array());
>> 252:   int low = 0, high = java_fields - 1;
> 
> This is probably correct, but I recommend a few changes:
> A. add an assert `low <= high` at the beginning (defend against surprise 
> `java_fields` number).
> B. repeat the assert after each new calculation of high or low.
> C. declare high and low and mid as `uint32_t` to prove overflow (UB) 
> impossible without the need to reason about integral range

I looked over the code base for other occurrences of binary search, and I see 
your code is not an outlier.

The tendency is to use int for high/mid/low, and there are cases where UB would 
be possible in the absence of extra knowledge about the possible ranges of the 
ints.  `InstanceKlass::quick_search` is especially scary, until one realizes 
that method lists are probably on the order of 2^16.  `vmSymbols::find_sid` is 
the same story; the number of symbols is much less than 2^30..

`GrowableArray::find_sorted` also uses int indexes, but uses a `uint` cast at 
just the right place when computing mid, to avoid UB, so can handle arrays 
sized near 2^31.

With those precedents, I'm not going to pick on this code here.  The 
recommended changes A/B/C are optional.

But I am going to think about what it would take to reduce the number of 
occurences of binary switch, and build a template that can re-implement at 
least some of them (probably including yours) with bullet-proof, unit-tested 
code.

Binary search over 3/4/5-byte packed records is easy, which is what you are 
doing here.  The use of shortened integers is clever.  I think that can be made 
a template also, and then used for more than one index, as I've observed before.

One good way to think about these variable-sized packed records is to treat 
them as physical `uint64_t` values, with high order zero bytes removed.  If you 
have a word or so of slop area at the bottom of the packed array, you can load 
1..8 bytes in a single (misaligned) memory operation, loading garbage into 
unused bytes, and then using shift or mask to clear the garbage.  That may be 
faster than asking C++ to do a bunch of branchy logic and byte assembly on 
every access.  Loading an N-byte word is simply `w=*(uint64_t*)(p+N-8)>>(N*8)`. 
 Your packed structs have two fields x,y, but they can be reimagined as 
bitfields `x=w&((1<<xlen)-1)`, `y=w>>xlen`.  This would save half a byte per 
entry in your tables, on average, since you can often share a byte between the 
two fields.  And the decode logic would be simpler than the closure-based stuff 
you have now.

These thoughts are just another observation about the design space you are 
working in, not a request for changes, although I do think they would simplify 
your code (and make it more efficient).  This sort of careful design makes the 
most sense when packaged in a separate class or template,  that can be 
carefully reviewed and unit-tested in isolation.  If we were to refactor some 
old searchable data structures onto a more modern footing, I think we would go 
in such directions.

An interesting property of your jump tables is that they can several kinds of 
indexes.  You can sort them according to any key in the field-info.  The 
entries are uniformly index/position (x/y) pairs.  For example, if you sort 
numerically (either x or y), you get a jump table for random-access into the 
field-info, given the field id number.  Same data, different sort order, allows 
acces under different search keys.  Again, this observation is not very 
actionable now, but it suggeste (to me) that we have a viable mechanism here 
for several random access problems in the VM.

Kudos to you for suggesting the idea of truncated ints combined with binary 
search!

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2106110163

Re: RFR: 8352075: Perf regression accessing fields [v8]

Reply via email to