Re: RFR: 8352075: Perf regression accessing fields [v8]

Radim Vansa Tue, 27 May 2025 06:34:11 -0700

On Sun, 25 May 2025 07:18:59 GMT, John R Rose <jr...@openjdk.org> wrote:


>> src/hotspot/share/oops/fieldInfo.cpp line 252:
>> 
>>> 250: int FieldInfoReader::search_table_lookup(const Array<u1> 
>>> *search_table, const Symbol *name, const Symbol *signature, ConstantPool 
>>> *cp, int java_fields) {
>>> 251:   UNSIGNED5::Reader<const u1*, int> r2(_r.array());
>>> 252:   int low = 0, high = java_fields - 1;
>> 
>> This is probably correct, but I recommend a few changes:
>> A. add an assert `low <= high` at the beginning (defend against surprise 
>> `java_fields` number).
>> B. repeat the assert after each new calculation of high or low.
>> C. declare high and low and mid as `uint32_t` to prove overflow (UB) 
>> impossible without the need to reason about integral range
>
> I looked over the code base for other occurrences of binary search, and I see 
> your code is not an outlier.
> 
> The tendency is to use int for high/mid/low, and there are cases where UB 
> would be possible in the absence of extra knowledge about the possible ranges 
> of the ints.  `InstanceKlass::quick_search` is especially scary, until one 
> realizes that method lists are probably on the order of 2^16.  
> `vmSymbols::find_sid` is the same story; the number of symbols is much less 
> than 2^30..
> 
> `GrowableArray::find_sorted` also uses int indexes, but uses a `uint` cast at 
> just the right place when computing mid, to avoid UB, so can handle arrays 
> sized near 2^31.
> 
> With those precedents, I'm not going to pick on this code here.  The 
> recommended changes A/B/C are optional.
> 
> But I am going to think about what it would take to reduce the number of 
> occurences of binary switch, and build a template that can re-implement at 
> least some of them (probably including yours) with bullet-proof, unit-tested 
> code.
> 
> Binary search over 3/4/5-byte packed records is easy, which is what you are 
> doing here.  The use of shortened integers is clever.  I think that can be 
> made a template also, and then used for more than one index, as I've observed 
> before.
> 
> One good way to think about these variable-sized packed records is to treat 
> them as physical `uint64_t` values, with high order zero bytes removed.  If 
> you have a word or so of slop area at the bottom of the packed array, you can 
> load 1..8 bytes in a single (misaligned) memory operation, loading garbage 
> into unused bytes, and then using shift or mask to clear the garbage.  That 
> may be faster than asking C++ to do a bunch of branchy logic and byte 
> assembly on every access.  Loading an N-byte word is simply 
> `w=*(uint64_t*)(p+N-8)>>(N*8)`.  Your packed structs have two fields x,y, but 
> they can be reimagined as bitfields `x=w&((1<<xlen)-1)`, `y=w>>xlen`.  This 
> would save half a byte per entry in your tables, on average, since you can 
> often share a byte between the two fields.  And the decode logic would be 
> simpler than the closure-based stuff you have now.
> 
> These thoughts are just another observation about the design space you are 
> working in, not a request for changes, although I do think they would 
> simplify your code (and make it more efficient).  This sort of careful design 
> makes the most sense when packaged in a separate class or template,  that can 
> be carefully reviewed and unit-tested in isolation.  If we were to refactor 
> some old searchable...

I like the idea of mapping each element in the table as raw bits, though 
handling of access to the end of the array would be a bit inconvenient (or we 
would have to allocate a few extra bytes).

I've changed the algorithm to use unsigned integers; in fact I find a bit 
annoying that most of the indices used throughout the related code are signed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2109021670

Re: RFR: 8352075: Perf regression accessing fields [v8]

Reply via email to