Re: RFR: 8352075: Perf regression accessing fields [v15]

Radim Vansa Wed, 04 Jun 2025 05:51:04 -0700

On Fri, 30 May 2025 21:14:53 GMT, John R Rose <jr...@openjdk.org> wrote:


>> Radim Vansa has refreshed the contents of this pull request, and previous 
>> commits have been removed. Incremental views are not available.
>
>> I like the idea of mapping each element in the table as raw bits, though 
>> handling of access to the end of the array would be a bit inconvenient (or 
>> we would have to allocate a few extra bytes).
> 
> The code snippet I shared above shows a better way:  You load a full 8 (or 4) 
> bytes where the END (not the START) of the word lines up with the LAST (not 
> FIRST) byte.  Then you will never run past the end of the array!  So, fine, 
> but what about the start of the array?  Well, it's inside an `Array<u1>` 
> object, which has a length header, which is guaranteed to be safe to load 
> (under a cast or bytewise or whatever).  Problem solved.  The only thing to 
> avoid is to load an 8-byte word when the packed word size is 1..5 bytes; then 
> you load a 4-byte word.  You can load both components at once, and then use a 
> configurable shift (from one machine word) to separate them.  This is why I 
> say it saves a half-byte on average.
> 
> These tweaky ideas have three effects:  They probably make the code a little 
> simpler (or at least no worse), they reduce the number of memory operations 
> to query a packed array, and they probably use fewer ALU instructions 
> overall.  They are certainly worth considering for the general-purpose 
> "searchable packed array" I am envisioning; they are optional for this 
> particular bug, viewed in isolation.
> 
>> I've changed the algorithm to use unsigned integers; in fact I find a bit 
>> annoying that most of the indices used throughout the related code are 
>> signed.
> 
> Yes, it annoys me also.  It's playing with fire (or walking the firepit).
> 
>> I've also added a test generating class with a different number of fields, 
>> though running it through the full range of fields (0-65535, though in 
>> practice the upper bound is rather 26k) would be excessive; even now it 
>> takes more than a minute on my machine. Also, I realize that varying the 
>> number of fields does not result in full coverage of possible stream sizes; 
>> per-field records have probably rather uniform lengths.
> 
> Yeah, a gtest on the binary search would cover most of those issues, faster 
> and cleaner.  Then loading many gigantic classfiles will be unnecessary.  
> Just a few classfiles at several scales, probably, and thorough gtest-level 
> unit testing, gets a better result in less time.  As I said above, I'm 
> willing to put off some of the refactoring, given that it should cover other, 
> prior occurrences of binary search (so it's got a larger scope than this bug).
> 
>> @rose00 OK, so I have refactored out the PackedTable that now h...

@rose00 Hi, would you be OK with the current implementation?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2939915008

Re: RFR: 8352075: Perf regression accessing fields [v15]

Reply via email to