Re: RFR: 8352075: Perf regression accessing fields [v5]

Radim Vansa Mon, 19 May 2025 00:14:45 -0700

On Fri, 16 May 2025 23:51:39 GMT, John R Rose <jr...@openjdk.org> wrote:


>> Radim Vansa has updated the pull request incrementally with five additional 
>> commits since the last revision:
>> 
>>  - Revert change in array.hpp
>>  - Revert changes in VerifyRawIndexesTest
>>  - Improve FieldInfo::print
>>  - Load constant for SORTED_FIELD_TABLE_THRESHOLD
>>  - Replace JumpTable with SortedFieldTable
>
> This change adds smaller "fast-forward" table to accelerate random-access 
> queries within a larger stream.
> 
> I am not against this change, but I think it should be refactored (either now 
> or later) into a library that can be used with other streamy data.  I'm 
> thinking of debug-info, dependency lists, line number tables, reloc-info, or 
> maybe PC descs as future clients of such a thing.
> 
> It could be a query accelerator (index), for any kind of "streamy data" that 
> can get long, and needs occasional random access.
> 
> It is good to switch to binary search for big tables, while preserving the 
> speed of linear search for small ones.  A similar technique can also be seen 
> in `PcDescContainer::find_pc_desc_internal`.  Both binary-search algorithms 
> are complicated and difficult to validate.   If we had a suitable library, we 
> could tune it and test it carefully, and use it to simplify field streams, PC 
> descs, and whatever else we need to work with.
> 
> BTW, it's sometimes useful to add a front-side cache for such searches.  This 
> is a bookmark of the last query point.  This helps with queries which are 
> correlated; you don't re-search from scratch on every step.  This is the 
> function of `last_pc_desc` in the PC desc query; it can accelerate the binary 
> search.
> 
> The fast-forward table here uses an ad hoc var-int scheme which is delicate 
> and possibly buggy.
> 
> One way to do without the specialized table is to inject synchronization 
> markers into the streamy data itself.  Note that the null byte is never used 
> to encode UNSIGNED5 data.  Therefore, if you have a stream of UNSIGNED5 
> tokens, you can add boundaries encoded null bytes.   In the present 
> application, you could perform binary search on the raw stream pointer, with 
> each probe first re-synchronizing to the first null byte, and then decoding 
> with a short sprint of linear access.  The placement of null bytes is a 
> tuning tactic:  More null bytes add footprint but decrease the length of the 
> short sprints.
> 
> I'm not asking for any changes at this point because I have not read the PR 
> carefully enough.  And I don't intend to hold up reviews, but I do want us to 
> put refactoring on our roadmap.

@rose00 While refactoring might be useful, this PR is trying to address a 
performance regression (in fact by improving the scenario significantly), and 
I'd prefer that to land in 25 rather than to delay just for the sake of better 
'reusability'. And TBH I don't consider binary search algorithm too difficult 
to validate.

While a simple cache might be useful, I don't have a good example to validate 
its usefulness. And turning a read-only scenario into mutating scenario can 
turn out badly for multithreaded code, as in 
https://bugs.openjdk.org/browse/JDK-8180450 .

While you suggest 0 byte as synchronization markers, I've tried to synchronize 
the stream in #24713 and experiments have shown that while it improves things, 
it was not sufficient in my case. That's why I've abandoned that approach.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24847#issuecomment-2889870320

Re: RFR: 8352075: Perf regression accessing fields [v5]

Reply via email to