On Tue, 10 Jun 2025 14:30:06 GMT, Johan Sjölen <jsjo...@openjdk.org> wrote:

>> But then you'd need create an array of `Pair` in `create_search_table` and 
>> copy the data into that. You wouldn't need a Supplier at all, just pass the 
>> array and its length to the `fill` method. If you are worried about the 
>> virtual calls, I could do that.
>> 
>> Alternatively, we could replace virtual calls with templated `fill` method. 
>> In that case the Comparator should be a template method as well, since that 
>> one is even more perf-critical?
>
> I think you misunderstand me, as what I'm proposing wouldn't requre creating 
> an array in `create_search_table`.
> 
> I'm saying that you do this:
> 
> ```c++
> // Note: we require the supplier to provide the elements in the final order 
> as we can't easily sort
> // within this method - qsort() accepts only pure function as comparator.
> void PackedTableBuilder::fill(u1* table, size_t table_length, Supplier 
> &supplier) const {
>   Pair<uint32_t, uint32_t> kvs[4];
> 
>   size_t offset = 0;
>   size_t len_read = 0;
>   while (len = supplier.next(kvs, 4)) {
>     // len tells you how many of the full capacity of kvs was used.
>   }
> }
> 
> 
> Now each call of  `Supplier::next` gives you up to 4 elements, quartering the 
> amount of virtual calls necessary.
> 
>>Alternatively, we could replace virtual calls with templated fill method. In 
>>that case the Comparator should be a template method as well, since that one 
>>is even more perf-critical?
> 
> Sure, you can try that out.
> 
> To be clear, I am only asking: Are virtual calls expensive enough here that 
> not having them, or amortizing their cost, something that makes the 
> performance better?

The earlier version of this PR was not abstracting out PackedTable* so there 
were no virtual calls. In both the CCC.java and undisclosed customer reproducer 
I don't see a significant difference in performance - these involve whole JVM 
startup, so the efficiency of code with linear complexity is probably under the 
radar. If we want to optimize the hack out of it - yes, there would be space 
for that, maybe at the cost of maintainability and/or reusability.
TLDR: I don't have a (micro)benchmark that would prove one thing is better than 
the other.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2138174164

Reply via email to