On Sun, 30 Nov 2025 21:51:55 GMT, Vladimir Yaroslavskiy <[email protected]> 
wrote:

>> Vladimir Yaroslavskiy has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   JDK-8266431: Dual-Pivot Quicksort improvements
>>   
>>   * Added @java.io.Serial
>>   * Added information about the best data types for thresholds of sorting
>>   * Added comments about native implementation based on AVX512 instructions
>
> Hi all,
> 
> Please find my latest changes: I improved mixed insertion sort
> (and therefore the whole sorting), added description of sorting logic
> and created code template to reduce the duplication. Also I
> added a generation script.
> 
> @DougLea Doug Lea
> @jatin-bhateja, @jbhateja Jatin Bhateja
> @tarsa Piotr Tarsa
> 
> Please look at the latest sources.

@VladimirIaroslavski: the organization (code partitioning) of the template 
looks good overall. iiuc the template + generator together are more than twice 
shorter than the original full java code, so the gains are definitely there. 
one small extra suggestion (not well thought out so you can ignore it): maybe 
it would be beneficial to apply such templating to test code too, if that makes 
sense?

i have one suggestion to test out, i.e. speeding up counting in case the 
store-to-load forwarding (or memory disambiguation in general) in the cpu is 
not good enough. the idea is described here: 
https://fastcompression.blogspot.com/2014/09/counting-bytes-fast-little-trick-from.html
 and in short it's about replacing one counter array with multiple counter 
arrays that are modified one after another. the problem only occurs when 
frequently modifying same counters (i.e. same memory locations) back-to-back, 
e.g. if you're making histogram of data which has runs of the same value or run 
of a very few different values (modern cpus perform multiple operations at 
once, so they can try to count multiple values in parallel). in short: it's all 
explained in the article. some extra theory as to what is the mechanism in the 
cpu that's relevant to the counting optimisation is 
https://en.wikipedia.org/wiki/Load-Hit-Store and 
https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forwardin
 g but it's probably enough to just test out the idea without going too deep 
into theory. using multiple counter arrays of course adds some constant 
overhead (for allocation at the beginning and merging counters at the end of 
counting), so it would be beneficial for big enough arrays only.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27411#issuecomment-3593881394

Reply via email to