On Sun, 30 Nov 2025 21:51:55 GMT, Vladimir Yaroslavskiy <[email protected]> wrote:
>> Vladimir Yaroslavskiy has updated the pull request incrementally with one >> additional commit since the last revision: >> >> JDK-8266431: Dual-Pivot Quicksort improvements >> >> * Added @java.io.Serial >> * Added information about the best data types for thresholds of sorting >> * Added comments about native implementation based on AVX512 instructions > > Hi all, > > Please find my latest changes: I improved mixed insertion sort > (and therefore the whole sorting), added description of sorting logic > and created code template to reduce the duplication. Also I > added a generation script. > > @DougLea Doug Lea > @jatin-bhateja, @jbhateja Jatin Bhateja > @tarsa Piotr Tarsa > > Please look at the latest sources. @VladimirIaroslavski: the organization (code partitioning) of the template looks good overall. iiuc the template + generator together are more than twice shorter than the original full java code, so the gains are definitely there. one small extra suggestion (not well thought out so you can ignore it): maybe it would be beneficial to apply such templating to test code too, if that makes sense? i have one suggestion to test out, i.e. speeding up counting in case the store-to-load forwarding (or memory disambiguation in general) in the cpu is not good enough. the idea is described here: https://fastcompression.blogspot.com/2014/09/counting-bytes-fast-little-trick-from.html and in short it's about replacing one counter array with multiple counter arrays that are modified one after another. the problem only occurs when frequently modifying same counters (i.e. same memory locations) back-to-back, e.g. if you're making histogram of data which has runs of the same value or run of a very few different values (modern cpus perform multiple operations at once, so they can try to count multiple values in parallel). in short: it's all explained in the article. some extra theory as to what is the mechanism in the cpu that's relevant to the counting optimisation is https://en.wikipedia.org/wiki/Load-Hit-Store and https://en.wikipedia.org/wiki/Memory_disambiguation#Store_to_load_forwardin g but it's probably enough to just test out the idea without going too deep into theory. using multiple counter arrays of course adds some constant overhead (for allocation at the beginning and merging counters at the end of counting), so it would be beneficial for big enough arrays only. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27411#issuecomment-3593881394
