*one year later* On Mon, Jan 20, 2025 at 7:02 PM Pádraig Brady <[email protected]> wrote: > Ideally we wouldn't have significantly different performance > based solely on chunk size at least.
I've found a nice and fast hash judgement function that works for arbitrary chunk size. So, I picked up the patchset that had previously fallen through cracks and rested there for a while. I followed Daniel Lemire's idea and treated u32|u64 as a fractional part of a number uniformly distributed in [0, 1) range. So, we can use the hash judgement function that's as simple as ($hash <= $threshold) and that's it. We just have to approximate the $threshold as UINT_MAX / chunk_size to get probability right. The code turned out to be fast enough to trigger the issue that's described in Intel Mitigations for Jump Conditional Code Erratum[JCC]. Code layout changes triggered the described Performance Effects and resulted in 35% drop of overall throughput in my benchmarks: 205ms became 280ms to process a 128 MiB file when MITE was used instead of DSB on my Skylake CPU. Luckly, -mbranches-within-32B-boundaries is already implemented in GNU as and clang, so there is no need to do manual layout to get those 35% back. The patch gets into the final shape, it's something like ~1 kLOC + some tests + doc. I hope to finish and submit it for the first round of review in the next few weeks. [JCC] https://www.intel.com/content/www/us/en/content-details/841076/intel-mitigations-for-jump-conditional-code-erratum.html -- WBRBW, Leonid Evdokimov, https://darkk.net.ru tel:+79816800702 PGP: 6691 DE6B 4CCD C1C1 76A0 0D4A E1F2 A980 7F50 FAB2
