*one year later*

On Mon, Jan 20, 2025 at 7:02 PM Pádraig Brady <[email protected]> wrote:
> Ideally we wouldn't have significantly different performance
> based solely on chunk size at least.

I've found a nice and fast hash judgement function that works for
arbitrary chunk size. So, I picked up the patchset that had previously
fallen through cracks and rested there for a while.

I followed Daniel Lemire's idea and treated u32|u64 as a fractional
part of a number uniformly distributed in [0, 1) range. So, we can use
the hash judgement function that's as simple as ($hash <= $threshold)
and that's it. We just have to approximate the $threshold as UINT_MAX
/ chunk_size to get probability right.

The code turned out to be fast enough to trigger the issue that's
described in Intel Mitigations for Jump Conditional Code Erratum[JCC].
Code layout changes triggered the described Performance Effects and
resulted in 35% drop of overall throughput in my benchmarks: 205ms
became 280ms to process a 128 MiB file when MITE was used instead of
DSB on my Skylake CPU.

Luckly, -mbranches-within-32B-boundaries is already implemented in GNU
as and clang, so there is no need to do manual layout to get those 35%
back.

The patch gets into the final shape, it's something like ~1 kLOC +
some tests + doc. I hope to finish and submit it for the first round
of review in the next few weeks.

[JCC] 
https://www.intel.com/content/www/us/en/content-details/841076/intel-mitigations-for-jump-conditional-code-erratum.html

--
WBRBW, Leonid Evdokimov, https://darkk.net.ru tel:+79816800702
PGP: 6691 DE6B 4CCD C1C1 76A0 0D4A E1F2 A980 7F50 FAB2

Reply via email to