On Sat, Aug 24, 2024, at 14:10, Dean Rasheed wrote: > On Sat, 24 Aug 2024 at 08:26, Joel Jacobson <j...@compiler.org> wrote: >> >> On Sat, Aug 24, 2024, at 01:35, Joel Jacobson wrote: >> > On Sat, Aug 24, 2024, at 00:00, Joel Jacobson wrote: >> >> Since statistical tools that rely on normal distributions can't be used, >> >> let's look at the individual measurements for (var1ndigits=3, >> >> var2ndigits=3) >> >> since that seems to be the biggest slowdown on both CPUs, >> >> and see if our level of surprise is affected. >> > >> > Here is a more traditional benchmark, >> > which seems to also indicate (var1ndigits=3, var2ndigits=3) is a bit >> > slower: >> >> I tested just adding back div_var_int64, and it seems to help. >> > > Thanks for testing. > > There does appear to be quite a lot of variability between platforms > over whether or not div_var_int64() is a win for 3 and 4 digit > divisors. Since this patch is primarily about improving div_var()'s > long division algorithm, it's probably best for it to not touch that, > so I've put div_var_int64() back in for now. We could possibly > investigate whether it can be improved separately. > > Looking at your other test results, they seem to confirm my previous > observation that exact mode is faster than approximate mode for > var2ndigits <= 12 or so, so I've added code to do that. > > I also expanded on the comments for the quotient-correction code a bit.
Nice. LGTM. I've successfully tested the new patch again on both Intel and AMD. I've marked it as Ready for Committer. Regards, Joel