On Mon, Jul 29, 2024, at 16:42, Joel Jacobson wrote: > New results with less noise below. > > Pardon the exceeding of 80 chars line width, > but felt important to include commit hash and relative delta. > > > ndigits | rate | change | accum | commit | > summary > ---------------+------------+-----------+-----------+---------+----------------------------------------------------
I've reviewed the benchmark results, and it looks like v3-0001 made some cases a bit slower: (32,32) | 1.786e+06 | -13.27 % | -11.26 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co (32,64) | 1.119e+06 | -16.72 % | -20.45 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co (32,128) | 7.242e+05 | -13.55 % | -9.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co (64,64) | 5.515e+05 | -22.34 % | -24.47 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co (64,128) | 3.204e+05 | -14.83 % | -12.44 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co (128,128) | 1.750e+05 | -16.01 % | -15.24 % | v3-0001 | Extend mul_var_short() to 5 and 6-digit inputs. Co Thanks to v3-0002, they are all still significantly faster when both patches have been applied, but I wonder if it is expected or not, that v3-0001 temporarily made them a bit slower? Same cases with v3-0002 applied: (32,32) | 3.408e+06 | +90.80 % | +69.32 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2 (32,64) | 2.356e+06 | +110.63 % | +67.56 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2 (32,128) | 1.393e+06 | +92.39 % | +74.61 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2 (64,64) | 1.432e+06 | +159.69 % | +96.14 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2 (128,128) | 5.567e+05 | +218.07 % | +169.60 % | v3-0002 | Optimise numeric multiplication using base-NBASE^2 /Joel