On Tue, Jul 9, 2024, at 14:01, Dean Rasheed wrote: > Before considering the other patches to optimise for larger inputs, I > think it's worth optimising the existing mul_var() code as much as > possible. > > One thing I noticed while testing the earlier patches on this thread > was that they were significantly faster if they used unsigned integers > rather than signed integers. I think the reason is that operations > like "x / 10000" and "x % 10000" use fewer CPU instructions (on every > platform, according to godbolt.org) if x is unsigned. > > In addition, this reduces the number of times the digit array needs to > be renormalised, which seems to be the biggest factor. > > Another small optimisation that seems to be just about worthwhile is > to pull the first digit of var1 out of the main loop, so that its > contributions can be set directly in dig[], rather than being added to > it. This allows palloc() to be used to allocate dig[], rather than > palloc0(), and only requires the upper part of dig[] to be initialised > to zeros, rather than all of it.
Nice, really smart! > Together, these seem to give a decent speed-up: > > NBASE digits | HEAD rate | patch rate > --------------+---------------+--------------- > 5 | 5.8797105e+06 | 6.047134e+06 > 12 | 4.140017e+06 | 4.3429845e+06 > 25 | 2.5711072e+06 | 2.7530615e+06 > 50 | 1.0367389e+06 | 1.3370771e+06 > 100 | 367924.1 | 462437.38 > 250 | 77231.32 | 104593.95 > 2500 | 881.48694 | 1197.4739 > 15000 | 25.064987 | 32.78391 > > The largest gains are above around 50 NBASE digits, where the time > spent renormalising dig[] becomes significant. I added some more ndigits test cases: /* * Intel Core i9-14900K */ NBASE digits | HEAD rate | patch rate | relative difference --------------+----------------+----------------+--------------------- 1 | 4.7846890e+07 | 4.7846890e+07 | 0.00% 2 | 4.9504950e+07 | 4.7393365e+07 | -4.27% 3 | 4.0816327e+07 | 4.0983607e+07 | 0.41% 4 | 4.1152263e+07 | 3.9370079e+07 | -4.33% 5 | 2.2573363e+07 | 2.1978022e+07 | -2.64% 6 | 2.1739130e+07 | 1.9646365e+07 | -9.63% 7 | 1.6393443e+07 | 1.6339869e+07 | -0.33% 8 | 1.6863406e+07 | 1.6778523e+07 | -0.50% 9 | 1.5105740e+07 | 1.6420361e+07 | 8.70% 10 | 1.3315579e+07 | 1.5527950e+07 | 16.61% 11 | 1.2360939e+07 | 1.4124294e+07 | 14.27% 12 | 1.1764706e+07 | 1.2836970e+07 | 9.11% 13 | 1.0060362e+07 | 1.1820331e+07 | 17.49% 14 | 9.0909091e+06 | 1.0000000e+07 | 10.00% 15 | 7.6923077e+06 | 8.0000000e+06 | 4.00% 16 | 9.0909091e+06 | 9.4339623e+06 | 3.77% 17 | 7.2992701e+06 | 9.0909091e+06 | 24.55% 18 | 7.0921986e+06 | 7.8125000e+06 | 10.16% 19 | 6.5789474e+06 | 6.6666667e+06 | 1.33% 20 | 6.2500000e+06 | 6.5789474e+06 | 5.26% 21 | 5.8479532e+06 | 6.1728395e+06 | 5.56% 22 | 5.5555556e+06 | 5.9880240e+06 | 7.78% 24 | 5.2631579e+06 | 5.8823529e+06 | 11.76% 25 | 5.2083333e+06 | 5.5555556e+06 | 6.67% 26 | 4.7619048e+06 | 5.2631579e+06 | 10.53% 27 | 4.5045045e+06 | 5.2083333e+06 | 15.63% 28 | 4.4247788e+06 | 4.7619048e+06 | 7.62% 29 | 4.1666667e+06 | 4.5454545e+06 | 9.09% 30 | 4.0000000e+06 | 4.3478261e+06 | 8.70% 31 | 3.4482759e+06 | 4.0000000e+06 | 16.00% 32 | 3.9840637e+06 | 4.2016807e+06 | 5.46% 50 | 2.0964361e+06 | 2.6595745e+06 | 26.86% 100 | 666666.67 | 869565.22 | 30.43% 250 | 142653.35 | 171526.59 | 20.24% 2500 | 1642.04 | 2197.80 | 33.85% 15000 | 41.67 | 52.63 | 26.32% (36 rows) /* * AMD Ryzen 9 7950X3D */ NBASE digits | HEAD rate | patch rate | relative difference --------------+----------------+----------------+--------------------- 1 | 3.6900369e+07 | 3.8022814e+07 | 3.04% 2 | 3.4602076e+07 | 3.5714286e+07 | 3.21% 3 | 2.8011204e+07 | 2.7777778e+07 | -0.83% 4 | 2.7932961e+07 | 2.8328612e+07 | 1.42% 5 | 1.6420361e+07 | 1.7123288e+07 | 4.28% 6 | 1.4705882e+07 | 1.5313936e+07 | 4.13% 7 | 1.3192612e+07 | 1.3888889e+07 | 5.28% 8 | 1.2121212e+07 | 1.2919897e+07 | 6.59% 9 | 1.1235955e+07 | 1.2135922e+07 | 8.01% 10 | 1.0000000e+07 | 1.1312217e+07 | 13.12% 11 | 9.0909091e+06 | 1.0000000e+07 | 10.00% 12 | 8.1967213e+06 | 8.4033613e+06 | 2.52% 13 | 7.2463768e+06 | 7.7519380e+06 | 6.98% 14 | 6.7567568e+06 | 7.1428571e+06 | 5.71% 15 | 5.5555556e+06 | 5.8823529e+06 | 5.88% 16 | 6.3291139e+06 | 5.7803468e+06 | -8.67% 17 | 5.8823529e+06 | 5.9880240e+06 | 1.80% 18 | 5.5555556e+06 | 5.7142857e+06 | 2.86% 19 | 5.2356021e+06 | 5.6179775e+06 | 7.30% 20 | 4.9019608e+06 | 5.1020408e+06 | 4.08% 21 | 4.5454545e+06 | 4.8543689e+06 | 6.80% 22 | 4.1841004e+06 | 4.5871560e+06 | 9.63% 24 | 4.4642857e+06 | 4.4052863e+06 | -1.32% 25 | 4.1666667e+06 | 4.2194093e+06 | 1.27% 26 | 4.0000000e+06 | 3.9525692e+06 | -1.19% 27 | 3.8461538e+06 | 3.8022814e+06 | -1.14% 28 | 3.9062500e+06 | 3.8759690e+06 | -0.78% 29 | 3.7878788e+06 | 3.8022814e+06 | 0.38% 30 | 3.3898305e+06 | 3.7174721e+06 | 9.67% 31 | 2.7472527e+06 | 2.8571429e+06 | 4.00% 32 | 3.0395137e+06 | 3.1446541e+06 | 3.46% 50 | 1.7094017e+06 | 2.0576132e+06 | 20.37% 100 | 518134.72 | 609756.10 | 17.68% 250 | 108577.63 | 136612.02 | 25.82% 2500 | 1264.22 | 1610.31 | 27.38% 15000 | 34.48 | 43.48 | 26.09% (36 rows) /* * Apple M3 Max */ NBASE digits | HEAD rate | patch rate | relative difference --------------+----------------+----------------+--------------------- 1 | 4.9504950e+07 | 4.9504950e+07 | 0.00% 2 | 6.1349693e+07 | 5.9171598e+07 | -3.55% 3 | 5.2631579e+07 | 5.2083333e+07 | -1.04% 4 | 4.5248869e+07 | 4.5248869e+07 | 0.00% 5 | 2.1978022e+07 | 2.2727273e+07 | 3.41% 6 | 1.9920319e+07 | 2.0876827e+07 | 4.80% 7 | 1.7182131e+07 | 1.8018018e+07 | 4.86% 8 | 1.5822785e+07 | 1.6051364e+07 | 1.44% 9 | 1.3368984e+07 | 1.3333333e+07 | -0.27% 10 | 1.1709602e+07 | 1.1627907e+07 | -0.70% 11 | 1.0020040e+07 | 1.0526316e+07 | 5.05% 12 | 9.0909091e+06 | 9.0909091e+06 | 0.00% 13 | 8.2644628e+06 | 8.2644628e+06 | 0.00% 14 | 7.6923077e+06 | 7.6335878e+06 | -0.76% 15 | 7.1428571e+06 | 7.0921986e+06 | -0.71% 16 | 6.6225166e+06 | 6.5789474e+06 | -0.66% 17 | 5.9880240e+06 | 6.2111801e+06 | 3.73% 18 | 5.7803468e+06 | 5.5865922e+06 | -3.35% 19 | 5.2631579e+06 | 5.2356021e+06 | -0.52% 20 | 4.6296296e+06 | 4.8543689e+06 | 4.85% 21 | 4.4444444e+06 | 4.3859649e+06 | -1.32% 22 | 4.2016807e+06 | 4.0485830e+06 | -3.64% 24 | 3.7453184e+06 | 3.5714286e+06 | -4.64% 25 | 3.4843206e+06 | 3.4013605e+06 | -2.38% 26 | 3.2786885e+06 | 3.2786885e+06 | 0.00% 27 | 3.0674847e+06 | 3.1055901e+06 | 1.24% 28 | 2.8818444e+06 | 2.9069767e+06 | 0.87% 29 | 2.7322404e+06 | 2.7700831e+06 | 1.39% 30 | 2.5839793e+06 | 2.6246719e+06 | 1.57% 31 | 2.5062657e+06 | 2.4630542e+06 | -1.72% 32 | 4.5871560e+06 | 4.6082949e+06 | 0.46% 50 | 1.7513135e+06 | 1.9880716e+06 | 13.52% 100 | 714285.71 | 833333.33 | 16.67% 250 | 124223.60 | 149925.04 | 20.69% 2500 | 1440.92 | 1760.56 | 22.18% 15000 | 39.53 | 48.08 | 21.63% (36 rows) Regards, Joel