On 7/8/20 12:34 PM, Torbjörn Granlund wrote:
Any number which does not happen to be B-smooth for, say B < 2^30, will show easily measurable performance difference of 5x to 40x IIRC.
Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even though it isn't B-smooth for B < 2^30, the performance difference was only 2x on my machine. I just now tried 2^127 - 1 and saw a similar performance difference, but 2^127 - 3 had a 15x difference so it's a better example.
I installed the attached to try to document this better.
I have a patch which makes the non-GMP code some 2x - 3x faster. It's been maturing for several years now, so I suppose I should really finish it. (It got tangled with code which improves the GMP case by letting it fall into the non-GMP code as numbers get smaller. That sounds simple but is quite messy for various reasons. It is also not clear how much complexity we could defend for this command of limited utility.)
Yes, 'factor' is just a minor utility needed for POSIX compliance. Although it'd be nice to get that 2x-3x improvement whenever you have the time, it's not urgent. Thanks for your guidance on the GMP issue.
>From ba1489d763b66dd1fcec08ecb4cba5917745f6bf Mon Sep 17 00:00:00 2001 From: Paul Eggert <egg...@cs.ucla.edu> Date: Wed, 8 Jul 2020 18:58:18 -0700 Subject: [PATCH] factor: explain why non-GMP code (Bug#42269) * doc/coreutils.texi (factor invocation): * src/factor.c: Explain why the two-word algorithm is useful. --- doc/coreutils.texi | 24 ++++++++++++++---------- src/factor.c | 5 +++++ 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 6ec1e6c31..656b8bc79 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -18368,14 +18368,17 @@ Print the program version on standard output, then exit without further processing. @end table -Factoring the product of the eighth and ninth Mersenne primes -takes about 4 milliseconds of CPU time on an Intel Xeon Silver 4116. +If the number to be factored is small (less than @math{2^{127}} on +typical machines), @command{factor} uses a faster algorithm. +For example, on a circa-2017 Intel Xeon Silver 4116, factoring the +product of the eighth and ninth Mersenne primes (approximately +@math{2^{92}}) takes about 4 ms of CPU time: @example -M8=$(echo 2^31-1|bc) -M9=$(echo 2^61-1|bc) -n=$(echo "$M8 * $M9" | bc) -bash -c "time factor $n" +$ M8=$(echo 2^31-1 | bc) +$ M9=$(echo 2^61-1 | bc) +$ n=$(echo "$M8 * $M9" | bc) +$ bash -c "time factor $n" 4951760154835678088235319297: 2147483647 2305843009213693951 real 0m0.004s @@ -18383,11 +18386,12 @@ user 0m0.004s sys 0m0.000s @end example -Similarly, factoring the eighth Fermat number @math{2^{256}+1} takes -about 14 seconds on the same machine. +For larger numbers, @command{factor} uses a slower algorithm. On the +same platform, factoring the eighth Fermat number @math{2^{256} + 1} +takes about 14 seconds, and the slower algorithm would have taken +about 750 ms to factor @math{2^{127} - 3} instead of the 50 ms needed by +the faster algorithm. -The single-precision code uses an algorithm -designed for factoring smaller numbers. Factoring large numbers is, in general, hard. The Pollard-Brent rho algorithm used by @command{factor} is particularly effective for numbers with relatively small factors. If you wish to factor large diff --git a/src/factor.c b/src/factor.c index c1c35a562..1b1607f16 100644 --- a/src/factor.c +++ b/src/factor.c @@ -53,6 +53,11 @@ trick of multiplying all n-residues by the word base, allowing cheap Hensel reductions mod n. + The GMP code uses an algorithm that can be considerably slower; + for example, on a circa-2017 Intel Xeon Silver 4116, factoring + 2^{127}-3 takes about 50 ms with the two-word algorithm but would + take about 750 ms with the GMP code. + Improvements: * Use modular inverses also for exact division in the Lucas code, and -- 2.17.1