bug#42269: Remove non-GMP code from coreutils factor.c

Paul Eggert Wed, 08 Jul 2020 19:09:24 -0700

On 7/8/20 12:34 PM, Torbjörn Granlund wrote:

Any number which does not happen to be B-smooth for, say B < 2^30, will
show easily measurable performance difference of 5x to 40x IIRC.

Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even thoughit isn't B-smooth for B < 2^30, the performance difference was only 2x on mymachine. I just now tried 2^127 - 1 and saw a similar performance difference,but 2^127 - 3 had a 15x difference so it's a better example.


I installed the attached to try to document this better.

I have a patch which makes the non-GMP code some 2x - 3x faster.  It's
been maturing for several years now, so I suppose I should really finish
it.  (It got tangled with code which improves the GMP case by letting it
fall into the non-GMP code as numbers get smaller.  That sounds simple
but is quite messy for various reasons.  It is also not clear how much
complexity we could defend for this command of limited utility.)

Yes, 'factor' is just a minor utility needed for POSIX compliance. Although it'dbe nice to get that 2x-3x improvement whenever you have the time, it's noturgent. Thanks for your guidance on the GMP issue.

>From ba1489d763b66dd1fcec08ecb4cba5917745f6bf Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Wed, 8 Jul 2020 18:58:18 -0700
Subject: [PATCH] factor: explain why non-GMP code (Bug#42269)

* doc/coreutils.texi (factor invocation):
* src/factor.c: Explain why the two-word algorithm is useful.
---
 doc/coreutils.texi | 24 ++++++++++++++----------
 src/factor.c       |  5 +++++
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 6ec1e6c31..656b8bc79 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -18368,14 +18368,17 @@ Print the program version on standard output, then exit without further
 processing.
 @end table
 
-Factoring the product of the eighth and ninth Mersenne primes
-takes about 4 milliseconds of CPU time on an Intel Xeon Silver 4116.
+If the number to be factored is small (less than @math{2^{127}} on
+typical machines), @command{factor} uses a faster algorithm.
+For example, on a circa-2017 Intel Xeon Silver 4116, factoring the
+product of the eighth and ninth Mersenne primes (approximately
+@math{2^{92}}) takes about 4 ms of CPU time:
 
 @example
-M8=$(echo 2^31-1|bc)
-M9=$(echo 2^61-1|bc)
-n=$(echo "$M8 * $M9" | bc)
-bash -c "time factor $n"
+$ M8=$(echo 2^31-1 | bc)
+$ M9=$(echo 2^61-1 | bc)
+$ n=$(echo "$M8 * $M9" | bc)
+$ bash -c "time factor $n"
 4951760154835678088235319297: 2147483647 2305843009213693951
 
 real	0m0.004s
@@ -18383,11 +18386,12 @@ user	0m0.004s
 sys	0m0.000s
 @end example
 
-Similarly, factoring the eighth Fermat number @math{2^{256}+1} takes
-about 14 seconds on the same machine.
+For larger numbers, @command{factor} uses a slower algorithm.  On the
+same platform, factoring the eighth Fermat number @math{2^{256} + 1}
+takes about 14 seconds, and the slower algorithm would have taken
+about 750 ms to factor @math{2^{127} - 3} instead of the 50 ms needed by
+the faster algorithm.
 
-The single-precision code uses an algorithm
-designed for factoring smaller numbers.
 Factoring large numbers is, in general, hard.  The Pollard-Brent rho
 algorithm used by @command{factor} is particularly effective for
 numbers with relatively small factors.  If you wish to factor large
diff --git a/src/factor.c b/src/factor.c
index c1c35a562..1b1607f16 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -53,6 +53,11 @@
     trick of multiplying all n-residues by the word base, allowing cheap Hensel
     reductions mod n.
 
+    The GMP code uses an algorithm that can be considerably slower;
+    for example, on a circa-2017 Intel Xeon Silver 4116, factoring
+    2^{127}-3 takes about 50 ms with the two-word algorithm but would
+    take about 750 ms with the GMP code.
+
   Improvements:
 
     * Use modular inverses also for exact division in the Lucas code, and
-- 
2.17.1

bug#42269: Remove non-GMP code from coreutils factor.c

Reply via email to