bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

Torbjorn Granlund Tue, 04 Sep 2012 06:29:20 -0700

The very old factoring code cut from an now obsolete version GMP does
not pass proper arguments to the mpz_probab_prime_p function.  It ask
for 3 Miller-Rabin tests only, which is not sufficient.


I am afraid the original poor code was wrritten by me, where this
particular problem was introduced with 3.0's demos/factorize.c.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of "false witnesses" will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that causes coreutils factor to
fail:

465658903
2242724851
6635692801
17709149503
17754345703
20889169003
42743470771
54890944111
72047131003
85862644003
98275842811
114654168091
117225546301
...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
invalidly recognised as primes in that range.

Here is a simple patch:

diff
Description: Binary data

I and Niels Möller have written a suggested replacement for coreutils
factor.c.  It fixes a number of issues with the current code:

(1) Much faster trial division code (> 10x) based on a small table of
    prime inverses.  Still, the new code doesn't perform lots of trial
    dividing.

(2) Pollard rho code using Montgomery representation for numbers < 2^64.
    (We consider extending this to 128 bits.)  Not dependent on GMP.

(3) Lucas prime proving code instead of probablitic Miller-Rabin primes
    testing.

(4) SQUFOF code, which might be included depending on performance
    issues.

(5) Replacement GMP code (#if HAVE_GMP) that also includes Lucas proving
    code.

The new code is faster than the current code:

Old:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time factor >/dev/null
  524.57 user

New:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time ./factor >/dev/null
  0.05 user

For smaller number ranges, the improvements are currently much more
modest, as little as 2x in some cases.

The code should be ready within a few weeks.

-- 
Torbjörn

bug#12350: Composites identified as primes in factor.c (when HAVE_GMP)

Reply via email to