[sage-devel] factorization database

Jonathan Bober Fri, 15 Jun 2007 23:50:04 -0700

Here is a amusing, but real, snippet from a running sage session (don't
try this at home):


[EMAIL PROTECTED]:~/math$ sage
----------------------------------------------------------------------
| SAGE Version 2.6, Release Date: 2007-06-02                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
Loading SAGE library. Current Mercurial branch is: bober
sage: rsa200 =
27997833911221327870829467638722601621070446786955428537560009929326128400107609345671052955360856061822351910951365788637105954482006576775098580557613579098734950144178863178946295187237869221823983
sage: time factor(rsa200)
CPU times: user 0.68 s, sys: 0.00 s, total: 0.68 s
Wall time: 0.69
3532461934402770121272604978198464368671197400197625023649303468776121253679423200058547956528088349
 * 
7925869954478333033347085841480059687737975857364219960734330341455767872818152135381409304740185467


So, anyway, I recently was doing something where I thought it would be
nice to have an easily usable database of known factorizations for
interesting numbers, and that it would be especially nice if it was
usable with SAGE. So I searched google for "factorization database" and,
amusingly, got

http://www.sagemath.org:9002/sage_trac/ticket/191

as the third hit. So I wasn't the first with this idea. Anyway, I have
the beginnings of something implemented, but I thought I would ask for
some suggestions before going further.

What I have been implementing/planning thus far is mostly as follows:

-I have a global object factor_table, which is an instance of

sage.databases.factor_table.FactorizationDatabase

which is a class that inherits from sage.databases.db.Database (so uses
ZODB for the underlying implementation), and has the following
properties:

-For all integer n, factor_table[n] is of type
sage.rings.integer.Integer and is such that n % factor_table[n] == 0

-Something like this...
        -(factor_table[n] == -1) ==> (is_pseudoprime(n) == True)
        -(factor_table[n] == 1) <==> (is_prime(n) == True)*
        -(factor_table[n] == n) ==> (Nothing)

[*There will be more regarding this later for anyone who reads all of
this email.]

Some questions:
(1) What should the minimum size be for an entry in the factor table? I
currently have it as 10^40, but I think that this might be too small.
(But I have a relatively fast computer.) Some timings for me:

sage: q = next_prime(10^20)
sage: p = next_prime(10^21)
sage: time factor(p*q)
CPU times: user 0.33 s, sys: 0.02 s, total: 0.35 s
Wall time: 0.36
100000000000000000039 * 1000000000000000000117
sage: q = next_prime(10^25)
sage: p = next_prime(10^24)
sage: time factor(p*q)
CPU times: user 2.40 s, sys: 0.05 s, total: 2.45 s
Wall time: 2.45
1000000000000000000000007 * 10000000000000000000000013
sage: q = next_prime(10^29)
sage: p = next_prime(10^28)
sage: time factor(p*q)
CPU times: user 16.25 s, sys: 0.10 s, total: 16.35 s
Wall time: 16.48
10000000000000000000000000331 * 100000000000000000000000000319


(2) I thought for a while about whether or not factor_table[n] should
always return a prime number (or 1). I decided against this because I
would like to support "partial factorizations", so I can have something
like the following:

sage: factor_table.get_info(A)
[...] has 5 known prime factors, 2 unfactored composite factors, and at
least 9 distinct prime factors.

Any reason why this might be a bad idea?

(3) Instead of just storing one prime factor in factor_table[n], the
full (known) factorization could be stored. This would make things
simpler, but might increase storage space. But then again, it might save
storage space somehow - I'm not sure. The factor table would certainly
have less entries, but they would be bigger entries. (Also, less entries
means less information, but someone is unlikely to find anything useful
in a random entry that just happens to be there because it is a factor
of something "special".) Any thoughts?

(4) I've changed factor() to search the factor table before trying to
factor normally. This makes the factor table easiest to use. This makes
the factor table easiest to use, but, given that the factor table would
probably be an optional package (since it will probably be quite a large
download), it might be better to just have a special function
factor_table.factor(). The following code is what I have put in
factor(), somewhere in the middle:

    if n > factor_table.minimum_entry and factor_table[n] != n:
        p = factor_table[n]
        if p == 1:
            return factorization.Factorization([(n,1)], unit)
        elif p == -1:
            if proof and is_prime(n):
                return factorization.Factorization([(n,1)], unit)
            else:
                return factorization.Factorization([(n,1)], unit)
        else:
            return factor(p,proof=proof) * factor(n/p,proof=proof)

I don't see any problem with having it there, except for a marginal
slowdown in the general case, but it can fail nicely when the
factorization database is not installed by having factor_table[n] == n
always.

For quick partial factorizations, I currently have factor_table.factor()
as factoring only using the factor table, and possibly returning
composite factors.

(5) Regarding (*), above: I would like to be able to have some entries
in the factor table marked as proved primes. The main problem with this
is that there should be a way to verify the table, and verification
should involve proving that all the claimed primes are indeed primes,
but this could take a really long time.

One option, which I don't like, would be to simply prove a number is
prime exactly once, and then be very careful about making changes to the
table.

Another thing that might be a possibility in the future is storing a
"certificate" for any number claimed to be proved prime. The ECPP
algorithm can give a prime certificate which can (I think) be checked in
significantly less time than it takes to find the certificate. Then the
certificate could be checked, which might not take too long (I really am
not sure about this). This would require new functionality in SAGE,
which might be difficult to implement (it looks like PARI does not have
ECPP).

For now, I just ignore this and only check for pseudo-primes, and don't
mark anything as proved prime.

--

OK, I think that that is it for now. Any other thoughts would be good.
It seems that the major work to be done here is to get various factor
tables into usable formats - compared to that, implementation issues
might be easy.

Congratulations to you if you have read this far. This email was very
long.

-Jon


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

[sage-devel] factorization database

Reply via email to