[sage-devel] Re: paralizing algorithms

kstueve Sat, 20 Nov 2010 16:49:37 -0800

Because I have worked with prime counting and Sage, my input belongs
in this discussion.


For the record, there are three families of prime counting algorithms-
combinatorial, analytical, and table-based.

The combinatorial family includes Legendre's formula as well as the
Meissel-Lehmer-Lagarias-Miller-Odlyzko algorithm (which has been
further improved by Deleglise, Rivat, Gourdon, and Oliveira e Silva).
The combinatorial family proceeds from a binary tree based on a
recurrence relation for the partial sieve function which counts
numbers <=x with no prime factors less than a given bound.  LMO (1985)
was the first prime counting algorithm that provably used Õ(x^(2/3))
time-it's successors have made log(x) and constant factor improvements
and decreased the communication requirements for parallel algorithms.
LMO has a time complexity of Õ(x^(2/3)) and space complexity of
Õ(x^(1/3)).  Both Sage and Mathematica use the combinatorial family,
although Sage does not yet achieve the efficiency of LMO.

The analytical method of Lagarias and Odlyzko (1987) involves a
smoothing kernel and numerical integration of functions related to
log(zeta(s)) "which require O(x^(3/5)) time and O(x^ε) space for any ε
> 0 in one version and O(x^(1/2 + ε)) time and O(x^(1/4 + ε)) space in
another version".  The analytic method also requires finding the prime
powers in an interval.  Although the analytic method has a superior
asymptotic complexity to the combinatorial method, the constant
implied by Landau's notation is large, and the analytic method has not
yet been used for record sized prime counting computation (although it
wouldn't be surprising for this to be done, as at some point the
analytic method is the most efficient).  In 2004 Galway did his PhD
thesis on the analytic method, where he introduced a new kernel
function and developed two new methods for enumerating primes in
intervals.  Galway provides a modification of the Atkin-Bernstein
sieve that has a space overhead of O(x^(1/3)) and a time overhead of
O(x^(1/3)) (compared to the sieve of Eratosthenes which has a time and
space overhead of O(x^(1/2))).  Oliveira e Silva has graciously
released two implementations of the segmented sieve of Eratosthenes
(used for his record breaking Goldbach conjecture project and
optimized for primes near 10^18) under the GPL, which use novel
optimizations.  The Atkin-Bernstein sieve has an asymptotic
computational complexity that is superior to the sieve of Eratosthenes
(by a logarithmic factor), however the version provided by the
creators is advertised to work only up to 10^15, so requires more
coding effort to get into a programming project.

The table-based method was introduced by Booker at the nth prime
page.  It applies a space-time tradeoff to the sieve of Eratosthenes.
Specifically, a precomputed table (using the sieve of Eratosthenes) of
spaced values of pi(x) is combined with a method of counting primes in
an interval.  If a table of size Õ(x^(1/2)) is used, then only
Õ(x^(1/2)) time is needed to sieve the interval between the nearest
table entry and x.  If more space is available for the table, and
Galway's dissected sieve is used, then a table of size up to
Õ(x^(2/3)) may be used to achieve a time complexity of Õ(x^(1/3)).
Kulsha has compiled the largest tables of the prime counting function
for his study of the fluctuations of the prime counting function.  On
Tue, 18 Nov 2008, Kulsha made a post to the Nodak number theory
archive where he suggested that the commonly provided R based explicit
formula is missing a o(x) term.  Kulsha's prime tables include
previously unpublished tables of Nicely and Oliveira e Silva that were
released to the public only after I expressed interest in them. It is
unfortunate that the table-based method is not more widely known-if it
were computational number theorists could produce dense tables of
pi(x) whenever they sieve large intervals.  The currently available
tables have spacings that are less than sqrt(x).

None of the three prime counting methods is necessarily "better" than
the others.  If you wanted to break the record for the largest value
of pi(x) computed (using months of computational time) you would
probably use the combinatorial method-or a graduate student in complex
analysis and computational number theory could extend Galway's work
and make an efficient implementation of the analytic method for record
sized values of pi(x).  If you wanted to make a compact algorithm to
ship in a CAS distribution (where every MB is being fought for), the
combinatorial method may be appropriate.  If you were studying the
distribution of primes and wanted to be able to query arbitrary values
of pi(x) as fast as possible, the table-based method would be ideal.

Summer 2009 when I implemented a pi(x) function competitive with
Mathematica that uses the table-based method (#7103), some members of
the Sage community told me that it wouldn't be appropriate to include
a table in Sage for a single function that is used by only a fraction
of users.  For some users, downloads are prohibitively expensive, and
the combinatorial method would be preferred to downloading a large
table.  This is certainly true, but for those that wish to study the
distribution of primes, the table-based method is highly superior for
values within the range of sieving.  William has shown with PSAGE that
a single monolithic CAS distribution for all members of the
mathematics community may not be the best solution.  An infrastructure
that would allow users interested in pi(x) to include spaced pi(x)
tables in their download of Sage would be useful.  Another option is
providing a web service that allows querying a very large database of
spaced pi(x) values.  For other types of primes, such as twin primes,
there is no know combinatorial or analytic method, so the table-based
method is the only option to improve beyond enumerating each
individual such prime.

The word "useless" has been thrown around in this discussion thread.
A cynic might have described compiling tables of primes in books in
the time of Gauss as "useless".  It was studying "useless" tables of
primes that Gauss conjectured the prime number theorem when he was
15.  It was following Gauss's legacy of studying the distribution of
the primes that led Riemann to establish much of complex analysis.  If
Gauss had to pay thousands of dollars to have access to the current
state of the art mathematical knowledge (rather than be able to get it
for free at the library), we might not have much of the scientific and
mathematical knowledge we take for granted today.  It is very naive
and shortsighted to discount the benefit of studying theoretical
mathematics and producing faster implementations of pi(x) for the
general public-especially when exponential increases in time
complexity are possible over the pi(x) that is currently in
Mathematica.  There may be a modern day Gauss that could give us great
insight into mathematics if they were just allowed to quickly query
values of pi(x) (a very small example of what is possible with a CAS).

Perhaps part of the reason for the cynicism I have perceived regarding
making the table-based method available with respect to Sage is that
it is seen as intellectually inferior to the more complicated
combinatorial and analytic methods.  On the contrary, that the fastest
way to calculate the prime counting function involves using an
enormous lookup table has profound implications on the primes.  Zagier
has described the primes as a balance between order and chaos.  One of
the best heuristic arguments for the Riemann hypothesis (often called
the greatest problem in mathematics) is Denjoy's probabilistic
interpretation (Edwards 1974 p. 268), which says that if in a certain
sense certain properties of the integers (being multiplicatively even
or odd) behave randomly like a coin flip (by obeying the de Moivre-
Laplace theorem), then the Riemann hypothesis is true (this has been
the core of a number of failed attempts to prove RH).  I see a strong
connection between pi(x) being a random function that can best be
calculated with a table of data and Chaitin's observation that
mathematical truth is random- that a lookup table of random
incompressible data is the best way to store mathematical truth.

Optimizations to the table-based method can go as far as complex
analysis and modern day algorithms-it does not need to be simply a
combination of querying a table and sieving.  In my undergraduate
number theory course taught by William Stein I explored ways to
compress tables of values of pi(x) in my final project.  I started
from the explicit formula (Oliveira e Silva graciously provided an
implementation of the explicit formula).  I suggested that it is
possible to improve the accuracy of the explicit formula not just by
increasing the number of nontrivial zeros used, but also by using a
Monte Carlo method.  By evaluating the explicit formula at many points
in an interval around some x, and also calculating the fluctuations of
the prime counting function in that interval (with local sieving), we
can obtain a more accurate approximation of the prime counting
function than the analytic approximation.  I am curious about what is
possible by using the Odlyzko–Schönhage algorithm.  Also implementing
sparse caching for pi(x) is nontrivial.

One of the reasons making the fastest possible pi(x) available is
important is because of its relationship to the Riemann hypothesis.  A
proof of the Riemann hypothesis would not only provide immense insight
into areas of mathematics as diverse as arithmetic and complex
analysis, it would also provide great insight into the relationship
between order and randomness in mathematics, and possibly even shed
light on the role of randomness in the collapse of the wave function
in quantum mechanics.  If an individual wants to calculate values of
pi(x) as fast as possible, then the table-based method is the best
option.  The world deserves for dense tables of pi(x) to be made
officially available as part of a table-based pi(x) implementation in
a freely available CAS.  Maybe someone using it to study the primes
would be inspired to create a certificate that verifies a value of
pi(x).

Sincerely,
Kevin Stueve

Nth Prime Page
http://primes.utm.edu/nthprime/algorithm.php

Fluctuations of the Prime Counting Function (with tables)
http://www.primefan.ru/stuff/primes/table.html

Writeup for my Summer 2009 project under William Stein
http://wstein.org/projects/stueve.pdf

A bibliography of resources I made during my Summer 2009 project
http://sage.math.washington.edu/home/kstueve/prime_pi_bibliography.html

The Trac Ticket that includes my table-based pi(x) program (with
contributions from Oliveira e Silvia, Ohana, and Leonhardy to name a
few)
http://trac.sagemath.org/sage_trac/ticket/7013

Oliveira e Silva's Segmented Sieve of Eratosthenes
http://www.ieeta.pt/~tos/software/prime_sieve.html

Kulsha proposing a corrected explicit formula
http://listserv.nodak.edu/cgi-bin/wa.exe%20?A2=ind0811&L=NMBRTHRY&P=R401&I=-3

My number theory final project for Math 414 taught by William Stein
(Monte-Carlo Approximation of the Prime Counting Function)
sage.math.washington.edu/edu/2010/414/projects/stueve.pdf

Oliveira e Silva's implementation of the explicit formula
http://trac.sagemath.org/sage_trac/ticket/8135

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: paralizing algorithms

Reply via email to