On 7/27/07, Pablo De Napoli <[EMAIL PROTECTED]> wrote: > > I've tested it but seems to be buggy:: it works up to 156 > > ./part 156 > p(156)= > 73232243759 > > but for 157 gives a floating point exception error > (and a gdb tracing says it is in the line 176 of > the source code > > r2 = r0 % r1; > > in function g
I've attached a slightly modified version of part.c that seems to work (it doesn't crash and gives correct answers in all the cases I tested). I just changed the code in g slighlty so if r1 is 0, then r2 is also set to 0. I compile it using gcc part.c -O3 -g -lmpfr -lm -o part -I/home/was/sage/local/include -L/home/was/sage/local/lib/ -lgmp where /home/was/sage is the path to my SAGE install. Interestingly, when I time part.c it is not much better than CVS pari (see below). Bill Hart wrote: >This, with the other ideas I gave above, will definitely let us beat >Mathematica convincingly, as a simple back of the envelope calculation >shows. It will have the added benefit of allowing us to beat Magma at >something. Magma is already terrible at computing NumberOfPartitions, though not as bad as GAP (which is even worse): [EMAIL PROTECTED]:~$ magma Magma V2.13-5 Fri Jul 27 2007 10:54:07 on ubuntu [Seed = 1720324423] Type ? for help. Type <Ctrl>-D to quit. > time k := NumberOfPartitions(10^5); Time: 3.070 With the latest CVS pari: ? gettime; n=numbpart(10^5); gettime/1000.0 %1 = 0.01200000000000000000000000000 ? gettime; n=numbpart(10^6); gettime/1000.0 %2 = 0.2080000000000000000000000000 ? gettime; n=numbpart(10^7); gettime/1000.0 %3 = 6.973000000000000000000000000 ? gettime; n=numbpart(10^8); gettime/1000.0 With part.c: I use time ./part 100000 > out and record system + user time: 10^6 0.04 seconds 10^7 4.584 seconds 10^8 47.483 seconds So, to clarify or summarize, Bill is there one thing that we could change in part.c that would speed it up? I realized you sort of answered this before, but the sequence of previous emails yesterday was rather long and may have got confused. Thanks, william > The code theres seems indeed to be GPL-ed, it has a copyright notice that > says: > > "(C) 2002 Ralf Stephan ([EMAIL PROTECTED]). > * See part.pdf on the same path for a summary of the math. > * Distributed under GPL (see gnu.org). Version 2002-12." > > The pdf is: > > http://www.ark.in-berlin.de/part.pdf > > Pablo > > On /26/07, Bill Hart <[EMAIL PROTECTED]> wrote: > > > > Alternatively you could just use the implementation here: > > > > http://www.ark.in-berlin.de/part.c > > > > which seems to only rely on mpfr and GMP. However, since the Pari > > version was based on it, I suspect it may also need correction. > > > > He does seem to adjust the precision as he goes, but I've no idea how > > far above or below the required precision this is. However there is no > > need to use a precision of 55 from a certain point on. That probably > > forces it to use two doubles instead of one in the floating point > > computations, which I think would be unnecesary. You can look on > > mathworld for how well the Rademacher series converges. > > > > It would probably be quicker to do a fresh implementation from scratch > > given the application in mind. The code there may not be GPL'd also. > > > > To check it works, one should at least look at the congruences modulo > > 5, 7, 11 and 13. The author of the mpfr version does seem to check > > them mod 11 but for very small n and only for a very few values of n. > > > > If the gcds prove to be a bottleneck, I have some code that is roughly > > twice as fast as GMP for computing these. Certainly the code in the > > mpfr version is suboptimal and needs some serious optimisation when > > the terms in the dedekind sum fit into a single limb, which on a 64 > > bit machine is probably always, when n is of a reasonable size. > > > > Another suggestion is that for sufficiently small values of h and or > > k, it may be quicker to compute the dedekind sum directly rather than > > use the gcd formula. > > > > A lookup table would also be useful for the tail end of the gcd/ > > dedekind sum computation. > > > > I'd be shocked if the computation for n = 10^9 couldn't be done in > > under 10s on sage.math. > > > > Bill. > > > > -- William Stein Associate Professor of Mathematics University of Washington http://www.williamstein.org --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to sage-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/ -~----------~----~----~----~------~----~------~--~---
/* Implementation of the unrestricted partition function using mpfr (see * mpfr.org). (C) 2002 Ralf Stephan ([EMAIL PROTECTED]). * See part.pdf on the same path for a summary of the math. * Distributed under GPL (see gnu.org). Version 2002-12. * * The unrestricted partition function counts the nonnegative integer * solutions to a+2b+3c+...=n. Ref.: sequence A000041 in the OEIS. * * 2002-12: - included psi optimizations by Karim Belabas (orig. in pari). * - use Apostol's gcd-like Dedekind sum algorithm. * - major code cleanup. * * TODO: remove restriction unsigned long to n? */ #include <stdio.h> #include <stdlib.h> #include <gmp.h> #include <mpfr.h> #define USE_COSPI #define USE_DEBUG_CODE #define TEST_CODE #ifdef TEST_CODE #define USE_DEBUG_CODE #endif /* TEST_CODE */ static mpfr_t pi; // maybe you have computed this already... static int psi (mpfr_t res, unsigned long q, mpfr_t a, mpfr_t d, int prec, mp_rnd_t r) { mpfr_t ea, invea, cha, sha; mpfr_init2 (ea, prec); mpfr_exp (ea, a, r); mpfr_init2 (invea, prec); mpfr_ui_div (invea, 1, ea, r); mpfr_init2 (cha, prec); mpfr_add (cha, ea, invea, r); mpfr_div_2ui (cha, cha, 1, r); mpfr_init2 (sha, prec); mpfr_sub (sha, ea, invea, r); mpfr_div_2ui (sha, sha, 1, r); mpfr_mul (ea, a, cha, r); mpfr_sub (ea, ea, sha, r); mpfr_sqrt_ui (res, q, r); mpfr_mul (res, res, d, r); mpfr_mul (res, res, ea, r); mpfr_clear (ea); mpfr_clear (invea); mpfr_clear (cha); mpfr_clear (sha); } static int g (mpfr_t res, unsigned long q, unsigned long h, mp_rnd_t r) /* * Compute g(h,q)=q*s(h,q) where s() is the classical Dedekind sum. * * Ref.: Apostol, T.M. in "Modular Functions and Dirichlet Series * in Number Theory", NY, Springer 1997, pp.52 and 61-69. * (according to Wolfram's MathWorld, please someone verify) * * The only mpfr_t var is res. */ { unsigned long k; mpz_t gt, gtt; if (q==1 || q==2) { mpfr_set_ui (res, 0, r); return 0; } if (h==1) { mpfr_set_ui (res, q-1, r); mpfr_mul_ui (res, res, q-2, r); mpfr_div_ui (res, res, 12, r); return 0; } if (h==2 && q>5 && (q%2)) { mpfr_set_ui (res, q-1, r); mpfr_mul_ui (res, res, q-5, r); mpfr_div_ui (res, res, 24, r); return 0; } k = (q%h); if (k==1) { mpfr_set_ui (res, h, r); mpfr_mul_ui (res, res, h, r); mpfr_ui_sub (res, q-1, res, r); mpfr_mul_ui (res, res, q-1, r); mpfr_div_ui (res, res, h, r); mpfr_div_ui (res, res, 12, r); return 0; } if (k==2) { mpfr_set_ui (res, h, r); mpfr_mul_ui (res, res, h, r); mpfr_add_ui (res, res, 1, r); mpfr_div_2ui (res, res, 1, r); mpfr_ui_sub (res, q, res, r); mpfr_mul_ui (res, res, q-2, r); mpfr_div_ui (res, res, h, r); mpfr_div_ui (res, res, 12, r); return 0; } if (k==h-1) { mpz_init_set_ui (gt, h); mpz_mul_ui (gt, gt, h); mpz_sub_ui (gt, gt, 6*h); mpz_add_ui (gt, gt, 2); mpz_mul_ui (gt, gt, q); mpz_init_set_ui (gtt, q); mpz_mul_ui (gtt, gtt, q); mpz_add (gt, gt, gtt); mpz_set_ui (gtt, h); mpz_mul_ui (gtt, gtt, h); mpz_add (gt, gt, gtt); mpz_add_ui (gt, gt, 1); mpfr_set_z (res, gt, r); mpfr_div_ui (res, res, h, r); mpfr_div_ui (res, res, 12, r); return 0; } { int j=0; unsigned long r0, r1, r2; mpq_t qsum, qt, qtt; mpq_init (qsum); mpq_init (qt); mpq_init (qtt); mpz_init (gt); mpz_init (gtt); r0 = q; r1 = h; r2 = r0 % r1; while (r1) { mpz_set_ui (gt, r0); mpz_mul_ui (gt, gt, r0); mpz_set_ui (gtt, r1); mpz_mul_ui (gtt, gtt, r1); mpz_add (gt, gt, gtt); mpz_add_ui (gt, gt, 1); mpq_set_z (qt, gt); mpz_set_ui (gt, r0); mpz_mul_ui (gt, gt, r1); mpq_set_z (qtt, gt); mpq_div (qt, qt, qtt); mpq_canonicalize (qt); if (!(j%2)) mpq_add (qsum, qsum, qt); else mpq_sub (qsum, qsum, qt); mpq_canonicalize (qsum); r0 = r1; r1 = r2; if(r1) r2 = r0 % r1; else r2 = 0; ++j; } mpq_set_ui (qt, 12, 1); mpq_canonicalize (qt); mpq_div (qsum, qsum, qt); mpq_canonicalize (qsum); if (j%2) { mpq_set_ui (qt, 1, 4); mpq_canonicalize (qt); mpq_sub (qsum, qsum, qt); mpq_canonicalize (qsum); } mpfr_set_z (res, mpq_numref (qsum), r); mpfr_div_z (res, res, mpq_denref (qsum), r); mpfr_mul_ui (res, res, q, r); mpq_clear (qsum); mpq_clear (qt); mpq_clear (qtt); } mpz_clear (gt); mpz_clear (gtt); } static unsigned long gcd (unsigned long x, unsigned long y) { unsigned long r0, r1, r2; if (x>y) { r0=x; r1=y; } else { r1=x; r0=y; } while (1) { r2 = r0 % r1; if (r2 == 0) return r1; r0 = r1; r1 = r2; } return 0; } #ifdef USE_COSPI static cospi (mpfr_t res, mpfr_t x, int prec, mp_rnd_t r) // fast cos(pi*x) { mpfr_t t, tt, half, fourth; mpfr_init2 (t, prec); mpfr_init2 (tt, prec); mpfr_init2 (half, prec); mpfr_init2 (fourth, prec); mpfr_set_ui (half, 1, r); mpfr_div_2ui (half, half, 1, r); mpfr_div_2ui (fourth, half, 1, r); mpfr_div_2ui (tt, x, 1, r); mpfr_floor (tt, tt); mpfr_mul_2ui (tt, tt, 1, r); mpfr_sub (t, x, tt, r); if (mpfr_cmp_ui (t, 1) > 0) mpfr_sub_ui (t, t, 2, r); mpfr_abs (tt, t, r); if (mpfr_cmp (tt, half) > 0) { mpfr_ui_sub (t, 1, tt, r); mpfr_abs (tt, t, r); if (mpfr_cmp (tt, fourth) > 0) { if (mpfr_sgn (t) > 0) mpfr_sub (t, half, t, r); else mpfr_add (t, t, half, r); mpfr_mul (t, t, pi, r); mpfr_sin (t, t, r); } else { mpfr_mul (t, t, pi, r); mpfr_cos (t, t, r); } mpfr_neg (res, t, r); } else { mpfr_abs (tt, t, r); if (mpfr_cmp (tt, fourth) > 0) { if (mpfr_sgn (t) > 0) mpfr_sub (t, half, t, r); else mpfr_add (t, t, half, r); mpfr_mul (t, t, pi, r); mpfr_sin (res, t, r); } else { mpfr_mul (t, t, pi, r); mpfr_cos (res, t, r); } } mpfr_clear (half); mpfr_clear (fourth); mpfr_clear (t); mpfr_clear (tt); } #endif /* USE_COSPI */ static int L (mpfr_t res, unsigned long n, unsigned long q, int prec, mp_rnd_t r) { mpfr_t t, tt; unsigned long h; if (q==1) { mpfr_set_ui (res, 1, r); return 0; } mpfr_init2 (res, prec); mpfr_set_ui (res, 0, r); mpfr_init2 (t, prec); mpfr_init2 (tt, prec); for (h=1; h<q; ++h) { if (gcd (q, h) > 1) continue; mpfr_set_ui (t, h<<1, r); mpfr_mul_ui (t, t, n, r); g (tt, q, h, r); mpfr_sub (t, tt, t, r); mpfr_div_ui (t, t, q, r); #ifdef USE_COSPI cospi (t, t, prec, r); /* Faster than mpfr_cos in gmp-4.1 */ #else mpfr_mul (t, t, pi, r); mpfr_cos (t, t, r); #endif mpfr_add (res, res, t, r); } mpfr_clear (t); mpfr_clear (tt); } static unsigned long log2pn (unsigned long n) { mpfr_t t, tt, Pi; mp_rnd_t r = GMP_RNDN; double d; mpfr_init (t); mpfr_init (tt); mpfr_init (Pi); mpfr_const_pi (Pi, r); mpfr_set_ui (t, n, r); mpfr_mul_ui (t, t, 2, r); mpfr_div_ui (t, t, 3, r); mpfr_sqrt (t, t, r); mpfr_mul (t, t, Pi, r); mpfr_exp (t, t, r); mpfr_div_ui (t, t, 4, r); mpfr_div_ui (t, t, n, r); mpfr_set_ui (tt, 3, r); mpfr_sqrt (tt, tt, r); mpfr_div (t, t, tt, r); mpfr_log10 (t, t, r); mpfr_mul_ui (t, t, 10, r); mpfr_div_ui (t, t, 3, r); /* bit more than log2(p(n)) */ d = mpfr_get_d (t, r); mpfr_clear (t); mpfr_clear (tt); mpfr_clear (Pi); return (unsigned long) d; } #ifdef USE_DEBUG_CODE static int debug; #endif /* USE_DEBUG_CODE */ static int partition (mpz_t res, unsigned long n) { mpfr_t t, tt, sum, nearzero; mpfr_t a, b, sqrtb, c, d; mp_rnd_t r = GMP_RNDN; mp_exp_t e; char *cp; unsigned long q, max, po; mpfr_init (nearzero); mpfr_set_str (nearzero, "1e-10", 10, r); po = log2pn (n) + 20; if (po < 55) po = 55; #ifdef USE_DEBUG_CODE if (debug & 2) { printf ("precision = %d bits\n", po); fflush (stdout); } #endif /* USE_DEBUG_CODE */ mpfr_init2 (pi, po); mpfr_const_pi (pi, r); mpfr_init2 (b, po); mpfr_set_ui (b, 1, r); mpfr_div_ui (b, b, 24, r); mpfr_ui_sub (b, n, b, r); mpfr_init2 (sqrtb, po); mpfr_sqrt (sqrtb, b, r); mpfr_init2 (c, po); mpfr_set_ui (c, 2, r); mpfr_div_ui (c, c, 3, r); mpfr_sqrt (c, c, r); mpfr_mul (c, c, pi, r); mpfr_mul (c, c, sqrtb, r); mpfr_init2 (d, po); mpfr_set_ui (d, 8, r); mpfr_sqrt (d, d, r); mpfr_mul (d, d, b, r); mpfr_mul (d, d, sqrtb, r); mpfr_mul (d, d, pi, r); mpfr_ui_div (d, 1, d, r); mpfr_init2 (a, po); mpfr_init2 (t, po); mpfr_init2 (tt,po); mpfr_init2 (sum, po); mpfr_set_ui (sum, 0, r); mpfr_set_ui (t, n, r); mpfr_sqrt (t, t, r); mpfr_mul_ui (t, t, 24, r); mpfr_div_ui (t, t, 100, r); max = mpfr_get_d (t, r) + 5; #ifdef USE_DEBUG_CODE if (debug & 2) printf ("Computing %ld steps...\n", max); #endif /* USE_DEBUG_CODE */ for (q=1; q<=max; ++q) { int prec = po/q+35; if (prec<55) prec = 55; L (t, n, q, prec, r); mpfr_abs (tt, t, r); if (mpfr_cmp (tt, nearzero) <= 0) { mpfr_set_ui (t, 0, r); mpfr_set_ui (tt, 0, r); } else { mpfr_set (a, c, r); mpfr_div_ui (a, a, q, r); psi (tt, q, a, d, prec, r); } #ifdef USE_DEBUG_CODE if (debug & 4) { printf (" "); mpfr_out_str (stdout, 10, 10, t, r); printf (" * "); mpfr_out_str (stdout, 10, 10, tt, r); printf (" = "); } #endif /* USE_DEBUG_CODE */ mpfr_mul (t, t, tt, r); mpfr_add (sum, sum, t, r); #ifdef USE_DEBUG_CODE if (debug & 4) { mpfr_out_str (stdout, 10, 10, t, r); printf ("\n"); printf ("---%ld: ", q); cp = mpfr_get_str ((char*)0, &e, 10, 0, sum, GMP_RNDN); cp[5]=0; printf ("%s...%c%c%c.%c%c%c%c---\n", cp, cp[e-3], cp[e-2], cp[e-1], cp[e], cp[e+1], cp[e+2], cp[e+3]); fflush (stdout); } #endif /* USE_DEBUG_CODE */ } mpfr_round (tt, sum); e = mpfr_get_z_exp (res, tt); if (e<0) mpz_fdiv_q_2exp (res, res, -e); else mpz_mul_2exp (res, res, e); mpfr_clear (t); mpfr_clear (tt); mpfr_clear (sum); mpfr_clear (a); mpfr_clear (b); mpfr_clear (sqrtb); mpfr_clear (c); mpfr_clear (d); mpfr_clear (nearzero); } #ifdef TEST_CODE static int testit () { mpz_t z, zz; long k, m, n; mpz_init (z); mpz_init (zz); printf ("Testing...\n"); printf ("Computing p(n) with n = 11m+6, m>=10000:\n"); for (m=100000; m<100020; ++m) { n = 11*m + 6; printf ("p(%ld) ", n); partition (z, n); if (mpz_divisible_ui_p (z, 11)) printf ("is divisible by 11: OK\n"); else return 1; } mpz_set_ui (zz, 0); n = 100000; printf ("Computing p(%ld) using Euler formula: p(%ld)=\n", n, n); for (k=1; ; ++k) { m = k * (3*k + 1)/2; if (n <= m) break; partition (z, n-m); //printf("%d:",n-m);mpz_out_str (stdout, 10, z);printf("\n"); (k & 1)? mpz_add (zz, zz, z) : mpz_sub (zz, zz, z); printf ("%cp(%ld)", (k & 1)? '+':'-', n-m); fflush (stdout); m = k * (3*k - 1)/2; if (n <= m) break; partition (z, n-m); //printf("%d:",n-m);mpz_out_str (stdout, 10, z);printf("\n"); (k & 1)? mpz_add (zz, zz, z) : mpz_sub (zz, zz, z); printf ("%cp(%ld)", (k & 1)? '+':'-', n-m); fflush (stdout); } printf ("\nComputing p(%ld) using Rademacher formula only.\n", n); partition (z, n); if (mpz_cmp (z, zz)) return 1; printf ("Both are identical.\n"); return 0; } int main (int argc, char *argv[]) { mpz_t z; unsigned long n=0; if (argc>1) n = atol (argv[1]); debug = 0; if (argc>2) debug = atol (argv[2]); if ((debug & 1) && testit()) printf ("Test failed! Please contact the maintainer.\n"); if (!n) return 0; mpz_init (z); partition (z, n); printf ("p(%ld)=\n", n); mpz_out_str (stdout, 10, z); printf ("\n"); } #endif /* TEST_CODE */