On 7/27/07, Pablo De Napoli <[EMAIL PROTECTED]> wrote:
>
> I've tested it but seems to be buggy:: it works up to 156
>
>  ./part 156
> p(156)=
> 73232243759
>
> but for 157 gives a floating point exception error
> (and a gdb tracing says it is in the line 176 of
> the source code
>
> r2 = r0 % r1;
>
> in function g

I've attached a slightly modified version of part.c that seems to work
(it doesn't
crash and gives correct answers in all the cases I tested).  I just changed the
code in g slighlty so if r1 is 0, then r2 is also set to 0.  I compile
it using

 gcc part.c -O3 -g -lmpfr -lm -o part -I/home/was/sage/local/include
-L/home/was/sage/local/lib/ -lgmp

where /home/was/sage is the path to my SAGE install.

Interestingly, when I time part.c it is not much better than CVS pari
(see below).

Bill Hart wrote:
>This, with the other ideas I gave above, will definitely let us beat
>Mathematica convincingly, as a simple back of the envelope calculation
>shows. It will have the added benefit of allowing us to beat Magma at
>something.

Magma is already terrible at computing NumberOfPartitions, though
not as bad as GAP (which is even worse):

[EMAIL PROTECTED]:~$ magma
Magma V2.13-5     Fri Jul 27 2007 10:54:07 on ubuntu   [Seed = 1720324423]
Type ? for help.  Type <Ctrl>-D to quit.
> time k := NumberOfPartitions(10^5);
Time: 3.070

With the latest CVS pari:
? gettime; n=numbpart(10^5); gettime/1000.0
%1 = 0.01200000000000000000000000000
? gettime; n=numbpart(10^6); gettime/1000.0
%2 = 0.2080000000000000000000000000
? gettime; n=numbpart(10^7); gettime/1000.0
%3 = 6.973000000000000000000000000
? gettime; n=numbpart(10^8); gettime/1000.0


With part.c:

I use time ./part 100000 > out and record system + user time:
10^6    0.04 seconds
10^7    4.584 seconds
10^8   47.483 seconds

So, to clarify or summarize, Bill is there one thing that we could
change in part.c that would
speed it up?  I realized you sort of answered this before, but the
sequence of previous
emails yesterday was rather long and may have got confused.

Thanks,
  william


> The code theres seems indeed to be GPL-ed, it has a  copyright notice that 
> says:
>
> "(C) 2002 Ralf Stephan ([EMAIL PROTECTED]).
>  * See part.pdf on the same path for a summary of the math.
>  * Distributed under GPL (see gnu.org).  Version 2002-12."
>
> The pdf is:
>
> http://www.ark.in-berlin.de/part.pdf
>
> Pablo
>
> On /26/07, Bill Hart <[EMAIL PROTECTED]> wrote:
> >
> > Alternatively you could just use the implementation here:
> >
> > http://www.ark.in-berlin.de/part.c
> >
> > which seems to only rely on mpfr and GMP. However, since the Pari
> > version was based on it, I suspect it may also need correction.
> >
> > He does seem to adjust the precision as he goes, but I've no idea how
> > far above or below the required precision this is. However there is no
> > need to use a precision of 55 from a certain point on. That probably
> > forces it to use two doubles instead of one in the floating point
> > computations, which I think would be unnecesary. You can look on
> > mathworld for how well the Rademacher series converges.
> >
> > It would probably be quicker to do a fresh implementation from scratch
> > given the application in mind. The code there may not be GPL'd also.
> >
> > To check it works, one should at least look at the congruences modulo
> > 5, 7, 11 and 13. The author of the mpfr version does seem to check
> > them mod 11 but for very small n and only for a very few values of n.
> >
> > If the gcds prove to be a bottleneck, I have some code that is roughly
> > twice as fast as GMP for computing these. Certainly the code in the
> > mpfr version is suboptimal and needs some serious optimisation when
> > the terms in the dedekind sum fit into a single limb, which on a 64
> > bit machine is probably always, when n is of a reasonable size.
> >
> > Another suggestion is that for sufficiently small values of h and or
> > k, it may be quicker to compute the dedekind sum directly rather than
> > use the gcd formula.
> >
> > A lookup table would also be useful for the tail end of the gcd/
> > dedekind sum computation.
> >
> > I'd be shocked if the computation for n = 10^9 couldn't be done in
> > under 10s on sage.math.
> >
> > Bill.
>
> >
>


-- 
William Stein
Associate Professor of Mathematics
University of Washington
http://www.williamstein.org

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

/* Implementation of the unrestricted partition function using mpfr (see
 * mpfr.org). (C) 2002 Ralf Stephan ([EMAIL PROTECTED]). 
 * See part.pdf on the same path for a summary of the math.
 * Distributed under GPL (see gnu.org).  Version 2002-12.
 * 
 * The unrestricted partition function counts the nonnegative integer
 * solutions to a+2b+3c+...=n. Ref.: sequence A000041 in the OEIS.
 * 
 * 2002-12: - included psi optimizations by Karim Belabas (orig. in pari).
 *          - use Apostol's gcd-like Dedekind sum algorithm.
 *          - major code cleanup.
 *
 * TODO: remove restriction unsigned long to n?
 */

#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
#include <mpfr.h>

#define USE_COSPI
#define USE_DEBUG_CODE
#define TEST_CODE

#ifdef  TEST_CODE
#define USE_DEBUG_CODE
#endif /* TEST_CODE */

static mpfr_t pi;       // maybe you have computed this already...

static int psi (mpfr_t res,	
		unsigned long q, 
		mpfr_t a,
		mpfr_t d,
		int prec, 
		mp_rnd_t r)
{
	mpfr_t ea, invea, cha, sha;

	mpfr_init2 (ea, prec);
	mpfr_exp (ea, a, r);
	mpfr_init2 (invea, prec);
	mpfr_ui_div (invea, 1, ea, r);
	mpfr_init2 (cha, prec);
	mpfr_add (cha, ea, invea, r);
	mpfr_div_2ui (cha, cha, 1, r);
	mpfr_init2 (sha, prec);
	mpfr_sub (sha, ea, invea, r);
	mpfr_div_2ui (sha, sha, 1, r);
	mpfr_mul (ea, a, cha, r);
	mpfr_sub (ea, ea, sha, r);
	mpfr_sqrt_ui (res, q, r);
	mpfr_mul (res, res, d, r);
	mpfr_mul (res, res, ea, r);
	
	mpfr_clear (ea);
	mpfr_clear (invea);
	mpfr_clear (cha);
	mpfr_clear (sha);
}

static int g (mpfr_t res, 
		unsigned long q,
		unsigned long h,
		mp_rnd_t r)
/*
 * Compute g(h,q)=q*s(h,q) where s() is the classical Dedekind sum.
 * 
 * Ref.: Apostol, T.M. in "Modular Functions and Dirichlet Series 
 *       in Number Theory", NY, Springer 1997, pp.52 and 61-69.
 *       (according to Wolfram's MathWorld, please someone verify)
 *
 * The only mpfr_t var is res.
 */
{
	unsigned long k;
	mpz_t gt, gtt;

	if (q==1 || q==2)
	{
		mpfr_set_ui (res, 0, r);
		return 0;
	}

	if (h==1)
	{
		mpfr_set_ui (res, q-1, r);
		mpfr_mul_ui (res, res, q-2, r);
		mpfr_div_ui (res, res, 12, r);
		return 0;
	}

	if (h==2 && q>5 && (q%2))
	{
		mpfr_set_ui (res, q-1, r);
		mpfr_mul_ui (res, res, q-5, r);
		mpfr_div_ui (res, res, 24, r);
		return 0;
	}

	k = (q%h);
	if (k==1)
	{
		mpfr_set_ui (res, h, r);
		mpfr_mul_ui (res, res, h, r);
		mpfr_ui_sub (res, q-1, res, r);
		mpfr_mul_ui (res, res, q-1, r);
		mpfr_div_ui (res, res, h, r);
		mpfr_div_ui (res, res, 12, r);
		return 0;
	}
	if (k==2)
	{
		mpfr_set_ui (res, h, r);
		mpfr_mul_ui (res, res, h, r);
		mpfr_add_ui (res, res, 1, r);
		mpfr_div_2ui (res, res, 1, r);
		mpfr_ui_sub (res, q, res, r);
		mpfr_mul_ui (res, res, q-2, r);
		mpfr_div_ui (res, res, h, r);
		mpfr_div_ui (res, res, 12, r);
		return 0;
	}
	if (k==h-1)
	{
		mpz_init_set_ui (gt, h);
		mpz_mul_ui (gt, gt, h);
		mpz_sub_ui (gt, gt, 6*h);
		mpz_add_ui (gt, gt, 2);
		mpz_mul_ui (gt, gt, q);
		mpz_init_set_ui (gtt, q);
		mpz_mul_ui (gtt, gtt, q);
		mpz_add (gt, gt, gtt);
		mpz_set_ui (gtt, h);
		mpz_mul_ui (gtt, gtt, h);
		mpz_add (gt, gt, gtt);
		mpz_add_ui (gt, gt, 1);
		mpfr_set_z (res, gt, r);
		mpfr_div_ui (res, res, h, r);
		mpfr_div_ui (res, res, 12, r);
		return 0;
	}

	{
		int j=0;
		unsigned long r0, r1, r2;
		mpq_t qsum, qt, qtt;

		mpq_init (qsum);
		mpq_init (qt);
		mpq_init (qtt);
		mpz_init (gt);
		mpz_init (gtt);
		r0 = q;	r1 = h; r2 = r0 % r1;
		while (r1)
		{
			mpz_set_ui (gt, r0);
			mpz_mul_ui (gt, gt, r0);
			mpz_set_ui (gtt, r1);
			mpz_mul_ui (gtt, gtt, r1);
			mpz_add (gt, gt, gtt);
			mpz_add_ui (gt, gt, 1);
			mpq_set_z (qt, gt);
			mpz_set_ui (gt, r0);
			mpz_mul_ui (gt, gt, r1);
			mpq_set_z (qtt, gt);
			mpq_div (qt, qt, qtt);
			mpq_canonicalize (qt);
			if (!(j%2))
				mpq_add (qsum, qsum, qt);
			else
				mpq_sub (qsum, qsum, qt);
			mpq_canonicalize (qsum);
			r0 = r1;
			r1 = r2;
                        if(r1)
			  r2 = r0 % r1;
                        else
                          r2 = 0;
			++j;
		}
		mpq_set_ui (qt, 12, 1);
		mpq_canonicalize (qt);
		mpq_div (qsum, qsum, qt);
		mpq_canonicalize (qsum);
		if (j%2)
		{
			mpq_set_ui (qt, 1, 4);
			mpq_canonicalize (qt);
			mpq_sub (qsum, qsum, qt);
			mpq_canonicalize (qsum);
		}
		mpfr_set_z (res, mpq_numref (qsum), r);
		mpfr_div_z (res, res, mpq_denref (qsum), r);
		mpfr_mul_ui (res, res, q, r);
		mpq_clear (qsum);
		mpq_clear (qt);
		mpq_clear (qtt);
	}

	mpz_clear (gt);
	mpz_clear (gtt);
}

static unsigned long gcd 
(unsigned long x, unsigned long y)
{
    unsigned long r0, r1, r2;
  
	if (x>y) { r0=x; r1=y; }
    else     { r1=x; r0=y; }
  
	while (1) 
	{
  		r2 = r0 % r1;
  		if (r2 == 0) return r1;
  		r0 = r1; r1 = r2;
  	}
    return 0;
}

#ifdef USE_COSPI

static cospi (mpfr_t res, 
		mpfr_t x, 
		int prec, 
		mp_rnd_t r)  // fast cos(pi*x)
{
	mpfr_t t, tt, half, fourth;
	
	mpfr_init2 (t, prec);
	mpfr_init2 (tt, prec);
	mpfr_init2 (half, prec);
	mpfr_init2 (fourth, prec);

	mpfr_set_ui (half, 1, r);
	mpfr_div_2ui (half, half, 1, r);
	mpfr_div_2ui (fourth, half, 1, r);
	mpfr_div_2ui (tt, x, 1, r);
	mpfr_floor (tt, tt);
	mpfr_mul_2ui (tt, tt, 1, r);
	mpfr_sub (t, x, tt, r);
	if (mpfr_cmp_ui (t, 1) > 0)
		mpfr_sub_ui (t, t, 2, r);
	mpfr_abs (tt, t, r);
	if (mpfr_cmp (tt, half) > 0)
	{
		mpfr_ui_sub (t, 1, tt, r);
		mpfr_abs (tt, t, r);
		if (mpfr_cmp (tt, fourth) > 0)
		{
			if (mpfr_sgn (t) > 0)
				mpfr_sub (t, half, t, r);
			else
				mpfr_add (t, t, half, r);
			mpfr_mul (t, t, pi, r);
			mpfr_sin (t, t, r);
		}
		else
		{
			mpfr_mul (t, t, pi, r);
			mpfr_cos (t, t, r);	
		}
		mpfr_neg (res, t, r);
	}
	else
	{
		mpfr_abs (tt, t, r);
		if (mpfr_cmp (tt, fourth) > 0)
		{
			if (mpfr_sgn (t) > 0)
				mpfr_sub (t, half, t, r);
			else
				mpfr_add (t, t, half, r);
			mpfr_mul (t, t, pi, r);
			mpfr_sin (res, t, r);
		}
		else
		{
			mpfr_mul (t, t, pi, r);
			mpfr_cos (res, t, r);
		}
	}

	mpfr_clear (half);
	mpfr_clear (fourth);
	mpfr_clear (t);
	mpfr_clear (tt);
}

#endif /* USE_COSPI */

static int L (mpfr_t res,
		unsigned long n,
		unsigned long q,
		int prec,
		mp_rnd_t r)
{
	mpfr_t t, tt;
	unsigned long h;
	
	if (q==1)
	{
		mpfr_set_ui (res, 1, r);
		return 0;
	}

	mpfr_init2 (res, prec);
	mpfr_set_ui (res, 0, r);
	mpfr_init2 (t, prec);
	mpfr_init2 (tt, prec);

	for (h=1; h<q; ++h)
	{
	        if (gcd (q, h) > 1)
			continue;
		mpfr_set_ui (t, h<<1, r);
		mpfr_mul_ui (t, t, n, r);
		g (tt, q, h, r);
		mpfr_sub (t, tt, t, r);
		mpfr_div_ui (t, t, q, r);
#ifdef USE_COSPI
		cospi (t, t, prec, r);  /* Faster than mpfr_cos in gmp-4.1 */
#else
		mpfr_mul (t, t, pi, r);
		mpfr_cos (t, t, r);
#endif
		mpfr_add (res, res, t, r);
	}

	mpfr_clear (t);
	mpfr_clear (tt);
}

static unsigned long log2pn (unsigned long n)
{
	mpfr_t t, tt, Pi;
	mp_rnd_t r = GMP_RNDN;
	double d;
	
	mpfr_init (t);
	mpfr_init (tt);
	mpfr_init (Pi);
	mpfr_const_pi (Pi, r);

	mpfr_set_ui (t, n, r);
	mpfr_mul_ui (t, t, 2, r);
	mpfr_div_ui (t, t, 3, r);
	mpfr_sqrt (t, t, r);
	mpfr_mul (t, t, Pi, r);
	mpfr_exp (t, t, r);
	mpfr_div_ui (t, t, 4, r);
	mpfr_div_ui (t, t, n, r);
	mpfr_set_ui (tt, 3, r);
	mpfr_sqrt (tt, tt, r);
	mpfr_div (t, t, tt, r);
	mpfr_log10 (t, t, r);
	mpfr_mul_ui (t, t, 10, r);
	mpfr_div_ui (t, t, 3, r);  /* bit more than log2(p(n)) */
	d = mpfr_get_d (t, r);
	
	mpfr_clear (t);
	mpfr_clear (tt);
	mpfr_clear (Pi);

	return (unsigned long) d;
}

#ifdef USE_DEBUG_CODE
static int debug;
#endif /* USE_DEBUG_CODE */

static int partition 
(mpz_t res, unsigned long n)
{
	mpfr_t t, tt, sum, nearzero;
	mpfr_t a, b, sqrtb, c, d;
	mp_rnd_t r = GMP_RNDN;
	mp_exp_t e;
	char *cp;
	unsigned long q, max, po;

	mpfr_init (nearzero);
	mpfr_set_str (nearzero, "1e-10", 10, r);
	po = log2pn (n) + 20;
	if (po < 55) po = 55;

#ifdef USE_DEBUG_CODE
	if (debug & 2) 
	{ 
		printf ("precision = %d bits\n", po); 
		fflush (stdout); 
	}
#endif /* USE_DEBUG_CODE */
	
	mpfr_init2 (pi, po);
	mpfr_const_pi (pi, r);
	mpfr_init2 (b, po);
	mpfr_set_ui (b, 1, r);
	mpfr_div_ui (b, b, 24, r);
	mpfr_ui_sub (b, n, b, r);
	mpfr_init2 (sqrtb, po);
	mpfr_sqrt (sqrtb, b, r);
	mpfr_init2 (c, po);
	mpfr_set_ui (c, 2, r);
	mpfr_div_ui (c, c, 3, r);
	mpfr_sqrt (c, c, r);
	mpfr_mul (c, c, pi, r);
	mpfr_mul (c, c, sqrtb, r);
	mpfr_init2 (d, po);
	mpfr_set_ui (d, 8, r);
	mpfr_sqrt (d, d, r);
	mpfr_mul (d, d, b, r);
	mpfr_mul (d, d, sqrtb, r);
	mpfr_mul (d, d, pi, r);
	mpfr_ui_div (d, 1, d, r);
	mpfr_init2 (a, po);
	
	mpfr_init2 (t, po);
	mpfr_init2 (tt,po);
	mpfr_init2 (sum, po);
	mpfr_set_ui (sum, 0, r);

	mpfr_set_ui (t, n, r);
	mpfr_sqrt (t, t, r);
	mpfr_mul_ui (t, t, 24, r);
	mpfr_div_ui (t, t, 100, r);
	max = mpfr_get_d (t, r) + 5;
#ifdef USE_DEBUG_CODE
	if (debug & 2)
		printf ("Computing %ld steps...\n", max);
#endif /* USE_DEBUG_CODE */

	for (q=1; q<=max; ++q)
	{
		int prec = po/q+35;
		if (prec<55) prec = 55;

		L (t, n, q, prec, r);
		mpfr_abs (tt, t, r);
		if (mpfr_cmp (tt, nearzero) <= 0)
		{
			mpfr_set_ui (t, 0, r);
			mpfr_set_ui (tt, 0, r);
		}
		else
		{
			mpfr_set (a, c, r);
			mpfr_div_ui (a, a, q, r);
			psi (tt, q, a, d, prec, r);
		}
		
#ifdef USE_DEBUG_CODE
		if (debug & 4) 
		{
			printf ("      ");
			mpfr_out_str (stdout, 10, 10, t, r); 
			printf (" * ");
			mpfr_out_str (stdout, 10, 10, tt, r); 
			printf (" = ");
		}
#endif /* USE_DEBUG_CODE */
		
		mpfr_mul (t, t, tt, r);
		mpfr_add (sum, sum, t, r);

#ifdef USE_DEBUG_CODE
		if (debug & 4) 
		{
		    mpfr_out_str (stdout, 10, 10, t, r); printf ("\n");
			printf ("---%ld: ", q);
			cp = mpfr_get_str ((char*)0, &e, 10, 0, sum, GMP_RNDN); 
			cp[5]=0;
			printf ("%s...%c%c%c.%c%c%c%c---\n", 
					cp, cp[e-3], cp[e-2], cp[e-1], cp[e], cp[e+1], cp[e+2], cp[e+3]);
			fflush (stdout);
		}
#endif /* USE_DEBUG_CODE */
	}

	mpfr_round (tt, sum);
	e = mpfr_get_z_exp (res, tt);
	if (e<0)
		mpz_fdiv_q_2exp (res, res, -e);
	else
		mpz_mul_2exp (res, res, e);

	mpfr_clear (t);
	mpfr_clear (tt);
	mpfr_clear (sum);
	mpfr_clear (a);
	mpfr_clear (b);
	mpfr_clear (sqrtb);
	mpfr_clear (c);
	mpfr_clear (d);
	mpfr_clear (nearzero);
}

#ifdef TEST_CODE

static int testit ()
{
	mpz_t z, zz;
	long k, m, n;
	
	mpz_init (z);
	mpz_init (zz);

	printf ("Testing...\n");
	printf ("Computing p(n) with n = 11m+6, m>=10000:\n");
	for (m=100000; m<100020; ++m)
	{
		n = 11*m + 6;
		printf ("p(%ld) ", n);
		partition (z, n);
		if (mpz_divisible_ui_p (z, 11))
			printf ("is divisible by 11: OK\n");
		else
			return 1;
	}

	mpz_set_ui (zz, 0);
	n = 100000;
	printf ("Computing p(%ld) using Euler formula: p(%ld)=\n", n, n);
	for (k=1; ; ++k)
	{
		m = k * (3*k + 1)/2;
		if (n <= m) break;
		partition (z, n-m);
//printf("%d:",n-m);mpz_out_str (stdout, 10, z);printf("\n");
		(k & 1)? mpz_add (zz, zz, z) : mpz_sub (zz, zz, z);
		printf ("%cp(%ld)", (k & 1)? '+':'-', n-m);
		fflush (stdout);
		m = k * (3*k - 1)/2;
		if (n <= m) break;
		partition (z, n-m);
//printf("%d:",n-m);mpz_out_str (stdout, 10, z);printf("\n");
		(k & 1)? mpz_add (zz, zz, z) : mpz_sub (zz, zz, z);
		printf ("%cp(%ld)", (k & 1)? '+':'-', n-m);
		fflush (stdout);
	}
	printf ("\nComputing p(%ld) using Rademacher formula only.\n", n);
	partition (z, n);
	if (mpz_cmp (z, zz))
		return 1;
	printf ("Both are identical.\n");
	
	return 0;
}

int main (int argc, char *argv[])
{
	mpz_t z;
	unsigned long n=0;

	if (argc>1) n = atol (argv[1]);
	debug = 0;
	if (argc>2) debug = atol (argv[2]);

	if ((debug & 1) && testit())
		printf ("Test failed! Please contact the maintainer.\n");
	if (!n) return 0;

	mpz_init (z);
	partition (z, n);
	printf ("p(%ld)=\n", n);
	mpz_out_str (stdout, 10, z);
	printf ("\n");
}

#endif /* TEST_CODE */

Reply via email to