Here's some more info:

It's not a caching issue. I've tried adjusting the expected cache size
in FLINT from very small to very large. It makes little difference.
It's quite amazing that these Intel machines appear completely
oblivious to how big their caches are.

The total time taken for the large integer multiplication in FLINT is
2.130s of the 2.220s, so it isn't overheads in the rest of the
algorithm.

GMP takes 4.210s for the same integer multiplication, so it isn't that
the FLINT FFT is slow on Intel machines.

There's only one conclusion possible. The Schoenhage/Nussbaumer FFT
David has written in zn_poly for multiplication of polys over Z/pZ is
truly much better on Intel than the Kronecker Segmentation/Schoenhage-
Strassen FFT method used in FLINT.

Bill.

On 14 Jan, 21:50, Bill Hart <goodwillh...@googlemail.com> wrote:
> Properly tuned, for a length 10^6 multiplication with 40 bit modulus,
> zn_poly takes 1.59s and FLINT 2.22s on sage.math. So zn_poly is quite
> a bit faster on Intel machines than FLINT. No idea why that is. But it
> is very impressive!!
>
> Bill.
>
> On 14 Jan, 20:37, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > I did some more timings and I've been bitten by the timing
> > irregularities on my Opteron again.
>
> > For length 1,000,000 and 40 bits FLINT takes about 1.57s and zn_poly
> > 1.46s. I tried using the --use-flint option in zn_poly, but presumably
> > this multiplication is outside the KS range and so the timing doesn't
> > improve, i.e. it isn't using FLINT's large integer multiplication for
> > this, but the Schoenhage/Nussbaumer FFT in zn_poly.
>
> > This timing difference is about what I'd expect. So back to square one
> > figuring out why the great difference on Intel machines. I suppose I
> > could try tuning FLINT on sage.math and see if the tuning values are
> > much different from the Opteron tunings which are used by default.
>
> > Bill.
>
> > On 14 Jan, 17:47, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > On the 2.4GHz Opteron zn_poly reports that it takes 1.46s for this
> > > multiplication. That is certainly faster than FLINT. So it looks like
> > > zn_poly really is better for longer length polynomials now.
>
> > > It's not clear what the improvement was, but it looks like better
> > > tuning code, from the zn_poly CHANGELOG.
>
> > > I think this will be interesting information for David.
>
> > > Bill.
>
> > > On 14 Jan, 17:22, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > On a 2.4Ghz Opteron FLINT takes 1.92s for the multiplication. So at
> > > > least there is no serious speed regression in FLINT.
>
> > > > I now need to time zn_poly and see if it does better than FLINT on the
> > > > Opteron.
>
> > > > Bill.
>
> > > > On 14 Jan, 17:09, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > I did the timings from FLINT directly and indeed it takes 2.5s for a
> > > > > 40 bit modulus and length 10^6. This agrees with what comes out of
> > > > > Martin's version of Sage.
>
> > > > > So, only 3 possibilities remain:
>
> > > > > 1) Serious speed regression in FLINT
> > > > > 2) David's improved tuning for zn_poly *really* makes a difference
> > > > > 3) David's approach is much better on Intel hardware
>
> > > > > I am suspecting it is number 3, but I'll do a timing on an Opteron to
> > > > > confirm this.
>
> > > > > Bill.
>
> > > > > On 14 Jan, 16:44, Bill Hart <goodwillh...@googlemail.com> wrote:
>
> > > > > > I'm trying to inspect the gmp in your installation to see if maybe 
> > > > > > the
> > > > > > Core 2 patches aren't installed or something, but I'm having trouble
> > > > > > opening the spkg named gmp-4.2.2.p1.fake. Is that where the gmp is? 
> > > > > > I
> > > > > > thought these were just tar files or bzip2 files?
>
> > > > > > Bill.
>
> > > > > > On 14 Jan, 16:29, Martin Albrecht <m...@informatik.uni-bremen.de>
> > > > > > wrote:
>
> > > > > > > On Wednesday 14 January 2009, Bill Hart wrote:
>
> > > > > > > > I checked back through the correspondence I had and the timing 
> > > > > > > > I am
> > > > > > > > thinking of is for a 40 bit modulus at length 1,000,000. David 
> > > > > > > > and I
> > > > > > > > compared at the time and zn_poly version 0.4.1 took 2.06s 
> > > > > > > > compared to
> > > > > > > > FLINT at 2.37s, at that stage, on the old sage.math.
>
> > > > > > > > The zn_poly changelog shows that David implemented the 
> > > > > > > > Schonhage/
> > > > > > > > Nussbaumer FFT in zn_poly shortly after that point. After 
> > > > > > > > zn_poly
> > > > > > > > 0.7.0 David reported that the same multiplication took zn_poly 
> > > > > > > > 4.3
> > > > > > > > billion cycles on sage.math, i.e. 2.39s which is actually 
> > > > > > > > slower than
> > > > > > > > the current FLINT.
>
> > > > > > > > There's no indication in the changelog of any changes to the
> > > > > > > > multiplication code (though the tuning code was revamped) since 
> > > > > > > > that
> > > > > > > > time.
>
> > > > > > > > Can I ask what architecture you got the above timings on Martin.
>
> > > > > > > That's my Core2Duo 2.33Ghz Macbook Pro. You can try it on 
> > > > > > > sage.math:
>
> > > > > > > sage: def f(p,n):
> > > > > > > ....:         P = PolynomialRing(GF(next_prime(p)),'x')
> > > > > > > ....:     f = P.random_element(n)
> > > > > > > ....:     g = P.random_element(n)
> > > > > > > ....:     t0 = cputime()
> > > > > > > ....:     r0 = f*g
> > > > > > > ....:     t0 = cputime(t0)
> > > > > > > ....:     t1 = cputime()
> > > > > > > ....:     r1 = f._mul_zn_poly(g)
> > > > > > > ....:     t1 = cputime(t1)
> > > > > > > ....:     assert(r0 == r1)
> > > > > > > ....:     return p,n,t0,t1
> > > > > > > ....:
> > > > > > > sage: for i in range(21):
> > > > > > > ....:        f(2**47,2**i)
> > > > > > > ....:
>
> > > > > > > (140737488355328, 1, 0.0, 0.0)
> > > > > > > (140737488355328, 2, 0.0, 0.0)
> > > > > > > (140737488355328, 4, 0.0, 0.0)
> > > > > > > (140737488355328, 8, 0.0, 0.0)
> > > > > > > (140737488355328, 16, 0.0, 0.0)
> > > > > > > (140737488355328, 32, 0.0, 0.0)
> > > > > > > (140737488355328, 64, 0.0, 0.0)
> > > > > > > (140737488355328, 128, 0.0, 0.0)
> > > > > > > (140737488355328, 256, 0.0, 0.0)
> > > > > > > (140737488355328, 512, 0.0, 0.0)
> > > > > > > (140737488355328, 1024, 0.0, 0.0)
> > > > > > > (140737488355328, 2048, 0.0, 0.0)
> > > > > > > (140737488355328, 4096, 0.010000000000000009, 0.0)
> > > > > > > (140737488355328, 8192, 0.0099999999999997868, 
> > > > > > > 0.010000000000000231)
> > > > > > > (140737488355328, 16384, 0.020000000000000018, 
> > > > > > > 0.020000000000000018)
> > > > > > > (140737488355328, 32768, 0.049999999999999822, 
> > > > > > > 0.050000000000000266)
> > > > > > > (140737488355328, 65536, 0.099999999999999645, 
> > > > > > > 0.099999999999999645)
> > > > > > > (140737488355328, 131072, 0.28000000000000114, 
> > > > > > > 0.19999999999999929)
> > > > > > > (140737488355328, 262144, 0.63000000000000256, 
> > > > > > > 0.44000000000000128)
> > > > > > > (140737488355328, 524288, 1.5300000000000011, 1.0600000000000023)
> > > > > > > (140737488355328, 1048576, 3.0999999999999943, 2.1500000000000057)
>
> > > > > > > use my sage install (/home/malb/sage-3.2.2) for that.
>
> > > > > > > > It'll be interesting for both David and I if his 
> > > > > > > > Schonhage/Nussbaumer FFT is
> > > > > > > > that much better than the FLINT large integer FFT on certain
> > > > > > > > architectures.
>
> > > > > > > Cheers,
> > > > > > > Martin
>
> > > > > > > --
> > > > > > > name: Martin Albrecht
> > > > > > > _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> > > > > > > _otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
> > > > > > > _www:http://www.informatik.uni-bremen.de/~malb
> > > > > > > _jab: martinralbre...@jabber.ccc.de
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to