Thanks all for the very interesting comments and links to publications
and CAS's.

I've implemented the algorithm using flint2's fmpz (multiprecision)
integer type for coefficients and at this stage for 62 bit integers
for exponents, only. (However it should be trivial to lift this
restriction.)

If I make a couple of simplifications, namely assume that the output
fits into two limbs, and that none of the polynomials has length >
2^32 - 1, etc, I get pretty good times, certainly better than reported
in Francesco's paper. I also don't know if any assumptions about the
coefficients being unsigned are made in any of the benchmarks. That
would further reduce the times, etc.

If I am a bit more general with the assumptions, allowing coefficients
of any multiple precisions size and polys of any length up to 2^64 -
1, then it takes a little longer than reported in the various papers.
My comparison is on sage.math which is a core2 penryn running at 2.66
GHz (I adjust for the faster clock speed). Performance is also good on
a K102.

I might yet implement a heap speedup I thought of last year, but which
didn't actually speed anything up at that time. This year, now that I
have taken Roman's suggestion of not trying to store too much
information in the heap, I seem to be getting good times. (Last year I
couldn't get under 133% of the sdmp time.) The speedup might work this
time around!

I want to publicly thank Roman Pearce for sending me a copy of his
reference implementation. It was a great help. I didn't use any of the
goto tricks you did to help gcc along. I believe you probably have
better performance in sdmp on account of your very careful
implementation. But my heap implementation already looks too much like
yours! I seem to recall you saying you accumulate in 3 limbs, rather
than 2 in sdmp, and I am not convinced when I do that, that I'm not
losing quite a few cycles.

I think I might not implement Richard Fateman's trick, but instead
just duplicate the heap code to deal with the case where the exponents
don't fit in a single limb. My implementation should not slow down in
the other case if I do this, and in some senses, the cases where the
exponents don't fit in a single limb are not the interesting cases for
my work. I'd be better off putting time into a multivariate Kronecker
Segmentation implementation.

Bill.

On May 14, 7:35 am, parisse <bernard.pari...@ujf-grenoble.fr> wrote:
> On 13 mai, 19:03, Roman Pearce <rpear...@gmail.com> wrote:
>
> > On May 13, 2:45 am, parisse <bernard.pari...@ujf-grenoble.fr> wrote:
>
> > > In my own experience, coding with an univariate polynomial is not
> > > efficient especially if the polynomial is sparse.
>
> > There must be some kind of inefficiency.  If you use word
> > operations
> > for all monomial operations then it should be fast.  
> > Seehttp://portal.acm.org/citation.cfm?id=281657
>
> Sorry, I made a confusion. I wanted to say that multiplying using big
> integers was inefficient. I'm of course using univariate
> multiplication by packing monomials in an integer (if it can fit).
> Looking at Biscani's paper, it seems he has good timing by
> preallocating an array for the resulting monomial in the "semi-dense"
> case (by semi-dense I mean dense with respect to total degree).
>
> --
> To post to this group, send an email to sage-devel@googlegroups.com
> To unsubscribe from this group, send an email to 
> sage-devel+unsubscr...@googlegroups.com
> For more options, visit this group athttp://groups.google.com/group/sage-devel
> URL:http://www.sagemath.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to