On the other hand, I am unable to replicate the very sparse benchmark unless I assume the result will fit in 2 limbs and allocate all the output mpz's in advance, etc. Then I can basically replicate it. If I use my generic no assumptions code it takes about 3s. I don't think I can improve that much if at all.
Bill. On 14 May, 17:09, Bill Hart <goodwillh...@googlemail.com> wrote: > With a bit of fiddling I can get the Fateman benchmark down to 53.5s > on sage.math (2.66 GHz Core2/penryn) making no assumptions about the > size of the output coefficients. I've checked that at least the output > poly has the right length and coeffs of the right size. > > Adjusting for the clock speed, that seems to match what is reported > for sdmp in Roman and Peter's paper. > > Time to clean up my code and commit. I didn't implement the heap trick > I mentioned. > > Bill. > > On 14 May, 14:28, Bill Hart <goodwillh...@googlemail.com> wrote: > > > > > > > Thanks all for the very interesting comments and links to publications > > and CAS's. > > > I've implemented the algorithm using flint2's fmpz (multiprecision) > > integer type for coefficients and at this stage for 62 bit integers > > for exponents, only. (However it should be trivial to lift this > > restriction.) > > > If I make a couple of simplifications, namely assume that the output > > fits into two limbs, and that none of the polynomials has length > > > 2^32 - 1, etc, I get pretty good times, certainly better than reported > > in Francesco's paper. I also don't know if any assumptions about the > > coefficients being unsigned are made in any of the benchmarks. That > > would further reduce the times, etc. > > > If I am a bit more general with the assumptions, allowing coefficients > > of any multiple precisions size and polys of any length up to 2^64 - > > 1, then it takes a little longer than reported in the various papers. > > My comparison is on sage.math which is a core2 penryn running at 2.66 > > GHz (I adjust for the faster clock speed). Performance is also good on > > a K102. > > > I might yet implement a heap speedup I thought of last year, but which > > didn't actually speed anything up at that time. This year, now that I > > have taken Roman's suggestion of not trying to store too much > > information in the heap, I seem to be getting good times. (Last year I > > couldn't get under 133% of the sdmp time.) The speedup might work this > > time around! > > > I want to publicly thank Roman Pearce for sending me a copy of his > > reference implementation. It was a great help. I didn't use any of the > > goto tricks you did to help gcc along. I believe you probably have > > better performance in sdmp on account of your very careful > > implementation. But my heap implementation already looks too much like > > yours! I seem to recall you saying you accumulate in 3 limbs, rather > > than 2 in sdmp, and I am not convinced when I do that, that I'm not > > losing quite a few cycles. > > > I think I might not implement Richard Fateman's trick, but instead > > just duplicate the heap code to deal with the case where the exponents > > don't fit in a single limb. My implementation should not slow down in > > the other case if I do this, and in some senses, the cases where the > > exponents don't fit in a single limb are not the interesting cases for > > my work. I'd be better off putting time into a multivariate Kronecker > > Segmentation implementation. > > > Bill. > > > On May 14, 7:35 am, parisse <bernard.pari...@ujf-grenoble.fr> wrote: > > > > On 13 mai, 19:03, Roman Pearce <rpear...@gmail.com> wrote: > > > > > On May 13, 2:45 am, parisse <bernard.pari...@ujf-grenoble.fr> wrote: > > > > > > In my own experience, coding with an univariate polynomial is not > > > > > efficient especially if the polynomial is sparse. > > > > > There must be some kind of inefficiency. If you use word > > > > operations > > > > for all monomial operations then it should be fast. > > > > Seehttp://portal.acm.org/citation.cfm?id=281657 > > > > Sorry, I made a confusion. I wanted to say that multiplying using big > > > integers was inefficient. I'm of course using univariate > > > multiplication by packing monomials in an integer (if it can fit). > > > Looking at Biscani's paper, it seems he has good timing by > > > preallocating an array for the resulting monomial in the "semi-dense" > > > case (by semi-dense I mean dense with respect to total degree). > > > > -- > > > To post to this group, send an email to sage-devel@googlegroups.com > > > To unsubscribe from this group, send an email to > > > sage-devel+unsubscr...@googlegroups.com > > > For more options, visit this group > > > athttp://groups.google.com/group/sage-devel > > > URL:http://www.sagemath.org > > > -- > > To post to this group, send an email to sage-devel@googlegroups.com > > To unsubscribe from this group, send an email to > > sage-devel+unsubscr...@googlegroups.com > > For more options, visit this group > > athttp://groups.google.com/group/sage-devel > > URL:http://www.sagemath.org > > -- > To post to this group, send an email to sage-devel@googlegroups.com > To unsubscribe from this group, send an email to > sage-devel+unsubscr...@googlegroups.com > For more options, visit this group athttp://groups.google.com/group/sage-devel > URL:http://www.sagemath.org -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org