Now that you have given the precise conditions, I will check if they are equivalent to whats in the smp code. They do look like what I was aiming for. As for caching the last entry, that code is still in there in a thread-safe way. This optimisation had no effect on the timings when the exponent vectors fit into one word, but it was +5% on multi-word exponents, so I left it in.
On Tuesday, September 26, 2017 at 10:37:33 AM UTC+2, Bill Hart wrote: > > Roman Pearce said he would put up some code to explain this. (The > conditions Daniel uses are quite complex in comparison.) I therefore think > that it's ok to quote the following from correspondence with Roman (since I > put up my blog): > > [The idea is based on ] "Ellis Horowitz, A Sorting Algorithm for > Polynomial Multiplication > Journal of the ACM, Volume 22 Issue 4, Oct. 1975, Pages 450-462 > http://dl.acm.org/citation.cfm?doid=321906.321908 > > Before inserting f_i * g_j into the heap, check if there exists > f_k * g_j in the heap where k < i, and if so, do not insert the > product. We simply check if f_{i-1} * g_j is the next unmerged > product for f_{i-1} whether it is in the heap or not. Then you > have to put the dropped products back in the heap later. After > extracting f_i * g_j from the heap, check if f_{i+1} has a term > in the heap, and if not, insert f_{i+1} * g_j." > > This is what I had remembered from what Roman explained at ISSAC. But > Daniel wasn't able to figure out something along those lines that improved > timings. Clearly it does though, since Roman uses it to great effect. And > thanks very much to Roman for suggesting this very important improvement! > > I haven't yet looked in detail at Daniel's conditions. It will be > interesting to compare. You can see then in fmpz_mpoly/mul_heap_threaded.c > in the Flint repository [1]. To my eye, they don't look that complicated. I > think it is the lines 187 and following. Perhaps they will end up being > equivalent. > > [1] > https://github.com/wbhart/flint2/blob/trunk/fmpz_mpoly/mul_heap_threaded.c > > On Tuesday, 26 September 2017 07:52:27 UTC+2, parisse wrote: >> >> >> >> Le lundi 25 septembre 2017 18:56:26 UTC+2, Bill Hart a écrit : >>> >>> Do you use anything special for memory management? Mickael Gastineau >>> recommended jemalloc, which we haven't tried yet. I assume you expected to >>> see better times for the threaded benchmarks with giac? >>> >> >> I'm not using anything specific for memory management. Thread-efficient >> memory management would certainly help improve performances, because more >> than half of the time is done in conversions inside giac, and conversions >> to mpz types are not parallelized because it would be slower to wait for >> locks. I was asking about the compiler because I looked at my code and saw >> that some optimizations are disabled for clang. >> Can you give some precisions about what you call delayed insertion in the >> heap? I can delay insertion of the initial product monomials f[i]*g[0] but >> that does not improve much the timings, at least with 1 thread. >> > -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at https://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.