Now that you have given the precise conditions, I will check if they are 
equivalent to whats in the smp code. They do look like what I was aiming 
for.
As for caching the last entry, that code is still in there in a thread-safe 
way. This optimisation had no effect on the timings when the exponent 
vectors fit into one word, but it was +5% on multi-word exponents, so I 
left it in.

On Tuesday, September 26, 2017 at 10:37:33 AM UTC+2, Bill Hart wrote:
>
> Roman Pearce said he would put up some code to explain this. (The 
> conditions Daniel uses are quite complex in comparison.) I therefore think 
> that it's ok to quote the following from correspondence with Roman (since I 
> put up my blog):
>
> [The idea is based on ] "Ellis Horowitz, A Sorting Algorithm for 
> Polynomial Multiplication
> Journal of the ACM, Volume 22 Issue 4, Oct. 1975, Pages 450-462
> http://dl.acm.org/citation.cfm?doid=321906.321908
>
> Before inserting f_i * g_j into the heap, check if there exists
> f_k * g_j in the heap where k < i, and if so, do not insert the
> product.  We simply check if f_{i-1} * g_j is the next unmerged
> product for f_{i-1} whether it is in the heap or not.  Then you
> have to put the dropped products back in the heap later.  After
> extracting f_i * g_j from the heap, check if f_{i+1} has a term
> in the heap, and if not, insert f_{i+1} * g_j."
>
> This is what I had remembered from what Roman explained at ISSAC. But 
> Daniel wasn't able to figure out something along those lines that improved 
> timings. Clearly it does though, since Roman uses it to great effect. And 
> thanks very much to Roman for suggesting this very important improvement!
>
> I haven't yet looked in detail at Daniel's conditions. It will be 
> interesting to compare. You can see then in fmpz_mpoly/mul_heap_threaded.c 
> in the Flint repository [1]. To my eye, they don't look that complicated. I 
> think it is the lines 187 and following. Perhaps they will end up being 
> equivalent.
>
> [1] 
> https://github.com/wbhart/flint2/blob/trunk/fmpz_mpoly/mul_heap_threaded.c
>  
> On Tuesday, 26 September 2017 07:52:27 UTC+2, parisse wrote:
>>
>>
>>
>> Le lundi 25 septembre 2017 18:56:26 UTC+2, Bill Hart a écrit :
>>>
>>> Do you use anything special for memory management? Mickael Gastineau 
>>> recommended jemalloc, which we haven't tried yet. I assume you expected to 
>>> see better times for the threaded benchmarks with giac? 
>>>
>>
>> I'm not using anything specific for memory management. Thread-efficient 
>> memory management would certainly help improve performances, because more 
>> than half of the time is done in conversions inside giac, and conversions 
>> to mpz types are not parallelized because it would be slower to wait for 
>> locks. I was asking about the compiler because I looked at my code and saw 
>> that some optimizations are disabled for clang. 
>> Can you give some precisions about what you call delayed insertion in the 
>> heap? I can delay insertion of the initial product monomials f[i]*g[0] but 
>> that does not improve much the timings, at least with 1 thread. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to