Richard,

So, I’m noticing that in get_reassociation_width() we know how many ops 
(ops_num) are in the expression being considered for parallel reassociation, 
but this is not passed to the target hook. In my testing this seems like it 
might be useful to have. If you determine the maximum width that gives 
additional speedup for a large number of terms, and then use that as the width 
from the target hook, get_reassociation_width() is more aggressive than you 
would like for small expressions with maybe 4-16 terms and produces code that 
is slower than optimal. For example in many cases you want to continue using a 
width of 1 until you get to 16 terms or so. My testing shows this to be the 
case for power8, power9, and power10 processors. 

So, I’m wondering how it might be received if I posted a patch that adds this 
to the reassociation_width target hook (and of course fixes all uses of that 
target hook)?

Thanks!
   Aaron


Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

Reply via email to