Richard, So, I’m noticing that in get_reassociation_width() we know how many ops (ops_num) are in the expression being considered for parallel reassociation, but this is not passed to the target hook. In my testing this seems like it might be useful to have. If you determine the maximum width that gives additional speedup for a large number of terms, and then use that as the width from the target hook, get_reassociation_width() is more aggressive than you would like for small expressions with maybe 4-16 terms and produces code that is slower than optimal. For example in many cases you want to continue using a width of 1 until you get to 16 terms or so. My testing shows this to be the case for power8, power9, and power10 processors.
So, I’m wondering how it might be received if I posted a patch that adds this to the reassociation_width target hook (and of course fixes all uses of that target hook)? Thanks! Aaron Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain