On Thu, Jan 11, 2018 at 03:35:37PM +0000, Wilco Dijkstra wrote: > Segher Boessenkool wrote: > > > Of course I see that ldp is useful. I don't think that this particular > > way of forcing more pairs is a good idea. Needs testing / benchmarking / > > instrumentation, and we haven't seen any of that. > > I wouldn't propose a patch if it caused slowdowns. In fact I am seeing > speedups, > particularly benchmarks which benefit from shrinkwrapping (eg. povray). Int is > flat, and there is an overall gain on fp. > > > Forcing pairs before separate shrink-wrapping reduces the effectiveness > > of the latter by a lot. > > That doesn't appear to be the case - or at least any reduction in > effectiveness is > more than mitigated by the lower number of memory accesses and I-cache misses. > To do better still you'd need to compute an optimal set of pairs, and that is > quite > difficult in the current infrastructure. I could dynamically create pairs > just in the backend > but that won't be optimal either.
Right, I certainly believe forming more pairs before sws (as you do) improves the code -- but I think forming the pairs only after sws will work even better. But yes that is more work to implement, and the benefit (if any) is unknown. I hoped I could convince you to try ;-) Segher