On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka <hubi...@ucw.cz> wrote: > > > On 11/5/18 7:21 AM, Jan Hubicka wrote: > > >> > > >> Did you mean "the nearest common dominator"? > > > > > > If the nearest common dominator appears in the loop while all uses are > > > out of loops, this will result in suboptimal xor placement. > > > In this case you want to split edges out of the loop. > > > > > > In general this is what the LCM framework will do for you if the problem > > > is modelled siimlar way as in mode_swtiching. At entry function mode is > > > "no zero register needed" and all conversions need mode "zero register > > > needed". Mode switching should then do the correct placement decisions > > > (reaching minimal number of executions of xor). > > > > > > Jeff, whan is your optinion on the approach taken by the patch? > > > It seems like a special case of more general issue, but I do not see > > > very elegant way to solve it at least in the GCC 9 horisont, so if > > > the placement is correct we can probalby go either with new pass or > > > making this part of mode swithcing (which is anyway run by x86 backend) > > So I haven't followed this discussion at all, but did touch on this > > issue with some patch a month or two ago with a target patch that was > > trying to avoid the partial stalls. > > > > My assumption is that we're trying to find one or more places to > > initialize the upper half of an avx register so as to avoid partial > > register stall at existing sites that set the upper half. > > > > This sounds like a classic PRE/LCM style problem (of which mode > > switching is just another variant). A common-dominator approach is > > closer to a classic GCSE and is going to result is more initializations > > at sub-optimal points than a PRE/LCM style. > > yes, it is usual code placement problem. It is special case because the > zero register is not modified by the conversion (just we need to have > zero somewhere). So basically we do not have kills to the zero except > for entry block. >
Do you have testcase to show thatf the nearest common dominator in the loop, while all uses areout of loops, leads to suboptimal xor placement? -- H.J.