> On 11/28/18 12:48 PM, H.J. Lu wrote: > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka <hubi...@ucw.cz> wrote: > >> > >>> On 11/5/18 7:21 AM, Jan Hubicka wrote: > >>>>> > >>>>> Did you mean "the nearest common dominator"? > >>>> > >>>> If the nearest common dominator appears in the loop while all uses are > >>>> out of loops, this will result in suboptimal xor placement. > >>>> In this case you want to split edges out of the loop. > >>>> > >>>> In general this is what the LCM framework will do for you if the problem > >>>> is modelled siimlar way as in mode_swtiching. At entry function mode is > >>>> "no zero register needed" and all conversions need mode "zero register > >>>> needed". Mode switching should then do the correct placement decisions > >>>> (reaching minimal number of executions of xor). > >>>> > >>>> Jeff, whan is your optinion on the approach taken by the patch? > >>>> It seems like a special case of more general issue, but I do not see > >>>> very elegant way to solve it at least in the GCC 9 horisont, so if > >>>> the placement is correct we can probalby go either with new pass or > >>>> making this part of mode swithcing (which is anyway run by x86 backend) > >>> So I haven't followed this discussion at all, but did touch on this > >>> issue with some patch a month or two ago with a target patch that was > >>> trying to avoid the partial stalls. > >>> > >>> My assumption is that we're trying to find one or more places to > >>> initialize the upper half of an avx register so as to avoid partial > >>> register stall at existing sites that set the upper half. > >>> > >>> This sounds like a classic PRE/LCM style problem (of which mode > >>> switching is just another variant). A common-dominator approach is > >>> closer to a classic GCSE and is going to result is more initializations > >>> at sub-optimal points than a PRE/LCM style. > >> > >> yes, it is usual code placement problem. It is special case because the > >> zero register is not modified by the conversion (just we need to have > >> zero somewhere). So basically we do not have kills to the zero except > >> for entry block. > >> > > > > Do you have testcase to show thatf the nearest common dominator > > in the loop, while all uses areout of loops, leads to suboptimal xor > > placement? > I don't have a testcase, but it's all but certain nearest common > dominator is going to be a suboptimal placement. That's going to create > paths where you're going to emit the xor when it's not used. > > The whole point of the LCM algorithms is they are optimal in terms of > expression evaluations.
i think testcase should be something like test() { while (true) { if (cond1) { do_one_conversion; return; } if (cond2) { do_other_conversion; return; } } } Honza > > jeff > > >