Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

Jan Hubicka Wed, 28 Nov 2018 12:21:23 -0800

> On 11/28/18 12:48 PM, H.J. Lu wrote:
> > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka <hubi...@ucw.cz> wrote:
> >>
> >>> On 11/5/18 7:21 AM, Jan Hubicka wrote:
> >>>>>
> >>>>> Did you mean "the nearest common dominator"?
> >>>>
> >>>> If the nearest common dominator appears in the loop while all uses are
> >>>> out of loops, this will result in suboptimal xor placement.
> >>>> In this case you want to split edges out of the loop.
> >>>>
> >>>> In general this is what the LCM framework will do for you if the problem
> >>>> is modelled siimlar way as in mode_swtiching.  At entry function mode is
> >>>> "no zero register needed" and all conversions need mode "zero register
> >>>> needed".  Mode switching should then do the correct placement decisions
> >>>> (reaching minimal number of executions of xor).
> >>>>
> >>>> Jeff, whan is your optinion on the approach taken by the patch?
> >>>> It seems like a special case of more general issue, but I do not see
> >>>> very elegant way to solve it at least in the GCC 9 horisont, so if
> >>>> the placement is correct we can probalby go either with new pass or
> >>>> making this part of mode swithcing (which is anyway run by x86 backend)
> >>> So I haven't followed this discussion at all, but did touch on this
> >>> issue with some patch a month or two ago with a target patch that was
> >>> trying to avoid the partial stalls.
> >>>
> >>> My assumption is that we're trying to find one or more places to
> >>> initialize the upper half of an avx register so as to avoid partial
> >>> register stall at existing sites that set the upper half.
> >>>
> >>> This sounds like a classic PRE/LCM style problem (of which mode
> >>> switching is just another variant).   A common-dominator approach is
> >>> closer to a classic GCSE and is going to result is more initializations
> >>> at sub-optimal points than a PRE/LCM style.
> >>
> >> yes, it is usual code placement problem. It is special case because the
> >> zero register is not modified by the conversion (just we need to have
> >> zero somewhere).  So basically we do not have kills to the zero except
> >> for entry block.
> >>
> > 
> > Do you have  testcase to show thatf the nearest common dominator
> > in the loop, while all uses areout of loops, leads to suboptimal xor
> > placement?
> I don't have a testcase, but it's all but certain nearest common
> dominator is going to be a suboptimal placement.  That's going to create
> paths where you're going to emit the xor when it's not used.
> 
> The whole point of the LCM algorithms is they are optimal in terms of
> expression evaluations.


i think testcase should be something like

test()
{
  while (true)
  {
     if (cond1)
       {
         do_one_conversion;
         return;
       }
     if (cond2)
       {
         do_other_conversion;
         return;
       }
  }
}

Honza
> 
> jeff
> > 
>

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

Reply via email to