https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> ---
Am 13.04.2023 um 18:54 schrieb jakub at gcc dot gnu.org
<gcc-bugzi...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
> 
> --- Comment #45 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> So, would
> void
> foo (float *f, float d, float e)
> {
>  if (e >= 2.0f && e <= 4.0f)
>    ;
>  else
>    __builtin_unreachable ();
>  for (int i = 0; i < 1024; i++)
>    {
>      float a = f[i];
>      f[i] = (a < 0.0f ? 1.0f : 1.0f - a * d) * (a < e ? 1.0f : 0.0f);
>    }
> }
> be a better reduction on what's going on?
> From the frange/threading POV, when e is in [2.0f, 4.0f] range, if a < 0.0f, 
> we
> know that a < e is also true, so there is no point in testing that at runtime.
> So I think what threadfull1 does is right and desirable if the final code
> actually performs those comparisons and uses conditional jumps.
> The only thing is that it is harmful for vectorization and maybe for 
> predicated
> code.
> Therefore, for scalar code at least without massive ARM style conditional
> execution,
> the above is better emitted as
>  if (a < 0.0f)
>    tmp = 1.0f;
>  else
>    {
>      tmp = (1.0f - a * d) * (a < e ? 1.0f : 0.0f);
>    }
> or even
>  if (a < 0.0f)
>    tmp = 1.0f;
>  else if (a < e)
>    tmp = 1.0f - a * d;
>  else
>    tmp = 0.0f;
>  f[i] = tmp;
> Thus, could we effectively try to undo it at ifcvt time on loops for
> vectorization only, or during vectorization or something similar?
> As ifcvt then turns the IMHO desirable
>  if (a_16 >= 0.0)
>    goto <bb 5>; [59.00%]
>  else
>    goto <bb 11>; [41.00%]
> 
>  <bb 11> [local count: 435831803]:
>  goto <bb 7>; [100.00%]
> 
>  <bb 5> [local count: 627172605]:
>  _7 = a_16 * d_17(D);
>  iftmp.0_18 = 1.0e+0 - _7;
>  if (e_13(D) > a_16)
>    goto <bb 12>; [20.00%]
>  else
>    goto <bb 6>; [80.00%]
> 
>  <bb 12> [local count: 125434523]:
>  goto <bb 7>; [100.00%]
> 
>  <bb 6> [local count: 501738082]:
> 
>  <bb 7> [local count: 1063004410]:
>  # prephitmp_26 = PHI <iftmp.0_18(12), 0.0(6), 1.0e+0(11)>
> (ok, the 2 empty forwarders are unlikely useful) into:
>  _7 = a_16 * d_17(D);
>  iftmp.0_18 = 1.0e+0 - _7;
>  _21 = a_16 >= 0.0;
>  _10 = e_13(D) > a_16;
>  _9 = _10 & _21;
>  _27 = e_13(D) <= a_16;
>  _28 = _21 & _27;
>  _ifc__43 = _9 ? iftmp.0_18 : 0.0;
>  _ifc__44 = _28 ? 0.0 : _ifc__43;
>  _45 = a_16 < 0.0;
>  prephitmp_26 = _45 ? 1.0e+0 : _ifc__44;
> Now, perhaps if ifcvt used ranger, it could figure out that a_16 < 0.0 implies
> e_13(D) > a_16 and do something smarter with it.
> Or maybe just try to do smarter ifcvt just based on the original CFG.
> The pre-ifcvt code was a_16 < 0.0f ? 1.0f : a_16 < e_13 ? 1.0f - a_16 * d_17 :
> 0.0f
> so when ifcvt puts everything together, make it
>  _7 = a_16 * d_17(D);
>  iftmp.0_18 = 1.0e+0 - _7;
>  _27 = e_13(D) > a_16;
>  _28 = a_16 < 0.0;
>  _ifc__43 = _27 ? iftmp.0_18 : 0.0f;
>  prephitmp_26 = _28 ? 1.0f : _ifc__43;
> ?

Certainly improving what ifcvt produces for multiarg phis is desirable. I’m not
sure if undoing the threading is generally possible.

> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

Reply via email to