14 regression] jump threading de-optimizes nested floating point comparisons

tnfchris at gcc dot gnu.org via Gcc-bugs Fri, 07 Jul 2023 11:10:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154


--- Comment #59 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
I've sent two patches upstream this morning to fix the remaining ifcvt issues:

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623848.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623849.html

This brings us within 5% of GCC-12, but not all the way there,  the reason is
that since GCC-13 PRE behaves differently.

In GCC-12 after PRE we'd have the following CFG:

  <bb 15> [local count: 623751662]:
  _16 = distbb_79 * iftmp.1_100;
  iftmp.8_80 = 1.0e+0 - _16;
  _160 = chrg_init_75 * iftmp.8_80;

  <bb 16> [local count: 1057206200]:
  # iftmp.8_39 = PHI <iftmp.8_80(15), 1.0e+0(14)>
  # prephitmp_161 = PHI <_160(15), chrg_init_75(14)>
  if (distbb_79 < iftmp.0_96)
    goto <bb 17>; [50.00%]
  else
    goto <bb 18>; [50.00%]

  <bb 17> [local count: 528603100]:
  _164 = ABS_EXPR <prephitmp_161>;
  _166 = -_164;

  <bb 18> [local count: 1057206200]:
  # iftmp.9_40 = PHI <1.0e+0(17), 0.0(16)>
  # prephitmp_163 = PHI <prephitmp_161(17), 0.0(16)>
  # prephitmp_167 = PHI <_166(17), 0.0(16)>
  if (iftmp.2_38 != 0)
    goto <bb 20>; [50.00%]
  else
    goto <bb 19>; [50.00%]

  <bb 19> [local count: 528603100]:

  <bb 20> [local count: 1057206200]:
  # iftmp.10_41 = PHI <prephitmp_167(18), prephitmp_163(19)>

That is to say, in both branches we always do the multiply, gimple-isel then
correctly turns this into a COND_MUL based on the mask.

Since GCC-13 PRE now does some extra optimizations:

  <bb 15> [local count: 1057206200]:
  # l_107 = PHI <l_84(21), 0(14)>
  _13 = lpos_x[l_107];
  x_72 = _13 - p_atom$x_81;
  powmult_73 = x_72 * x_72;
  distbb_74 = powmult_73 - radij_58;
  if (distbb_74 >= 0.0)
    goto <bb 17>; [59.00%]
  else
    goto <bb 16>; [41.00%]

  <bb 16> [local count: 433454538]:
  _165 = ABS_EXPR <chrg_init_70>;
  _168 = -_165;
  goto <bb 19>; [100.00%]

  <bb 17> [local count: 623751662]:
  _14 = distbb_74 * iftmp.1_101;
  iftmp.8_76 = 1.0e+0 - _14;
  if (distbb_74 < iftmp.0_97)
    goto <bb 18>; [20.00%]
  else
    goto <bb 19>; [80.00%]

  <bb 18> [local count: 124750334]:
  _162 = chrg_init_70 * iftmp.8_76;
  _164 = ABS_EXPR <_162>;
  _167 = -_164;

  <bb 19> [local count: 1057206200]:
  # iftmp.9_38 = PHI <1.0e+0(18), 0.0(17), 1.0e+0(16)>
  # iftmp.8_102 = PHI <iftmp.8_76(18), iftmp.8_76(17), 1.0e+0(16)>
  # prephitmp_163 = PHI <_162(18), 0.0(17), chrg_init_70(16)>
  # prephitmp_169 = PHI <_167(18), 0.0(17), _168(16)>
  if (iftmp.2_36 != 0)
    goto <bb 21>; [50.00%]
  else
    goto <bb 20>; [50.00%]

That is to say, the multiplication is now compleletely skipped in one branch,
this should be better for scalar code, but for vector we have to do the
multiplication anyway.

after ifcvt we end up with:

  _162 = chrg_init_70 * iftmp.8_76;
  _164 = ABS_EXPR <_162>;
  _167 = -_164;
  _ifc__166 = distbb_74 < iftmp.0_97 ? _167 : 0.0;
  prephitmp_169 = distbb_74 >= 0.0 ? _ifc__166 : _168;

instead of

  _160 = chrg_init_75 * iftmp.8_80;
  prephitmp_161 = distbb_79 < 0.0 ? chrg_init_75 : _160;
  _164 = ABS_EXPR <prephitmp_161>;
  _166 = -_164;
  prephitmp_167 = distbb_79 < iftmp.0_96 ? _166 : 0.0;

previously we'd make COND_MUL and COND_NEG and so don't need a VCOND in the
end,
now we select after the multiplication, so we only have a COND_NEG followed by
a VCOND.

This is obviously worse, but I have no idea how to recover it.  Any ideas?

[Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons

Reply via email to