https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #46 from rguenther at suse dot de <rguenther at suse dot de> --- Am 13.04.2023 um 18:54 schrieb jakub at gcc dot gnu.org <gcc-bugzi...@gcc.gnu.org>: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154 > > --- Comment #45 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > So, would > void > foo (float *f, float d, float e) > { > if (e >= 2.0f && e <= 4.0f) > ; > else > __builtin_unreachable (); > for (int i = 0; i < 1024; i++) > { > float a = f[i]; > f[i] = (a < 0.0f ? 1.0f : 1.0f - a * d) * (a < e ? 1.0f : 0.0f); > } > } > be a better reduction on what's going on? > From the frange/threading POV, when e is in [2.0f, 4.0f] range, if a < 0.0f, > we > know that a < e is also true, so there is no point in testing that at runtime. > So I think what threadfull1 does is right and desirable if the final code > actually performs those comparisons and uses conditional jumps. > The only thing is that it is harmful for vectorization and maybe for > predicated > code. > Therefore, for scalar code at least without massive ARM style conditional > execution, > the above is better emitted as > if (a < 0.0f) > tmp = 1.0f; > else > { > tmp = (1.0f - a * d) * (a < e ? 1.0f : 0.0f); > } > or even > if (a < 0.0f) > tmp = 1.0f; > else if (a < e) > tmp = 1.0f - a * d; > else > tmp = 0.0f; > f[i] = tmp; > Thus, could we effectively try to undo it at ifcvt time on loops for > vectorization only, or during vectorization or something similar? > As ifcvt then turns the IMHO desirable > if (a_16 >= 0.0) > goto <bb 5>; [59.00%] > else > goto <bb 11>; [41.00%] > > <bb 11> [local count: 435831803]: > goto <bb 7>; [100.00%] > > <bb 5> [local count: 627172605]: > _7 = a_16 * d_17(D); > iftmp.0_18 = 1.0e+0 - _7; > if (e_13(D) > a_16) > goto <bb 12>; [20.00%] > else > goto <bb 6>; [80.00%] > > <bb 12> [local count: 125434523]: > goto <bb 7>; [100.00%] > > <bb 6> [local count: 501738082]: > > <bb 7> [local count: 1063004410]: > # prephitmp_26 = PHI <iftmp.0_18(12), 0.0(6), 1.0e+0(11)> > (ok, the 2 empty forwarders are unlikely useful) into: > _7 = a_16 * d_17(D); > iftmp.0_18 = 1.0e+0 - _7; > _21 = a_16 >= 0.0; > _10 = e_13(D) > a_16; > _9 = _10 & _21; > _27 = e_13(D) <= a_16; > _28 = _21 & _27; > _ifc__43 = _9 ? iftmp.0_18 : 0.0; > _ifc__44 = _28 ? 0.0 : _ifc__43; > _45 = a_16 < 0.0; > prephitmp_26 = _45 ? 1.0e+0 : _ifc__44; > Now, perhaps if ifcvt used ranger, it could figure out that a_16 < 0.0 implies > e_13(D) > a_16 and do something smarter with it. > Or maybe just try to do smarter ifcvt just based on the original CFG. > The pre-ifcvt code was a_16 < 0.0f ? 1.0f : a_16 < e_13 ? 1.0f - a_16 * d_17 : > 0.0f > so when ifcvt puts everything together, make it > _7 = a_16 * d_17(D); > iftmp.0_18 = 1.0e+0 - _7; > _27 = e_13(D) > a_16; > _28 = a_16 < 0.0; > _ifc__43 = _27 ? iftmp.0_18 : 0.0f; > prephitmp_26 = _28 ? 1.0f : _ifc__43; > ? Certainly improving what ifcvt produces for multiarg phis is desirable. I’m not sure if undoing the threading is generally possible. > -- > You are receiving this mail because: > You are on the CC list for the bug.