[Bug tree-optimization/109088] GCC does not always vectorize conditional reduction

rguenth at gcc dot gnu.org via Gcc-bugs Wed, 27 Sep 2023 00:15:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rdapp at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #8)
> It's because the order of the operations we are doing:
> 
> For code as follows:
> 
> result += mask ? a[i] + x : 0;
> 
> GCC:
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> STMT 1. tmp = a[i] + x;
> STMT 2. tmp2 = tmp + result_ssa_1;
> STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> 
> Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> 
> Wheras LLVM:
> 
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> IR 1. tmp = a[i] + x;
> IR 2. tmp2 = mask ? tmp : 0;
> IR 3. result_ssa_2 = tmp2 + result_ssa_1.

For floating point these are not equivalent (adding zero isn't a no-op).

> LLVM only has 1 use.
> 
> Is it reasonable to swap the order in match.pd ?

if-conversion could be teached to swap this (it's if-conversion creating
the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
a patch to make if-conversion emit .COND_ADD instead which should make
it even better to vectorize.

[Bug tree-optimization/109088] GCC does not always vectorize conditional reduction

Reply via email to