On Fri, 6 Oct 2023, Robin Dapp wrote: > > So if you think you got everything correct the patch is OK as-is, > > I just wasn't sure - maybe the neutral_element change deserves > > a comment as to how MINUS_EXPR is handled. > > Heh, I never think I got everything correct ;) > > Added this now: > > static bool > fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn) > { > + /* We support MINUS_EXPR by negating the operand. This also preserves an > + initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 + > + (-0.0) = -0.0. */ > > What I still found is that aarch64 ICEs at the assertion you added > with -frounding-math. Therefore I changed it to: > > - gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)); > + if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "cannot vectorize fold-left reduction > because" > + " signed zeros cannot be preserved.\n"); > + return false; > + } > > No code changes apart from that. Will leave it until Monday and push then > barring any objections.
Hmm, the function is called at transform time so this shouldn't help avoiding the ICE. I expected we refuse to vectorize _any_ reduction when sign dependent rounding is in effect? OTOH maybe sign-dependent rounding is OK but only when we use a unconditional fold-left (so a loop mask from fully masking is OK but not an original COND_ADD?). Still the check should be done in vectorizable_reduction, not only during transform (there the assert is proper, if we can distinguish the loop mask vs. the COND_ADD here, otherwise just remove it). Richard. > Thanks for the pointers. > > Regards > Robin > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)