On Mon, 9 Oct 2023, Robin Dapp wrote: > > Hmm, the function is called at transform time so this shouldn't help > > avoiding the ICE. I expected we refuse to vectorize _any_ reduction > > when sign dependent rounding is in effect? OTOH maybe sign-dependent > > rounding is OK but only when we use a unconditional fold-left > > (so a loop mask from fully masking is OK but not an original COND_ADD?). > > So we currently only disable the use of partial vectors > > else if (reduction_type == FOLD_LEFT_REDUCTION > && reduc_fn == IFN_LAST
aarch64 probably chokes because reduc_fn is not IFN_LAST. > && FLOAT_TYPE_P (vectype_in) > && HONOR_SIGNED_ZEROS (vectype_in) so with your change we'd support signed zeros correctly. > && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "can't operate on partial vectors because" > " signed zeros cannot be preserved.\n"); > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > > which is inside a LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P block. > > For the fully masked case we continue (and then fail the assertion > on aarch64 at transform time). > > I didn't get why that case is ok, though? We still merge the initial > definition with the identity/neutral op (i.e. possibly -0.0) based on > the loop mask. Is that different to partial masking? I think the main point with my earlier change is that without native support for a fold-left reduction (like on x86) we get ops = mask ? ops : neutral; acc += ops[0]; acc += ops[1]; ... so we wouldn't use a COND_ADD but add neutral elements for masked elements. That's OK for signed zeros after your change (great) but not OK for sign dependent rounding (because we can't decide on the sign of the neutral zero then). For the case of using an internal function, thus direct target support, it should be OK to have sign-dependent rounding if we can use the masked-fold-left reduction op. As we do /* On the first iteration the input is simply the scalar phi result, and for subsequent iterations it is the output of the preceding operation. */ if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST)) { if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var, def0, mask, len, bias); else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS) new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var, def0, mask); else new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var, def0); the last case should be able to assert that !HONOR_SIGN_DEPENDENT_ROUNDING (also the reduc_fn == IFN_LAST case). The quoted condition above should change to drop the HONOR_SIGNED_ZEROS condition and the reduc_fn == IFN_LAST should change, maybe to internal_fn_mask_index (reduc_fn) == -1? Richard.