https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>: https://gcc.gnu.org/g:af792f0226e479b165a49de5e8f9e1d16a4b26c0 commit r15-2191-gaf792f0226e479b165a49de5e8f9e1d16a4b26c0 Author: Tamar Christina <tamar.christ...@arm.com> Date: Mon Jul 22 10:26:14 2024 +0100 middle-end: Implement conditonal store vectorizer pattern [PR115531] This adds a conditional store optimization for the vectorizer as a pattern. The vectorizer already supports modifying memory accesses because of the pattern based gather/scatter recognition. Doing it in the vectorizer allows us to still keep the ability to vectorize such loops for architectures that don't have MASK_STORE support, whereas doing this in ifcvt makes us commit to MASK_STORE. Concretely for this loop: void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride) { if (stride <= 1) return; for (int i = 0; i < n; i++) { int res = c[i]; int t = b[i+stride]; if (a[i] != 0) res = t; c[i] = res; } } today we generate: .L3: ld1b z29.s, p7/z, [x0, x5] ld1w z31.s, p7/z, [x2, x5, lsl 2] ld1w z30.s, p7/z, [x1, x5, lsl 2] cmpne p15.b, p6/z, z29.b, #0 sel z30.s, p15, z30.s, z31.s st1w z30.s, p7, [x2, x5, lsl 2] add x5, x5, x4 whilelo p7.s, w5, w3 b.any .L3 which in gimple is: vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67); vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); mask__34.16_79 = vect__9.15_77 != { 0, ... }; vect_res_11.17_80 = VEC_COND_EXPR <mask__34.16_79, vect_t_20.12_74, vect_res_18.9_68>; .MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80); A MASK_STORE is already conditional, so there's no need to perform the load of the old values and the VEC_COND_EXPR. This patch makes it so we generate: vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67); vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67); mask__34.16_79 = vect__9.15_77 != { 0, ... }; .MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68); which generates: .L3: ld1b z30.s, p7/z, [x0, x5] ld1w z31.s, p7/z, [x1, x5, lsl 2] cmpne p7.b, p7/z, z30.b, #0 st1w z31.s, p7, [x2, x5, lsl 2] add x5, x5, x4 whilelo p7.s, w5, w3 b.any .L3 gcc/ChangeLog: PR tree-optimization/115531 * tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New. (vect_recog_cond_store_pattern): New. (vect_vect_recog_func_ptrs): Use it. * target.def (conditional_operation_is_expensive): New. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Document it. * targhooks.cc (default_conditional_operation_is_expensive): New. * targhooks.h (default_conditional_operation_is_expensive): New.