https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
Richard Sandiford <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #8 from Richard Sandiford <rsandifo at gcc dot gnu.org> --- Perhaps I'm missing the point, but I think one of the issues here is that we (still) don't model that MASK_LOAD sets inactive elements to zero. Inactive elements are currently undefined instead. (I think Robin mentioned that assuming zero is problematic for RVV, so we might need an explicit MASK_LOAD argument for inactive elements, like for COND_ADD etc.) So quoting the IL in comment 4: # loop_mask_63 = PHI <next_mask_95(10), max_mask_94(20)> vect__4.10_64 = .MASK_LOAD (vectp_a.8_53, 32B, loop_mask_63); mask__31.11_66 = vect__4.10_64 != { 0, ... }; mask__56.12_67 = ~mask__31.11_66; vec_mask_and_70 = mask__56.12_67 & loop_mask_63; vect__7.15_71 = .MASK_LOAD (vectp_c.13_68, 32B, vec_mask_and_70); mask__22.16_73 = vect__7.15_71 == { 0, ... }; mask__34.17_75 = vec_mask_and_70 & mask__22.16_73; I think this and... vect_iftmp.20_78 = .MASK_LOAD (vectp_d.18_76, 32B, mask__34.17_75); vect__61.21_79 = vect__4.10_64 | vect__7.15_71; mask__35.22_81 = vect__61.21_79 != { 0, ... }; vec_mask_and_84 = mask__35.22_81 & loop_mask_63; ...this have to be kept until we model inactive elements. vect_iftmp.25_85 = .MASK_LOAD (vectp_b.23_82, 32B, vec_mask_and_84); _86 = mask__34.17_75 & loop_mask_63; This one is really curious though :) Why does the code think that the loop mask is needed here? Does the code think the mask is needed for correctness, or is the scalar_cond_masked_set optimisation misfiring? vect_iftmp.26_87 = VEC_COND_EXPR <_86, vect_iftmp.20_78, vect_iftmp.25_85>; .MASK_STORE (vectp_res.27_88, 32B, loop_mask_63, vect_iftmp.26_87);