https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355
--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Kewen Lin <li...@gcc.gnu.org>: https://gcc.gnu.org/g:52c112800d9f44457c4832309a48c00945811313 commit r15-1504-g52c112800d9f44457c4832309a48c00945811313 Author: Kewen Lin <li...@linux.ibm.com> Date: Thu Jun 20 20:23:56 2024 -0500 rs6000: Fix wrong RTL patterns for vector merge high/low word on LE Commit r12-4496 changes some define_expands and define_insns for vector merge high/low word, which are altivec_vmrg[hl]w, vsx_xxmrg[hl]w_<VSX_W:mode>. These defines are mainly for built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si and some internal gen function needs. These functions should consider endianness, taking vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges the first halves (in element order) of two vectors", it does note it's in element order. So it's mapped into vmrghw on BE while vmrglw on LE respectively. Although the mapped insns are different, as the discussion in PR106069, the RTL pattern should be still the same, it is conformed before commit r12-4496, define_expand altivec_vmrghw got expanded into: (vec_select:VSX_W (vec_concat:<VS_double> (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on both BE and LE then. But commit r12-4496 changed it to expand into: (vec_select:VSX_W (vec_concat:<VS_double> (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] on BE, and (vec_select:VSX_W (vec_concat:<VS_double> (match_operand:VSX_W 1 "register_operand" "wa,v") (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] on LE, although the mapped insn are still vmrghw on BE and vmrglw on LE, the associated RTL pattern is completely wrong and inconsistent with the mapped insn. If optimization passes leave this pattern alone, even if its pattern doesn't represent its mapped insn, it's still fine, that's why simple testing on bif doesn't expose this issue. But once some optimization pass such as combine does some changes basing on this wrong pattern, because the pattern doesn't match the semantics that the expanded insn is intended to represent, it would cause the unexpected result. So this patch is to fix the wrong RTL pattern, ensure the associated RTL patterns become the same as before which can have the same semantic as their mapped insns. With the proposed patch, the expanders like altivec_vmrghw expands into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le depending on endianness, "direct" can easily show which insn would be generated, _be and _le are mainly for the different RTL patterns as endianness. Co-authored-by: Xionghu Luo <xionghu...@tencent.com> PR target/106069 PR target/115355 gcc/ChangeLog: * config/rs6000/altivec.md (altivec_vmrghw_direct_<VSX_W:mode>): Rename to ... (altivec_vmrghw_direct_<VSX_W:mode>_be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrghw_direct_<VSX_W:mode>_le): New define_insn. (altivec_vmrglw_direct_<VSX_W:mode>): Rename to ... (altivec_vmrglw_direct_<VSX_W:mode>_be): ... this. Add the condition BYTES_BIG_ENDIAN. (altivec_vmrglw_direct_<VSX_W:mode>_le): New define_insn. (altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. (vec_widen_umult_hi_v8hi): Adjust the call to gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE and by gen_altivec_vmrglw for LE. (vec_widen_smult_hi_v8hi): Likewise. (vec_widen_umult_lo_v8hi): Adjust the call to gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE and by gen_altivec_vmrghw for LE (vec_widen_smult_lo_v8hi): Likewise. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace CODE_FOR_altivec_vmrghw_direct_v4si by CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace CODE_FOR_altivec_vmrglw_direct_v4si by CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and CODE_FOR_altivec_vmrglw_direct_v4si_le for LE. * config/rs6000/vsx.md (vsx_xxmrghw_<VSX_W:mode>): Adjust by calling gen_altivec_vmrghw_direct_v4si_be for BE and gen_altivec_vmrglw_direct_v4si_le for LE. (vsx_xxmrglw_<VSX_W:mode>): Adjust by calling gen_altivec_vmrglw_direct_v4si_be for BE and gen_altivec_vmrghw_direct_v4si_le for LE. gcc/testsuite/ChangeLog: * g++.target/powerpc/pr106069.C: New test. * gcc.target/powerpc/pr115355.c: New test.