Hi Xionghu, Thanks for the updated version of patch, some comments are inlined.
on 2022/8/11 14:15, Xionghu Luo wrote: > > > On 2022/8/11 01:07, Segher Boessenkool wrote: >> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>> "_direct" is normally to indicate the define_insn is mapped to the >>>> corresponding hw insn directly. With this change, for example, >>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>>> versions, both are mapped into the same insn but have different RTL >>>> patterns. Looking forward to Segher's and David's suggestions. >>> >>> Thanks! Do you mean same RTL patterns with different hw insn? >> >> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >> instruction, never a vmrglb instead. Misleading names are an expensive >> problem. >> >> > > Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be > expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and > finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". > For BE just strict forward, seems more clear :-), OK for master? > > > [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS > [PR106069] > > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} > Linux(Thanks to Kewen). > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_<mode>): Remove. > (altivec_vmrghw_direct_<mode>_be): New pattern for BE. > (altivec_vmrglw_direct_<mode>_le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_<mode>): Remove. > (altivec_vmrglw_direct_<mode>_be): New pattern for BE. > (altivec_vmrghw_direct_<mode>_le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghu...@tencent.com> > --- > gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 36 ++-- > gcc/config/rs6000/vsx.md | 24 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ > 4 files changed, 305 insertions(+), 98 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..78245f470e9 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT > (17), > + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), > + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), > + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) I think you can just call gen_altivec_vmrghb_direct_be and gen_altivec_vmrghb_direct_le separately here. Similar for some other define_expands. > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > Could you move the following altivec_vmrghb_direct_le here? Then readers can easily check the difference between be and le for the same altivec_vmrghb_direct. Same comment applied for some other similar cases. > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglb %0,%2,%1" > + [(set_attr "type" "vecperm")]) Could you update this pattern for assembly "vmrglb %0,%1,%2" instead of "vmrglb %0,%2,%1"? I checked the previous md before the culprit commit 0910c516a3d72af048, it emits "vmrglb %0,%1,%2" for altivec_vmrglb_direct. Same comment applied for some other similar cases. > + > (define_expand "altivec_vmrghh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), > + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrghw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT > (5)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_<mode>" > +(define_insn "altivec_vmrghw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrghw %x0,%x1,%x2 > - vmrghw %0,%1,%2" > + xxmrglw %x0,%x2,%x1 > + vmrglw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrghsf" > @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT > (25), > + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), > + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), > + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghb %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT > (13), > + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT > (7)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_<mode>" > +(define_insn "altivec_vmrglw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrglw %x0,%x1,%x2 > - vmrglw %0,%1,%2" > + xxmrghw %x0,%x2,%x1 > + vmrghw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrglsf" > @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", you need to change this to: emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); Same comment applied for some other similar cases. > } > DONE; > }) > @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..97da7706f63 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, > {OPTION_MASK_ALTIVEC, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, Before the culprit commit 0910c516a3d72af04, we have: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct : CODE_FOR_altivec_vmrglb_direct), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, I think we should use: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be : CODE_FOR_altivec_vmrglb_direct_le), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, here instead. Similar comment for those related below. > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, > > /* For little-endian, the two input operands must be swapped > (or swapped back) to ensure proper right-to-left numbering > - from 0 to 2N-1. */ > - if (swapped ^ !BYTES_BIG_ENDIAN > - && icode != CODE_FOR_vsx_xxpermdi_v16qi) > + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ > + if (swapped ^ !BYTES_BIG_ENDIAN) > + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be > + || icode == CODE_FOR_altivec_vmrglb_direct_be > + || icode == CODE_FOR_altivec_vmrghh_direct_be > + || icode == CODE_FOR_altivec_vmrglh_direct_be > + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be > + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be > + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) > std::swap (op0, op1); IIUC, we don't need this part of change once we fix the operand order in the assembly for those LE "direct"s. BR, Kewen > if (imode != V16QImode) > { > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..2ae1bce131d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> > - : gen_altivec_vmrglw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], > operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], > operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> > - : gen_altivec_vmrghw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], > operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], > operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C > b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..2cde9b821e3 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,120 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +}