Hi, Since the arm-linux toolchain build has been fixed, I have noticed additional failures on armeb: gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1cq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1h_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1mq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3 gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times vdup.32\\tq[0-9]+, r[0-9]+ 4 gcc.target/arm/crypto-vsha1pq_u32.c scan-assembler-times vmov.32\\tr[0-9]+, d[0-9]+\\[[0-9]+\\]+ 3
I don't see them mentioned in this thread though? Can you check? Thanks Christophe On Thu, Jul 15, 2021 at 3:07 PM Jonathan Wright <jonathan.wri...@arm.com> wrote: > Ah, yes - those test results should have only been changed for little > endian. > > I've submitted a patch to the list restoring the original expected results > for big > endian. > > Thanks, > Jonathan > ------------------------------ > *From:* Christophe Lyon <christophe.lyon....@gmail.com> > *Sent:* 15 July 2021 10:09 > *To:* Richard Sandiford <richard.sandif...@arm.com>; Jonathan Wright < > jonathan.wri...@arm.com>; gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; > Kyrylo Tkachov <kyrylo.tkac...@arm.com> > *Subject:* Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification > > > > On Mon, Jul 12, 2021 at 5:31 PM Richard Sandiford via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > Jonathan Wright <jonathan.wri...@arm.com> writes: > > Hi, > > > > Version 2 of this patch adds more code generation tests to show the > > benefit of this RTL simplification as well as adding a new helper > function > > 'rtx_vec_series_p' to reduce code duplication. > > > > Patch tested as version 1 - ok for master? > > Sorry for the slow reply. > > > Regression tested and bootstrapped on aarch64-none-linux-gnu, > > x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and > > aarch64_be-none-linux-gnu - no issues. > > I've also tested this on powerpc64le-unknown-linux-gnu, no issues again. > > > diff --git a/gcc/combine.c b/gcc/combine.c > > index > 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4 > 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode > op0_mode, int in_dest, > > - 1, > > 0)); > > break; > > + case VEC_SELECT: > > + { > > + rtx trueop0 = XEXP (x, 0); > > + mode = GET_MODE (trueop0); > > + rtx trueop1 = XEXP (x, 1); > > + int nunits; > > + /* If we select a low-part subreg, return that. */ > > + if (GET_MODE_NUNITS (mode).is_constant (&nunits) > > + && targetm.can_change_mode_class (mode, GET_MODE (x), > ALL_REGS)) > > + { > > + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) > : 0; > > + > > + if (rtx_vec_series_p (trueop1, offset)) > > + { > > + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode); > > + if (new_rtx != NULL_RTX) > > + return new_rtx; > > + } > > + } > > + } > > Since this occurs three times, I think it would be worth having > a new predicate: > > /* Return true if, for all OP of mode OP_MODE: > > (vec_select:RESULT_MODE OP SEL) > > is equivalent to the lowpart RESULT_MODE of OP. */ > > bool > vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx > sel) > > containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class > and rtx_vec_series_p tests. > > I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p > is in emit-rtl.c. > > > diff --git a/gcc/config/aarch64/aarch64.md > b/gcc/config/aarch64/aarch64.md > > index > aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105 > 100644 > > --- a/gcc/config/aarch64/aarch64.md > > +++ b/gcc/config/aarch64/aarch64.md > > @@ -1884,15 +1884,16 @@ > > ) > > > > (define_insn "*zero_extend<SHORT:mode><GPI:mode>2_aarch64" > > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w") > > - (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m")))] > > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r") > > + (zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" > "r,m,m,w")))] > > "" > > "@ > > and\t%<GPI:w>0, %<GPI:w>1, <SHORT:short_mask> > > ldr<SHORT:size>\t%w0, %1 > > - ldr\t%<SHORT:size>0, %1" > > - [(set_attr "type" "logic_imm,load_4,f_loads") > > - (set_attr "arch" "*,*,fp")] > > + ldr\t%<SHORT:size>0, %1 > > + umov\t%w0, %1.<SHORT:size>[0]" > > + [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp") > > + (set_attr "arch" "*,*,fp,fp")] > > FTR (just to show I thought about it): I don't know whether the umov > can really be considered an fp operation rather than a simd operation, > but since we don't support fp without simd, this is already a distinction > without a difference. So the pattern is IMO OK as-is. > > > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > > index > 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35 > 100644 > > --- a/gcc/config/arm/vfp.md > > +++ b/gcc/config/arm/vfp.md > > @@ -224,7 +224,7 @@ > > ;; problems because small constants get converted into adds. > > (define_insn "*arm_movsi_vfp" > > [(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m > ,*t,r,*t,*t, *Uv") > > - (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))] > > + (match_operand:SI 1 "general_operand" "rk, > I,K,j,mi,rk,r,t,*t,*Uvi,*t"))] > > "TARGET_ARM && TARGET_HARD_FLOAT > > && ( s_register_operand (operands[0], SImode) > > || s_register_operand (operands[1], SImode))" > > I'll assume that an Arm maintainer would have spoken up by now if > they didn't want this for some reason. > > > diff --git a/gcc/rtl.c b/gcc/rtl.c > > index > aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa > 100644 > > --- a/gcc/rtl.c > > +++ b/gcc/rtl.c > > @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec) > > } > > } > > > > +/* Return true if element-selection indices in VEC are in series. */ > > + > > +bool > > +rtx_vec_series_p (const_rtx vec, int start) > > I think rtvec_series_p would be better, for consistency with > rtvec_all_equal_p. Also, let's generalise it to: > > /* Return true if VEC contains a linear series of integers > { START, START+1, START+2, ... }. */ > > bool > rtvec_series_p (rtvec vec, int start) > { > } > > > +{ > > + for (int i = 0; i < XVECLEN (vec, 0); i++) > > + { > > + if (i + start != INTVAL (XVECEXP (vec, 0, i))) > > + return false; > > + } > > + return true; > > With the general definition I think this should be: > > for (int i = 0; i < GET_NUM_ELEM (vec); i++) > { > rtx x = RTVEC_ELT (vec, i); > if (!CONST_INT_P (x) || INTVAL (x) != i + start) > return false; > } > > Then pass XVEC (sel, 0) to the function, instead of just sel. > > OK with those changes, thanks. > > > Hi, > > Some of the updated tests fail on aarch64_be: > gcc.target/aarch64/sve/extract_1.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 2 > gcc.target/aarch64/sve/extract_1.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 2 > gcc.target/aarch64/sve/extract_2.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 2 > gcc.target/aarch64/sve/extract_2.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 2 > gcc.target/aarch64/sve/extract_3.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 5 > gcc.target/aarch64/sve/extract_3.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 5 > gcc.target/aarch64/sve/extract_4.c scan-assembler-times > \\tfmov\\tw[0-9]+, s[0-9]\\n 6 > gcc.target/aarch64/sve/extract_4.c scan-assembler-times > \\tfmov\\tx[0-9]+, d[0-9]\\n 6 > > Can you check? > > Thanks, > > Christophe > > > > > Richard > >