Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> writes: > On 26/01/18 13:31, Richard Sandiford wrote: >> sve/extract_[12].c were relying on the target-independent optimisation >> that removes a redundant vec_select, so that we don't end up with >> things like: >> >> dup v0.4s, v0.4s[0] >> ...use s0... >> >> But that optimisation rightly doesn't trigger for big-endian targets, >> because GCC expects lane 0 to be in the high part of the register >> rather than the low part. >> >> SVE breaks this assumption -- see the comment at the head of >> aarch64-sve.md for details -- so the optimisation is valid for >> both endiannesses. Long term, we probably need some kind of target >> hook to make GCC aware of this. >> >> But there's another problem with the current extract pattern: it doesn't >> tell the register allocator how cheap an extraction of lane 0 is with >> tied registers. It seems better to split the lane 0 case out into >> its own pattern and use tied operands for the FPR<-SIMD case, >> so that using different registers has the cost of an extra reload. >> I think we want this for both endiannesses, regardless of the hook >> described above. >> >> Also, the gen_lowpart in this pattern fails for aarch64_be due to >> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG >> instead. We're only creating this rtl in order to print it, so there's >> no need for anything fancier. >> >> Tested on aarch64_be-elf and aarch64-linux-gnu. OK to install? >> >> Richard >> >> >> 2018-01-26 Richard Sandiford <richard.sandif...@linaro.org> >> >> gcc/ >> * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New >> pattern. >> (*vec_extract<mode><Vel>_v128): Require a nonzero lane number. >> Use gen_rtx_REG rather than gen_lowpart. >> >> Index: gcc/config/aarch64/aarch64-sve.md >> =================================================================== >> --- gcc/config/aarch64/aarch64-sve.md 2018-01-13 18:01:51.232735405 +0000 >> +++ gcc/config/aarch64/aarch64-sve.md 2018-01-26 13:26:50.176756711 +0000 >> @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>" >> } >> ) >> >> +;; Extract element zero. This is a special case because we want to force >> +;; the registers to be the same for the second alternative, and then >> +;; split the instruction into nothing after RA. >> +(define_insn_and_split "*vec_extract<mode><Vel>_0" >> + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, >> Utv") >> + (vec_select:<VEL> >> + (match_operand:SVE_ALL 1 "register_operand" "w, 0, w") >> + (parallel [(const_int 0)])))] >> + "TARGET_SVE" >> + { >> + operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1])); >> + switch (which_alternative) >> + { >> + case 0: >> + return "umov\\t%<vwcore>0, %1.<Vetype>[0]"; >> + case 1: >> + return "#"; >> + case 2: >> + return "st1\\t{%1.<Vetype>}[0], %0"; >> + default: >> + gcc_unreachable (); >> + } >> + } >> + "&& reload_completed >> + && REG_P (operands[1]) >> + && REGNO (operands[0]) == REGNO (operands[1])" > > Is it guaranteed that operands[0] will be a REG rtx here? > The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow > a MEM, which would cause an error in an RTL-checking build. > If I'm not mistaken GCC may attempt the split even when matching alternative 2
Bah, good catch. The REG_P was checking the wrong operand. Fixed version below. Richard 2018-01-26 Richard Sandiford <richard.sandif...@linaro.org> gcc/ * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New pattern. (*vec_extract<mode><Vel>_v128): Require a nonzero lane number. Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-01-26 15:14:36.016144657 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2018-01-26 15:14:36.171138212 +0000 @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>" } ) +;; Extract element zero. This is a special case because we want to force +;; the registers to be the same for the second alternative, and then +;; split the instruction into nothing after RA. +(define_insn_and_split "*vec_extract<mode><Vel>_0" + [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") + (vec_select:<VEL> + (match_operand:SVE_ALL 1 "register_operand" "w, 0, w") + (parallel [(const_int 0)])))] + "TARGET_SVE" + { + operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1])); + switch (which_alternative) + { + case 0: + return "umov\\t%<vwcore>0, %1.<Vetype>[0]"; + case 1: + return "#"; + case 2: + return "st1\\t{%1.<Vetype>}[0], %0"; + default: + gcc_unreachable (); + } + } + "&& reload_completed + && REG_P (operands[0]) + && REGNO (operands[0]) == REGNO (operands[1])" + [(const_int 0)] + { + emit_note (NOTE_INSN_DELETED); + DONE; + } + [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")] +) + ;; Extract an element from the Advanced SIMD portion of the register. ;; We don't just reuse the aarch64-simd.md pattern because we don't -;; want any chnage in lane number on big-endian targets. +;; want any change in lane number on big-endian targets. (define_insn "*vec_extract<mode><Vel>_v128" [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv") (vec_select:<VEL> (match_operand:SVE_ALL 1 "register_operand" "w, w, w") (parallel [(match_operand:SI 2 "const_int_operand")])))] "TARGET_SVE - && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)" + && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)" { - operands[1] = gen_lowpart (<V128>mode, operands[1]); + operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1])); switch (which_alternative) { case 0: