On Fri, Jan 26, 2018 at 03:15:58PM +0000, Richard Sandiford wrote: > Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> writes: > > On 26/01/18 13:31, Richard Sandiford wrote: > >> sve/extract_[12].c were relying on the target-independent optimisation > >> that removes a redundant vec_select, so that we don't end up with > >> things like: > >> > >> dup v0.4s, v0.4s[0] > >> ...use s0... > >> > >> But that optimisation rightly doesn't trigger for big-endian targets, > >> because GCC expects lane 0 to be in the high part of the register > >> rather than the low part. > >> > >> SVE breaks this assumption -- see the comment at the head of > >> aarch64-sve.md for details -- so the optimisation is valid for > >> both endiannesses. Long term, we probably need some kind of target > >> hook to make GCC aware of this.
This explanation is scary - it implies there might be more surprises waiting for us. > >> > >> But there's another problem with the current extract pattern: it doesn't > >> tell the register allocator how cheap an extraction of lane 0 is with > >> tied registers. It seems better to split the lane 0 case out into > >> its own pattern and use tied operands for the FPR<-SIMD case, > >> so that using different registers has the cost of an extra reload. > >> I think we want this for both endiannesses, regardless of the hook > >> described above. > >> > >> Also, the gen_lowpart in this pattern fails for aarch64_be due to > >> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG > >> instead. We're only creating this rtl in order to print it, so there's > >> no need for anything fancier. > >> > >> Tested on aarch64_be-elf and aarch64-linux-gnu. OK to install? OK. Thanks, James