On Fri, Jan 26, 2018 at 03:15:58PM +0000, Richard Sandiford wrote:
> Kyrill  Tkachov <kyrylo.tkac...@foss.arm.com> writes:
> > On 26/01/18 13:31, Richard Sandiford wrote:
> >> sve/extract_[12].c were relying on the target-independent optimisation
> >> that removes a redundant vec_select, so that we don't end up with
> >> things like:
> >>
> >>     dup v0.4s, v0.4s[0]
> >>     ...use s0...
> >>
> >> But that optimisation rightly doesn't trigger for big-endian targets,
> >> because GCC expects lane 0 to be in the high part of the register
> >> rather than the low part.
> >>
> >> SVE breaks this assumption -- see the comment at the head of
> >> aarch64-sve.md for details -- so the optimisation is valid for
> >> both endiannesses.  Long term, we probably need some kind of target
> >> hook to make GCC aware of this.

This explanation is scary - it implies there might be more surprises
waiting for us.

> >>
> >> But there's another problem with the current extract pattern: it doesn't
> >> tell the register allocator how cheap an extraction of lane 0 is with
> >> tied registers.  It seems better to split the lane 0 case out into
> >> its own pattern and use tied operands for the FPR<-SIMD case,
> >> so that using different registers has the cost of an extra reload.
> >> I think we want this for both endiannesses, regardless of the hook
> >> described above.
> >>
> >> Also, the gen_lowpart in this pattern fails for aarch64_be due to
> >> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
> >> instead.  We're only creating this rtl in order to print it, so there's
> >> no need for anything fancier.
> >>
> >> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

OK.

Thanks,
James

Reply via email to