[Bug target/106770] powerpc64le: Unnecessary xxpermdi before mfvsrd

segher at gcc dot gnu.org via Gcc-bugs Thu, 02 Mar 2023 05:26:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106770


--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jens Seifert from comment #6)
> The left part of VSX registers overlaps with floating point registers, that
> is why no register xxpermdi is required and mfvsrd can access all (left)
> parts of VSX registers directly.

The mfvsrd instruction was invented before ELFv2 (at the same time as mfvsrwz).
Everything in common use was big-endian then.  The insns to move GPR->VSR that
initially existed were mtvstrd and mtvsrw[az], all of which write to dword 0 of
the target VSR.

Dword 0 of vector regs is where 64-bit entities in vector regs are stored in
the ABIs, sure, and that corresponds to the FPRs in the ISA.  mtvsrdd and
mtvsrws
were added in ISA 3.0 (p9), together with mfvsrld, to make little-endian work
better with little-endian ELFv2.

> The xxpermdi x,y,y,3 indicates to me that gcc prefers right part of register
> which might also cause the xxpermdi at the beginning.

And with -mbig you get ,2 here.  It is accidental.

> At the end the mystery
> is why gcc adds 3 xxpermdi to the code.

As I said, this is constructed during expand, to make correct code.  That is
all
that expand should do: make correct (and well-optimisable, "open structured",
easy to transform, code).  We should be able to optimise this to something
better in later passes that *are* supposed to make faster code.  Like the p8
swaps pass, which mostly zaps unnecessary pairs of swaps, or the swiss army
bazooka combine, or even many earlier passes if such an xxpermdi insn is truly
superfluous.  It usually is not, we are dealing with the full 128-bit VSRs
there, there is no way of saying we do not care about part of the register
contents.  Making infra for that is big work.

We can make things easier by expressing things as 64 bit earlier.  We can (and
should) also investigate why the mfvsrd is not combined (as in, what the
instruction combiner pass does) with the xxpermdi.  There are many things not
quite perfect here.

[Bug target/106770] powerpc64le: Unnecessary xxpermdi before mfvsrd

Reply via email to