On 30/05/2023 07:26, Richard Biener wrote:
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs <a...@codesourcery.com> wrote:
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.
Therefore, this patch adds minimal support for "complex vector int"
modes. I have not attempted to provide any means to use these modes
from C, so they're really only useful for DIVMOD. The actual libfunc
implementation will pack the data into wider vector modes manually.
A knock-on effect of this is that I needed to increase the range of
"mode_unit_size" (several of the vector modes supported by amdgcn exceed
the previous 255-byte limit).
Since this change would add a large number of new, unused modes to many
architectures, I have elected to *not* enable them, by default, in
machmode.def (where the other complex modes are created). The new modes
are therefore inactive on all architectures but amdgcn, for now.
OK for mainline? (I've not done a full test yet, but I will.)
I think it makes more sense to map vector CSImode to vector SImode with
the double number of lanes. In fact since divmod is a libgcc function
I wonder where your vector variant would reside and how GCC decides to
emit calls to it? That is, there's no way to OMP simd declare this function?
The divmod implementation lives in libgcc. It's not too difficult to
write using vector extensions and some asm tricks. I did try an OMP simd
declare implementation, but it didn't vectorize well, and that's a yack
I don't wish to shave right now.
In any case, the OMP simd declare will not help us here, directly,
because the DIVMOD transformation happens too late in the pass pipeline,
long after ifcvt and vect. My implementation (not yet posted), uses a
libfunc and the TARGET_EXPAND_DIVMOD_LIBFUNC hook in the standard way.
It just needs the complex vector modes to exist.
Using vectors twice the length is problematic also. If I create a new
V128SImode that spans across two 64-lane vector registers then that will
probably have the desired effect ("real" quotient in v8, "imaginary"
remainder in v9), but if I use V64SImode to represent two V32SImode
vectors then that's a one-register mode, and I'll have to use a
permutation (a memory operation) to extract lanes 32-63 into lanes 0-31,
and if we ever want to implement instructions that operate on these
modes (as opposed to the odd/even add/sub complex patterns we have now)
then the masking will be all broken and we'd need to constantly
disassemble the double length vectors to operate on them.
The implementation I proposed is essentially a struct containing two
vectors placed in consecutive registers. This is the natural
representation for the architecture.
Anyway, you don't like this patch and I see that AArch64 is picking
apart BLKmode to see if there's complex inside, so maybe I can make
something like that work here? AArch64 doesn't seem to use
TARGET_EXPAND_DIVMOD_LIBFUNC though, and I'm pretty sure the problem I
was trying to solve was in the way the expand pass handles the BLKmode
complex, outside the control of the backend hook (I'm still paging this
stuff back in, post vacation).
Thanks
Andrew