https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824

--- Comment #8 from Chris Elrod <elrodc at gmail dot com> ---
> If it's designed the way you want it to be, another issue would be like, 
> should we lower 512-bit vector builtins/intrinsic to ymm/xmm when 
> -mprefer-vector-width=256, the answer is we'd rather not. 

To be clear, what I meant by

>  it would be great to respect
> `-mprefer-vector-width=512`, it should ideally also be able to respect
> vector builtins/intrinsics

is that when someone uses 512 bit vector builtins, that codegen should generate
512 bit code regardless of `mprefer-vector-width` settings.
That is, as a developer, I would want 512 bit builtins to mean we get 512-bit
vector code generation.

>  If user explicitly use 512-bit vector type, builtins or intrinsics, gcc will 
> generate zmm no matter -mprefer-vector-width=.

This is what I would want, and I'd also want it to apply to movement of
`struct`s holding vector builtin objects, instead of the `ymm` usage as we see
here.

> And yes, there could be some mismatches between 512-bit intrinsic and 
> architecture tuning when you're using 512-bit intrinsic, and also rely on 
> compiler autogen to handle struct
> For such case, an explicit -mprefer-vector-width=512 is needed.

Note the template partial specialization

template <std::floating_point T, ptrdiff_t N> struct Vector<T,N>{
    static constexpr ptrdiff_t W = N >= VecWidth<T> ? VecWidth<T> :
ptrdiff_t(std::bit_ceil(size_t(N))); 
    static constexpr ptrdiff_t L = (N/W) + ((N%W)!=0);
    using V = Vec<W,T>;
    V data[L];
    static constexpr auto size()->ptrdiff_t{return N;}
};

Thus, `Vector`s in this example may explicitly be structs containing arrays of
vector builtins. I would expect these structs to not need an
`mprefer-vector-width=512` setting for producing 512 bit code handling this
struct.
Given small `L`, I would also expect passing this struct as an argument by
value to a non-inlined function to be done in `zmm` registers when possible,
for example.

Reply via email to