https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824
--- Comment #8 from Chris Elrod <elrodc at gmail dot com> --- > If it's designed the way you want it to be, another issue would be like, > should we lower 512-bit vector builtins/intrinsic to ymm/xmm when > -mprefer-vector-width=256, the answer is we'd rather not. To be clear, what I meant by > it would be great to respect > `-mprefer-vector-width=512`, it should ideally also be able to respect > vector builtins/intrinsics is that when someone uses 512 bit vector builtins, that codegen should generate 512 bit code regardless of `mprefer-vector-width` settings. That is, as a developer, I would want 512 bit builtins to mean we get 512-bit vector code generation. > If user explicitly use 512-bit vector type, builtins or intrinsics, gcc will > generate zmm no matter -mprefer-vector-width=. This is what I would want, and I'd also want it to apply to movement of `struct`s holding vector builtin objects, instead of the `ymm` usage as we see here. > And yes, there could be some mismatches between 512-bit intrinsic and > architecture tuning when you're using 512-bit intrinsic, and also rely on > compiler autogen to handle struct > For such case, an explicit -mprefer-vector-width=512 is needed. Note the template partial specialization template <std::floating_point T, ptrdiff_t N> struct Vector<T,N>{ static constexpr ptrdiff_t W = N >= VecWidth<T> ? VecWidth<T> : ptrdiff_t(std::bit_ceil(size_t(N))); static constexpr ptrdiff_t L = (N/W) + ((N%W)!=0); using V = Vec<W,T>; V data[L]; static constexpr auto size()->ptrdiff_t{return N;} }; Thus, `Vector`s in this example may explicitly be structs containing arrays of vector builtins. I would expect these structs to not need an `mprefer-vector-width=512` setting for producing 512 bit code handling this struct. Given small `L`, I would also expect passing this struct as an argument by value to a non-inlined function to be done in `zmm` registers when possible, for example.