https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
--- Comment #15 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- So it seems that if at least one of the vector builtins involved in the expression is 512 bits GCC needs to locally increase prefer-vector-width to 512? Or, more generally: prefer-vector-width = max(prefer-vector-width, 8 * sizeof(operands)..., 8 * sizeof(return-value)) The reason to default to 256 bits is to avoid zmm register usage altogether (clock-down). But if the surrounding code already uses zmm registers that motivation is moot. Also, I think this shouldn't be considered auto-vectorization but rather pattern recognition (recognizing a __builtin_convertvector).