https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109302
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think the problem is that ccp1 decides to fold
_4 = 0;
...
VIEW_CONVERT_EXPR<__int128 unsigned[4]>(v)[_4] = _8;
into
v_18 = BIT_INSERT_EXPR <v_13, _8, 0 (128 bits)>;
while the function is still TARGET_AVX512F and so has V4TImode support.
Later on during IPA the function is multi-versioned and one version has smaller
ISA support than before, nothing (e.g. generic vector lowering) lowers that
BIT_INSERT_EXPR
and expansion can't handle BLKmode BIT_INSERT_EXPR either.
Now, do we want to really support this loophole for lowering ISA capabilities?
I mean, say
#include <x86intrin.h>
__attribute__((target_clones("arch=x86-64", "default"))) __m512i
foo (__m512i a, __m512i b, __m512i c, __mmask8 d)
{
return _mm512_mask_ternarylogic_epi64 (a, d, b, c, 3);
}
when compiled with -O2 -mno-sse3 it is rejected with
error: inlining failed in call to ‘always_inline’
‘_mm512_mask_ternarylogic_epi64’: target specific option mismatch
while with -O2 -mavx512f in this case it is caught during expansion:
error: ‘__builtin_ia32_pternlogq512_mask’ needs isa option -mavx512f
(which is still needed if one would use the builtins by hand), but generally,
we could
e.g. gimple_fold etc. something that we couldn't handle properly later on.
If we want to support this mess, should we handle it in generic vector lowering
or expansion?