https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109302

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think the problem is that ccp1 decides to fold
  _4 = 0;
...
  VIEW_CONVERT_EXPR<__int128 unsigned[4]>(v)[_4] = _8;
into
  v_18 = BIT_INSERT_EXPR <v_13, _8, 0 (128 bits)>;
while the function is still TARGET_AVX512F and so has V4TImode support.
Later on during IPA the function is multi-versioned and one version has smaller
ISA support than before, nothing (e.g. generic vector lowering) lowers that
BIT_INSERT_EXPR
and expansion can't handle BLKmode BIT_INSERT_EXPR either.

Now, do we want to really support this loophole for lowering ISA capabilities?

I mean, say
#include <x86intrin.h>

__attribute__((target_clones("arch=x86-64", "default"))) __m512i
foo (__m512i a, __m512i b, __m512i c, __mmask8 d)
{
  return _mm512_mask_ternarylogic_epi64 (a, d, b, c, 3);
}
when compiled with -O2 -mno-sse3 it is rejected with
error: inlining failed in call to ‘always_inline’
‘_mm512_mask_ternarylogic_epi64’: target specific option mismatch
while with -O2 -mavx512f in this case it is caught during expansion:
error: ‘__builtin_ia32_pternlogq512_mask’ needs isa option -mavx512f
(which is still needed if one would use the builtins by hand), but generally,
we could
e.g. gimple_fold etc. something that we couldn't handle properly later on.

If we want to support this mess, should we handle it in generic vector lowering
or expansion?

Reply via email to