https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
--- Comment #7 from prathamesh3492 at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 })
> 2 and VECTOR_CST_NPATTERNS == 1. And the selector { 1, 0, 1, 2 } has
> npatterns == 1 and nelts-per-pattern == 3.
>
> /* (1) If SEL is a suitable mask as determined by
> valid_mask_for_fold_vec_perm_cst_p, then:
> res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> res_nelts_per_pattern = max of nelts_per_pattern between
> ARG0, ARG1 and SEL.
> (2) If SEL is not a suitable mask, and TYPE is VLS then:
> res_npatterns = nelts in result vector.
> res_nelts_per_pattern = 1.
> This exception is made so that VLS ARG0, ARG1 and SEL work as before.
> */
> if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> {
> res_npatterns
> = std::max (VECTOR_CST_NPATTERNS (arg0),
> std::max (VECTOR_CST_NPATTERNS (arg1),
> sel.encoding ().npatterns ()));
>
> res_nelts_per_pattern
> = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> sel.encoding ().nelts_per_pattern ()));
>
> res_nelts = res_npatterns * res_nelts_per_pattern;
>
> this seems to be a case that doesn't fit, so the fix needs to be to
> valid_mask_for_fold_vec_perm_cst_p which really looks a bit
> unwieldly.
valid_mask_for_fold_vec_perm_cst_p returns incorrectly true here,
which is being addressed in PR111648 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631926.html
Even if the vectors had integral element type:
arg0 = arg1 = (v4si){ 9, 0, 0, 0 } // encoded as {9, 0, ...}
and sel = { 1, 0, 1, 2 } // encoded as {1, 0, 1, ...}
The pattern in sel {1, 0, 1, ...}
would choose elements from arg0, and
res would have incorrect encoding with step = -9:
res = { arg0[1], arg0[0], arg0[1], ... }
= { 0, 9, 0, ... }
And res[3] will be incorrectly computed as -9 instead of arg0[2].
However, for floating element types, even if encoding is correct,
I assume it will still ICE when trying to derive elements not present in
encoding since poly_int_cst can only deal with integral elements ?
>
> An assert that res_nelts is power-of-two would be nice to add.
Sorry, I don't understand. res_nelts for VLA need not be power of 2,
since res_nelts_per_pattern can be 3. The encoding for res is chosen
to be max of npatterns and max of nelts_per_pattern between arg0, arg1, and
sel.
Thanks,
Prathamesh