Hi Juzhe,
the general method seems sane and useful (it's not very complicated).
I was just distracted by
Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the
common expression:
{ 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... }
For this selector, we can use vmsltu + vmerge to optimize the codegen.
because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.
Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.
+/* Recognize the patterns that we can use merge operation to shuffle the
+ vectors. The value of Each element (index i) in selector can only be
+ either i or nunits + i.
+
+ E.g.
+ v = VEC_PERM_EXPR (v0, v1, selector),
+ selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... }
Same.
+
+ We can transform such pattern into:
+
+ v = vcond_mask (v0, v1, mask),
+ mask = { 0, 1, 0, 1, 0, 1, ... }. */
+
+static bool
+shuffle_merge_patterns (struct expand_vec_perm_d *d)
+{
+ machine_mode vmode = d->vmode;
+ machine_mode sel_mode = related_int_vector_mode (vmode).require ();
+ int n_patterns = d->perm.encoding ().npatterns ();
+ poly_int64 vec_len = d->perm.length ();
+
+ for (int i = 0; i < n_patterns; ++i)
+ if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
+ return false;
+
+ for (int i = n_patterns; i < n_patterns * 2; i++)
+ if (!d->perm.series_p (i, n_patterns, i, n_patterns)
+ && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
+ return false;
Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?
I didn't go through all tests in detail but skimmed several. All in all
looks good to me.