On 6/14/23 09:00, Robin Dapp wrote:
Hi Juzhe,

the general method seems sane and useful (it's not very complicated).
I was just distracted by

Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

For this selector, we can use vmsltu + vmerge to optimize the codegen.

because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.

Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.

+/* Recognize the patterns that we can use merge operation to shuffle the
+   vectors. The value of Each element (index i) in selector can only be
+   either i or nunits + i.
+
+   E.g.
+   v = VEC_PERM_EXPR (v0, v1, selector),
+   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

Same.

+
+   We can transform such pattern into:
+
+   v = vcond_mask (v0, v1, mask),
+   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
+
+static bool
+shuffle_merge_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
+  int n_patterns = d->perm.encoding ().npatterns ();
+  poly_int64 vec_len = d->perm.length ();
+
+  for (int i = 0; i < n_patterns; ++i)
+    if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
+      return false;
+
+  for (int i = n_patterns; i < n_patterns * 2; i++)
+    if (!d->perm.series_p (i, n_patterns, i, n_patterns)
+       && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
+      return false;

Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?

I didn't go through all tests in detail but skimmed several.  All in all
looks good to me.
So I think that means we want a V2 for the comment updates. But I think we can go ahead and consider V2 pre-approved.

jeff

Reply via email to