On 6/16/23 04:29, juzhe.zh...@rivai.ai wrote:
From: Ju-Zhe Zhong <juzhe.zh...@rivai.ai>

This patch bootstrap pass on X86, ok for trunk ?

Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.

The description of the patterns in doc are coming Robin.

After this patch is approved, will send the second patch to apply len_mask_*
patterns into vectorizer.

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:                                   v1 = MEM (...)
   for (int i = 0; i < 4; i++)           v2 = MEM (...)
     a[i] = b[i] + c[i];                 v3 = v1 + v2
                                         MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:                                   mask = comparison
   for (int i = 0; i < 4; i++)           v1 = LEN_MASK_LOAD (length = VF, mask)
     if (cond[i])                        v2 = LEN_MASK_LOAD (length = VF, mask)
       a[i] = b[i] + c[i];               v3 = v1 + v2
                                         LEN_MASK_STORE (length = VF, mask, v3)
Case 3 (VLA):
Code:                                   loop_len = SELECT_VL or MIN
   for (int i = 0; i < n; i++)           v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
       a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
                                         v3 = v1 + v2
                                         LEN_MASK_STORE (length = loop_len, 
mask = {-1,-1,...}, v3)

Case 4 (VLA):
Code:                                   loop_len = SELECT_VL or MIN
   for (int i = 0; i < n; i++)           mask = comparison
       if (cond[i])                      v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
       a[i] = b[i] + c[i];               v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
                                         v3 = v1 + v2
                                         LEN_MASK_STORE (length = loop_len, 
mask, v3)

Co-authored-by: Robin Dapp <rdapp....@gmail.com>

gcc/ChangeLog:

         * doc/md.texi: Add len_mask{load,store}.
         * genopinit.cc (main): Ditto.
         (CMP_NAME): Ditto.
         * internal-fn.cc (len_maskload_direct): Ditto.
         (len_maskstore_direct): Ditto.
         (expand_call_mem_ref): Ditto.
         (expand_partial_load_optab_fn): Ditto.
         (expand_len_maskload_optab_fn): Ditto.
         (expand_partial_store_optab_fn): Ditto.
         (expand_len_maskstore_optab_fn): Ditto.
         (direct_len_maskload_optab_supported_p): Ditto.
         (direct_len_maskstore_optab_supported_p): Ditto.
         * internal-fn.def (LEN_MASK_LOAD): Ditto.
         (LEN_MASK_STORE): Ditto.
         * optabs.def (OPTAB_CD): Ditto.
You should probably mention the change to len_load in the ChangeLog entry for md.texi.



@@ -5136,6 +5136,57 @@ of @code{QI} elements.
This pattern is not allowed to @code{FAIL}. +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load of (operand 2 + operand 4) elements from vector memory
+operand 1 into vector register operand 0, setting the other elements of
+operand 0 to undefined values.  This is a combination of len_load and maskload.
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
+has whichever integer mode the target prefers.  A mask is specified in
+operand 3 which must be of type @var{n}.  The mask has lower precedence than
+the length and is itself subject to length masking,
+i.e. only mask indices < (operand 2 + operand 4) are used.
+Operand 4 conceptually has mode @code{QI}.s/vector memory/memory/

That first sentence is a bit difficult to parse. Perhaps break it into two sentences. Something like this?

Perform a masked load from the memory location pointed to be operand 1 into register operand 0. operand 2 + operand 4 elements are loaded from memory and other elements in operand 0 are set to undefined values.

Presumably the element length is encoded by the modes? Is this worth mentioning?


+
+This pattern is not allowed to @code{FAIL}.
If the pattern is not allowed to fail, then what code enforces the bias argument's restrictions? I don't see it in the generic expander code.


+
+@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
+@item @samp{len_maskstore@var{m}@var{n}}
+Perform a masked store of (operand 2 + operand 4) vector elements from vector 
register
+operand 1 into memory operand 0, leaving the other elements of operand 0 
unchanged.
+This is a combination of len_store and maskstore.
I'd think this could be adjusted in roughly the same manner as I suggested for loads.


diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 0c1b6859ca0..6bd8858a1d9 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc

@@ -386,7 +387,8 @@ main (int argc, const char **argv)
      {
  #define CMP_NAME(N) !strncmp (p->name, (N), strlen ((N)))
        if (CMP_NAME("while_ult") || CMP_NAME ("len_load")
-         || CMP_NAME ("len_store"))
+         || CMP_NAME ("len_store")|| CMP_NAME ("len_maskload")
Whitespace nit.  Insert a space before the "||".


Looks reasonable to me. I think we're going to need a V2, but I don't see major conceptual issues, just minor details to work through.

jeff

Reply via email to