Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

juzhe.zh...@rivai.ai Thu, 15 Jun 2023 03:09:05 -0700

Hi, Richi. I have sent the first splitted patch (only add ifn and optabs) as 
you suggested.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621874.html 
Could you take a look at it?
After this patch is approved, I will send the second patch (Support them into 
vectorizer) next.

Thanks!

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-06-15 17:52
To: Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford; krebbel; uweigand; 
linkw
Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for length loop control
On Thu, 15 Jun 2023, Robin Dapp wrote:

> > Meh, PoP is now behind a paywall, trying to get through ... I wonder
> > if there's a nice online html documenting the s390 len_load/store
> > instructions to better understand the need for the bias.
> 
> https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> 
> Look for vector load with length (store).  The length operand specifies
> the highest bytes to load instead of the actual length.

Hmm.  It indeed cannot represent len == 0, so you are making sure
that never happens?  Because when it is actually zero you are
going to get -1 here?  At least I don't see the bias operand used at
all:

; Implement len_load/len_store optabs with vll/vstl.
(define_expand "len_load_v16qi"
  [(match_operand:V16QI 0 "register_operand")
   (match_operand:V16QI 1 "memory_operand")
   (match_operand:QI 2 "register_operand")
   (match_operand:QI 3 "vll_bias_operand")
  ]
  "TARGET_VX && TARGET_64BIT"
{
  rtx mem = adjust_address (operands[1], BLKmode, 0);

  rtx len = gen_reg_rtx (SImode);
  emit_move_insn (len, gen_rtx_ZERO_EXTEND (SImode, operands[2]));
  emit_insn (gen_vllv16qi (operands[0], len, mem));
  DONE;
})

the docs of len_load say

"
@cindex @code{len_load_@var{m}} instruction pattern
@item @samp{len_load_@var{m}}
Load (operand 2 - operand 3) elements from vector memory operand 1
into vector register operand 0, setting the other elements of
operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
which must be a vector mode.  Operand 2 has whichever integer mode the
target prefers.  Operand 3 conceptually has mode @code{QI}. 

Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
constant bias: it is either a constant 0 or a constant -1.  The predicate 
on
operand 3 must only accept the bias values that the target actually 
supports.
GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than
elements, it should only implement this pattern for vectors of @code{QI}
elements."

the minus in 'operand 2 - operand 3' should be a plus if the
bias is really zero or -1.  I suppose

'If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.'

means that the vectorizer has to make sure the biased element
count never underflows?

That is, for a loop like

void foo (double *x, float *y, int n)
{
  for (int i = 0; i < n; ++i)
    y[i] = x[i];
}

you should get

   x1 = len_load (...);
   x2 = len_load (...);
   y = VEC_PACK_TRUNC_EXPR <x1, x2>
   len_store (..., y);

but then the x2 load can end up with a len of zero and thus
trap (since you will load either a full vector or the first
byte of it).  I see you do

  /* If the backend requires a bias of -1 for LEN_LOAD, we must not emit
     len_loads with a length of zero.  In order to avoid that we prohibit
     more than one loop length here.  */
  if (partial_load_bias == -1
      && LOOP_VINFO_LENS (loop_vinfo).length () > 1)
    return false;

that's quite conservative.  I think you can do better when the
loads are aligned, reading an extra byte when ignoring the bias
is OK and you at least know the very first element is used.
For stores you would need to emit compare&jump for all but
the first store of a group though ...

That said, I'm still not seeing where you actually apply the bias.

Richard.

Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

Reply via email to