On Mon, 29 Jul 2024, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Mon, 29 Jul 2024, Jakub Jelinek wrote:
> >> And, for the GET_MODE_INNER, I also meant it for Aarch64/RISC-V VL vectors,
> >> I think those should be considered as true by the hook, not false
> >> because maybe_ne.
> >
> > I don't think relevant modes will have size/precision mismatches
> > and maybe_ne should work here.  Richard?
> 
> Yeah, I think that's true for AArch64 at least (not sure about RVV).
> 
> One wrinkle is that VNx16BI (every bit of a predicate) is technically
> suitable for memcpy, even though it would be a bad choice performance-wise.
> But VNx8BI (every even bit of a predicate) wouldn't, since the odd bits
> are undefined on read.
> 
> Arguably, this means that VNx8BI has the wrong precision, but like you
> say, we don't (AFAIK) support bitsize != precision for vector modes.
> Instead, the information that there is only one meaningful bit per
> boolean is represented by having an inner mode of BI.  Both VNx16BI
> and VNx8BI have an inner mode of BI, which means that VNx8BI's
> precision is not equal the its nunits * its unit precision.
> 
> So I suppose:
> 
>   maybe_ne (GET_MODE_BITSIZE (mode),
>             GET_MODE_UNIT_PRECISION (mode) * GET_MODE_NUNITS (mode))
> 
> would capture this.

OK, I'll adjust like this.

> Targets that want a vector bool mode with 2 meaningful bits per boolean
> are expected to define a 2-bit scalar boolean mode and use that as the
> inner mode.  So I think the condition above would (correctly) continue
> to allow those.

Hmm, but I think SVE mask registers could be used to transfer bits?
I tried the following

typedef svint64_t v4dfm __attribute__((vector_mask));

void __GIMPLE(ssa) foo(void *p)
{
  v4dfm _2;

__BB(2):
  _2 = __MEM <v4dfm> ((v4dfm *)p);
  __MEM <v4dfm> ((v4dfm *)p + 128) = _2;
  return;
}

and it produces

        ldr     p15, [x0]
        add     x0, x0, 128
        str     p15, [x0]

exactly the same code as if using svint8_t which gets
signed-boolean:1 vs signed-boolean:8, so that mask producing
instructions get you undefined bits doesn't mean that
reg<->mem moves do the same since the predicate registers
don't know what modes they operate in?

It might of course be prohibitive to copy memory like this
and there might not be GPR <-> predicate reg moves.

But technically ... for SVE predicates there aren't even any
types less than 8 bits in size (as there are for GCN and AVX512).

Richard.

Reply via email to