RE: [PATCH 1/3][RFC] Implement bool reduction vectorization

Richard Biener Wed, 15 Oct 2025 02:16:13 -0700

On Wed, 15 Oct 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Robin Dapp <[email protected]>
> > Sent: 15 October 2025 09:12
> > To: Richard Biener <[email protected]>; [email protected]
> > Cc: Tamar Christina <[email protected]>; [email protected]
> > Subject: Re: [PATCH 1/3][RFC] Implement bool reduction vectorization
> > 
> > > Currently we mess up here in two places.  One is pattern recognition
> > > which computes a mask-precision for a bool reduction PHI that's
> > > inconsistent with that of the latch definition.  This is solved by
> > > iterating the mask-precision computation.  The second is that the
> > > reduction epilogue generation and the code querying support for it
> > > isn't ready for mask inputs.  This should be fixed as well, mainly
> > > by doing all the epilogue processing on a data type again, for now
> > > at least.  We probably want reduc_mask_{and,ior,xor}_scal optabs
> > > so we can go the direct IFN path on masks if the target supports
> > > that.
> > 
> > Why not reuse the regular reduc_{and,...}_scal optabs allowing mask modes
> > similar to vec_extract?
> 
> Likely because the reduction operators need to be well defined to return a 
> boolean value
> and not an inorder reduction as the existing reduc_{}_scal require.


SSE/AVX have regular integer vector modes as mask modes, so there's lack
of information that the V4SImode is actually a -1/0 boolean where this
knowledge allows for more efficient code generation.

> So for e.g. Adv. SIMD where the VECTOR_BOOLEAN_TYPE is defined as an integer
> vector e.g. V4SI, the current reduction optabs would say we return SI the 
> result
> needs to be {-1, 0}.  But the knowledge that we want to reduce to a boolean
> helps us here since we can generate better code knowing that the actual result
> is only {1,0}.
> 
> For instance SVE the reduction of VNx16BImode to BImode is not something we 
> can do
> since masks aren't supposed to be treated as values, only used to compute 
> values.

We'd need to deviate from the scalar result mode being the vector 
component mode (that's what my proposal does).

I'll send an updated patch ASAP, I'm not exactly mirroring
vec_pack_sbool_trunc which we seem to only use for the case where
the mode is ambiguous (as we've added this late IIRC), but instead
have reduc_sbool_{and,ior,xor}_scal have an additional CONST_INT
operand, but only when the mode is scalar integer.  Not sure how
well a "conditional extra operand" will play out in practice, but
at least I have not to worry to create a POLY_INT there for the VLA
case ... (and I didn't want to have two different optabs).  For
the unambiguous cases the operand should be easy to ignore, having
it for all integer mode inputs might help unifying the patterns.

We'll see.

Richard.


> Thanks,
> Tamar
> 
> > I guess the scalar mode might be controversial then but we already have that
> > when extracting from masks where RVV needs to support both, QImode and
> > BImode.
> > (Something I've been wanting to fix for a while...)
> > 
> > > [2/3] adds these optabs and [3/3] is a way to use them but with no
> > > actual target implementation (I stubbed a x86 one for one case to
> > > get the code miscompiled^Wexercised).
> > >
> > > I wonder if there's any feedback on the epilogue handling, esp. how
> > > SVE or RVV would be able to handle vector mask reduction to a
> > > scalar bool in the epilogue.  Esp. whether you think there is already
> > > sufficient functionality that I just didn't see.
> > 
> > We'd need to emulate them as we, generally, only have unary/binary
> > operations
> > on masks.  What we do have is popcount on a mask with a scalar destination,
> > so
> > "mask_popcount" =  "popcount + extract_first".
> > 
> > This itself is of course not exposed as optab yet.  Likely wouldn't be
> > generally useful as it includes an extract?
> > 
> > Anyway, the reductions could look like:
> > 
> > reduc_xor (mask) = ("mask_popcount" (mask, other_mask, len, ...) & 1)
> > reduc_ior (mask) = ("mask_popcount" (...) != 0)
> > reduc_and (mask) = ("mask_popcount" == len)
> > 
> > if I didn't mess up the bit-foo.  The comparisons we'd perform in the
> > scalar domain.
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 1/3][RFC] Implement bool reduction vectorization

Reply via email to