Re: [RFC] Proposal to support Packed Boolean Vector masks.

Richard Biener Wed, 10 Jul 2024 04:07:33 -0700

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Tejas Belagod <tejas.bela...@arm.com> writes:
> > On 7/10/24 2:38 PM, Richard Biener wrote:
> >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> 
> >> wrote:
> >>>
> >>> On 7/9/24 4:22 PM, Richard Biener wrote:
> >>>> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> 
> >>>> wrote:
> >>>>>
> >>>>> On 7/8/24 4:45 PM, Richard Biener wrote:
> >>>>>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod <tejas.bela...@arm.com> 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Sorry to have dropped the ball on
> >>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> >>>>>>> here I've tried to pick it up again and write up a strawman proposal 
> >>>>>>> for
> >>>>>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Tejas.
> >>>>>>>
> >>>>>>> Motivation
> >>>>>>> ----------
> >>>>>>>
> >>>>>>> The idea of packed boolean vectors came about when we wanted to 
> >>>>>>> support
> >>>>>>> C/C++ operators on SVE ACLE types. The current vector boolean type 
> >>>>>>> that
> >>>>>>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>>>>>> which
> >>>>>>> they were derived off of. Consider this simple, albeit unrealistic, 
> >>>>>>> example:
> >>>>>>>
> >>>>>>>       bool foo (svint32_t a, svint32_t b)
> >>>>>>>       {
> >>>>>>>         svbool_t p = a > b;
> >>>>>>>
> >>>>>>>         // Here p[2] is not the same as a[2] > b[2].
> >>>>>>>         return p[2];
> >>>>>>>       }
> >>>>>>>
> >>>>>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, 
> >>>>>>> p[i]
> >>>>>>> does not return the bool value corresponding to a[i] > b[i]. This
> >>>>>>> necessitates a 'typed' vector boolean value that unambiguously
> >>>>>>> represents results of operations
> >>>>>>> of the same type.
> >>>>>>>
> >>>>>>> __attribute__((vector_mask))
> >>>>>>> -----------------------------
> >>>>>>>
> >>>>>>> Note: If interested in historical discussions refer to:
> >>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>>>>>
> >>>>>>> We define this new attribute which when applied to a base data vector
> >>>>>>> produces a new boolean vector type that represents a boolean type that
> >>>>>>> is produced as a result of operations on the corresponding base vector
> >>>>>>> type. The following is the syntax.
> >>>>>>>
> >>>>>>>       typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
> >>>>>>>       typedef v8si v8sib __attribute__((vector_mask));
> >>>>>>>
> >>>>>>> Here the 'base' data vector type is v8si or a vector of 8 integers.
> >>>>>>>
> >>>>>>> Rules
> >>>>>>>
> >>>>>>> • The layout/size of the boolean vector type is implementation-defined
> >>>>>>> for its base data vector type.
> >>>>>>>
> >>>>>>> • Two boolean vector types who's base data vector types have same 
> >>>>>>> number
> >>>>>>> of elements and lane-width have the same layout and size.
> >>>>>>>
> >>>>>>> • Consequently, two boolean vectors who's base data vector types have
> >>>>>>> different number of elements or different lane-size have different 
> >>>>>>> layouts.
> >>>>>>>
> >>>>>>> This aligns with gnu vector extensions that generate integer vectors 
> >>>>>>> as
> >>>>>>> a result of comparisons - "The result of the comparison is a vector of
> >>>>>>> the same width and number of elements as the comparison operands with 
> >>>>>>> a
> >>>>>>> signed integral element type." according to
> >>>>>>>        https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>>>>>
> >>>>>> Without having the time to re-review this all in detail I think the GNU
> >>>>>> vector extension does not expose the result of the comparison as the
> >>>>>> machine would produce it but instead a comparison "decays" to
> >>>>>> a conditional:
> >>>>>>
> >>>>>> typedef int v4si __attribute__((vector_size(16)));
> >>>>>>
> >>>>>> v4si a;
> >>>>>> v4si b;
> >>>>>>
> >>>>>> void foo()
> >>>>>> {
> >>>>>>      auto r = a < b;
> >>>>>> }
> >>>>>>
> >>>>>> produces, with C23:
> >>>>>>
> >>>>>>      vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 
> >>>>>> 0,
> >>>>>> 0, 0, 0 } > ;
> >>>>>>
> >>>>>> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >>>>>> produced" mask types and the above could either produce a AVX mask with
> >>>>>> 32bit elements or a AVX512 mask with 1bit elements.
> >>>>>>
> >>>>>> Not exposing "native" mask types requires the compiler optimizing 
> >>>>>> subsequent
> >>>>>> uses and makes generic vectors difficult to combine with for example 
> >>>>>> AVX512
> >>>>>> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
> >>>>>> also
> >>>>>> even more difficult to optimize mask transitions.
> >>>>>>
> >>>>>> But it at least allows portable code and it does not suffer from users 
> >>>>>> trying to
> >>>>>> expose machine representations of masks as input to generic vector code
> >>>>>> with all the problems of constant folding not only requiring 
> >>>>>> self-consistent
> >>>>>> code within the compiler but compatibility with user produced constant 
> >>>>>> masks.
> >>>>>>
> >>>>>> That said, I somewhat question the need to expose the target mask 
> >>>>>> layout
> >>>>>> to users for GCCs generic vector extension.
> >>>>>>
> >>>>>
> >>>>> Thanks for your feedback.
> >>>>>
> >>>>> IIUC, I can imagine how having a GNU vector extension exposing the
> >>>>> target vector mask layout can pose a challenge - maybe making it a
> >>>>> generic GNU vector extension was too ambitious. I wonder if there's
> >>>>> value in pursuing these alternate paths?
> >>>>>
> >>>>> 1. Can implementing this extension in a 'generic' way i.e. possibly not
> >>>>> implement it with a target mask, but just a generic int vector, still
> >>>>> maintain the consistency of GNU predicate vectors within the compiler? I
> >>>>> know it may not seem very different from how boolean vectors are
> >>>>> currently implemented (as in your above example), but, having the
> >>>>> __attribute__((vector_mask)) as a 'property' of the object makes it
> >>>>> useful to optimize its uses to target predicates in subsequent stages of
> >>>>> the compiler.
> >>>>>
> >>>>> 2. Restricting __attribute__((vector_mask)) to apply only to target
> >>>>> intrinsic types? Eg.
> >>>>>
> >>>>> On SVE something like:
> >>>>> typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.
> >>>>>
> >>>>> On AVX, something like:
> >>>>> typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
> >>>>> this would require more fine-grained defn of lane-size to mask-bits 
> >>>>> mapping.
> >>>>
> >>>> I think the target should be able to register builtin types already which
> >>>> intrinsics could use.  There is already the vector_mask attribute but 
> >>>> only
> >>>> for GIMPLE and it has the same limitation of querying the target for the
> >>>> actual mode being used - for AVX vs AVX512 one might be able to
> >>>> combine this with a mode attribute.  Not sure if on arm you can parse
> >>>> __attribute__((mode("Vx4BI4"))) or how the modes are called.
> >>>>
> >>>> But when you are talking about intrinsics I'd really suggest to leave the
> >>>> type creation to the target rather than trying to do a typedef in a 
> >>>> header?
> >>>>
> >>>
> >>> Yeah, thinking about this a bit more, makes sense to keep intrinsic type
> >>> creation in the target realm.
> >>>
> >>> Just to clarify if I understand your point about exposing masks' machine
> >>> representations, would representing vector_mask types using opaque
> >>> types/modes have the same challenges with compatibility with generic
> >>> vector constants as it essentially would be a parallel type system, and
> >>> would be unaffected by constant-folding etc due to their opacity? I ask
> >>> because opacity might give the representation the flexibility of
> >>> 'decaying' to a type based on the context it is used in.
> >>
> >> I also thought about using an opaque type but I wonder if it really suits
> >> here?
> >
> > Sorry, yes using opaque type was your idea from last year's thread - I
> > merely reiterated it here. :-)
> >
> > Or would the target then need to decay a mask[i] into something
> >> that's later recognizable?
> >>
> >
> > I think that would depend on the usage, wouldn't it - it could lower
> > down to target insn(s) based on how whether, for eg, its used as a test
> > or read as a scalar value?
> >
> >
> >> So I guess the answer is you'd have to try.
> >
> > Thanks for your feedback so far - much appreciated. If it helps, I will
> > try to write up a prototype to test the idea - might help clear the mist
> > further.
>
> Just to note that one of the original motivations (that applies more
> to option 3 from last year's proposal) was to add support for general
> packed vector boolean types to the GNU vector extension, as a feature
> independent of the target's "native" format(s).  Clang already supports
> this via ext_vector_type and it seemed like there might be value in
> providing something similar for the GNU extensions.


But that's more for data, aka vector bool, not for what's produced by
targets from vector comparisons?  So yes, I suppose that's reasonable
but representation would then be fully defined by the extension
rather than by however the target computes the actual comparison
result vector.

Richard.

> Thanks,
> Richard
>
>

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to