Re: [RFC] Proposal to support Packed Boolean Vector masks.

Richard Biener Thu, 11 Jul 2024 23:16:45 -0700

On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod <tejas.bela...@arm.com> wrote:
>
> On 7/10/24 4:37 PM, Richard Biener wrote:
> > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Tejas Belagod <tejas.bela...@arm.com> writes:
> >>> On 7/10/24 2:38 PM, Richard Biener wrote:
> >>>> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> 
> >>>> wrote:
> >>>>>
> >>>>> On 7/9/24 4:22 PM, Richard Biener wrote:
> >>>>>> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On 7/8/24 4:45 PM, Richard Biener wrote:
> >>>>>>>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod 
> >>>>>>>> <tejas.bela...@arm.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Sorry to have dropped the ball on
> >>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
> >>>>>>>>> here I've tried to pick it up again and write up a strawman 
> >>>>>>>>> proposal for
> >>>>>>>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Tejas.
> >>>>>>>>>
> >>>>>>>>> Motivation
> >>>>>>>>> ----------
> >>>>>>>>>
> >>>>>>>>> The idea of packed boolean vectors came about when we wanted to 
> >>>>>>>>> support
> >>>>>>>>> C/C++ operators on SVE ACLE types. The current vector boolean type 
> >>>>>>>>> that
> >>>>>>>>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>>>>>>>> which
> >>>>>>>>> they were derived off of. Consider this simple, albeit unrealistic, 
> >>>>>>>>> example:
> >>>>>>>>>
> >>>>>>>>>        bool foo (svint32_t a, svint32_t b)
> >>>>>>>>>        {
> >>>>>>>>>          svbool_t p = a > b;
> >>>>>>>>>
> >>>>>>>>>          // Here p[2] is not the same as a[2] > b[2].
> >>>>>>>>>          return p[2];
> >>>>>>>>>        }
> >>>>>>>>>
> >>>>>>>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, 
> >>>>>>>>> p[i]
> >>>>>>>>> does not return the bool value corresponding to a[i] > b[i]. This
> >>>>>>>>> necessitates a 'typed' vector boolean value that unambiguously
> >>>>>>>>> represents results of operations
> >>>>>>>>> of the same type.
> >>>>>>>>>
> >>>>>>>>> __attribute__((vector_mask))
> >>>>>>>>> -----------------------------
> >>>>>>>>>
> >>>>>>>>> Note: If interested in historical discussions refer to:
> >>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>>>>>>>
> >>>>>>>>> We define this new attribute which when applied to a base data 
> >>>>>>>>> vector
> >>>>>>>>> produces a new boolean vector type that represents a boolean type 
> >>>>>>>>> that
> >>>>>>>>> is produced as a result of operations on the corresponding base 
> >>>>>>>>> vector
> >>>>>>>>> type. The following is the syntax.
> >>>>>>>>>
> >>>>>>>>>        typedef int v8si __attribute__((vector_size (8 * sizeof 
> >>>>>>>>> (int)));
> >>>>>>>>>        typedef v8si v8sib __attribute__((vector_mask));
> >>>>>>>>>
> >>>>>>>>> Here the 'base' data vector type is v8si or a vector of 8 integers.
> >>>>>>>>>
> >>>>>>>>> Rules
> >>>>>>>>>
> >>>>>>>>> • The layout/size of the boolean vector type is 
> >>>>>>>>> implementation-defined
> >>>>>>>>> for its base data vector type.
> >>>>>>>>>
> >>>>>>>>> • Two boolean vector types who's base data vector types have same 
> >>>>>>>>> number
> >>>>>>>>> of elements and lane-width have the same layout and size.
> >>>>>>>>>
> >>>>>>>>> • Consequently, two boolean vectors who's base data vector types 
> >>>>>>>>> have
> >>>>>>>>> different number of elements or different lane-size have different 
> >>>>>>>>> layouts.
> >>>>>>>>>
> >>>>>>>>> This aligns with gnu vector extensions that generate integer 
> >>>>>>>>> vectors as
> >>>>>>>>> a result of comparisons - "The result of the comparison is a vector 
> >>>>>>>>> of
> >>>>>>>>> the same width and number of elements as the comparison operands 
> >>>>>>>>> with a
> >>>>>>>>> signed integral element type." according to
> >>>>>>>>>         https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>>>>>>>
> >>>>>>>> Without having the time to re-review this all in detail I think the 
> >>>>>>>> GNU
> >>>>>>>> vector extension does not expose the result of the comparison as the
> >>>>>>>> machine would produce it but instead a comparison "decays" to
> >>>>>>>> a conditional:
> >>>>>>>>
> >>>>>>>> typedef int v4si __attribute__((vector_size(16)));
> >>>>>>>>
> >>>>>>>> v4si a;
> >>>>>>>> v4si b;
> >>>>>>>>
> >>>>>>>> void foo()
> >>>>>>>> {
> >>>>>>>>       auto r = a < b;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> produces, with C23:
> >>>>>>>>
> >>>>>>>>       vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } 
> >>>>>>>> , { 0,
> >>>>>>>> 0, 0, 0 } > ;
> >>>>>>>>
> >>>>>>>> In fact on x86_64 with AVX and AVX512 you have two different "machine
> >>>>>>>> produced" mask types and the above could either produce a AVX mask 
> >>>>>>>> with
> >>>>>>>> 32bit elements or a AVX512 mask with 1bit elements.
> >>>>>>>>
> >>>>>>>> Not exposing "native" mask types requires the compiler optimizing 
> >>>>>>>> subsequent
> >>>>>>>> uses and makes generic vectors difficult to combine with for example 
> >>>>>>>> AVX512
> >>>>>>>> intrinsics (where masks are just 'int').  Across an ABI boundary 
> >>>>>>>> it's also
> >>>>>>>> even more difficult to optimize mask transitions.
> >>>>>>>>
> >>>>>>>> But it at least allows portable code and it does not suffer from 
> >>>>>>>> users trying to
> >>>>>>>> expose machine representations of masks as input to generic vector 
> >>>>>>>> code
> >>>>>>>> with all the problems of constant folding not only requiring 
> >>>>>>>> self-consistent
> >>>>>>>> code within the compiler but compatibility with user produced 
> >>>>>>>> constant masks.
> >>>>>>>>
> >>>>>>>> That said, I somewhat question the need to expose the target mask 
> >>>>>>>> layout
> >>>>>>>> to users for GCCs generic vector extension.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Thanks for your feedback.
> >>>>>>>
> >>>>>>> IIUC, I can imagine how having a GNU vector extension exposing the
> >>>>>>> target vector mask layout can pose a challenge - maybe making it a
> >>>>>>> generic GNU vector extension was too ambitious. I wonder if there's
> >>>>>>> value in pursuing these alternate paths?
> >>>>>>>
> >>>>>>> 1. Can implementing this extension in a 'generic' way i.e. possibly 
> >>>>>>> not
> >>>>>>> implement it with a target mask, but just a generic int vector, still
> >>>>>>> maintain the consistency of GNU predicate vectors within the 
> >>>>>>> compiler? I
> >>>>>>> know it may not seem very different from how boolean vectors are
> >>>>>>> currently implemented (as in your above example), but, having the
> >>>>>>> __attribute__((vector_mask)) as a 'property' of the object makes it
> >>>>>>> useful to optimize its uses to target predicates in subsequent stages 
> >>>>>>> of
> >>>>>>> the compiler.
> >>>>>>>
> >>>>>>> 2. Restricting __attribute__((vector_mask)) to apply only to target
> >>>>>>> intrinsic types? Eg.
> >>>>>>>
> >>>>>>> On SVE something like:
> >>>>>>> typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.
> >>>>>>>
> >>>>>>> On AVX, something like:
> >>>>>>> typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
> >>>>>>> this would require more fine-grained defn of lane-size to mask-bits 
> >>>>>>> mapping.
> >>>>>>
> >>>>>> I think the target should be able to register builtin types already 
> >>>>>> which
> >>>>>> intrinsics could use.  There is already the vector_mask attribute but 
> >>>>>> only
> >>>>>> for GIMPLE and it has the same limitation of querying the target for 
> >>>>>> the
> >>>>>> actual mode being used - for AVX vs AVX512 one might be able to
> >>>>>> combine this with a mode attribute.  Not sure if on arm you can parse
> >>>>>> __attribute__((mode("Vx4BI4"))) or how the modes are called.
> >>>>>>
> >>>>>> But when you are talking about intrinsics I'd really suggest to leave 
> >>>>>> the
> >>>>>> type creation to the target rather than trying to do a typedef in a 
> >>>>>> header?
> >>>>>>
> >>>>>
> >>>>> Yeah, thinking about this a bit more, makes sense to keep intrinsic type
> >>>>> creation in the target realm.
> >>>>>
> >>>>> Just to clarify if I understand your point about exposing masks' machine
> >>>>> representations, would representing vector_mask types using opaque
> >>>>> types/modes have the same challenges with compatibility with generic
> >>>>> vector constants as it essentially would be a parallel type system, and
> >>>>> would be unaffected by constant-folding etc due to their opacity? I ask
> >>>>> because opacity might give the representation the flexibility of
> >>>>> 'decaying' to a type based on the context it is used in.
> >>>>
> >>>> I also thought about using an opaque type but I wonder if it really suits
> >>>> here?
> >>>
> >>> Sorry, yes using opaque type was your idea from last year's thread - I
> >>> merely reiterated it here. :-)
> >>>
> >>> Or would the target then need to decay a mask[i] into something
> >>>> that's later recognizable?
> >>>>
> >>>
> >>> I think that would depend on the usage, wouldn't it - it could lower
> >>> down to target insn(s) based on how whether, for eg, its used as a test
> >>> or read as a scalar value?
> >>>
> >>>
> >>>> So I guess the answer is you'd have to try.
> >>>
> >>> Thanks for your feedback so far - much appreciated. If it helps, I will
> >>> try to write up a prototype to test the idea - might help clear the mist
> >>> further.
> >>
> >> Just to note that one of the original motivations (that applies more
> >> to option 3 from last year's proposal) was to add support for general
> >> packed vector boolean types to the GNU vector extension, as a feature
> >> independent of the target's "native" format(s).  Clang already supports
> >> this via ext_vector_type and it seemed like there might be value in
> >> providing something similar for the GNU extensions.
> >
> > But that's more for data, aka vector bool, not for what's produced by
> > targets from vector comparisons?  So yes, I suppose that's reasonable
> > but representation would then be fully defined by the extension
> > rather than by however the target computes the actual comparison
> > result vector.
> >
>
> Sorry for the slow response.
>
> Thanks RichardS for your timely comment. Sorry, I might have gotten
> ambitious with the original vector bool proposal and went down the route
> of supporting 'native' formats with vector_mask, but scaling my
> ambitions back to a boolean vector of a certain representation that is
> independent of the target's native format and defined by the extension
> itself is a more realistic proposition.
>
> To reiterate option 3 from last year's proposal, currently we don't support
>
>   typedef bool vbool __attribute__((__vector_size__(64)));
>
> But if we did, could we support a more layout-friendly form i.e.
>
>    typedef bool vbool __attribute__((vector_size (s, n[, w])));
>
> where 's' is size in bytes, 'n' is the number of lanes and an optional
> 3rd parameter 'w' is the number of bits of the PBV that represents a
> lane of the target vector? 'w' would allow a target to force a certain
> layout of the PBV.


isn't one of s, n or w redundant?  That is, w == (s * 8) / n?  Or
would vector_size (8, 32, 1) put in 1 bit of "padding" per lane?
(but where?)

That said, how about

typedef unsigned _BitInt(1) vbool __attribute__((vector_size (8)));

instead?  Slight complication is that _BitInt isn't supported in C++,
but I suppose that could be fixed at least as extension?

As we currently reject

typedef _Bool vbool __attribute__((vector_size (8)));

we can also chose to accept that as the 1-bit case at least.

> I don't know if overloading vector_size is a good idea though...

Is there precedent in other compilers for supporting bit-precision
vector components in extensions to GCCs vector extension?

Richard.

> Thanks,
> Tejas.
>
>
> > Richard.
> >
> >> Thanks,
> >> Richard
> >>
> >>
>

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to