Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

Richard Biener via Gcc-patches Thu, 29 Jun 2023 06:28:50 -0700

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod <tejas.bela...@arm.com> wrote:
>
>
>
>
>
> From: Richard Biener <richard.guent...@gmail.com>
> Date: Tuesday, June 27, 2023 at 12:58 PM
> To: Tejas Belagod <tejas.bela...@arm.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>
> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod <tejas.bela...@arm.com> wrote:
> >
> >
> >
> >
> >
> > From: Richard Biener <richard.guent...@gmail.com>
> > Date: Monday, June 26, 2023 at 2:23 PM
> > To: Tejas Belagod <tejas.bela...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
> > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >
> > On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > >
> > > Packed Boolean Vectors
> > > ----------------------
> > >
> > > I'd like to propose a feature addition to GNU Vector extensions to add 
> > > packed
> > > boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> > > variant has
> > > been implemented in Clang recently[2].
> > >
> > > With predication features being added to vector architectures (SVE, MVE, 
> > > AVX),
> > > it is a useful feature to have to model predication on targets.  This 
> > > could
> > > find its use in intrinsics or just used as is as a GNU vector extension 
> > > being
> > > mapped to underlying target features.  For example, the packed boolean 
> > > vector
> > > could directly map to a predicate register on SVE.
> > >
> > > Also, this new packed boolean type GNU extension can be used with SVE ACLE
> > > intrinsics to replace a fixed-length svbool_t.
> > >
> > > Here are a few options to represent the packed boolean vector type.
> >
> > The GIMPLE frontend uses a new 'vector_mask' attribute:
> >
> > typedef int v8si __attribute__((vector_size(8*sizeof(int))));
> > typedef v8si v8sib __attribute__((vector_mask));
> >
> > it get's you a vector type that's the appropriate (dependent on the
> > target) vector
> > mask type for the vector data type (v8si in this case).
> >
> >
> >
> > Thanks Richard.
> >
> > Having had a quick look at the implementation, it does seem to tick the 
> > boxes.
> >
> > I must admit I haven't dug deep, but if the target hook allows the mask to 
> > be
> >
> > defined in way that is target-friendly (and I don't know how much effort it 
> > will
> >
> > be to migrate the attribute to more front-ends), it should do the job 
> > nicely.
> >
> > Let me go back and dig a bit deeper and get back with questions if any.
>
>
> Let me add that the advantage of this is the compiler doesn't need
> to support weird explicitely laid out packed boolean vectors that do
> not match what the target supports and the user doesn't need to know
> what the target supports (and thus have an #ifdef maze around explicitely
> specified layouts).
>
> Sorry for the delayed response – I spent a day experimenting with vector_mask.
>
>
>
> Yeah, this is what option 4 in the RFC is trying to achieve – be portable 
> enough
>
> to avoid having to sprinkle the code with ifdefs.
>
>
> It does remove some flexibility though, for example with -mavx512f -mavx512vl
> you'll get AVX512 style masks for V4SImode data vectors but of course the
> target sill supports SSE2/AVX2 style masks as well, but those would not be
> available as "packed boolean vectors", though they are of course in fact
> equal to V4SImode data vectors with -1 or 0 values, so in this particular
> case it might not matter.
>
> That said, the vector_mask attribute will get you V4SImode vectors with
> signed boolean elements of 32 bits for V4SImode data vectors with
> SSE2/AVX2.
>
>
>
> This sounds very much like what the scenario would be with NEON vs SVE. 
> Coming to think
>
> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
> ‘base’ vector type
>
> and a ‘w’ specified for the type.
>
>
>
> Given its current implementation, if vector_mask is exposed to the CFE, would 
> there be any
>
> major challenges wrt implementation or defining behaviour semantics? I played 
> around with a
>
> few examples from the testsuite and wrote some new ones. I mostly tried 
> operations that
>
> the new type would have to support (unary, binary bitwise, initializations 
> etc) – with a couple of exceptions
>
> most of the ops seem to be supported. I also triggered a couple of ICEs in 
> some tests involving
>
> implicit conversions to wider/narrower vector_mask types (will raise reports 
> for these). Correct me
>
> if I’m wrong here, but we’d probably have to support a couple of new ops if 
> vector_mask is exposed
>
> to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.  A question would be for example how to write common mask test
operations like
if (any (mask)) or if (all (mask)).  Likewise writing merge operations
- do those as

 a = a | (mask ? b : 0);

thus use ternary ?: for this?  For initialization regular vector
syntax should work:

mtype mask = (mtype){ -1, -1, 0, 0, ... };

there's the question of the signedness of the mask elements.  GCC
internally uses signed
bools with values -1 for true and 0 for false.

Richard.

>
>
>
>
>
> Thanks,
>
> Tejas.
>
>
>
>
>
> Richard.
>
> >
> >
> > Thanks,
> >
> > Tejas.
> >
> >
> >
> >
> >
> >
> >
> > > 1. __attribute__((vector_size (n))) where n represents bytes
> > >
> > >   typedef bool vbool __attribute__ ((vector_size (1)));
> > >
> > > In this approach, the shape of the boolean vector is unclear. IoW, it is 
> > > not
> > > clear if each bit in 'n' controls a byte or an element. On targets
> > > like SVE, it would be natural to have each bit control a byte of the 
> > > target
> > > vector (therefore resulting in an 'unpacked' layout of the PBV) and on 
> > > AVX, each
> > > bit would control one element/lane on the target vector(therefore 
> > > resulting in a
> > > 'packed' layout with all significant bits at the LSB).
> > >
> > > 2. __attribute__((vector_size (n))) where n represents num of lanes
> > >
> > >   typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
> > >   typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof 
> > > (v4si){0}[0])));
> > >
> > > Here the 'n' in the vector_size attribute represents the number of bits 
> > > that
> > > is needed to represent a vector quantity.  In this case, this packed 
> > > boolean
> > > vector can represent upto 'n' vector lanes. The size of the type is
> > > rounded up the nearest byte.  For example, the sizeof v4bi in the above
> > > example is 1.
> > >
> > > In this approach, because of the nature of the representation, the n bits 
> > > required
> > > to represent the n lanes of the vector are packed at the LSB. This does 
> > > not naturally
> > > align with the SVE approach of each bit representing a byte of the target 
> > > vector
> > > and PBV therefore having an 'unpacked' layout.
> > >
> > > More importantly, another drawback here is that the change in units for 
> > > vector_size
> > > might be confusing to programmers.  The units will have to be interpreted 
> > > based on the
> > > base type of the typedef. It does not offer any flexibility in terms of 
> > > the layout of
> > > the bool vector - it is fixed.
> > >
> > > 3. Combination of 1 and 2.
> > >
> > > Combining the best of 1 and 2, we can introduce extra parameters to 
> > > vector_size that will
> > > unambiguously represent the layout of the PBV. Consider
> > >
> > >   typedef bool vbool __attribute__((vector_size (s, n[, w])));
> > >
> > > where 's' is size in bytes, 'n' is the number of lanes and an optional 
> > > 3rd parameter 'w'
> > > is the number of bits of the PBV that represents a lane of the target 
> > > vector. 'w' would
> > > allow a target to force a certain layout of the PBV.
> > >
> > > The 2-parameter form of vector_size allows the target to have an
> > > implementation-defined layout of the PBV. The target is free to choose 
> > > the 'w'
> > > if it is not specified to mirror the target layout of predicate 
> > > registers. For
> > > eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
> > >
> > > As an example, to represent the result of a comparison on 2 int16x8_t, 
> > > we'd need
> > > 8 lanes of boolean which could be represented by
> > >
> > >   typedef bool v8b __attribute__ ((vector_size (2, 8)));
> > >
> > > SVE would implement v8b layout to make every 2nd bit significant i.e. w 
> > > == 2
> > >
> > > and AVX would choose a layout where all 8 consecutive bits packed at LSB 
> > > would
> > > be significant i.e. w == 1.
> > >
> > > This scheme would accomodate more than 1 target to effectively represent 
> > > vector
> > > bools that mirror the target properties.
> > >
> > > 4. A new attribite
> > >
> > > This is based on a suggestion from Richard S in [3]. The idea is to 
> > > introduce a new
> > > attribute to define the PBV and make it general enough to
> > >
> > > * represent all targets flexibly (SVE, AVX etc)
> > > * represent sub-byte length predicates
> > > * have no change in units of vector_size/no new vector_size signature
> > > * not have the number of bytes constrain representation
> > >
> > > If we call the new attribute 'bool_vec' (for lack of a better name), 
> > > consider
> > >
> > >   typedef bool vbool __attribute__((bool_vec (n[, w])))
> > >
> > > where 'n' represents number of lanes/elements and the optional 'w' is 
> > > bits-per-lane.
> > >
> > > If 'w' is not specified, it and bytes-per-predicate are 
> > > implementation-defined based on target.
> > > If 'w' is specified,  sizeof (vbool) will be ceil (n*w/8).
> > >
> > > 5. Behaviour of the packed vector boolean type.
> > >
> > > Taking the example of one of the options above, following is an 
> > > illustration of it's behavior
> > >
> > > * ABI
> > >
> > >   New ABI rules will need to be defined for this type - eg alignment, PCS,
> > >   mangling etc
> > >
> > > * Initialization:
> > >
> > >   Packed Boolean Vectors(PBV) can be initialized like so:
> > >
> > >     typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
> > >     v4bi p = {false, true, false, false};
> > >
> > >   Each value in the initizlizer constant is of type bool. The lowest 
> > > numbered
> > >   element in the const array corresponds to the LSbit of p, element 1 is
> > >   assigned to bit 4 etc.
> > >
> > >   p is effectively a 2-byte bitmask with value 0x0010
> > >
> > >   With a different layout
> > >
> > >     typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
> > >     v4bi p = {false, true, false, false};
> > >
> > >   p is effectively a 2-byte bitmask with value 0x0002
> > >
> > > * Operations:
> > >
> > >   Packed Boolean Vectors support the following operations:
> > >   . unary ~
> > >   . unary !
> > >   . binary&,|andˆ
> > >   . assignments &=, |= and ˆ=
> > >   . comparisons <, <=, ==, !=, >= and >
> > >   . Ternary operator ?:
> > >
> > >   Operations are defined as applied to the individual elements i.e the 
> > > bits
> > >   that are significant in the PBV. Whether the PBVs are treated as 
> > > bitmasks
> > >   or otherwise is implementation-defined.
> > >
> > >   Insignificant bits could affect results of comparisons or ternary 
> > > operators.
> > >   In such cases, it is implementation defined how the unused bits are 
> > > treated.
> > >
> > >   . Subscript operator []
> > >
> > >   For the subscript operator, the packed boolean vector acts like a array 
> > > of
> > >   elements - the first or the 0th indexed element being the LSbit of the 
> > > PBV.
> > >   Subscript operator yields a scalar boolean value.
> > >   For example:
> > >
> > >     typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
> > >
> > >     // Subscript operator result yields a boolean value.
> > >     // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
> > >     bool foo (v8b p, int n) { p[3] = true; return p[1]; }
> > >
> > >   Out of bounds access: OOB access can be determined at compile time 
> > > given the
> > >   strong typing of the PBVs.
> > >
> > >   PBV does not support address of operator(&) for elements of PBVs.
> > >
> > >   . Implicit conversion from integer vectors to PBVs
> > >
> > >   We would like to support the output of comparison operations to be 
> > > PBVs. This
> > >   requires us to define the implicit conversion from an integer vector to 
> > > PBV
> > >   as the result of vector comparisons are integer vectors.
> > >
> > >   To define this operation:
> > >
> > >     bool_vector = vector <cmpop> vector
> > >
> > >   There is no change in how vector <cmpop> vector behavior i.e. this 
> > > comparison
> > >   would still produce an int_vector type as it does now.
> > >
> > >     temp_int_vec = vector <cmpop> vector
> > >     bool_vec = temp_int_vec // Implicit conversion from int_vec to 
> > > bool_vec
> > >
> > >   The implicit conversion from int_vec to bool I'd define simply to be:
> > >
> > >     bool_vec[n] = (_Bool) int_vec[n]
> > >
> > >   where the C11 standard rules apply
> > >   6.3.1.2 Boolean type  When any scalar value is converted to _Bool, the 
> > > result
> > >   is 0 if the value compares equal to 0; otherwise, the result is 1.
> > >
> > >
> > > [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
> > > [2] https://reviews.llvm.org/D88905
> > > [3] https://reviews.llvm.org/D81083
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Tejas.

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

Reply via email to