On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod <tejas.bela...@arm.com> wrote: > > > > > > From: Richard Biener <richard.guent...@gmail.com> > Date: Tuesday, June 27, 2023 at 12:58 PM > To: Tejas Belagod <tejas.bela...@arm.com> > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > > On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod <tejas.bela...@arm.com> wrote: > > > > > > > > > > > > From: Richard Biener <richard.guent...@gmail.com> > > Date: Monday, June 26, 2023 at 2:23 PM > > To: Tejas Belagod <tejas.bela...@arm.com> > > Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> > > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > > > > On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi, > > > > > > Packed Boolean Vectors > > > ---------------------- > > > > > > I'd like to propose a feature addition to GNU Vector extensions to add > > > packed > > > boolean vectors (PBV). This has been discussed in the past here[1] and a > > > variant has > > > been implemented in Clang recently[2]. > > > > > > With predication features being added to vector architectures (SVE, MVE, > > > AVX), > > > it is a useful feature to have to model predication on targets. This > > > could > > > find its use in intrinsics or just used as is as a GNU vector extension > > > being > > > mapped to underlying target features. For example, the packed boolean > > > vector > > > could directly map to a predicate register on SVE. > > > > > > Also, this new packed boolean type GNU extension can be used with SVE ACLE > > > intrinsics to replace a fixed-length svbool_t. > > > > > > Here are a few options to represent the packed boolean vector type. > > > > The GIMPLE frontend uses a new 'vector_mask' attribute: > > > > typedef int v8si __attribute__((vector_size(8*sizeof(int)))); > > typedef v8si v8sib __attribute__((vector_mask)); > > > > it get's you a vector type that's the appropriate (dependent on the > > target) vector > > mask type for the vector data type (v8si in this case). > > > > > > > > Thanks Richard. > > > > Having had a quick look at the implementation, it does seem to tick the > > boxes. > > > > I must admit I haven't dug deep, but if the target hook allows the mask to > > be > > > > defined in way that is target-friendly (and I don't know how much effort it > > will > > > > be to migrate the attribute to more front-ends), it should do the job > > nicely. > > > > Let me go back and dig a bit deeper and get back with questions if any. > > > Let me add that the advantage of this is the compiler doesn't need > to support weird explicitely laid out packed boolean vectors that do > not match what the target supports and the user doesn't need to know > what the target supports (and thus have an #ifdef maze around explicitely > specified layouts). > > Sorry for the delayed response – I spent a day experimenting with vector_mask. > > > > Yeah, this is what option 4 in the RFC is trying to achieve – be portable > enough > > to avoid having to sprinkle the code with ifdefs. > > > It does remove some flexibility though, for example with -mavx512f -mavx512vl > you'll get AVX512 style masks for V4SImode data vectors but of course the > target sill supports SSE2/AVX2 style masks as well, but those would not be > available as "packed boolean vectors", though they are of course in fact > equal to V4SImode data vectors with -1 or 0 values, so in this particular > case it might not matter. > > That said, the vector_mask attribute will get you V4SImode vectors with > signed boolean elements of 32 bits for V4SImode data vectors with > SSE2/AVX2. > > > > This sounds very much like what the scenario would be with NEON vs SVE. > Coming to think > > of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the > ‘base’ vector type > > and a ‘w’ specified for the type. > > > > Given its current implementation, if vector_mask is exposed to the CFE, would > there be any > > major challenges wrt implementation or defining behaviour semantics? I played > around with a > > few examples from the testsuite and wrote some new ones. I mostly tried > operations that > > the new type would have to support (unary, binary bitwise, initializations > etc) – with a couple of exceptions > > most of the ops seem to be supported. I also triggered a couple of ICEs in > some tests involving > > implicit conversions to wider/narrower vector_mask types (will raise reports > for these). Correct me > > if I’m wrong here, but we’d probably have to support a couple of new ops if > vector_mask is exposed > > to the CFE – initialization and subscript operations?
Yes, either that or restrict how the mask vectors can be used, thus properly diagnose improper uses. A question would be for example how to write common mask test operations like if (any (mask)) or if (all (mask)). Likewise writing merge operations - do those as a = a | (mask ? b : 0); thus use ternary ?: for this? For initialization regular vector syntax should work: mtype mask = (mtype){ -1, -1, 0, 0, ... }; there's the question of the signedness of the mask elements. GCC internally uses signed bools with values -1 for true and 0 for false. Richard. > > > > > > Thanks, > > Tejas. > > > > > > Richard. > > > > > > > Thanks, > > > > Tejas. > > > > > > > > > > > > > > > > > 1. __attribute__((vector_size (n))) where n represents bytes > > > > > > typedef bool vbool __attribute__ ((vector_size (1))); > > > > > > In this approach, the shape of the boolean vector is unclear. IoW, it is > > > not > > > clear if each bit in 'n' controls a byte or an element. On targets > > > like SVE, it would be natural to have each bit control a byte of the > > > target > > > vector (therefore resulting in an 'unpacked' layout of the PBV) and on > > > AVX, each > > > bit would control one element/lane on the target vector(therefore > > > resulting in a > > > 'packed' layout with all significant bits at the LSB). > > > > > > 2. __attribute__((vector_size (n))) where n represents num of lanes > > > > > > typedef int v4si __attribute__ ((vector_size (4 * sizeof (int))); > > > typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof > > > (v4si){0}[0]))); > > > > > > Here the 'n' in the vector_size attribute represents the number of bits > > > that > > > is needed to represent a vector quantity. In this case, this packed > > > boolean > > > vector can represent upto 'n' vector lanes. The size of the type is > > > rounded up the nearest byte. For example, the sizeof v4bi in the above > > > example is 1. > > > > > > In this approach, because of the nature of the representation, the n bits > > > required > > > to represent the n lanes of the vector are packed at the LSB. This does > > > not naturally > > > align with the SVE approach of each bit representing a byte of the target > > > vector > > > and PBV therefore having an 'unpacked' layout. > > > > > > More importantly, another drawback here is that the change in units for > > > vector_size > > > might be confusing to programmers. The units will have to be interpreted > > > based on the > > > base type of the typedef. It does not offer any flexibility in terms of > > > the layout of > > > the bool vector - it is fixed. > > > > > > 3. Combination of 1 and 2. > > > > > > Combining the best of 1 and 2, we can introduce extra parameters to > > > vector_size that will > > > unambiguously represent the layout of the PBV. Consider > > > > > > typedef bool vbool __attribute__((vector_size (s, n[, w]))); > > > > > > where 's' is size in bytes, 'n' is the number of lanes and an optional > > > 3rd parameter 'w' > > > is the number of bits of the PBV that represents a lane of the target > > > vector. 'w' would > > > allow a target to force a certain layout of the PBV. > > > > > > The 2-parameter form of vector_size allows the target to have an > > > implementation-defined layout of the PBV. The target is free to choose > > > the 'w' > > > if it is not specified to mirror the target layout of predicate > > > registers. For > > > eg. AVX would choose 'w' as 1 and SVE would choose s*8/n. > > > > > > As an example, to represent the result of a comparison on 2 int16x8_t, > > > we'd need > > > 8 lanes of boolean which could be represented by > > > > > > typedef bool v8b __attribute__ ((vector_size (2, 8))); > > > > > > SVE would implement v8b layout to make every 2nd bit significant i.e. w > > > == 2 > > > > > > and AVX would choose a layout where all 8 consecutive bits packed at LSB > > > would > > > be significant i.e. w == 1. > > > > > > This scheme would accomodate more than 1 target to effectively represent > > > vector > > > bools that mirror the target properties. > > > > > > 4. A new attribite > > > > > > This is based on a suggestion from Richard S in [3]. The idea is to > > > introduce a new > > > attribute to define the PBV and make it general enough to > > > > > > * represent all targets flexibly (SVE, AVX etc) > > > * represent sub-byte length predicates > > > * have no change in units of vector_size/no new vector_size signature > > > * not have the number of bytes constrain representation > > > > > > If we call the new attribute 'bool_vec' (for lack of a better name), > > > consider > > > > > > typedef bool vbool __attribute__((bool_vec (n[, w]))) > > > > > > where 'n' represents number of lanes/elements and the optional 'w' is > > > bits-per-lane. > > > > > > If 'w' is not specified, it and bytes-per-predicate are > > > implementation-defined based on target. > > > If 'w' is specified, sizeof (vbool) will be ceil (n*w/8). > > > > > > 5. Behaviour of the packed vector boolean type. > > > > > > Taking the example of one of the options above, following is an > > > illustration of it's behavior > > > > > > * ABI > > > > > > New ABI rules will need to be defined for this type - eg alignment, PCS, > > > mangling etc > > > > > > * Initialization: > > > > > > Packed Boolean Vectors(PBV) can be initialized like so: > > > > > > typedef bool v4bi __attribute__ ((vector_size (2, 4, 4))); > > > v4bi p = {false, true, false, false}; > > > > > > Each value in the initizlizer constant is of type bool. The lowest > > > numbered > > > element in the const array corresponds to the LSbit of p, element 1 is > > > assigned to bit 4 etc. > > > > > > p is effectively a 2-byte bitmask with value 0x0010 > > > > > > With a different layout > > > > > > typedef bool v4bi __attribute__ ((vector_size (2, 4, 1))); > > > v4bi p = {false, true, false, false}; > > > > > > p is effectively a 2-byte bitmask with value 0x0002 > > > > > > * Operations: > > > > > > Packed Boolean Vectors support the following operations: > > > . unary ~ > > > . unary ! > > > . binary&,|andˆ > > > . assignments &=, |= and ˆ= > > > . comparisons <, <=, ==, !=, >= and > > > > . Ternary operator ?: > > > > > > Operations are defined as applied to the individual elements i.e the > > > bits > > > that are significant in the PBV. Whether the PBVs are treated as > > > bitmasks > > > or otherwise is implementation-defined. > > > > > > Insignificant bits could affect results of comparisons or ternary > > > operators. > > > In such cases, it is implementation defined how the unused bits are > > > treated. > > > > > > . Subscript operator [] > > > > > > For the subscript operator, the packed boolean vector acts like a array > > > of > > > elements - the first or the 0th indexed element being the LSbit of the > > > PBV. > > > Subscript operator yields a scalar boolean value. > > > For example: > > > > > > typedef bool v8b __attribute__ ((vector_size (2, 8, 2))); > > > > > > // Subscript operator result yields a boolean value. > > > // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x. > > > bool foo (v8b p, int n) { p[3] = true; return p[1]; } > > > > > > Out of bounds access: OOB access can be determined at compile time > > > given the > > > strong typing of the PBVs. > > > > > > PBV does not support address of operator(&) for elements of PBVs. > > > > > > . Implicit conversion from integer vectors to PBVs > > > > > > We would like to support the output of comparison operations to be > > > PBVs. This > > > requires us to define the implicit conversion from an integer vector to > > > PBV > > > as the result of vector comparisons are integer vectors. > > > > > > To define this operation: > > > > > > bool_vector = vector <cmpop> vector > > > > > > There is no change in how vector <cmpop> vector behavior i.e. this > > > comparison > > > would still produce an int_vector type as it does now. > > > > > > temp_int_vec = vector <cmpop> vector > > > bool_vec = temp_int_vec // Implicit conversion from int_vec to > > > bool_vec > > > > > > The implicit conversion from int_vec to bool I'd define simply to be: > > > > > > bool_vec[n] = (_Bool) int_vec[n] > > > > > > where the C11 standard rules apply > > > 6.3.1.2 Boolean type When any scalar value is converted to _Bool, the > > > result > > > is 0 if the value compares equal to 0; otherwise, the result is 1. > > > > > > > > > [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html > > > [2] https://reviews.llvm.org/D88905 > > > [3] https://reviews.llvm.org/D81083 > > > > > > Thoughts? > > > > > > Thanks, > > > Tejas.