On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Tejas Belagod <tejas.bela...@arm.com> writes: > > On 7/10/24 2:38 PM, Richard Biener wrote: > >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> > >> wrote: > >>> > >>> On 7/9/24 4:22 PM, Richard Biener wrote: > >>>> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> > >>>> wrote: > >>>>> > >>>>> On 7/8/24 4:45 PM, Richard Biener wrote: > >>>>>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod <tejas.bela...@arm.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> Sorry to have dropped the ball on > >>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > >>>>>>> here I've tried to pick it up again and write up a strawman proposal > >>>>>>> for > >>>>>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE. > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Tejas. > >>>>>>> > >>>>>>> Motivation > >>>>>>> ---------- > >>>>>>> > >>>>>>> The idea of packed boolean vectors came about when we wanted to > >>>>>>> support > >>>>>>> C/C++ operators on SVE ACLE types. The current vector boolean type > >>>>>>> that > >>>>>>> ACLE specifies does not adequately disambiguate vector lane sizes > >>>>>>> which > >>>>>>> they were derived off of. Consider this simple, albeit unrealistic, > >>>>>>> example: > >>>>>>> > >>>>>>> bool foo (svint32_t a, svint32_t b) > >>>>>>> { > >>>>>>> svbool_t p = a > b; > >>>>>>> > >>>>>>> // Here p[2] is not the same as a[2] > b[2]. > >>>>>>> return p[2]; > >>>>>>> } > >>>>>>> > >>>>>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, > >>>>>>> p[i] > >>>>>>> does not return the bool value corresponding to a[i] > b[i]. This > >>>>>>> necessitates a 'typed' vector boolean value that unambiguously > >>>>>>> represents results of operations > >>>>>>> of the same type. > >>>>>>> > >>>>>>> __attribute__((vector_mask)) > >>>>>>> ----------------------------- > >>>>>>> > >>>>>>> Note: If interested in historical discussions refer to: > >>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > >>>>>>> > >>>>>>> We define this new attribute which when applied to a base data vector > >>>>>>> produces a new boolean vector type that represents a boolean type that > >>>>>>> is produced as a result of operations on the corresponding base vector > >>>>>>> type. The following is the syntax. > >>>>>>> > >>>>>>> typedef int v8si __attribute__((vector_size (8 * sizeof (int))); > >>>>>>> typedef v8si v8sib __attribute__((vector_mask)); > >>>>>>> > >>>>>>> Here the 'base' data vector type is v8si or a vector of 8 integers. > >>>>>>> > >>>>>>> Rules > >>>>>>> > >>>>>>> • The layout/size of the boolean vector type is implementation-defined > >>>>>>> for its base data vector type. > >>>>>>> > >>>>>>> • Two boolean vector types who's base data vector types have same > >>>>>>> number > >>>>>>> of elements and lane-width have the same layout and size. > >>>>>>> > >>>>>>> • Consequently, two boolean vectors who's base data vector types have > >>>>>>> different number of elements or different lane-size have different > >>>>>>> layouts. > >>>>>>> > >>>>>>> This aligns with gnu vector extensions that generate integer vectors > >>>>>>> as > >>>>>>> a result of comparisons - "The result of the comparison is a vector of > >>>>>>> the same width and number of elements as the comparison operands with > >>>>>>> a > >>>>>>> signed integral element type." according to > >>>>>>> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > >>>>>> > >>>>>> Without having the time to re-review this all in detail I think the GNU > >>>>>> vector extension does not expose the result of the comparison as the > >>>>>> machine would produce it but instead a comparison "decays" to > >>>>>> a conditional: > >>>>>> > >>>>>> typedef int v4si __attribute__((vector_size(16))); > >>>>>> > >>>>>> v4si a; > >>>>>> v4si b; > >>>>>> > >>>>>> void foo() > >>>>>> { > >>>>>> auto r = a < b; > >>>>>> } > >>>>>> > >>>>>> produces, with C23: > >>>>>> > >>>>>> vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { > >>>>>> 0, > >>>>>> 0, 0, 0 } > ; > >>>>>> > >>>>>> In fact on x86_64 with AVX and AVX512 you have two different "machine > >>>>>> produced" mask types and the above could either produce a AVX mask with > >>>>>> 32bit elements or a AVX512 mask with 1bit elements. > >>>>>> > >>>>>> Not exposing "native" mask types requires the compiler optimizing > >>>>>> subsequent > >>>>>> uses and makes generic vectors difficult to combine with for example > >>>>>> AVX512 > >>>>>> intrinsics (where masks are just 'int'). Across an ABI boundary it's > >>>>>> also > >>>>>> even more difficult to optimize mask transitions. > >>>>>> > >>>>>> But it at least allows portable code and it does not suffer from users > >>>>>> trying to > >>>>>> expose machine representations of masks as input to generic vector code > >>>>>> with all the problems of constant folding not only requiring > >>>>>> self-consistent > >>>>>> code within the compiler but compatibility with user produced constant > >>>>>> masks. > >>>>>> > >>>>>> That said, I somewhat question the need to expose the target mask > >>>>>> layout > >>>>>> to users for GCCs generic vector extension. > >>>>>> > >>>>> > >>>>> Thanks for your feedback. > >>>>> > >>>>> IIUC, I can imagine how having a GNU vector extension exposing the > >>>>> target vector mask layout can pose a challenge - maybe making it a > >>>>> generic GNU vector extension was too ambitious. I wonder if there's > >>>>> value in pursuing these alternate paths? > >>>>> > >>>>> 1. Can implementing this extension in a 'generic' way i.e. possibly not > >>>>> implement it with a target mask, but just a generic int vector, still > >>>>> maintain the consistency of GNU predicate vectors within the compiler? I > >>>>> know it may not seem very different from how boolean vectors are > >>>>> currently implemented (as in your above example), but, having the > >>>>> __attribute__((vector_mask)) as a 'property' of the object makes it > >>>>> useful to optimize its uses to target predicates in subsequent stages of > >>>>> the compiler. > >>>>> > >>>>> 2. Restricting __attribute__((vector_mask)) to apply only to target > >>>>> intrinsic types? Eg. > >>>>> > >>>>> On SVE something like: > >>>>> typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK. > >>>>> > >>>>> On AVX, something like: > >>>>> typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though > >>>>> this would require more fine-grained defn of lane-size to mask-bits > >>>>> mapping. > >>>> > >>>> I think the target should be able to register builtin types already which > >>>> intrinsics could use. There is already the vector_mask attribute but > >>>> only > >>>> for GIMPLE and it has the same limitation of querying the target for the > >>>> actual mode being used - for AVX vs AVX512 one might be able to > >>>> combine this with a mode attribute. Not sure if on arm you can parse > >>>> __attribute__((mode("Vx4BI4"))) or how the modes are called. > >>>> > >>>> But when you are talking about intrinsics I'd really suggest to leave the > >>>> type creation to the target rather than trying to do a typedef in a > >>>> header? > >>>> > >>> > >>> Yeah, thinking about this a bit more, makes sense to keep intrinsic type > >>> creation in the target realm. > >>> > >>> Just to clarify if I understand your point about exposing masks' machine > >>> representations, would representing vector_mask types using opaque > >>> types/modes have the same challenges with compatibility with generic > >>> vector constants as it essentially would be a parallel type system, and > >>> would be unaffected by constant-folding etc due to their opacity? I ask > >>> because opacity might give the representation the flexibility of > >>> 'decaying' to a type based on the context it is used in. > >> > >> I also thought about using an opaque type but I wonder if it really suits > >> here? > > > > Sorry, yes using opaque type was your idea from last year's thread - I > > merely reiterated it here. :-) > > > > Or would the target then need to decay a mask[i] into something > >> that's later recognizable? > >> > > > > I think that would depend on the usage, wouldn't it - it could lower > > down to target insn(s) based on how whether, for eg, its used as a test > > or read as a scalar value? > > > > > >> So I guess the answer is you'd have to try. > > > > Thanks for your feedback so far - much appreciated. If it helps, I will > > try to write up a prototype to test the idea - might help clear the mist > > further. > > Just to note that one of the original motivations (that applies more > to option 3 from last year's proposal) was to add support for general > packed vector boolean types to the GNU vector extension, as a feature > independent of the target's "native" format(s). Clang already supports > this via ext_vector_type and it seemed like there might be value in > providing something similar for the GNU extensions.
But that's more for data, aka vector bool, not for what's produced by targets from vector comparisons? So yes, I suppose that's reasonable but representation would then be fully defined by the extension rather than by however the target computes the actual comparison result vector. Richard. > Thanks, > Richard > >