On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod <tejas.bela...@arm.com> wrote: > > On 7/10/24 4:37 PM, Richard Biener wrote: > > On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Tejas Belagod <tejas.bela...@arm.com> writes: > >>> On 7/10/24 2:38 PM, Richard Biener wrote: > >>>> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> > >>>> wrote: > >>>>> > >>>>> On 7/9/24 4:22 PM, Richard Biener wrote: > >>>>>> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> On 7/8/24 4:45 PM, Richard Biener wrote: > >>>>>>>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod > >>>>>>>> <tejas.bela...@arm.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> Sorry to have dropped the ball on > >>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but > >>>>>>>>> here I've tried to pick it up again and write up a strawman > >>>>>>>>> proposal for > >>>>>>>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Tejas. > >>>>>>>>> > >>>>>>>>> Motivation > >>>>>>>>> ---------- > >>>>>>>>> > >>>>>>>>> The idea of packed boolean vectors came about when we wanted to > >>>>>>>>> support > >>>>>>>>> C/C++ operators on SVE ACLE types. The current vector boolean type > >>>>>>>>> that > >>>>>>>>> ACLE specifies does not adequately disambiguate vector lane sizes > >>>>>>>>> which > >>>>>>>>> they were derived off of. Consider this simple, albeit unrealistic, > >>>>>>>>> example: > >>>>>>>>> > >>>>>>>>> bool foo (svint32_t a, svint32_t b) > >>>>>>>>> { > >>>>>>>>> svbool_t p = a > b; > >>>>>>>>> > >>>>>>>>> // Here p[2] is not the same as a[2] > b[2]. > >>>>>>>>> return p[2]; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, > >>>>>>>>> p[i] > >>>>>>>>> does not return the bool value corresponding to a[i] > b[i]. This > >>>>>>>>> necessitates a 'typed' vector boolean value that unambiguously > >>>>>>>>> represents results of operations > >>>>>>>>> of the same type. > >>>>>>>>> > >>>>>>>>> __attribute__((vector_mask)) > >>>>>>>>> ----------------------------- > >>>>>>>>> > >>>>>>>>> Note: If interested in historical discussions refer to: > >>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html > >>>>>>>>> > >>>>>>>>> We define this new attribute which when applied to a base data > >>>>>>>>> vector > >>>>>>>>> produces a new boolean vector type that represents a boolean type > >>>>>>>>> that > >>>>>>>>> is produced as a result of operations on the corresponding base > >>>>>>>>> vector > >>>>>>>>> type. The following is the syntax. > >>>>>>>>> > >>>>>>>>> typedef int v8si __attribute__((vector_size (8 * sizeof > >>>>>>>>> (int))); > >>>>>>>>> typedef v8si v8sib __attribute__((vector_mask)); > >>>>>>>>> > >>>>>>>>> Here the 'base' data vector type is v8si or a vector of 8 integers. > >>>>>>>>> > >>>>>>>>> Rules > >>>>>>>>> > >>>>>>>>> • The layout/size of the boolean vector type is > >>>>>>>>> implementation-defined > >>>>>>>>> for its base data vector type. > >>>>>>>>> > >>>>>>>>> • Two boolean vector types who's base data vector types have same > >>>>>>>>> number > >>>>>>>>> of elements and lane-width have the same layout and size. > >>>>>>>>> > >>>>>>>>> • Consequently, two boolean vectors who's base data vector types > >>>>>>>>> have > >>>>>>>>> different number of elements or different lane-size have different > >>>>>>>>> layouts. > >>>>>>>>> > >>>>>>>>> This aligns with gnu vector extensions that generate integer > >>>>>>>>> vectors as > >>>>>>>>> a result of comparisons - "The result of the comparison is a vector > >>>>>>>>> of > >>>>>>>>> the same width and number of elements as the comparison operands > >>>>>>>>> with a > >>>>>>>>> signed integral element type." according to > >>>>>>>>> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. > >>>>>>>> > >>>>>>>> Without having the time to re-review this all in detail I think the > >>>>>>>> GNU > >>>>>>>> vector extension does not expose the result of the comparison as the > >>>>>>>> machine would produce it but instead a comparison "decays" to > >>>>>>>> a conditional: > >>>>>>>> > >>>>>>>> typedef int v4si __attribute__((vector_size(16))); > >>>>>>>> > >>>>>>>> v4si a; > >>>>>>>> v4si b; > >>>>>>>> > >>>>>>>> void foo() > >>>>>>>> { > >>>>>>>> auto r = a < b; > >>>>>>>> } > >>>>>>>> > >>>>>>>> produces, with C23: > >>>>>>>> > >>>>>>>> vector(4) int r = VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } > >>>>>>>> , { 0, > >>>>>>>> 0, 0, 0 } > ; > >>>>>>>> > >>>>>>>> In fact on x86_64 with AVX and AVX512 you have two different "machine > >>>>>>>> produced" mask types and the above could either produce a AVX mask > >>>>>>>> with > >>>>>>>> 32bit elements or a AVX512 mask with 1bit elements. > >>>>>>>> > >>>>>>>> Not exposing "native" mask types requires the compiler optimizing > >>>>>>>> subsequent > >>>>>>>> uses and makes generic vectors difficult to combine with for example > >>>>>>>> AVX512 > >>>>>>>> intrinsics (where masks are just 'int'). Across an ABI boundary > >>>>>>>> it's also > >>>>>>>> even more difficult to optimize mask transitions. > >>>>>>>> > >>>>>>>> But it at least allows portable code and it does not suffer from > >>>>>>>> users trying to > >>>>>>>> expose machine representations of masks as input to generic vector > >>>>>>>> code > >>>>>>>> with all the problems of constant folding not only requiring > >>>>>>>> self-consistent > >>>>>>>> code within the compiler but compatibility with user produced > >>>>>>>> constant masks. > >>>>>>>> > >>>>>>>> That said, I somewhat question the need to expose the target mask > >>>>>>>> layout > >>>>>>>> to users for GCCs generic vector extension. > >>>>>>>> > >>>>>>> > >>>>>>> Thanks for your feedback. > >>>>>>> > >>>>>>> IIUC, I can imagine how having a GNU vector extension exposing the > >>>>>>> target vector mask layout can pose a challenge - maybe making it a > >>>>>>> generic GNU vector extension was too ambitious. I wonder if there's > >>>>>>> value in pursuing these alternate paths? > >>>>>>> > >>>>>>> 1. Can implementing this extension in a 'generic' way i.e. possibly > >>>>>>> not > >>>>>>> implement it with a target mask, but just a generic int vector, still > >>>>>>> maintain the consistency of GNU predicate vectors within the > >>>>>>> compiler? I > >>>>>>> know it may not seem very different from how boolean vectors are > >>>>>>> currently implemented (as in your above example), but, having the > >>>>>>> __attribute__((vector_mask)) as a 'property' of the object makes it > >>>>>>> useful to optimize its uses to target predicates in subsequent stages > >>>>>>> of > >>>>>>> the compiler. > >>>>>>> > >>>>>>> 2. Restricting __attribute__((vector_mask)) to apply only to target > >>>>>>> intrinsic types? Eg. > >>>>>>> > >>>>>>> On SVE something like: > >>>>>>> typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK. > >>>>>>> > >>>>>>> On AVX, something like: > >>>>>>> typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though > >>>>>>> this would require more fine-grained defn of lane-size to mask-bits > >>>>>>> mapping. > >>>>>> > >>>>>> I think the target should be able to register builtin types already > >>>>>> which > >>>>>> intrinsics could use. There is already the vector_mask attribute but > >>>>>> only > >>>>>> for GIMPLE and it has the same limitation of querying the target for > >>>>>> the > >>>>>> actual mode being used - for AVX vs AVX512 one might be able to > >>>>>> combine this with a mode attribute. Not sure if on arm you can parse > >>>>>> __attribute__((mode("Vx4BI4"))) or how the modes are called. > >>>>>> > >>>>>> But when you are talking about intrinsics I'd really suggest to leave > >>>>>> the > >>>>>> type creation to the target rather than trying to do a typedef in a > >>>>>> header? > >>>>>> > >>>>> > >>>>> Yeah, thinking about this a bit more, makes sense to keep intrinsic type > >>>>> creation in the target realm. > >>>>> > >>>>> Just to clarify if I understand your point about exposing masks' machine > >>>>> representations, would representing vector_mask types using opaque > >>>>> types/modes have the same challenges with compatibility with generic > >>>>> vector constants as it essentially would be a parallel type system, and > >>>>> would be unaffected by constant-folding etc due to their opacity? I ask > >>>>> because opacity might give the representation the flexibility of > >>>>> 'decaying' to a type based on the context it is used in. > >>>> > >>>> I also thought about using an opaque type but I wonder if it really suits > >>>> here? > >>> > >>> Sorry, yes using opaque type was your idea from last year's thread - I > >>> merely reiterated it here. :-) > >>> > >>> Or would the target then need to decay a mask[i] into something > >>>> that's later recognizable? > >>>> > >>> > >>> I think that would depend on the usage, wouldn't it - it could lower > >>> down to target insn(s) based on how whether, for eg, its used as a test > >>> or read as a scalar value? > >>> > >>> > >>>> So I guess the answer is you'd have to try. > >>> > >>> Thanks for your feedback so far - much appreciated. If it helps, I will > >>> try to write up a prototype to test the idea - might help clear the mist > >>> further. > >> > >> Just to note that one of the original motivations (that applies more > >> to option 3 from last year's proposal) was to add support for general > >> packed vector boolean types to the GNU vector extension, as a feature > >> independent of the target's "native" format(s). Clang already supports > >> this via ext_vector_type and it seemed like there might be value in > >> providing something similar for the GNU extensions. > > > > But that's more for data, aka vector bool, not for what's produced by > > targets from vector comparisons? So yes, I suppose that's reasonable > > but representation would then be fully defined by the extension > > rather than by however the target computes the actual comparison > > result vector. > > > > Sorry for the slow response. > > Thanks RichardS for your timely comment. Sorry, I might have gotten > ambitious with the original vector bool proposal and went down the route > of supporting 'native' formats with vector_mask, but scaling my > ambitions back to a boolean vector of a certain representation that is > independent of the target's native format and defined by the extension > itself is a more realistic proposition. > > To reiterate option 3 from last year's proposal, currently we don't support > > typedef bool vbool __attribute__((__vector_size__(64))); > > But if we did, could we support a more layout-friendly form i.e. > > typedef bool vbool __attribute__((vector_size (s, n[, w]))); > > where 's' is size in bytes, 'n' is the number of lanes and an optional > 3rd parameter 'w' is the number of bits of the PBV that represents a > lane of the target vector? 'w' would allow a target to force a certain > layout of the PBV.
isn't one of s, n or w redundant? That is, w == (s * 8) / n? Or would vector_size (8, 32, 1) put in 1 bit of "padding" per lane? (but where?) That said, how about typedef unsigned _BitInt(1) vbool __attribute__((vector_size (8))); instead? Slight complication is that _BitInt isn't supported in C++, but I suppose that could be fixed at least as extension? As we currently reject typedef _Bool vbool __attribute__((vector_size (8))); we can also chose to accept that as the 1-bit case at least. > I don't know if overloading vector_size is a good idea though... Is there precedent in other compilers for supporting bit-precision vector components in extensions to GCCs vector extension? Richard. > Thanks, > Tejas. > > > > Richard. > > > >> Thanks, > >> Richard > >> > >> >