Re: [RFC] Proposal to support Packed Boolean Vector masks.

Richard Sandiford Wed, 17 Jul 2024 06:17:20 -0700

Richard Biener <richard.guent...@gmail.com> writes:
> On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod <tejas.bela...@arm.com> wrote:
>>
>> On 7/17/24 4:36 PM, Richard Biener wrote:
>> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod <tejas.bela...@arm.com> 
>> > wrote:
>> >>
>> >> On 7/15/24 6:05 PM, Richard Biener wrote:
>> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod <tejas.bela...@arm.com> 
>> >>> wrote:
>> >>>>
>> >>>> On 7/15/24 12:16 PM, Tejas Belagod wrote:
>> >>>>> On 7/12/24 6:40 PM, Richard Biener wrote:
>> >>>>>> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek <ja...@redhat.com> 
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>> >>>>>>>> Padding is only an issue for very small vectors - the obvious 
>> >>>>>>>> choice is
>> >>>>>>>> to disallow vector types that would require any padding.  I can 
>> >>>>>>>> hardly
>> >>>>>>>> see where those are faster than using a vector of up to 4 char
>> >>>>>>>> elements.
>> >>>>>>>> Problematic are 1-bit elements with 4, 2 or one element vectors,
>> >>>>>>>> 2-bit elements
>> >>>>>>>> with 2 or one element vectors and 4-bit elements with 1 element
>> >>>>>>>> vectors.
>> >>>>>>>
>> >>>>>>> I'd really like to avoid having to support something like
>> >>>>>>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
>> >>>>>>> 16)))
>> >>>>>>> _BitInt(2) to say size of long long could be acceptable.
>> >>>>>>
>> >>>>>> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
>> >>>>>> way to say
>> >>>>>> the element should have n (< 8) bits.
>> >>>>>>
>> >>>>>>>> I have no idea what the stance of supporting _BitInt in C++ are,
>> >>>>>>>> but most certainly diverging support (or even semantics) of the
>> >>>>>>>> vector extension in C vs. C++ is undesirable.
>> >>>>>>>
>> >>>>>>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
>> >>>>>>> didn't
>> >>>>>>> look favorably to _BitInt support in C++, so at least until something
>> >>>>>>> like
>> >>>>>>> that is standardized in C++ the answer is probably no.
>> >>>>>>
>> >>>>>> OK, I think that rules out _BitInt use here so while bool is then 
>> >>>>>> natural
>> >>>>>> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify 
>> >>>>>> the
>> >>>>>> number of bits explicitly.  There is signed_bool_precision but like
>> >>>>>> vector_mask it's use is restricted to the GIMPLE frontend because
>> >>>>>> interaction with the rest of the language isn't defined.
>> >>>>>>
>> >>>>>
>> >>>>> Thanks for all the suggestions - really insightful (to me) discussions.
>> >>>>>
>> >>>>> Yeah, BitInt seemed like it was best placed for this, but not having 
>> >>>>> C++
>> >>>>> support is definitely a blocker. But as you say, in the absence of
>> >>>>> BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
>> >>>>> way to specify non-1-bit widths could be overloading vector_size.
>> >>>>>
>> >>>>> Also, I think overloading GIMPLE's vector_mask takes us into the
>> >>>>> earlier-discussed territory of what it should actually mean - it 
>> >>>>> meaning
>> >>>>> the target truth type in GIMPLE and a generic vector extension in the 
>> >>>>> FE
>> >>>>> will probably confuse gcc developers more than users.
>> >>>>>
>> >>>>>> That said - we're mixing two things here.  The desire to have "proper"
>> >>>>>> svbool (fix: declare in the backend) and the desire to have "packed"
>> >>>>>> bit-precision vectors (for whatever actual reason) as part of the
>> >>>>>> GCC vector extension.
>> >>>>>>
>> >>>>>
>> >>>>> If we leave lane-disambiguation of svbool to the backend, the values I
>> >>>>> see in supporting 1, 2 and 4 bitsizes are 1) first step towards
>> >>>>> supporting BitInt(N) vectors possibly in the future 2) having a way for
>> >>>>> targets to define their intrinsics' bool vector types using GNU
>> >>>>> extensions 3) feature parity with Clang's ext_vector_type?
>> >>>>>
>> >>>>> I believe the primary motivation for Clang to support ext_vector_type
>> >>>>> was to have a way to define target intrinsics' vector bool type using
>> >>>>> vector extensions.
>> >>>>>
>> >>>>
>> >>>>
>> >>>> Interestingly, Clang seems to support
>> >>>>
>> >>>> typedef struct {
>> >>>>        _Bool i:1;
>> >>>> } STR;
>> >>>>
>> >>>> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
>> >>>> * 4))) vec;
>> >>>>
>> >>>>
>> >>>> int foo (vec b) {
>> >>>>       return sizeof b;
>> >>>> }
>> >>>>
>> >>>> I can't find documentation about how it is implemented, but I suspect
>> >>>> the vector is constructed as an array STR[] i.e. possibly each
>> >>>> bit-element padded to byte boundary etc. Also, I can't seem to apply
>> >>>> many operations other than sizeof.
>> >>>>
>> >>>> I don't know if we've tried to support such cases in GNU in the past?
>> >>>
>> >>> Why should we do that?  It doesn't make much sense.
>> >>>
>> >>> single-bit vectors is what _BitInt was invented for.
>> >>
>> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
>> >> _BitInts can be made to have single-bit generic vector semantics. For
>> >> eg. If I want to initialize a _BitInt as vector, I can't do:
>> >>
>> >>    _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
>> >>
>> >> as 'a' expects a scalar initialization.
>> >>
>> >> Of if I want to convert an int vector to bit vector, I can't do
>> >>
>> >>     v4si_p = v4si_a > v4si_b;
>> >>     _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));
>> >>
>> >> Also semantics of conditionals with _BitInt behave like scalars
>> >>
>> >>     _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
>> >> behave as scalars.
>> >>
>> >> Also, I can't do things like
>> >>
>> >>     typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt
>> >> (2)) * 4)));
>> >>
>> >> to force it to behave as a vector because _BitInt is disallowed here.
>> >>
>> >
>> > All I'm trying to say is that when people want to use vector<bool> as
>> > a large packed bitfield they can now use _BitInt instead.  Of course
>> > with a different (but portable) API.
>> > > I don't see single-bit element vectors something as especially
>> > useful with a "vector API".  What's its the use-case? (similar
>> > for the two and four bit elements, with or without padding)
>> >
>>
>> I'm trying to figure out if we had a portable (generic) way to represent
>> predicate vectors(eg BitInts) in the front end, and had rules(or a
>> vector API?)) that cast from integer vectors acting as bools to BitInts,
>> would it be more efficient to lower to target predicate modes (VNx16BI
>> etc on targets that support n-bit mode predicates)? It could also
>> possibly interoperate with target intrinsics better than int bool vectors.
>
> No, we don't have an existing way to represent predicate vectors.  And no,
> I don't think there's good evidence of necessity for supporting one
> within the realm
> of GCCs generic vector extension.  But there's plenty of doubt a portable
> and performant way of doing this is possible.


We'd like to be able to support things like:

  svbool_t x, y, z;
  x &= y | ~z;
  y[0] = z[1];

etc.  And, for fixed-size variants of svbool_t, we'd like to support:

  fixed_svbool_t x = { 1, 0, 1, 0 }; // + implicit zeros

The hope was that we could do that as a two-step process:

- add a generic way of representing packed boolean vectors
- inherit that generic support for the SVE ACLE types

It seemed unlikely that adding SVE ACLE support directly to the frontends
would be acceptable.  (E.g. direct target support in frontends was rejected
for Altivec IIRC.)

_BitInt doesn't seem like a good replacement since, like Tejas said,
it doesn't support vector-style initialisation and indexing, and it
isn't part of C++.  The last one is a killer for us, since so much
intrinsics code is written in C++ using abstraction layers.

Also, things like __builtin_shuffle and __builtin_convert should be
supported for vector booleans, but wouldn't (I guess) be natural
operations on _BitInt.

std::experimental::simd does support indexing of mask types, which
suggests that there is some demand for it.

At the moment, the implementation of that for SVE has to convert to an
integer vector, index that, and convert back to a bool:

template <>
  struct __sve_mask_type<2>
  {
    ...
    typedef svuint16_t __sve_mask_vector_type
    __attribute__((arm_sve_vector_bits(__ARM_FEATURE_SVE_BITS)));
    ...
    inline static bool
    __sve_mask_get(type __active_mask, size_t __i)
    { return __sve_mask_vector_type(svdup_u16_z(__active_mask, 1))[__i] != 0;}
    ...
  };

It would be nice if it could just use:

    inline static bool
    __sve_mask_get(type __active_mask, size_t __i)
    { return __active_mask[__i * 2]; }

without the round trip through uint16_ts.

Even better would be if __sve_mask_type<2> could use a 2-bits-per-element
GNU-style boolean vector, so that the compiler has a better view of what's
actually happening.  But for me, the main point was to design the extension
so that multi-bit elements could be added later, rather than being a
requirement from day 1.

Thanks,
Richard

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to