Jeff Law <l...@redhat.com> writes:
> On 10/5/19 5:29 AM, Richard Sandiford wrote:
>> 
>> Sure.  This message is going to go to the other extreme, sorry, but I'm
>> not sure which part will be the most convincing (if any).
> No worries.  Worst case going to the other extreme is I have to read it
> more than once after nodding off halfway through :-)

:-)

> Seriously though.  THanks for the more expansive write up.
>
>> I guess I should start by saying that SVE intrinsics have three types of
>> predication (zeroing, merging, and "don't care") that combine with the usual
>> type suffixes seen in arm_neon.h.  This gives 3,736 functions for baseline
>> SVE (SVE2 adds more).
> Ugh.  That'd be a lot of stuff to add into a header.  As you noted
> later, there's significant costs to doing so.
>
>> The vast majority of those functions can't be open-coded using the
>> core parts of C and C++, even with GNU extensions.  Some could perhaps
>> be coded using new library extensions, but that just shifts the question
>> from "how do we implement this feature in arm_sve.h?" to "how we implement
>> this feature in the library extension?".
> True.  But one could ask if those extensions are something that we're
> likely to need beyond what you're doing now.  But I don't necessarily
> think that would override the desire not to have so much stuff in the
> header.
>
> I'm guessing that even though you can't describe what you need at the
> C/C++ level, but you can describe at least some of what you want in
> gimple/rtl?  Otherwise I'm not sure what you get above and beyond asms.

Yeah, some of the simpler stuff, like building length-agnostic constants,
can be represented directly in gimple.  Some other things map directly
to internal functions like IFN_MASK_LOAD and IFN_MASK_STORE.  The vast
majority of the functions still needs to be a built-in function during
gimple though, e.g. so that we don't "lose" a necessary or useful
predicate argument.

The situation's similar in rtl, with a lot of stuff mapping to unspecs.

>> So that leaves us using built-in functions for almost all of those 3,736
>> functions.  With the traditional approach, the target would need to
>> register the functions at start-up and then define the header file in
>> terms of them.
> Yup.  And there's a cost you missed -- those things tend to end up in
> the debugging output as well.  That's caused problems with system tools
> in the past.

Ah, yeah, that's true.  And for the C overloading I mentioned, the
traditional approach would still need to use _Generic-based macros for
some cases, which would also show up if macro debug output is enabled.
And the macros wouldn't be useful for debugging, since what they expand
to wouldn't be callable interactively.

>> There are two approaches to doing that.  One is to define the built-in
>> functions under their header file name so that they become active once
>> a prototype is seen.  But that's only appropriate for functions like
>> printf that have linkage.  The arm_sve.h functions don't have linkage
>> and there's a chance that non-SVE code could be using the same names for
>> something else (perhaps even with the same prototype, in the case of
>> things like
>> 
>>   uint64_t svcntb (void);
>> 
>> that don't mention SVE types).
>> 
>> The other alternative is to define builtins in the "__builtin_"
>> namespace and wrap them in inline wrappers, which I think is what
>> most intrinsics header files do.  E.g., from arm_neon.h:
>> 
>> __extension__ extern __inline float64x2_t
>> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> vabsq_f64 (float64x2_t __a)
>> {
>>   return __builtin_aarch64_absv2df (__a);
>> }
>> 
>> But that's 6 lines per function.  Counting a blank line between
>> each one, we'd end up with a header file of at least 26,151 lines.
>> (OK, so arm_neon.h is already longer than that.  But with SVE2 and with
>> other considerations mentioned below, arm_sve.h could easily end up into
>> 6 figures this way.)
> Yea, that's kind of what I would expect for exposing this stuff.  But I
> didn't realize how many of these you were going to need.
>
> [ ... ]
>
>
>> 
>> It's very hard to maintain header files like that by hand without
>> introducing errors, such as forgetting to put the arguments safely
>> in the implementation namespace ("__a" rather than "a" etc.).  Kyrill
>> fixed some arm_neon.h instances of this the other week.  And using macros
>> to handle common patterns just makes the error messages worse, since the
>> macros then show up in error backtraces.
> Yup.  We've seen this across multiple targets.  Y'all aren't alone here.
>
>> 
>> An alternative to maintaining the header file by hand is to generate
>> it via a script.  Ideally the script would use the same metadata as
>> the compiler itself uses when registering the built-in functions.
>> But this means writing two pieces of code to process the metadata,
>> one to generate text for the inline wrappers and one to register the
>> built-ins.  And we still end up with the same very long header file.
> Yup.  It's not ideal.
>
>> 
>> A more fundamental problem with inline wrappers is that they make it
>> harder to honour the spec for constant arguments.  If an instruction
>> requires a constant operand, Arm has traditionally required the
>> associated intrinsic argument to be an integer constant expression
>> (in the C and C++ sense).  GCC has tended to fudge this and instead only
>> requires an integer constant at expand time, after inlining and constant
>> propagation have taken place.  But this means that all sorts of other
>> optimisations have happened too, and so what's constant at expand time
>> wasn't necessarily constant at the language level.
>> 
>> Admittedly some people like that behaviour :-), but it means that what's
>> acceptable depends on the vagaries compiler optimisation.  It also means
>> that code is often not portable between GCC and clang, which implements
>> the spec more closely.
> I go back and forth on this kind of thing.  Sometimes I'd rather see us
> be more strict, other times such strictness seems to be getting in the
> way for no good reason, particularly when after inlining/optimization,
> etc we're going to get the form we need.

Yeah, for C especially it can be awkward, since users often have to fall
back to macros or #include tricks when parameterising based on a constant.
For C++, constant template arguments and constexpr should help a lot.

> I'm sure from a user standpoint the inconsistency is annoying as hell.
>
>> So the pragma approach seemed better for several reasons:
> [ ... ]
> I tend to agree, particularly due to the number of builtins you'd end up
> needing.  It's not so bad when the number is small, but at the scale
> you're doing the old fashioned way seems inferior.
>
> Let's go with with it.  Again, thanks for the write-up.

Thanks!  Is the related enum patch:

   https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01523.html

also OK?  (I really should have posted them together as a proper series.)

Richard

Reply via email to