Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

Richard Biener Tue, 16 Oct 2018 00:53:06 -0700

On Mon, Oct 15, 2018 at 8:40 PM Uecker, Martin
<martin.uec...@med.uni-goettingen.de> wrote:
>
>
> Hi Richard,
>
> as Joseph pointed out, there are some related discussions
> on the WG14 reflector. How a about moving the discussion
> there?
>
> I find your approach very interesting and that it already
> comes with an implementation is of course very useful
>
> But I don't really understand the reasons why this is not based
> on (2). These types are not "sizeless" at all, their size
> just isn't known at compile time. So to me this seems to me
> a misnomer.
>
> In fact, to me these types *do* in fact seem very similar
> to VLAs as VLAs are also complete types which also do no
> have a known size at compile time.
>
> That arrays decay to pointers doesn't mean that we
> couldn't have similar vectors types which don't decay.
> This is hardly a fundamental problem.
>
> I also don't understand the problem about the array
> size. If I understand this correctly, the size is somehow
> known at run-time and implicitly passed along with the
> values. So these new types do not need to have a
> size expression (as in your proposal).
>
> Assignment, the possibility to return the type from
> functions, and something like __sizeless_structs would
> make sense for VLAs too.
>
> So creating a new category "variable-length types" for
> both VLAs and variably-length vector types seems do make
> much more sense to me. As I see it, this would be mainly
> a change in terminology and not so much of the underlying
> approach.


I agree - those types very much feel like VLAs.

I think there's also the existing vector extension of GCC to
factor in which introduces (non-VLA) vector types to the
language and allows arithmetic on them as well as
indexing and passing and returing them by values.
Variable-size vectors should play well in that context.

Richard.

>
> Best,
> Martin
>
> Am Montag, den 15.10.2018, 15:30 +0100 schrieb Richard Sandiford:
> > The C standard says:
> >
> >     At various points within a translation unit an object type may be
> >     "incomplete" (lacking sufficient information to determine the size of
> >     objects of that type) or "complete" (having sufficient information).
> >
> > For AArch64 SVE, we'd like to split this into two concepts:
> >
> >   * has the type been fully defined?
> >   * would fully-defining the type determine its size?
> >
> > This is because we'd like to be able to represent SVE vectors as C and C++
> > types.  Since SVE is a "vector-length agnostic" architecture, the size
> > of the vectors is determined by the runtime environment rather than the
> > programmer or compiler.  In that sense, defining an SVE vector type does
> > not determine its size.  It's nevertheless possible to use SVE vector types
> > in meaningful ways, such as having automatic vector variables and passing
> > vectors between functions.
> >
> > The main questions in the RFC are:
> >
> >   1) is splitting the definition like this OK in principle?
> >   2) are the specific rules described below OK?
> >   3) coding-wise, how should the split be represented in GCC?
> >
> > Terminology
> > -----------
> >
> > Going back to the second bullet above:
> >
> >   * would fully-defining the type determine its size?
> >
> > the rest of the RFC calls a type "sized" if fully defining it would
> > determine its size.  The type is "sizeless" otherwise.
> >
> > Contents
> > --------
> >
> > The RFC is organised as follows.  I've erred on the side of including
> > detail rather than leaving it out, but each section is meant to be
> > self-contained and skippable:
> >
> >   - An earlier RFC
> >   - Quick overview of SVE
> >   - Why we need SVE types in C and C++
> >   - How we ended up with this definition
> >   - The SVE types in more detail
> >   - Outline of the type system changes
> >   - Sizeless structures (and testing on non-SVE targets)
> >   - Other variable-length vector architectures
> >   - Edits to the C standard
> >     - Base changes
> >     - Updates for consistency
> >     - Sizeless structures
> >   - Edits to the C++ standard
> >   - GCC implementation questions
> >
> > I'll follow up with patches that implement the split.
> >
> >
> >
> > An earlier RFC
> > ==============
> >
> > For the record (in case this sounds familiar) I sent an RFC about the
> > sizeless type extension a while ago:
> >
> >     https://gcc.gnu.org/ml/gcc/2017-08/msg00012.html
> >
> > The rules haven't changed since then, but this version includes more
> > information and includes support for sizeless structures.
> >
> >
> > Quick overview of SVE
> > =====================
> >
> > SVE is a vector extension to AArch64.  A detailed description is
> > available here:
> >
> >     https://static.docs.arm.com/ddi0584/a/DDI0584A_a_SVE_supp_armv8A.pdf
> >
> > but the only feature that really matters for this RFC is that SVE has no
> > fixed or preferred vector length.  Implementations can instead choose
> > from a range of possible vector lengths, with 128 bits being the minimum
> > and 2048 bits being the maximum.  Priveleged software can further
> > constrain the vector length within the range offered by the implementation;
> > e.g. linux currently provides per-thread control of the vector length.
> >
> >
> > Why we need SVE types in C and C++
> > ==================================
> >
> > SVE was designed to be an easy target for autovectorising normal scalar
> > code.  There are also various language extensions that support explicit
> > data parallelism or that make explicit vector chunking easier to do in
> > an architecture-neutral way (e.g. C++ P0214).  This means that many users
> > won't need to do anything SVE-specific.
> >
> > Even so, there's always going to be a place for writing SVE-specific
> > optimisations, with full access to the underlying ISA.  As for other
> > vector architectures, we'd like users to be able to write such routines
> > in C and C++ rather than force them to go all the way to assembly.
> >
> > We'd also like C and C++ functions to be able to take SVE vector
> > parameters and return SVE vector results, which is particularly useful
> > when implementing things like vector math routines.  In this case in
> > particular, the types need to map directly to something that fits in
> > an SVE register, so that passing and returning vectors has minimal
> > overhead.
> >
> >
> > How we ended up with this definition
> > ====================================
> >
> > Requirements
> > ------------
> >
> > We need the SVE vector types to define and use SVE intrinsic functions
> > and to write SVE vector library routines.  The key requirements when
> > defining the types were:
> >
> >   * They must be available in both C and C++ (because we want to be able
> >     add SVE optimisations to C-only codebases).
> >
> >   * They must fit in an SVE vector register (so there can be no on-the-side
> >     information).
> >
> >   * It must be possible to define automatic variables with these types.
> >
> >   * It must be possible to pass and return objects of these types
> >     (since that's what intrinsics and vector library routines need to do).
> >
> >   * It must be possible to use the types in _Generic associations
> >     (so that _Generic can be used to provide tgmath.h-style overloads).
> >
> >   * It must be possible to use pointers or references to the types
> >     (for passing or returning by pointer or reference, and because not
> >     allowing references would be semantically difficult in C++).
> >
> > Ideally, there'd also be a way of grouping SVE vectors together into tuples,
> > since the ISA has instructions like LD2 that return multiple vectors.
> > It would be good if users could also define their own tuple types, on top
> > of the ones needed by the intrinsics, although that's more "nice to have".
> >
> > Possible approaches
> > -------------------
> >
> > The main complication is that the size of an SVE vector is not a
> > compile-time constant.  It seems that any approach to handling this
> > would fall into one of three categories:
> >
> >   (1) Limit the types in such a way that there is no concept of size.
> >
> >   (2) Define the size of the types to be variable.
> >
> >   (3) Define the size of the types to be constant, either with the
> >       constant being large enough for all possible vector lengths or
> >       with the types pointing to separate memory (as for C++ classes
> >       like std::string).
> >
> > Why (2) seemed like a bad idea
> > ------------------------------
> >
> > (2) seemed initially appealing since C already has the concept of
> > variable-length arrays.  However, variable-length built-in types
> > would work in a significantly different way.  Arrays often decay to
> > pointers (which of course are fixed-length types), whereas vector
> > types never would.  Unlike arrays, it should be possible to pass
> > variable-length vectors to functions, return them from functions,
> > and assign them by value.
> >
> > One particular difficulty is that the semantics of variable-length arrays
> > rely on having a point at which the array size is evaluated.  It would
> > be difficult to extend this approach to declarations of functions that
> > pass or return variable-length types.
> >
> > As well as the extension itself being relatively complex (especially
> > for C++), it might be difficult to define it in a way that interacts
> > naturally with other (unseen) extensions, even those that are aware of
> > variable-length arrays.  Also, AIUI, variable-length arrays were added
> > to an early draft of C++14, but were later removed as too controversial
> > and didn't make it into the final standard.  C++17 still requires sizeof
> > to be constant and C11 makes variable-length arrays optional.
> >
> > (2) therefore felt like a complicated dead-end.
> >
> > Why (3) seemed like a bad idea
> > ------------------------------
> >
> > (3) can be divided into two:
> >
> > (3a) The vector types have a constant size and are large enough for all
> >      possible vector lengths.
> >
> >     The main problem with this is that the maximum size of an SVE
> >     vector (2048 bits) is much larger than the minimum size (128 bits).
> >     Using a fixed size of 2048 bits would be extremely inefficient for
> >     smaller vector lengths, and of course the whole point of using
> >     vectors is to make things *more* efficient.
> >
> >     Also, we would need to define the types such that only the bytes
> >     associated with the actual vector length are significant.  This would
> >     make it possible to pass or return the types in registers and treat
> >     them as register values when copying.  This perhaps has some similarity
> >     with overaligned structures such as:
> >
> >       struct s { _Alignas(16) int i; };
> >
> >     except that the amount of padding would only be known at runtime.
> >
> >     There's also a significant conceptual problem: encoding a fixed size
> >     goes against a guiding principle of SVE, in which there is no preferred
> >     vector length.  There's nothing particularly magical about the current
> >     limit of 2048 bits and it would be better to avoid an ABI break if the
> >     maximum ever did increase in future.
> >
> > (3b) The vector types have a constant size and refer to separate storage
> >      (as for std::string etc.)
> >
> >     This would be difficult to do without C++-style constructor, destructor,
> >     copy and move semantics, so wouldn't work well in C.  And in C++ it 
> > would
> >     be less efficient than the proposed approach, since presumably an 
> > Allocator
> >     would be needed to allocate the separate storage.  It would also require
> >     a complicated ABI mapping to ensure that the vectors can still be passed
> >     and returned in registers.
> >
> > Chosen approach
> > ---------------
> >
> > We therefore took approach (1) and classified C and C++ types as "sized"
> > (having a measurable size when fully-defined) and "sizeless" (never
> > having a measurable size).  Sizeless types have no defined size,
> > alignment or layout at the language level, with those things becoming
> > purely an ABI-level detail.
> >
> > We then treated all sizeless types as permanently incomplete.
> > On its own, this would put them in a similar situation to "void"
> > (although they wouldn't be exactly the same, since there are some
> > specific rules for "void" that don't apply to incomplete types in
> > general).  We then relaxed specific rules until the types were actually
> > useful.
> >
> > Things in favour of (1)
> > -----------------------
> >
> > The reasons above were mostly negative, arriving at (1) by elimination.
> > A more positive justification of this approach is that it seems
> > to meet the requirements in the most efficient way possible.  The
> > vectors can use their natural (native) representation, and the type
> > system prevents uses that would make that representation problematic.
> >
> > Also, the approach of starting with very restricted types and then
> > specifically allowing certain things should be more future-proof
> > and interact better with other language extensions.  By default,
> > any language extension would treat the new types like other incomplete
> > types and choose conservatively-correct behaviour.  It would then be
> > possible to relax the language extension if this default behaviour
> > turns out to be too restrictive.
> >
> > (That said, treating the types as permanently incomplete still won't
> > avoid all clashes with other extensions.  For example, we need to
> > allow objects of automatic storage duration to have certain forms of
> > incomplete type, whereas an extension might implicitly assume that all
> > such objects must already have complete type.  The approach should still
> > avoid the worst effects though.)
> >
> >
> > The SVE types in more detail
> > ============================
> >
> > Arm has published an SVE "ACLE" that specifies the SVE types and intrinsic
> > functions in detail.  For reference this is available without registration
> > at:
> >
> >     https://static.docs.arm.com/100987/0000/acle_sve_100987_0000_00_en.pdf
> >
> > but I'll try to keep this self-contained.
> >
> > The ACLE defines a vector type sv<base>_t for each supported element type
> > <base>_t, so that the complete set is:
> >
> >     svint8_t      svint16_t     svint32_t     svint64_t
> >     svuint8_t     svuint16_t    svuint32_t    svuint64_t
> >                   svfloat16_t   svfloat32_t   svfloat64_t
> >
> > The types in each column have the same number of lanes and have twice
> > as many lanes as those in the column to the right.  Every vector has
> > the same number of bytes in total, with the number of bytes being
> > determined at runtime.
> >
> > The ACLE also defines a single predicate type:
> >
> >     svbool_t
> >
> > that has the same number of lanes as svint8_t and svuint8_t.
> >
> > All these types are opaque builtin types and are only expected to
> > be used with the associated ACLE intrinsics.  There are intrinsics for
> > creating vectors from scalars, loading from scalars, storing to scalars,
> > reinterpreting one type as another, etc.
> >
> > The idea is that the vector types would only be used for short-term
> > register-sized working data.  Longer-term data would typically be stored
> > out to arrays.
> >
> > For example, the vector function underlying:
> >
> >     #pragma omp declare simd
> >     double sin(double);
> >
> > would be:
> >
> >     svfloat64_t mangled_sin(svfloat64_t, svbool_t);
> >
> > (The svbool_t is because SVE functions should be predicated by default,
> > to avoid the need for a scalar epilogue loop.)
> >
> > The ACLE also defines x2, x3 and x4 tuple types for each vector type;
> > for example, svint8x3_t is a tuple of 3 svint8_ts.  The tuples are
> > structure-like types with fields v0, v1, v2 and v3, up to the number
> > required.
> >
> >
> > Outline of the type system changes
> > ==================================
> >
> > Going back to the summary at the start of the RFC, C classifies types as
> > "complete" (the size of objects can be calculated) or "incomplete" (the
> > size of objects can't be calculated).  There's very little you can do
> > with a type until it becomes complete.
> >
> > The approach we took was to treat all the SVE types as permanently
> > incomplete.  We then went through the standard relaxing specific
> > rules until the types were actually useful.
> >
> > The first step was to classify types as:
> >
> >   * "indefinite" (lacking sufficient information to create an object of
> >     that type) or "definite" (having sufficient information)
> >
> >   * "sized" (will have a known size when definite) or "sizeless" (will
> >     never have a known size)
> >
> >   * "incomplete" (lacking sufficient information to determine the size of
> >     objects of that type) or "complete" (having sufficient information)
> >
> > where the wording for the final bullet is unchanged from the standard.
> > Thus a "definite type" is one that has been fully-defined rather than
> > simply declared, and "complete" is now equivalent to "sized and definite".
> > All standard types are "sized" (even "void", although it's always
> > indefinite and incomplete).
> >
> > We then needed to make some rules use the distinction between "indefinite"
> > and "definite" rather than "incomplete" and "complete".  The specific
> > things we wanted to allow were:
> >
> >   * defining automatic variables with sizeless definite type
> >   * defining functions whose parameters have sizeless definite type
> >   * defining functions that return a sizeless definite type
> >   * using sizeless definite types in _Generic associations
> >   * dereferencing pointers to sizeless definite types
> >
> > Specific things we wanted to remain invalid -- by inheriting the rules from
> > incomplete types -- were:
> >
> >   * creating or accessing arrays that have sizeless element types
> >   * doing pointer arithmetic on pointers to sizeless types
> >   * using sizeof and _Alignof with a sizeless type (or object of sizeless 
> > type)
> >   * defining (sized) unions or structures with sizeless members
> >
> > It also seemed worth adding an extra restriction:
> >
> >   * variables with sizeless type must not have static or thread-local
> >     storage duration
> >
> > In practice it's impossible to define such variables with incomplete type,
> > but having an explicit rule means that things like:
> >
> >     extern svint8_t foo;  // An SVE vector of int8_t elements.
> >
> > are outright invalid rather than simply useless (because no other
> > translation unit could ever define foo).  Similarly, without an
> > explicit rule:
> >
> >     svint8_t foo;         // An SVE vector of int8_t elements.
> >
> > would be a valid tentative definition at the point it occurs and only
> > become invalid at the end of the translation unit, because svint8_t is
> > never completed.
> >
> > This restriction isn't critical but it gives better diagnostics.
> >
> >
> > Sizeless structures (and testing on non-SVE targets)
> > ====================================================
> >
> > We're planning to build all SVE intrinsic types directly into GCC
> > (patches already written).  SVE therefore doesn't strictly need a syntax
> > for creating new sizeless types in C and C++.  However, having a way of
> > creating new structure-like "sizeless" types would be useful for three
> > reasons:
> >
> >   - Functions could return arbitrary data by value.  The SVE ABI allows
> >     a function to return up to 8 vectors and 4 predicates in registers,
> >     which is far more flexible than the intrinsic types.
> >
> >   - We could use these sizeless structure types to test the functionality
> >     on all targets.
> >
> >   - A lot of the C++ frontend is concerned with classes, and having
> >     a way of creating sizeless classes would help make the C++ changes
> >     more consistent.
> >
> > The patches therefore add a new "__sizeless_struct" keyword to denote
> > structures that are sizeless rather than sized.  Unlike normal
> > structures, these structures can have members of sizeless type in
> > addition to members of sized type.  On the other hand, they have all
> > the same limitations as other sizeless types (described in earlier
> > sections).
> >
> > E.g., a sizeless structure definition might look like:
> >
> >     __sizeless_struct data {
> >       double *array;
> >       svuint64_t indices;  // An SVE vector of uint64_t elements.
> >       svbool_t active;     // An SVE predicate.
> >     };
> >
> > Adding a new keyword seemed better than using an attribute because it
> > means that the sized vs. sizeless distinction is fixed by the declaration.
> > E.g.:
> >
> >     struct data;                     // Is it sized or sizeless?
> >     extern struct data global_data;  // OK if sized, not if sizeless.
> >     struct __attribute__((sizeless)) data {
> >       double *array;
> >       svuint64_t indices;            // An SVE vector of uint64_t elements.
> >       svbool_t active;               // An SVE predicate.
> >     };
> >
> > would lead to the declaration of "global_data" sneaking through
> > despite being invalid when "data" is sizeless.
> >
> > The tests in the patches all use these __sizeless_structs; they contain
> > nothing SVE- or AArch64-specific.
> >
> >
> > Other variable-length vector architectures
> > ==========================================
> >
> > The proposed RISC-V vector extension also has variable-length vectors.
> > When this language change was discussed on the clang developers' list,
> > Bruce Hoult (from SiFive, but speaking personally) replied with:
> >
> >     http://lists.llvm.org/pipermail/cfe-dev/2018-May/057943.html
> >
> > That message covers some of the background about the vector extension.
> > On the language changes, Bruce said:
> >
> >     > However, even though the length is variable, the concept of a
> >     > "register-sized" C and C++ vector type makes just as much sense for 
> > SVE
> >     > as it does for other vector architectures.  Vector library functions
> >     > take such register-sized vectors as input and return them as results.
> >     > Intrinsic functions are also just as useful as they are for other 
> > vector
> >     > architectures, and they too take register-sized vectors as input and
> >     > return them as results.
> >
> >     Intrinsic functions are absolutely required, and are I think the main
> >     reason for such a low-level register-sized vector type to exist.
> >
> > [ Bruce went on to say:
> >
> >     I'm not sure whether user-written functions operating on register-sized
> >     vectors are useful enough to support. User-written functions would 
> > normally
> >     take and return a higher-level vector type, and would implement the 
> > desired
> >     functionality in terms of calls to other user-written functions 
> > (operating
> >     on the high level vector as a whole) and/or explicit loops iterating
> >     through the high level vector type using intrinsic functions on the
> >     register-sized vector type proposed here.
> >
> > But this use case is very important for SVE, since it will allow us
> > to implement vector math routines in a way that works with the OpenMP
> > "declare simd" construct.  There was also talk on gcc@ recently about
> > supporting this style of interface for RISC-V. ]
> >
> > [...]
> >
> >     > All these types are opaque builtin types and are only intended to be
> >     > used with the associated ACLE intrinsics.  There are intrinsics for
> >     > creating vectors from scalars, loading from scalars, storing to 
> > scalars,
> >     > reinterpreting one type as another, etc.
> >     >
> >     > The idea is that the vector types would only be used for short-term
> >     > register-sized working data.  Longer-term data would typically be 
> > stored
> >     > out to arrays.
> >
> >     I agree with this.
> >
> > [...]
> >
> >     > The approach we took was to treat all the SVE types as permanently
> >     > incomplete.
> >
> >     This seems reasonable.
> >
> > So it looks like this extension would be useful for at least one
> > architecture besides SVE.
> >
> >
> > Edits to the C standard
> > =======================
> >
> > This section specifies the behaviour for sizeless types as an edit to N1570.
> > There are three stages:
> >
> >   - base changes, which add enough support for built-in sizeless
> >     vector types
> >
> >   - updates for consistency, which change some of the wording without
> >     changing the meaning
> >
> >   - support for sizeless structures
> >
> > In each case, -strikethrough- indicates deleted text and *bold*
> > includes additional text.
> >
> >
> > Base changes
> > ------------
> >
> > These changes are enough to support sizeless built-in vector types.
> >
> >     6.2.5 Types
> >     -----------
> >
> >     1. The meaning of a value stored in an object or returned by a
> >     function is determined by the type of the expression used to access
> >     it. … Types are partitioned into object types (types that
> >     describe objects) and function types (types that describe
> >     functions).  -At various points within a translation unit an object
> >     type may be incomplete (lacking sufficient information to determine
> >     the size of objects of that type) or complete (having sufficient
> >     information).37)- *Object types are further partitioned into sized and
> >     sizeless; all basic and derived types defined in this standard are
> >     sized, but an implementation may provide additional sizeless types.*
> >
> >     1A. *At various points within a translation unit an object type may
> >     be indefinite (lacking sufficient information to construct an object
> >     of that type) or definite (having sufficient information).37) An
> >     object type is said to be complete if it is both sized and definite;
> >     all other object types are said to be incomplete.  Complete types
> >     have sufficient information to determine the size of an object of
> >     that type while incomplete types do not.*
> >
> >     1B. *Arrays, structures, unions and enumerated types are always
> >     sized, so for them the term incomplete is equivalent to (and used
> >     interchangeably with) the term indefinite.*
> >
> >     …
> >
> >     19. The void type comprises an empty set of values; it is -an
> >     incomplete- *a sized indefinite* object type that cannot be completed
> >     *(made definite)*.
> >
> >     …
> >
> >     37) A type may be -incomplete- *indefinite* or -complete- *definite*
> >     throughout an entire translation unit, or it may change states at
> >     different points within a translation unit.
> >
> >     …
> >
> >     6.3.2.1 Lvalues, arrays, and function designators
> >     -------------------------------------------------
> >
> >     1.  An lvalue is an expression (with an object type other than void)
> >     that potentially designates an object;64) … A modifiable lvalue is
> >     an lvalue that does not have array type, does not have an
> >     -incomplete- *indefinite* type, does not have a const-qualified
> >     type, …
> >
> >     2.  Except when it is the operand of the sizeof operator, the
> >     _Alignof operator, the unary & operator, the ++ operator, the --
> >     operator, or the left operand of the . operator or an assignment
> >     operator, an lvalue that does not have array type is converted to
> >     the value stored in the designated object (and is no longer an
> >     lvalue); this is called lvalue conversion. … If the lvalue has an
> >     -incomplete- *indefinite* type and does not have array type, the
> >     behavior is undefined. …
> >
> >     …
> >
> >     6.5.1.1 Generic selection
> >     -------------------------
> >
> >     …
> >
> >     Constraints
> >
> >     2. A generic selection shall have no more than one default generic
> >     association. The type name in a generic association shall specify a
> >     -complete- *definite* object type other than a variably modified
> >     type. …
> >
> >     …
> >
> >     6.5.2.2 Function calls
> >     ----------------------
> >
> >     Constraints
> >
> >     1. The expression that denotes the called function92) shall have
> >     type pointer to function returning void or returning a -complete-
> >     *definite* object type other than an array type.
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     4. An argument may be an expression of any -complete- *definite* object
> >     type. …
> >
> >     …
> >
> >     6.5.2.5 Compound literals
> >     -------------------------
> >
> >     Constraints
> >
> >     1. The type name shall specify a -complete- *definite* object type or an
> >     array of unknown size, but not a variable length array type.
> >
> >     …
> >
> >     6.7 Declarations
> >     ----------------
> >
> >     Constraints
> >
> >     …
> >
> >     4A. *If an identifier for an object does not have automatic storage
> >     duration, its type must be sized rather than sizeless.*
> >
> >     Semantics
> >
> >     …
> >
> >     7. If an identifier for an object is declared with no linkage, the
> >     type for the object shall be -complete- *definite* by the end of its
> >     declarator, or by the end of its init-declarator if it has an
> >     initializer; in the case of function parameters (including in
> >     prototypes), it is the adjusted type (see 6.7.6.3) that is required
> >     to be -complete- *definite*.
> >
> >     …
> >
> >     6.7.6.3 Function declarators (including prototypes)
> >     ---------------------------------------------------
> >
> >     Constraints
> >
> >     …
> >
> >     4. After adjustment, the parameters in a parameter type list in a
> >     function declarator that is part of a definition of that function
> >     shall not have -incomplete- *indefinite* type.
> >
> >     …
> >
> >     6.7.9 Initialization
> >     --------------------
> >
> >     Constraints
> >
> >     …
> >
> >     3. The type of the entity to be initialized shall be an array of
> >     unknown size or a -complete- *definite* object type that is not a
> >     variable length array type.
> >
> >     …
> >
> >     6.9.1 Function definitions
> >     --------------------------
> >
> >     Constraints
> >
> >     …
> >
> >     3. The return type of a function shall be void or a -complete-
> >     *definite* object type other than array type.
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     7. The declarator in a function definition specifies the name of the
> >     function being defined and the identifiers of its parameters. …
> >     [T]he type of each parameter is adjusted as described in
> >     6.7.6.3 for a parameter type list; the resulting type shall be a
> >     -complete- *definite* object type.
> >
> >     …
> >
> >     J.2 Undefined behavior
> >     ----------------------
> >
> >         …
> >       * A non-array lvalue with -an incomplete- *an indefinite* type is used
> >         in a context that requires the value of the designated object
> >         (6.3.2.1).
> >         …
> >       * An identifier for an object is declared with no linkage and the
> >         type of the object is -incomplete- *indefinite* after its
> >         declarator, or after its init-declarator if it has an
> >         initializer (6.7).
> >         …
> >       * An adjusted parameter type in a function definition is not a
> >         -complete- *definite* object type (6.9.1).
> >         …
> >
> > Updates for consistency
> > -----------------------
> >
> > These changes are a prerequisite for sizeless structures.  They have no
> > effect otherwise, but might be preferred anyway because they make the
> > terminology more consistent.  They apply on top of the previous edits.
> >
> >     6.2.5 Types
> >     -----------
> >
> >     …
> >
> >     22. An array type of unknown size is an -incomplete- *indefinite*
> >     type. It is -completed- *made definite*, for an identifier of that type,
> >     by specifying the size in a later declaration (with internal or
> >     external linkage). A structure or union type of unknown content (as
> >     described in 6.7.2.3) is an -incomplete- *indefinite* type. It is
> >     -completed- *made definite*, for all declarations of that type, by
> >     declaring the same structure or union tag with its defining content
> >     later in the same scope.
> >
> >     …
> >
> >     6.2.7 Compatible type and composite type
> >     ----------------------------------------
> >
> >     1. Two types have compatible type if their types are the same. …
> >     Moreover, two structure, union, or enumerated types declared in
> >     separate translation units are compatible if their tags and members
> >     satisfy the following requirements: If one is declared with a tag,
> >     the other shall be declared with the same tag. If both are
> >     -completed- *made definite* anywhere within their respective
> >     translation units, then the following additional requirements apply: …
> >
> >     …
> >
> >     6.7.2.1 Structure and union specifiers
> >     --------------------------------------
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     8. The presence of a struct-declaration-list in a
> >     struct-or-union-specifier declares a new type, within a translation
> >     unit. The struct-declaration-list is a sequence of declarations for
> >     the members of the structure or union.  If the struct-declaration-list
> >     does not contain any named members, either directly or via an anonymous
> >     structure or anonymous union, the behavior is undefined.  The type is
> >     -incomplete- *indefinite* until immediately after the } that terminates
> >     the list, and -complete- *definite* thereafter.
> >
> >     …
> >
> >     6.7.2.2 Enumeration specifiers
> >     ------------------------------
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     4. … The enumerated type is -incomplete- *indefinite* until
> >     immediately after the } that terminates the list of enumerator
> >     declarations, and -complete- *definite* thereafter.
> >
> >     …
> >
> >     6.7.2.3 Tags
> >     ------------
> >
> >     …
> >
> >     Semantics
> >
> >     4. All declarations of structure, union, or enumerated types that
> >     have the same scope and use the same tag declare the same
> >     type. Irrespective of whether there is a tag or what other
> >     declarations of the type are in the same translation unit, the type
> >     is -incomplete- *indefinite* 129) until immediately after the closing
> >     brace of the list defining the content, and -complete- *definite*
> >     thereafter.
> >
> >     …
> >
> >     8. If a type specifier of the form
> >
> >     struct-or-union identifier
> >
> >     occurs other than as part of one of the above forms, and no other
> >     declaration of the identifier as a tag is visible, then it declares
> >     an -incomplete- *indefinite* structure or union type, and declares the
> >     identifier as the tag of that type.131)
> >
> >     …
> >
> >     129) An -incomplete- *indefinite* type may only by used when -the
> >     size of an object- *the ability to create an object* of that type
> >     is not needed.  It is not needed, for example, when a typedef name
> >     is declared to be a specifier for a structure or union, or when a
> >     pointer to or a function returning a structure or union is being
> >     declared. (See -incomplete- *indefinite* types in 6.2.5.) The
> >     specification has to be -complete- *definite* before such a function
> >     is called or defined.
> >
> >     6.7.6.3 Function declarators (including prototypes)
> >     ---------------------------------------------------
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     12.  If the function declarator is not part of a definition of that
> >     function, parameters may have -incomplete- *indefinite* type and may use
> >     the [*] notation in their sequences of declarator specifiers to
> >     specify variable length array types.
> >
> >     …
> >
> >     J.2 Undefined behavior
> >     ----------------------
> >
> >         …
> >       * When the -complete- *definite* type is needed, an -incomplete-
> >         *indefinite* structure or union type is not completed in the same
> >         scope by another declaration of the tag that defines the content
> >         (6.7.2.3).
> >         …
> >
> > Sizeless structures
> > -------------------
> >
> > These additional changes to N1570 add the concept of a sizeless structure.
> > Again they apply on top of the edits above:
> >
> >     6.2.3 Name spaces of identifiers
> >     --------------------------------
> >
> >     1. If more than one declaration of a particular identifier is
> >     visible at any point in a translation unit, the syntactic context
> >     disambiguates uses that refer to different entities. Thus, there
> >     are separate name spaces for various categories of identifiers, as
> >     follows:
> >
> >       …
> >
> >       * the tags of *sized* structures, *sizeless structures,* unions, and
> >       enumerations (disambiguated by following any32) of the keywords
> >       struct, *__sizeless_struct,* union, or enum);
> >
> >       …
> >
> >     6.2.5 Types
> >     -----------
> >
> >     1. … Types are partitioned into object types (types that describe
> >     objects) and function types (types that describe functions).
> >     Object types are further partitioned into sized and sizeless;
> >     -all basic and derived types defined in this standard are
> >     sized, but an implementation may provide additional sizeless types.-
> >     *the only sizeless types defined by this standard are 
> > __sizeless_structs,
> >     but an implementation may provide additional sizeless types.*
> >
> >     …
> >
> >     1B. Arrays, -structures,- unions and enumerated types are always
> >     sized, so for them the term incomplete is equivalent to (and used
> >     interchangeably with) the term indefinite.
> >
> >     …
> >
> >     20. Any number of derived types can be constructed from the object
> >     and function types, as follows: …
> >
> >       * A *sized* structure type describes a sequentially allocated
> >         nonempty set of sized member objects (and, in certain
> >         circumstances, an incomplete array), each of which has an
> >         optionally specified name and possibly distinct type.
> >
> >       * *A sizeless structure type describes a set of non-overlapping
> >         member objects whose types may be sizeless and whose relative
> >         positions are unspecified.  It is also unspecified whether the
> >         structure occupies a single contiguous piece of storage or
> >         whether it requires several disjoint pieces.*
> >
> >     …
> >
> >     *20A. The term structure type refers collectively to sized structure
> >     types and sizeless structure types.*
> >
> >     …
> >
> >     6.4.1 Keywords
> >     --------------
> >
> >     Syntax
> >
> >     1. *(Add __sizeless_struct to the list and update the copy in A.1.2)*
> >
> >     …
> >
> >     6.5.8 Relational operators
> >     --------------------------
> >
> >     …
> >
> >     Semantics
> >
> >     …
> >
> >     5. When two pointers are compared, the result depends on the
> >     relative locations in the address space of the objects pointed to.
> >     … If the objects pointed to are members of the same aggregate object,
> >     pointers to *sized* structure members declared later compare greater
> >     than pointers to members declared earlier in the structure, and
> >     pointers to array elements with larger subscript values compare
> >     greater than pointers to elements of the same array with lower
> >     subscript values. …
> >
> >     …
> >
> >     6.7.2.1 Structure and union specifiers
> >     --------------------------------------
> >
> >     Syntax
> >
> >     struct-or-union-specifier:
> >         struct-or-union identifieropt { struct-declaration-list }
> >         struct-or-union identifier
> >
> >     struct-or-union:
> >         struct
> >         *__sizeless_struct*
> >         union
> >
> >     …
> >
> >     3. A *sized* structure or union shall not contain a member with
> >     incomplete or function type …, except that the last member of a
> >     structure with more than one named member may have incomplete array
> >     type; such a structure (and any union containing, possibly
> >     recursively, a member that is such a structure) shall not be a
> >     member of a structure or an element of an array.  *Simlarly, a
> >     sizeless structure shall not contain a member with indefinite or
> >     function type; the exception for incomplete array types does not
> >     apply.*
> >
> >     …
> >
> >     Semantics
> >
> >     6. As discussed in 6.2.5, a *sized* structure is a type consisting
> >     of a sequence of members, whose storage is allocated in an ordered
> >     sequence; *a sizeless structure is a type consisting of
> >     non-overlapping members whose relative position is unspecified,*
> >     and a union is a type consisting of a sequence of members whose
> >     storage overlap.
> >
> >     7. Structure and union specifiers have the same form. The keywords
> >     struct, *__sizeless_struct* and union indicate that the type being
> >     specified is, respectively, a *sized* structure type, *a sizeless
> >     structure type,* or a union type.
> >
> >     …[8 is as above]…
> >
> >     9. A member of a structure or union may have any complete object
> >     type other than a variably modified type.123)  *A member of a sizeless
> >     structure may also have a sizeless definite type.*  In addition, a
> >     member *of a structure or union* may be declared to consist of a
> >     specified number of bits (including a sign bit, if any). Such a
> >     member is called a bit-field;124) its width is preceded by a colon.
> >
> >     …
> >
> >     15. Within a *sized* structure object, the non-bit-field members and
> >     the units in which bit-fields reside have addresses that increase in
> >     the order in which they are declared. A pointer to a *sized* structure
> >     object, suitably converted, points to its initial member (or if that
> >     member is a bit-field, then to the unit in which it resides), and
> >     vice versa. There may be unnamed padding within a *sized* structure
> >     object, but not at its beginning.
> >
> >     15A. *The representation of a sizeless structure object is
> >     unspecified.  It is possible to form pointers to the structure
> >     itself and to its individual members, but the relationship between
> >     their addresses is unspecified.  The structure may occupy a single
> >     piece of contiguous storage or it may occupy several disjoint
> >     pieces.*
> >
> >     …
> >
> >     18 As a special case, the last element of a *sized* structure with
> >     more than one named member may have an incomplete array type; this
> >     is called a flexible array member. …
> >
> >     …
> >
> >     6.7.2.3 Tags
> >     ------------
> >
> >     Constraints
> >
> >     …
> >
> >     2. Where two declarations that use the same tag declare the same
> >     type, they shall both use the same choice of struct, 
> > *__sizeless_struct,*
> >     union, or enum.
> >
> >     …
> >
> >
> > Edits to the C++ standard
> > =========================
> >
> > We have a similar set of changes to the C++ standard, but this RFC is
> > long enough already, so I've not included them here.  I also didn't find
> > them to be particularly useful when writing the C++ patches, since most
> > of the changes were obvious given a few basic rules.  Those rules are:
> >
> >   - type traits can be used with sizeless types (unlike incomplete types)
> >
> >   - sizeless structures cannot have base classes or be used as base classes
> >
> >   - sizeless structures cannot have virtual members
> >
> >   - pointers to member variables are invalid for sizeless structures
> >     (although taking the address of a member of a specific sizeless object
> >     is fine, as for C)
> >
> >   - sizeless types are not literal types
> >
> >   - sizeless types cannot be created by operator new (as for incomplete 
> > types)
> >
> >   - sizeless types cannot be deleted (so, unlike for incomplete types,
> >     this is an error rather than a warning)
> >
> >   - sizeless types cannot be thrown or caught (as for incomplete types)
> >
> >   - sizeless types cannot be used with typeid() (as for incomplete types)
> >
> >
> > GCC implementation questions
> > ============================
> >
> > The GCC patches are pretty simple in principle.  The language changes
> > involve going through the standard replacing "complete" with "definite"
> > and most of the GCC patches go through the frontend code making the
> > same kind of change.
> >
> > New type flag for sizeless types
> > --------------------------------
> >
> > The patches add a new flag TYPE_SIZELESS_P to represent the negative of:
> >
> >   * would fully-defining the type determine its size?
> >
> > from the summary above.  Negative names are usually a bad thing,
> > but the natural default is for the flag to be off.
> >
> > There are currently 17 bits free in tree_type_common, so the patches
> > steal one of those.  Is that OK?
> >
> > The effect on COMPLETE_TYPE_P
> > -----------------------------
> >
> > The current definition of COMPLETE_TYPE_P is:
> >
> >     /* Nonzero if this type is a complete type.  */
> >     #define COMPLETE_TYPE_P(NODE) (TYPE_SIZE (NODE) != NULL_TREE)
> >
> > Although the SVE types don't have a measurable size at the language
> > level, they still have a TYPE_SIZE and TYPE_SIZE_UNIT, with the sizes
> > using placeholders for the runtime vector size.  So after the split
> > described in the summary, TYPE_SIZE (NODE) != NULL_TREE means
> > "the type is fully defined" rather than "the type is complete".
> > With TYPE_SIZELESS_P, the definition of "complete type" would be:
> >
> >     #define COMPLETE_TYPE_P(NODE) \
> >       (TYPE_SIZE (NODE) != NULL_TREE && !TYPE_SIZELESS_P (NODE))
> >
> > i.e. the type is fully-defined, and fully-defining it determines
> > its size at the language level.
> >
> > Uses of COMPLETE_TYPE_P outside the frontends
> > ---------------------------------------------
> >
> > The main complication is that the concept of "complete type" is exposed
> > outside the frontends, with COMPLETE_TYPE_P being defined in tree.h.
> >
> > I tried to audit all uses outside the frontends and it looks like
> > they're all testing whether "the type is fully defined" and don't
> > care about the distinction between sized and sizeless.  This means
> > that the current definition (rather than the new definition)
> > should be correct in all cases.
> >
> > In some cases the tests are simple null checks, like:
> >
> >      /* Try to approach equal type sizes.  */
> >      if (!COMPLETE_TYPE_P (type_a)
> >          || !COMPLETE_TYPE_P (type_b)
> >          || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
> >          || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
> >        break;
> >
> > IMO it's more obvious to test TYPE_SIZE_UNIT directly for null here.
> > Having a wrapper doesn't add much.
> >
> > In places like:
> >
> >   if (!COMPLETE_TYPE_P (t))
> >     layout_type (t);
> >
> > and:
> >
> >   if (COMPLETE_TYPE_P (t) && TYPE_CANONICAL (t)
> >       && TYPE_MODE (t) != TYPE_MODE (TYPE_CANONICAL (t)))
> >     ...
> >
> > it's testing whether the type has been laid out already.
> >
> > So the patches do two things:
> >
> >   * Expand the definition of the current COMPLETE_TYPE_P macro outside
> >     the frontends if the macro is simply protecting against a null
> >     dereference.
> >
> >   * Make COMPLETE_TYPE_P local to the frontends and rename all uses
> >     outside the frontends.
> >
> > As far as the second point goes, I wasn't sure what new name to use
> > outside the front ends.  Possibilities include:
> >
> >   - DEFINITE_TYPE_P
> >   - USABLE_TYPE_P
> >   - VALID_VAR_TYPE_P
> >   - TYPE_LAID_OUT_P
> >   - TYPE_DEFINED_P
> >   - TYPE_FULLY_DEFINED_P
> >   - TYPE_READY_P
> >   ...other suggestions welcome...
> >
> > I went for DEFINITE_TYPE_P because that's what the SVE specification
> > uses, but something more neutral like TYPE_DEFINED_P might be better.
> >
> > Frontend changes
> > ----------------
> >
> > The frontend patches change COMPLETE_TYPE_P to DEFINITE_TYPE_P where
> > necessary.  I've tried where possible to accompany each individual
> > change with a test.
> >
> > This worked fairly naturally (IMO) for C, and most of the changes could
> > be tied directly to the language edits above.
> >
> > For C++ it was more difficult (not surprisingly).  There are a lot of
> > tests for COMPLETE_TYPE_P that are obviously testing whether a class
> > has been fully defined, and are more concerned with name lookup than
> > TYPE_SIZE.  The same goes for COMPLETE_OR_OPEN_TYPE_P and whether the
> > definition has been started.  So while the C changes were relatively
> > small and self-contained, the C++ changes replace many more uses of
> > COMPLETE_TYPE_P than they keep.  This makes me wonder whether it's a
> > good idea to keep COMPLETE_TYPE_P at all, or whether it would be better
> > to replace the remaining uses with something more explicit like:
> >
> >   TYPE_SIZE_KNOWN_P
> >   TYPE_SIZE_DEFINED_P
> >   TYPE_SIZE_MEASURABLE_P
> >   TYPE_SIZE_COMPLETE_P
> >   ...suggestions again welcome...
> >
> > Thanks,
> > Richard

Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

Reply via email to