Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

Richard Sandiford Thu, 18 Oct 2018 04:28:21 -0700

Joseph Myers <jos...@codesourcery.com> writes:
> On Wed, 17 Oct 2018, Richard Sandiford wrote:
>
>> > But as shown in the related discussions, there are other possible features 
>> > that might also involve non-VLA types whose size is not a compile-time 
>> > constant.  And so it's necessary to work with the people interested in 
>> > those features in order to clarify what the underlying concepts ought to 
>> > look like to support different such features.
>> 
>> Could you give pointers to the specific proposals/papers you mean?
>
> They're generally reflector discussions rather than written up as papers, 
> exploring the space of problems and solutions in various areas (including 
> bignums and runtime introspection of types).  I think the first message in 
> those discussions is number 15529 
> <http://www.open-std.org/jtc1/sc22/wg14/15529> and then relevant 
> discussions continue for much of the next 200 messages or so.


OK, thanks.  I've read from there to the latest message at the time
of writing (15720).  There seemed to be various ideas:

- a new int128_t, which started the discussion off.

- support for parameterised fixed-size integers like _Int(40), which
  seemed to be a C version of C++ template<int> and wouldn't need
  variable-length types.

- bignums that extend as necessary.  On that I agree with what you said in:

    <http://www.open-std.org/jtc1/sc22/wg14/15572>
    A bignum type, in the sense of one that grows its storage if you
    store a too-big number in it (as opposed to fixed-width int<N> where
    you can specify an arbitrary integer constant expression for N),
    cannot meet other requirements for C integer types such as being
    directly represented in binary - it has to, effectively, be a fixed
    size but contain a pointer to allocated storage (and then there are
    considerations of how such a type should handle errors for
    allocation failure).

  and Hans Boehm said in:

    <http://www.open-std.org/jtc1/sc22/wg14/15573>
    2) Provide an integral type that is reasonably efficient for small
    integers, but gracefully overflows to something along the lines of
    (1). A common way to do that in other languages is to represent
    e.g. 63-bit integers directly by adding a zero bit on the right.
    On overflow a more complex result is represented by e.g. a 64-bit
    aligned pointer with the low bit set to one. That way integer
    addition is just an add instruction followed by an overflow check in
    the normal case. Probably a better way to do integer arithmetic in
    many, maybe even most, cases. Especially since such integers need to
    be usable as array elements, I don't see how to avoid memory
    allocation under the covers, along the slow path.

  This IIRC is how LLVM's APInt is implemented.  It doesn't need
  variable-length types, and although it would need some kind of
  memory management support for C, it doesn't need any language
  changes at all for C++.

  It's also similar to what GCC does with auto_vec<T, N> and LLVM does
  with SmallVector: the types have embedded room for common cases and
  fall back to separately-allocated storage if the contents get too big.

  There was talk about having it as a true variable-length type in:

    <http://www.open-std.org/jtc1/sc22/wg14/15577>
    (2) is difficult because of the requirements for memory management and
    the necessity to deal with allocation failures.

    For avoiding integer overflow vulnerabilities, there is a variant of (2)
    which is not possible to implement in a library, where expressions are
    evaluated with a sufficient number of bits to obtain the mathematically
    correct result.  GNAT has implemented something in this direction
    (MINIMIZED and ELIMINATED):

    
<https://gcc.gnu.org/onlinedocs/gnat_ugn/Management-of-Overflows-in-GNAT.html#Management-of-Overflows-in-GNAT>

    I think that for expressions which do not involve shifts by
    non-constants, it should be possible to determine the required storage
    at compile time, so it would avoid the memory allocation issue.  Unlike
    Ada, C doesn't have a power operator, so the storage requirements would
    grow with the size of the expression (still under the assumption that
    left shifts are excluded).

  But AIUI that was intended to be more special purpose, for
  intermediate results while evaluating an expression.  It solves
  the memory allocation issue because the (stack) memory used for
  evaluating the expression could be recovered after evaluation is
  complete.

  This approach wouldn't work if it was extended to an assignable bignum
  object type.  E.g. prohibiting left shifts wouldn't then help since:

     bignum x = ...;
     x <<= var; // invalid

  would be equivalent to:

     bignum x = ...;
     for (int i = 0; i < var; ++i)
       x += x; // valid

  Thus it would be easy to create what are effectively allocas of O(1<<var)
  bytes for some variable var.  And if the memory was always allocated on
  the stack, it would be hard to recover memory from discarded objects
  until the function returns.

  Hans went on to say:  

    I personally think that, especially in light of various integer overflow
    vulnerabilities, (2) would be really nice to have.

    I unfortunately haven't had time to follow the WG21 bignum discussion on
    this very closely. But my impression is that they're aiming to enable (2).

  So it sounds like bignums are being solved on the C++ side at least
  without having to add true variable-length types.

  FWIW, this corresponds to (3b) in the RFC, where a fixed-size type
  refers to separate storage where necessary.

- Type introspection for things like parsing format strings

  It sounded like the type descriptors would be fixed-sized types,
  a bit like a C version of std::type_info.

So I didn't see anything there that was really related, or anything that
relied on sizeof being variable (which as I say seems to be a very high
hurdle for C++).

Also, I thought opinion was turning/had turned against using what are
effectively unbounded allocas, so it would seem strange to spend a lot
of effort providing a more convenient wrapper for them.  (The SVE use
isn't unbounded because all sizeless objects are a X*VL+Y for constant
X and Y and bounded VL.)

>> ...and here is that any size changes come only from changes in the
>> implementation-defined built-in sizeless types.  The user can't define
>
> But then I think you still need to define in the standard edits something 
> of what the type-compatibility rules are.

I think it would look something like this (referring back to

    *Object types are further partitioned into sized and
    sizeless; all basic and derived types defined in this standard are
    sized, but an implementation may provide additional sizeless types.*

in the RFC), not really in standardese yet:

    Each implementation-specific sizeless type may have a set of
    implementation-specific "configurations".  The configuration of
    such a type may change in implementation-defined ways at any given
    sequence point.

    The configuration of a sizeless structure is a tuple containing the
    configuration of each member.  Thus the configuration of a sizeless
    structure changes if and only if the configuration of one of its
    members changes.

    The configuration of an object of sizeless type T is the configuration
    of T at the point that the object is created.

And then borrowing slightly from your 6.7.6.2#6 reference:

    If an object of sizeless type T is accessed when T has a different
    configuration from the object, the behavior is undefined.

Is that the kind of thing you mean?

>> > Can these types be passed to variadic functions and named in va_arg?  
>> > Again, I don't see anything to say they can't.
>> 
>> Yes, this is allowed (and covered by the tests FWIW).
>
> How does that work with not knowing the size even at runtime?
>
> At least, this seems like another place where there would be special type 
> compatibility considerations that need to be applied between caller and 
> callee.

Yeah, it requires the caller and callee to agree on what the type
represents (in SVE terms, to have the same vector length).

>>     Except for bit-fields *and sizeless structures*, objects are
>>     composed of contiguous sequences of one or more bytes, the number,
>>     order, and encoding of which are either explicitly specified or
>>     implementation-defined.
>> 
>> TBH the possibility of a discontiguous representation was an early idea
>> that we've never actually used so far, so if that's a problem, we could
>> probably drop it.  It just seemed to be a natural extension of the
>> principle that the layout is completely implementation-defined.
>
> If you have discontiguous representations, I don't see how "->" on 
> structure pointers (or indeed unary "*") is supposed to work;

The idea was that there would be some indirection under the covers
where necessary.  E.g. the data pointed to the pointer may have a
hidden field that gives the offset or pointer to other storage.

The specific reason for thinking that might be useful was that it would
allow the compiler to divide a frame into "VL data" and "constant-sized
data".  A sizeless structure could be in one but refer to data in the
other.  But as I say, we never actually used that.

> disallowing discontiguous representations would seem to fit a lot more
> naturally with the C object model.

OK, I'll take out the discontiguous part.

Thanks,
Richard

Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

Reply via email to