Joseph Myers <jos...@codesourcery.com> writes: > On Wed, 17 Oct 2018, Richard Sandiford wrote: > >> > But as shown in the related discussions, there are other possible features >> > that might also involve non-VLA types whose size is not a compile-time >> > constant. And so it's necessary to work with the people interested in >> > those features in order to clarify what the underlying concepts ought to >> > look like to support different such features. >> >> Could you give pointers to the specific proposals/papers you mean? > > They're generally reflector discussions rather than written up as papers, > exploring the space of problems and solutions in various areas (including > bignums and runtime introspection of types). I think the first message in > those discussions is number 15529 > <http://www.open-std.org/jtc1/sc22/wg14/15529> and then relevant > discussions continue for much of the next 200 messages or so.
OK, thanks. I've read from there to the latest message at the time of writing (15720). There seemed to be various ideas: - a new int128_t, which started the discussion off. - support for parameterised fixed-size integers like _Int(40), which seemed to be a C version of C++ template<int> and wouldn't need variable-length types. - bignums that extend as necessary. On that I agree with what you said in: <http://www.open-std.org/jtc1/sc22/wg14/15572> A bignum type, in the sense of one that grows its storage if you store a too-big number in it (as opposed to fixed-width int<N> where you can specify an arbitrary integer constant expression for N), cannot meet other requirements for C integer types such as being directly represented in binary - it has to, effectively, be a fixed size but contain a pointer to allocated storage (and then there are considerations of how such a type should handle errors for allocation failure). and Hans Boehm said in: <http://www.open-std.org/jtc1/sc22/wg14/15573> 2) Provide an integral type that is reasonably efficient for small integers, but gracefully overflows to something along the lines of (1). A common way to do that in other languages is to represent e.g. 63-bit integers directly by adding a zero bit on the right. On overflow a more complex result is represented by e.g. a 64-bit aligned pointer with the low bit set to one. That way integer addition is just an add instruction followed by an overflow check in the normal case. Probably a better way to do integer arithmetic in many, maybe even most, cases. Especially since such integers need to be usable as array elements, I don't see how to avoid memory allocation under the covers, along the slow path. This IIRC is how LLVM's APInt is implemented. It doesn't need variable-length types, and although it would need some kind of memory management support for C, it doesn't need any language changes at all for C++. It's also similar to what GCC does with auto_vec<T, N> and LLVM does with SmallVector: the types have embedded room for common cases and fall back to separately-allocated storage if the contents get too big. There was talk about having it as a true variable-length type in: <http://www.open-std.org/jtc1/sc22/wg14/15577> (2) is difficult because of the requirements for memory management and the necessity to deal with allocation failures. For avoiding integer overflow vulnerabilities, there is a variant of (2) which is not possible to implement in a library, where expressions are evaluated with a sufficient number of bits to obtain the mathematically correct result. GNAT has implemented something in this direction (MINIMIZED and ELIMINATED): <https://gcc.gnu.org/onlinedocs/gnat_ugn/Management-of-Overflows-in-GNAT.html#Management-of-Overflows-in-GNAT> I think that for expressions which do not involve shifts by non-constants, it should be possible to determine the required storage at compile time, so it would avoid the memory allocation issue. Unlike Ada, C doesn't have a power operator, so the storage requirements would grow with the size of the expression (still under the assumption that left shifts are excluded). But AIUI that was intended to be more special purpose, for intermediate results while evaluating an expression. It solves the memory allocation issue because the (stack) memory used for evaluating the expression could be recovered after evaluation is complete. This approach wouldn't work if it was extended to an assignable bignum object type. E.g. prohibiting left shifts wouldn't then help since: bignum x = ...; x <<= var; // invalid would be equivalent to: bignum x = ...; for (int i = 0; i < var; ++i) x += x; // valid Thus it would be easy to create what are effectively allocas of O(1<<var) bytes for some variable var. And if the memory was always allocated on the stack, it would be hard to recover memory from discarded objects until the function returns. Hans went on to say: I personally think that, especially in light of various integer overflow vulnerabilities, (2) would be really nice to have. I unfortunately haven't had time to follow the WG21 bignum discussion on this very closely. But my impression is that they're aiming to enable (2). So it sounds like bignums are being solved on the C++ side at least without having to add true variable-length types. FWIW, this corresponds to (3b) in the RFC, where a fixed-size type refers to separate storage where necessary. - Type introspection for things like parsing format strings It sounded like the type descriptors would be fixed-sized types, a bit like a C version of std::type_info. So I didn't see anything there that was really related, or anything that relied on sizeof being variable (which as I say seems to be a very high hurdle for C++). Also, I thought opinion was turning/had turned against using what are effectively unbounded allocas, so it would seem strange to spend a lot of effort providing a more convenient wrapper for them. (The SVE use isn't unbounded because all sizeless objects are a X*VL+Y for constant X and Y and bounded VL.) >> ...and here is that any size changes come only from changes in the >> implementation-defined built-in sizeless types. The user can't define > > But then I think you still need to define in the standard edits something > of what the type-compatibility rules are. I think it would look something like this (referring back to *Object types are further partitioned into sized and sizeless; all basic and derived types defined in this standard are sized, but an implementation may provide additional sizeless types.* in the RFC), not really in standardese yet: Each implementation-specific sizeless type may have a set of implementation-specific "configurations". The configuration of such a type may change in implementation-defined ways at any given sequence point. The configuration of a sizeless structure is a tuple containing the configuration of each member. Thus the configuration of a sizeless structure changes if and only if the configuration of one of its members changes. The configuration of an object of sizeless type T is the configuration of T at the point that the object is created. And then borrowing slightly from your 6.7.6.2#6 reference: If an object of sizeless type T is accessed when T has a different configuration from the object, the behavior is undefined. Is that the kind of thing you mean? >> > Can these types be passed to variadic functions and named in va_arg? >> > Again, I don't see anything to say they can't. >> >> Yes, this is allowed (and covered by the tests FWIW). > > How does that work with not knowing the size even at runtime? > > At least, this seems like another place where there would be special type > compatibility considerations that need to be applied between caller and > callee. Yeah, it requires the caller and callee to agree on what the type represents (in SVE terms, to have the same vector length). >> Except for bit-fields *and sizeless structures*, objects are >> composed of contiguous sequences of one or more bytes, the number, >> order, and encoding of which are either explicitly specified or >> implementation-defined. >> >> TBH the possibility of a discontiguous representation was an early idea >> that we've never actually used so far, so if that's a problem, we could >> probably drop it. It just seemed to be a natural extension of the >> principle that the layout is completely implementation-defined. > > If you have discontiguous representations, I don't see how "->" on > structure pointers (or indeed unary "*") is supposed to work; The idea was that there would be some indirection under the covers where necessary. E.g. the data pointed to the pointer may have a hidden field that gives the offset or pointer to other storage. The specific reason for thinking that might be useful was that it would allow the compiler to divide a frame into "VL data" and "constant-sized data". A sizeless structure could be in one but refer to data in the other. But as I say, we never actually used that. > disallowing discontiguous representations would seem to fit a lot more > naturally with the C object model. OK, I'll take out the discontiguous part. Thanks, Richard