rsmith added inline comments.
================ Comment at: clang/docs/ConstantInterpreter.rst:118 Memory management in the interpreter relies on 3 data structures: ``Block`` -object which store the data and associated inline metadata, ``Pointer`` objects -which refer to or into blocks, and ``Descriptor`` structures which describe -blocks and subobjects nested inside blocks. +object which store the data and associated inline metadata, ``Pointer`` +objects which refer to or into blocks, and ``Descriptor`` structures which ---------------- object -> objects ================ Comment at: clang/docs/ConstantInterpreter.rst:167 -Descriptor are generated at bytecode compilation time and contain information required to determine if a particular memory access is allowed in constexpr. Even though there is a single descriptor object, it encodes information for several kinds of objects: +Descriptor are generated at bytecode compilation time and contain information +required to determine if a particular memory access is allowed in constexpr. ---------------- Descriptor -> Descriptors ================ Comment at: clang/docs/ConstantInterpreter.rst:169-170 +required to determine if a particular memory access is allowed in constexpr. +Even though there is a single descriptor object, it encodes information for +several kinds of objects: ---------------- This sounds like this is talking about the memory layout of descriptors, but I think the text below is actually talking about the memory layout of blocks? ================ Comment at: clang/docs/ConstantInterpreter.rst:174 A block containing a primitive reserved storage only for the primitive. ---------------- reserved -> reserves How do you distinguish between initialized and in-lifetime-but-uninitialized primitives? Eg: ``` constexpr int f() { int a; return a; // error, a is in-lifetime but uninitialized } ``` ================ Comment at: clang/docs/ConstantInterpreter.rst:174 A block containing a primitive reserved storage only for the primitive. ---------------- rsmith wrote: > reserved -> reserves > > How do you distinguish between initialized and in-lifetime-but-uninitialized > primitives? Eg: > > ``` > constexpr int f() { > int a; > return a; // error, a is in-lifetime but uninitialized > } > ``` How do you represent the result of casting a pointer to an integer (which we permit only when constant-folding, but nonetheless we do permit)? ================ Comment at: clang/docs/ConstantInterpreter.rst:182 + initialised, while a value of ``(InitMap*)-1`` indicates that the object was + fully initialised. when all fields are initialised, the map is deallocated + and replaced with that token. ---------------- when -> When The overwhelmingly common case is the set of initialized elements is [0, 1, ..., K) for some K. Have you considered instead storing this value as a union of an `InitMap*` and an integer, using the bottom bit to indicate which case we're in? (We don't need to allocate the map at all except in weird cases where someone makes a hole in the array through a pseudo-destructor or allocates out-of-order with `construct_at` or similar.) How do you distinguish between in-lifetime-but-uninitialized elements and out-of-lifetime elements? For example: ``` using T = int; constexpr int f(bool b) { int arr[5]; arr[3].~T(); arr[0] = 1; // ok, uninitialized -> initialized if (!b) arr[3] = 1; // error, arr[3] is not in lifetime else std::construct_at(arr + 3, 0); // ok, not in lifetime -> in lifetime and initialized return arr[3]; } ``` Maybe we should use two bits per primitive in the `InitMap` case and store both "initialized" and "in-lifetime"? ================ Comment at: clang/docs/ConstantInterpreter.rst:194-195 +Records are laid out identically to arrays of composites: each field and base +class is preceded by an inline descriptor. The ``InlineDescriptor`` +has the following field: ---------------- field -> fields From the description below, it looks like `sizeof(InlineDescriptor)` is currently 16. That seems unnecessary: We could easily get this down to 8 bytes by bit-packing the offset and the flags. (Restricting ourselves to 2^59 bytes for each record seems unproblematic.) I suspect it might even be OK to pack this into 4 bytes; that'd still allow us to support objects whose representations are up to 128 MiB -- though perhaps that's getting a little too close to territory that programs might actually want to enter. ================ Comment at: clang/docs/ConstantInterpreter.rst:201 + * **IsInitialized**: flag indicating whether the field or element was + initialized. For non-primitive fields, this is only relevant for base classes. + * **IsBase**: flag indicating whether the record is a base class. In that case, ---------------- It's not completely clear to me what this would mean for a base class. Is the idea that this tracks whether base classes are in their period of construction / destruction, so that you can determine the dynamic type of the object? ================ Comment at: clang/docs/ConstantInterpreter.rst:204 + the offset can be used to identify the derived class. * **IsActive**: indicates if the field is the active field of a union. * **IsMutable**: indicates if the field is marked as mutable. ---------------- I assume this is actually more generally tracking whether the subobject is within its lifetime? ================ Comment at: clang/docs/ConstantInterpreter.rst:223-224 + * **TargetPointer**: represents a target address derived from a base address + through pointer arithmetic, such as ``((int *)0x100)[20]``. Null pointers are + target pointers with a zero offset. + * **TypeInfoPointer**: tracks information for the opaque type returned by ---------------- This seems problematic -- we need to distinguish between pointers with value zero and null pointers to support targets where they are different. Perhaps we could instead represent null pointers as a new tag in the tagged union, and reserve a zero `TargetPointer` for only those cases where a pointer is zero but not null? (See `APValue::isNullPointer()` and `LValue::IsNullPtr` for how we track that in the old constant evaluator.) ================ Comment at: clang/docs/ConstantInterpreter.rst:231-232 + +Besides the previously mentioned union, a number of other pointers have +their own type: + ---------------- I would say "pointers and pointer-like types", since at least member pointers are not actually a kind of pointer. ================ Comment at: clang/docs/ConstantInterpreter.rst:328 +banned in constexpr, some expressions on target offsets must be folded, +replicating the behavious of the ``offsetof`` builtin. Target pointers +are characterised by 3 offsets: a field offset, an array offset and a ---------------- behavious -> behaviour ================ Comment at: clang/docs/ConstantInterpreter.rst:329-334 +are characterised by 3 offsets: a field offset, an array offset and a +base offset, along with a descriptor specifying the type the pointer is +supposed to refer to. Array indexing ajusts the array offset, while the +field offset is adjusted when a pointer to a member is created. Casting +an integer to a pointer sets the value of the base offset. As a special +case, null pointers are target pointers with all offets set to 0. ---------------- Why do we need all three of these? If this is replacing the `APValue` forms with a null base and only an offset, a single value seems sufficient. ================ Comment at: clang/docs/ConstantInterpreter.rst:331 +base offset, along with a descriptor specifying the type the pointer is +supposed to refer to. Array indexing ajusts the array offset, while the +field offset is adjusted when a pointer to a member is created. Casting ---------------- ajusts -> adjusts Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D75726/new/ https://reviews.llvm.org/D75726 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits