nand added a comment.

Thanks for the comments! I tried to clarify what could be done in the future 
and what is already supported.



================
Comment at: clang/docs/ConstantInterpreter.rst:174
 
   A block containing a primitive reserved storage only for the primitive.
 
----------------
rsmith wrote:
> rsmith wrote:
> > reserved -> reserves
> > 
> > How do you distinguish between initialized and 
> > in-lifetime-but-uninitialized primitives? Eg:
> > 
> > ```
> > constexpr int f() {
> >   int a;
> >   return a; // error, a is in-lifetime but uninitialized
> > }
> > ```
> How do you represent the result of casting a pointer to an integer (which we 
> permit only when constant-folding, but nonetheless we do permit)?
Lifetime is handled by invalidating pointers, whereas descriptors and inline 
descriptors keep track of initialised bits. 

In this particular case, which is not yet supported since the interpreter 
presently expects all variables to be initialised, the byte code generator will 
have to emit an opcode that fetches the local and checks the initialised bit. 
Since currently all locals are assumed to be initialised, the opcode for 
fetching a local does not perform this check. For primitives, this bit will 
probably be part of the block.


================
Comment at: clang/docs/ConstantInterpreter.rst:174
 
   A block containing a primitive reserved storage only for the primitive.
 
----------------
nand wrote:
> rsmith wrote:
> > rsmith wrote:
> > > reserved -> reserves
> > > 
> > > How do you distinguish between initialized and 
> > > in-lifetime-but-uninitialized primitives? Eg:
> > > 
> > > ```
> > > constexpr int f() {
> > >   int a;
> > >   return a; // error, a is in-lifetime but uninitialized
> > > }
> > > ```
> > How do you represent the result of casting a pointer to an integer (which 
> > we permit only when constant-folding, but nonetheless we do permit)?
> Lifetime is handled by invalidating pointers, whereas descriptors and inline 
> descriptors keep track of initialised bits. 
> 
> In this particular case, which is not yet supported since the interpreter 
> presently expects all variables to be initialised, the byte code generator 
> will have to emit an opcode that fetches the local and checks the initialised 
> bit. Since currently all locals are assumed to be initialised, the opcode for 
> fetching a local does not perform this check. For primitives, this bit will 
> probably be part of the block.
This is still on the TODO list.

First of all, there should be a fast path which detects common pointer -> int 
-> pointer conversions and avoids all cast, creating opcodes to emit 
diagnostics.

For the generic case, codegen for pointer-to-int will fail with a note and 
compilation will be re-attempted, classifying pointer-wide integers and 
pointers as a primitive type capable of holding either a pointer or an integer.


================
Comment at: clang/docs/ConstantInterpreter.rst:182
+  initialised, while a value of ``(InitMap*)-1`` indicates that the object was 
+  fully initialised. when all fields are initialised, the map is deallocated 
+  and replaced with that token.
----------------
rsmith wrote:
> when -> When
> 
> The overwhelmingly common case is the set of initialized elements is [0, 1, 
> ..., K) for some K. Have you considered instead storing this value as a union 
> of an `InitMap*` and an integer, using the bottom bit to indicate which case 
> we're in? (We don't need to allocate the map at all except in weird cases 
> where someone makes a hole in the array through a pseudo-destructor or 
> allocates out-of-order with `construct_at` or similar.)
> 
> How do you distinguish between in-lifetime-but-uninitialized elements and 
> out-of-lifetime elements? For example:
> 
> ```
> using T = int;
> constexpr int f(bool b) {
>   int arr[5];
>   arr[3].~T();
>   arr[0] = 1; // ok, uninitialized -> initialized
>   if (!b)
>     arr[3] = 1; // error, arr[3] is not in lifetime
>   else
>     std::construct_at(arr + 3, 0); // ok, not in lifetime -> in lifetime and 
> initialized
>   return arr[3];
> }
> ```
> 
> Maybe we should use two bits per primitive in the `InitMap` case and store 
> both "initialized" and "in-lifetime"?
This is a great suggestion - I think the initmap could also be a rolling 
counter. 

I was not aware of the case that you have mentioned - destructors do pose a 
problem and another map will be required.


================
Comment at: clang/docs/ConstantInterpreter.rst:194-195
+Records are laid out identically to arrays of composites: each field and base 
+class is preceded by an inline descriptor. The ``InlineDescriptor`` 
+has the following field:
 
----------------
rsmith wrote:
> field -> fields
> 
> From the description below, it looks like `sizeof(InlineDescriptor)` is 
> currently 16. That seems unnecessary: We could easily get this down to 8 
> bytes by bit-packing the offset and the flags. (Restricting ourselves to 2^59 
> bytes for each record seems unproblematic.) I suspect it might even be OK to 
> pack this into 4 bytes; that'd still allow us to support objects whose 
> representations are up to 128 MiB -- though perhaps that's getting a little 
> too close to territory that programs might actually want to enter.
Descriptors could be compacted - for now, I have been focusing on feature 
completeness and tried to avoid performance optimisations.


================
Comment at: clang/docs/ConstantInterpreter.rst:201
+ * **IsInitialized**: flag indicating whether the field or element was 
+ initialized. For non-primitive fields, this is only relevant for base classes.
+ * **IsBase**: flag indicating whether the record is a base class. In that 
case, 
----------------
rsmith wrote:
> It's not completely clear to me what this would mean for a base class. Is the 
> idea that this tracks whether base classes are in their period of 
> construction / destruction, so that you can determine the dynamic type of the 
> object?
Yes, updated the docs.


================
Comment at: clang/docs/ConstantInterpreter.rst:204
+ the offset can be used to identify the derived class.
  * **IsActive**: indicates if the field is the active field of a union.
  * **IsMutable**: indicates if the field is marked as mutable.
----------------
rsmith wrote:
> I assume this is actually more generally tracking whether the subobject is 
> within its lifetime?
It's only for active union fields - maybe it can be merged with the initialised 
bit later on.


================
Comment at: clang/docs/ConstantInterpreter.rst:223-224
+ * **TargetPointer**: represents a target address derived from a base address 
+ through pointer arithmetic, such as ``((int *)0x100)[20]``. Null pointers are 
+ target pointers with a zero offset.
+ * **TypeInfoPointer**: tracks information for the opaque type returned by 
----------------
rsmith wrote:
> This seems problematic -- we need to distinguish between pointers with value 
> zero and null pointers to support targets where they are different. Perhaps 
> we could instead represent null pointers as a new tag in the tagged union, 
> and reserve a zero `TargetPointer` for only those cases where a pointer is 
> zero but not null? (See `APValue::isNullPointer()` and `LValue::IsNullPtr` 
> for how we track that in the old constant evaluator.)
Hmm, I was not aware of this - another item could be added to the union without 
too much problem. Do you have an example of such a target?


================
Comment at: clang/docs/ConstantInterpreter.rst:329-334
+are characterised by 3 offsets: a field offset, an array offset and a 
+base offset, along with a descriptor specifying the type the pointer is 
+supposed to refer to. Array indexing ajusts the array offset, while the 
+field offset is adjusted when a pointer to a member is created. Casting 
+an integer to a pointer sets the value of the base offset. As a special 
+case, null pointers are target pointers with all offets set to 0.
----------------
rsmith wrote:
> Why do we need all three of these? If this is replacing the `APValue` forms 
> with a null base and only an offset, a single value seems sufficient.
Required for compatibility with the atIndex() method, which is used by the 
opcode computing array offsets.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75726/new/

https://reviews.llvm.org/D75726



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
  • [PATCH] D75726: [... Richard Smith - zygoloid via Phabricator via cfe-commits
    • [PATCH] D757... Nandor Licker via Phabricator via cfe-commits
    • [PATCH] D757... Nandor Licker via Phabricator via cfe-commits
    • [PATCH] D757... Richard Smith - zygoloid via Phabricator via cfe-commits
    • [PATCH] D757... Nandor Licker via Phabricator via cfe-commits
    • [PATCH] D757... Nandor Licker via Phabricator via cfe-commits

Reply via email to