[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-11 Thread A. Wilcox via cfe-commits

https://github.com/awilfox created 
https://github.com/llvm/llvm-project/pull/111995

On big-endian systems, narrow casting will read the higher bits of the value.  
LazyOffsetPtr's `getAddressOfPointer` returns the address-of `Ptr` which was 
unconditionally a 64-bit value.  On 32-bit big endian systems, reading this 
value as a 32-bit pointer returns invalid data.

Fixes: bc73ef0031 ("PR60985: Fix merging of lambda closure types across 
modules.")
Closes: #111993

>From 3c9198421a44f32d00c2ebe4ee0bf5a6ec67e9ac Mon Sep 17 00:00:00 2001
From: "A. Wilcox" 
Date: Fri, 11 Oct 2024 07:54:43 -0500
Subject: [PATCH] [clang] LazyOffsetPtr: Use native pointer width

On big-endian systems, narrow casting will read the higher bits of the
value.  LazyOffsetPtr's `getAddressOfPointer` returns the address-of
`Ptr` which was unconditionally a 64-bit value.  On 32-bit big endian
systems, reading this value as a 32-bit pointer returns invalid data.

Fixes: bc73ef0031 ("PR60985: Fix merging of lambda closure types across 
modules.")
Closes: #111993
---
 clang/include/clang/AST/ExternalASTSource.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/clang/include/clang/AST/ExternalASTSource.h 
b/clang/include/clang/AST/ExternalASTSource.h
index 385c32edbae0fd..f5ce1a916fbee8 100644
--- a/clang/include/clang/AST/ExternalASTSource.h
+++ b/clang/include/clang/AST/ExternalASTSource.h
@@ -326,25 +326,25 @@ struct LazyOffsetPtr {
   ///
   /// If the low bit is clear, a pointer to the AST node. If the low
   /// bit is set, the upper 63 bits are the offset.
-  mutable uint64_t Ptr = 0;
+  mutable uintptr_t Ptr = 0;
 
 public:
   LazyOffsetPtr() = default;
-  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
+  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
 
-  explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) {
+assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable 
bits");
 if (Offset == 0)
   Ptr = 0;
   }
 
   LazyOffsetPtr &operator=(T *Ptr) {
-this->Ptr = reinterpret_cast(Ptr);
+this->Ptr = reinterpret_cast(Ptr);
 return *this;
   }
 
-  LazyOffsetPtr &operator=(uint64_t Offset) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  LazyOffsetPtr &operator=(uintptr_t Offset) {
+assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable 
bits");
 if (Offset == 0)
   Ptr = 0;
 else
@@ -375,7 +375,7 @@ struct LazyOffsetPtr {
 if (isOffset()) {
   assert(Source &&
  "Cannot deserialize a lazy pointer without an AST source");
-  Ptr = reinterpret_cast((Source->*Get)(OffsT(Ptr >> 1)));
+  Ptr = reinterpret_cast((Source->*Get)(OffsT(Ptr >> 1)));
 }
 return reinterpret_cast(Ptr);
   }

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-11 Thread A. Wilcox via cfe-commits

awilfox wrote:

This patch has been tested on 32-bit PowerPC, 64-bit PowerPC, and 64-bit x86_64 
(ensuring it doesn't change any behaviour on LE systems) - the entire Clang 
test suite passed on all three platforms.

https://github.com/llvm/llvm-project/pull/111995
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-11 Thread A. Wilcox via cfe-commits

awilfox wrote:

I feel like the messages being highlighted in the CI formatter need work 
anyway.  I was hoping the community would have a suggestion on better wording.  
However, I will be glad to reformat those lines in that manner if this wording 
should stay.

https://github.com/llvm/llvm-project/pull/111995
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Make LazyOffsetPtr more portable (PR #112927)

2024-10-18 Thread A. Wilcox via cfe-commits

awilfox wrote:

I'll run a full test on the ppc32 builder now and report back.  Thanks for this 
work.

https://github.com/llvm/llvm-project/pull/112927
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-16 Thread A. Wilcox via cfe-commits


@@ -326,25 +326,25 @@ struct LazyOffsetPtr {
   ///
   /// If the low bit is clear, a pointer to the AST node. If the low
   /// bit is set, the upper 63 bits are the offset.
-  mutable uint64_t Ptr = 0;
+  mutable uintptr_t Ptr = 0;
 
 public:
   LazyOffsetPtr() = default;
-  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
+  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
 
-  explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) {
+assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable 
bits");
 if (Offset == 0)
   Ptr = 0;
   }
 
   LazyOffsetPtr &operator=(T *Ptr) {
-this->Ptr = reinterpret_cast(Ptr);
+this->Ptr = reinterpret_cast(Ptr);
 return *this;
   }
 
-  LazyOffsetPtr &operator=(uint64_t Offset) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  LazyOffsetPtr &operator=(uintptr_t Offset) {

awilfox wrote:

Okay.  But it is still invalid to `reinterpret_cast` a `uint64_t` to a pointer 
on 32-bit BE systems, and that would still leave that as a segfault.

[C++23 draft 
N4950](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf), 
§7.6.1.10 number 5:
> A value of integral type or enumeration type can be explicitly converted to a 
> pointer. A pointer converted to an integer of sufficient size (if any such 
> exists on the implementation) and back to the same pointer type will have its 
> original value; mappings between pointers and integers are otherwise 
> implementation-defined.

uint64_t is not the sufficient size for a 32-bit BE system.  How should we 
proceed?

* I suppose a union could work, though it would probably be ugly.
* Alternatively, there could be two fields (a `uint64_t Offset` and a 
`uintptr_t Ptr`), but that would unconditionally add 8 bytes per 
`LazyOffsetPtr`.  I'm not completely familiar with how many live 
`LazyOffsetPtr` objects might be present in a single invocation of Clang and 
how that would affect memory usage.
* If there is a different way, I'd be happy to implement that as well.

https://github.com/llvm/llvm-project/pull/111995
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-16 Thread A. Wilcox via cfe-commits

awilfox wrote:

> Btw, if you submit a fix, it would be great if it could be backported to the 
> 18 and 19 branches.

I think the 18 branch may already be closed.  This is what we are shipping in 
Adélie for 18: 
[big-endian-32.patch](https://github.com/user-attachments/files/17404400/big-endian-32.patch).

https://github.com/llvm/llvm-project/pull/111995
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)

2024-10-16 Thread A. Wilcox via cfe-commits


@@ -326,25 +326,25 @@ struct LazyOffsetPtr {
   ///
   /// If the low bit is clear, a pointer to the AST node. If the low
   /// bit is set, the upper 63 bits are the offset.
-  mutable uint64_t Ptr = 0;
+  mutable uintptr_t Ptr = 0;
 
 public:
   LazyOffsetPtr() = default;
-  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
+  explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {}
 
-  explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) {
+assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable 
bits");
 if (Offset == 0)
   Ptr = 0;
   }
 
   LazyOffsetPtr &operator=(T *Ptr) {
-this->Ptr = reinterpret_cast(Ptr);
+this->Ptr = reinterpret_cast(Ptr);
 return *this;
   }
 
-  LazyOffsetPtr &operator=(uint64_t Offset) {
-assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits");
+  LazyOffsetPtr &operator=(uintptr_t Offset) {

awilfox wrote:

How can the higher 32 bits be meaningful on a system whose pointers are 32 bits 
in length?  Put another way: in what situation is a 64-bit offset to a 32-bit 
pointer valid?

The original message warns about 63 bits because the last bit is needed to tag 
the value as not-a-pointer (and as an offset).  That doesn't mean values passed 
here *have* to be 63 bits in length, just that the (naive) implementation that 
hardcoded 64-bit size for pointers required the offset to fit in N-1 = 63 bits.

https://github.com/llvm/llvm-project/pull/111995
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] Remove type-punning in LazyOffsetPtr. (PR #112806)

2024-10-17 Thread A. Wilcox via cfe-commits

awilfox wrote:

> This violates aliasing rules and doesn't work at all on big-endian 64-bit 
> systems where the pointer is stored in the second four bytes of the uint64_t.

Your sizes aren't correct in the description here.  This issue occurs on big 
endian 32-bit systems, and the pointer is stored in the lower bits, which are 
lost because of using address-of when turning the 64-bit number into a pointer. 
 i.e. storing 0x12345678 yields:

```
|---|
|  |  | 1234 | 5678 |
|---|
```

If you just cast down to a `uintptr_t` (or such), you get the correct value.  
But if you point to the *address of* this 64-bit value, and access a 32-bit 
value, you receive the `'` instead of `1234'5678`.

https://github.com/llvm/llvm-project/pull/112806
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits