[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
https://github.com/awilfox created https://github.com/llvm/llvm-project/pull/111995 On big-endian systems, narrow casting will read the higher bits of the value. LazyOffsetPtr's `getAddressOfPointer` returns the address-of `Ptr` which was unconditionally a 64-bit value. On 32-bit big endian systems, reading this value as a 32-bit pointer returns invalid data. Fixes: bc73ef0031 ("PR60985: Fix merging of lambda closure types across modules.") Closes: #111993 >From 3c9198421a44f32d00c2ebe4ee0bf5a6ec67e9ac Mon Sep 17 00:00:00 2001 From: "A. Wilcox" Date: Fri, 11 Oct 2024 07:54:43 -0500 Subject: [PATCH] [clang] LazyOffsetPtr: Use native pointer width On big-endian systems, narrow casting will read the higher bits of the value. LazyOffsetPtr's `getAddressOfPointer` returns the address-of `Ptr` which was unconditionally a 64-bit value. On 32-bit big endian systems, reading this value as a 32-bit pointer returns invalid data. Fixes: bc73ef0031 ("PR60985: Fix merging of lambda closure types across modules.") Closes: #111993 --- clang/include/clang/AST/ExternalASTSource.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/clang/include/clang/AST/ExternalASTSource.h b/clang/include/clang/AST/ExternalASTSource.h index 385c32edbae0fd..f5ce1a916fbee8 100644 --- a/clang/include/clang/AST/ExternalASTSource.h +++ b/clang/include/clang/AST/ExternalASTSource.h @@ -326,25 +326,25 @@ struct LazyOffsetPtr { /// /// If the low bit is clear, a pointer to the AST node. If the low /// bit is set, the upper 63 bits are the offset. - mutable uint64_t Ptr = 0; + mutable uintptr_t Ptr = 0; public: LazyOffsetPtr() = default; - explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} + explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} - explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) { +assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable bits"); if (Offset == 0) Ptr = 0; } LazyOffsetPtr &operator=(T *Ptr) { -this->Ptr = reinterpret_cast(Ptr); +this->Ptr = reinterpret_cast(Ptr); return *this; } - LazyOffsetPtr &operator=(uint64_t Offset) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + LazyOffsetPtr &operator=(uintptr_t Offset) { +assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable bits"); if (Offset == 0) Ptr = 0; else @@ -375,7 +375,7 @@ struct LazyOffsetPtr { if (isOffset()) { assert(Source && "Cannot deserialize a lazy pointer without an AST source"); - Ptr = reinterpret_cast((Source->*Get)(OffsT(Ptr >> 1))); + Ptr = reinterpret_cast((Source->*Get)(OffsT(Ptr >> 1))); } return reinterpret_cast(Ptr); } ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
awilfox wrote: This patch has been tested on 32-bit PowerPC, 64-bit PowerPC, and 64-bit x86_64 (ensuring it doesn't change any behaviour on LE systems) - the entire Clang test suite passed on all three platforms. https://github.com/llvm/llvm-project/pull/111995 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
awilfox wrote: I feel like the messages being highlighted in the CI formatter need work anyway. I was hoping the community would have a suggestion on better wording. However, I will be glad to reformat those lines in that manner if this wording should stay. https://github.com/llvm/llvm-project/pull/111995 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Make LazyOffsetPtr more portable (PR #112927)
awilfox wrote: I'll run a full test on the ppc32 builder now and report back. Thanks for this work. https://github.com/llvm/llvm-project/pull/112927 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
@@ -326,25 +326,25 @@ struct LazyOffsetPtr { /// /// If the low bit is clear, a pointer to the AST node. If the low /// bit is set, the upper 63 bits are the offset. - mutable uint64_t Ptr = 0; + mutable uintptr_t Ptr = 0; public: LazyOffsetPtr() = default; - explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} + explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} - explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) { +assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable bits"); if (Offset == 0) Ptr = 0; } LazyOffsetPtr &operator=(T *Ptr) { -this->Ptr = reinterpret_cast(Ptr); +this->Ptr = reinterpret_cast(Ptr); return *this; } - LazyOffsetPtr &operator=(uint64_t Offset) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + LazyOffsetPtr &operator=(uintptr_t Offset) { awilfox wrote: Okay. But it is still invalid to `reinterpret_cast` a `uint64_t` to a pointer on 32-bit BE systems, and that would still leave that as a segfault. [C++23 draft N4950](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf), §7.6.1.10 number 5: > A value of integral type or enumeration type can be explicitly converted to a > pointer. A pointer converted to an integer of sufficient size (if any such > exists on the implementation) and back to the same pointer type will have its > original value; mappings between pointers and integers are otherwise > implementation-defined. uint64_t is not the sufficient size for a 32-bit BE system. How should we proceed? * I suppose a union could work, though it would probably be ugly. * Alternatively, there could be two fields (a `uint64_t Offset` and a `uintptr_t Ptr`), but that would unconditionally add 8 bytes per `LazyOffsetPtr`. I'm not completely familiar with how many live `LazyOffsetPtr` objects might be present in a single invocation of Clang and how that would affect memory usage. * If there is a different way, I'd be happy to implement that as well. https://github.com/llvm/llvm-project/pull/111995 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
awilfox wrote: > Btw, if you submit a fix, it would be great if it could be backported to the > 18 and 19 branches. I think the 18 branch may already be closed. This is what we are shipping in Adélie for 18: [big-endian-32.patch](https://github.com/user-attachments/files/17404400/big-endian-32.patch). https://github.com/llvm/llvm-project/pull/111995 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] LazyOffsetPtr: Use native pointer width (PR #111995)
@@ -326,25 +326,25 @@ struct LazyOffsetPtr { /// /// If the low bit is clear, a pointer to the AST node. If the low /// bit is set, the upper 63 bits are the offset. - mutable uint64_t Ptr = 0; + mutable uintptr_t Ptr = 0; public: LazyOffsetPtr() = default; - explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} + explicit LazyOffsetPtr(T *Ptr) : Ptr(reinterpret_cast(Ptr)) {} - explicit LazyOffsetPtr(uint64_t Offset) : Ptr((Offset << 1) | 0x01) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + explicit LazyOffsetPtr(uintptr_t Offset) : Ptr((Offset << 1) | 0x01) { +assert((Offset << 1 >> 1) == Offset && "Offsets must fit in addressable bits"); if (Offset == 0) Ptr = 0; } LazyOffsetPtr &operator=(T *Ptr) { -this->Ptr = reinterpret_cast(Ptr); +this->Ptr = reinterpret_cast(Ptr); return *this; } - LazyOffsetPtr &operator=(uint64_t Offset) { -assert((Offset << 1 >> 1) == Offset && "Offsets must require < 63 bits"); + LazyOffsetPtr &operator=(uintptr_t Offset) { awilfox wrote: How can the higher 32 bits be meaningful on a system whose pointers are 32 bits in length? Put another way: in what situation is a 64-bit offset to a 32-bit pointer valid? The original message warns about 63 bits because the last bit is needed to tag the value as not-a-pointer (and as an offset). That doesn't mean values passed here *have* to be 63 bits in length, just that the (naive) implementation that hardcoded 64-bit size for pointers required the offset to fit in N-1 = 63 bits. https://github.com/llvm/llvm-project/pull/111995 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] Remove type-punning in LazyOffsetPtr. (PR #112806)
awilfox wrote: > This violates aliasing rules and doesn't work at all on big-endian 64-bit > systems where the pointer is stored in the second four bytes of the uint64_t. Your sizes aren't correct in the description here. This issue occurs on big endian 32-bit systems, and the pointer is stored in the lower bits, which are lost because of using address-of when turning the 64-bit number into a pointer. i.e. storing 0x12345678 yields: ``` |---| | | | 1234 | 5678 | |---| ``` If you just cast down to a `uintptr_t` (or such), you get the correct value. But if you point to the *address of* this 64-bit value, and access a 32-bit value, you receive the `'` instead of `1234'5678`. https://github.com/llvm/llvm-project/pull/112806 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits