[llvm-bugs] [Bug 92081] llvm-cxxfilt assumes that the host's representation for floating point types matches the target's

LLVM Bugs via llvm-bugs Tue, 14 May 2024 01:27:58 -0700

Issue	92081
Summary	llvm-cxxfilt assumes that the host's representation for floating point types matches the target's
Labels	tools:llvm-cxxfilt
Assignees
Reporter	bd1976bris

    **Preamble:**
Consider the following code:

```
namespace cxx20 {
  template<auto> struct A {};
  void f(A<1.0l>) {}
};
```

For the above Clang mangles `cxx20::f(cxx20::A<0x8p-3L>)` mangles as `_ZN5cxx201fENS_1AILe3fff8000000000000000EEE`. Note that GCC mangles as `_ZN5cxx201fENS_1AILe0000000000003fff8000000000000000EEE` - the leading zeros are apparently a benign difference.

The `1AILe3fff8000000000000000E` part (for `A < 1.0l >`) is mangled as:

```
1AI L e 3fff8000000000000000 E
¦ ¦ ¦ ¦ ¦
name literal long double hexadecimal string literal-bookend
```

Where the hexadecimal string is the in memory bytes on the target. Quoting from the Itanium-ABI:

> Floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example: "Lf bf800000 E" is -1.0f on platforms conforming to IEEE 754.

Clang uses 20 hex characters to encode a long double on most Itanium-ABI targets including PS5 (long double is implemented as 80-bit extended precision).

**Problem:**
With host = windows (long double is an alias for double) and target = PS5 (long double is 80-bit extended precision) ASAN reports a stack-buffer-overflow when running `llvm-cxxfilt.exe _ZN5cxx201fENS_1AILe3fff8000000000000000EEE`.

This occurs because the demangler code assumes that the representation of a floating point number on the target matches the representation on the host. See: https://github.com/llvm/llvm-project/blob/023cdfcc1a5bdef7f12bb6da9328f93b477c38b8/llvm/include/llvm/Demangle/ItaniumDemangle.h#L2558 However, Visual Studio on the windows host implements long double as synonym for double. Therefore, there isn't enough space to unpack into and the implementation overflows the 8 bytes for a long double and triggers the ASAN fault. Without ASAN, the number is decoded incorrectly. Similar problems will affect other cross-compiler demangling scenarios where there is a difference in the floating point representation between the target and host.

**Ideas for fixes:**
We could simply print the hexadecimal string from the mangled name, this appears to be what GNU implements: GNU cxxfilt demangles `_ZN5cxx201fENS_1AILe3fff8000000000000000EEE` as `cxx20::f(cxx20::A<(long double)[3fff8000000000000000]>)`.
If we just printed the mangled hexadecimal string then that would also remove the non-functional differences between the Windows and Linux output with cxxfilt for floating point literals, due to snprintf differences on different platforms.

We could use a target/host agnostic floating point decoder e.g. ADT/APFloat - which could make some reasonable assumptions e.g. IEEE 754 representation. We might also provide a way of specifying the target for llvm-cxxfilt.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 92081] llvm-cxxfilt assumes that the host's representation for floating point types matches the target's

Reply via email to