[Bug c++/115049] New: Silent severe miscompilation around inline functions

manx-bugzilla at problemloesungsmaschine dot de via Gcc-bugs Sun, 12 May 2024 04:04:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115049


            Bug ID: 115049
           Summary: Silent severe miscompilation around inline functions
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manx-bugzilla at problemloesungsmaschine dot de
  Target Milestone: ---

Created attachment 58182
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58182&action=edit
all mentioned attachments

Hello GCC team!


I am seeing severe miscompilation around inline functions with different
results across different translation units, ultimately leading to application
crashes.

I am running
```
$ x86_64-w64-mingw32-g++  --version
x86_64-w64-mingw32-g++.exe (Rev2, Built by MSYS2 project) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```
as shipped by MSYS2, and testing with the MINGW64 toolchain targeting amd64. I
can also reproduce with UCRT64 amd64, but so far not with MINGW32 targeting
x86. This is reproducible locally and on GitHub CI, so no system-specific
problem.

To reproduce, we need 2 translation units:
file1.cpp:
```

#include <random>

#include <cstdint>

void Trigger1();

int main( int /*argc*/ , char * /*argv*/ [] ) {
        Trigger1();
        return 0;
}

static void Trigger2() {
        std::ranlux48 prng{1};
        std::uint16_t Data1 =
std::uniform_int_distribution<std::uint16_t>{}(prng);
        std::uint64_t Data2 =
std::uniform_int_distribution<std::uint64_t>{}(prng);
        static_cast<void>(Data1);
        static_cast<void>(Data2);
}

using my_dummy_function = void (*)();

struct myDummy3 {
        my_dummy_function func{nullptr};
        myDummy3(my_dummy_function f)
                : func(f)
        {
        }
};

myDummy3 dummy3{
        &Trigger2
};

```
file2.cpp:
```

#include <algorithm>
#include <memory>
#include <random>
#include <vector>

#include <cstddef>
#include <cstdint>

#include <stdio.h>

template <typename T, std::size_t required_entropy_bits>
inline T my_random(std::ranlux48 & rng) {
        using unsigned_T = T;
        const unsigned int rng_bits = 48;
        unsigned_T result = 0;
        for (std::size_t entropy = 0; entropy < std::min(required_entropy_bits,
sizeof(T) * 8); entropy += rng_bits) {
                if constexpr (rng_bits < (sizeof(T) * 8)) {
                        unsigned int shift_bits = rng_bits % (sizeof(T) * 8);
                        //fflush(stdout);  // <--- no crash
                        result = (result << shift_bits) ^
static_cast<unsigned_T>(rng());
                } else {
                        result = static_cast<unsigned_T>(rng());
                }
        }
        if constexpr (required_entropy_bits >= (sizeof(T) * 8)) {
                return static_cast<T>(result);
        } else {
                return static_cast<T>(result & ((static_cast<unsigned_T>(1) <<
required_entropy_bits) - static_cast<unsigned_T>(1)));
        }
}

class myDummy1 {
private:
        std::ranlux48 m_PRNG;
        std::uint16_t m_Trigger1a;
public:
        myDummy1(std::unique_ptr<std::mt19937> & rd)
                : m_PRNG(std::uniform_int_distribution<unsigned int>{}(*rd))
                , m_Trigger1a(m_PRNG())
        {
        }
};

void Trigger1();

void Trigger1() {
        {
                std::unique_ptr<std::mt19937> rd =
std::make_unique<std::mt19937>(1);
                myDummy1 dummy1(rd);
        }
        {
                std::ranlux48 * prng = new std::ranlux48(1);
                const unsigned int trigger1b_size = 32;
                std::vector<std::uint8_t> trigger1b(trigger1b_size, 0);
                for (unsigned int i = 0; i < trigger1b_size; i++) {
                        trigger1b[i] = my_random<std::uint8_t, 8>(*prng);
                }
                delete prng;
        }
        {
                std::ranlux48 prng{1};
                printf("%llu\n", static_cast<unsigned long
long>(my_random<std::uint64_t, 53>(prng))); fflush(stdout);
                printf("%llu\n", static_cast<unsigned long
long>(my_random<std::uint64_t, 53>(prng))); fflush(stdout);
                printf("%llu\n", static_cast<unsigned long
long>(my_random<std::uint64_t, 53>(prng))); fflush(stdout);  // <--- crash
                printf("%llu\n", static_cast<unsigned long
long>(my_random<std::uint64_t, 53>(prng))); fflush(stdout);
        }
}

```
and the compilation script:
compile.sh:
```
#!/usr/bin/env bash

x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX -c file1.cpp -o file1.o
x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX -c file2.cpp -o file2.o
x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX -c file2-working.cpp -o file2-working.o

x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX file1.o file2.o -latomic -lm -o test-broken.exe
x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX file1.o file2-working.o -latomic -lm -o
test-working.exe

cp file1.o file1-trimmed.o
objcopy --remove-section
'.text$_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv'
 file1-trimmed.o
objcopy --remove-section
'.xdata$_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv'
file1-trimmed.o
objcopy --remove-section
'.pdata$_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv'
file1-trimmed.o

x86_64-w64-mingw32-g++ -std=c++20 -fexceptions -frtti -mthreads -O2 -Wall
-Wextra -Wpedantic -DNOMINMAX file1-trimmed.o file2.o -latomic -lm -o
test-trimmed.exe

objdump -M intel
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
file1.o > file1.asm
objdump -M intel
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
file2.o > file2.asm
objdump -M intel
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
file2-working.o > file2-working.asm

objdump -M intel --no-addresses
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
test-broken.exe  > test-broken.asm
objdump -M intel --no-addresses
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
test-trimmed.exe > test-trimmed.asm
objdump -M intel --no-addresses
--disassemble=_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
test-working.exe > test-working.asm

objdump -D test-broken.exe > full-test-broken.asm
objdump -D test-trimmed.exe > full-test-trimmed.asm
objdump -D test-working.exe > full-test-working.asm

```

Now, when I run test-*.exe, I get:
```
$ ./test-broken.exe
3578273826077799
7151956549303778
Segmentation fault
```
```
$ ./test-trimmed.exe
3578273826077799
7151956549303778
8042221890545872
8274253793764351
```
```
$ ./test-working.exe
3578273826077799
7151956549303778
8042221890545872
8274253793764351
```

The crash happens in
`std::discard_block_engine<std::subtract_with_carry_engine<unsigned long long,
48ull, 5ull, 12ull>, 389ull, 11ull>::operator()() ()`.

file1.asm, file2.asm, file2-working.asm, test-broken.asm, test-trimmed.asm,
test-working.asm, full-test-broken.asm, full-test-trimmed.asm,
full-test-working.asm, as well as the files quoted above are all attached.

The code in file1.cpp:Trigger2() is never called. It however causes a template
instantiation of
_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv.
When I remove this particular function from the object file before linking (see
compile.sh, test-trimmed.exe), the linker will choose the version that got
generated in file2.o. The generated function in file2.o and file2-working.o is
identical. It is different from the version generated in file1.o but neither is
necessarily broken (it looks like the version in file2.o has a tail call to
std::subtract_with_carry_engine<unsigned long long, 48ull, 5ull,
12ull>::operator()() instead of inlining it).

So, when compiling file2.cpp, GCC has choosen a particular implementation of
_ZNSt20discard_block_engineISt26subtract_with_carry_engineIyLy48ELy5ELy12EELy389ELy11EEclEv
to attach the symbol to, *and* the calling code inside file2.cpp assumes that
it would be calling this particular copy, and only works in that case.

Uncommenting the line marked with "no crash" in file2.cpp (see
file2-working.cpp) makes the problem go away even though the generated function
itself does not change at all.

I have checked for Undefined Behaviour with ubsan and asan in the test case
with GCC 14 on godbolt (https://godbolt.org/z/PMExccxbv), but it does come out
clean. Reading the source of std::ranlux48 in libstdc++ also does not show any
obvious problem to me either.

If I had to guess: To me, this looks like GCC has maybe dragged some context of
a particular instantiation of a callee out into the caller, which would be an
invalid assumption/optimization unless guaranteed that this particular
instantiation gets called - which would be VERY BAD and could possibly break
all C++ code at random.

I currently can only trigger the problem with -O2 and only on MSYS2
MINGW64/URT64 amd64 so far, -O1 and -O3 work fine for me and for this
particular case.

Notice that I have not been able to reproduce the problem on Linux (yet),
however given the nature of the symptom I somewhat doubt that other platforms
would not be affected. GCC might make different optimization choices here due
to ABI differences.

We had triggered this problem in real world code in libopenmpt
(https://lib.openmpt.org/, https://github.com/OpenMPT/openmpt/) when suddenly
our unit tests (about ~3000 individual tests) started crashing on MSYS2 (after
their GCC 14 update) at 1 particular test case. Narrowing it down to a somewhat
grokable test case took about 20 hours and 3 persons.

As a countermeasure we for now had to force GCC 14 to down -O1 (and never use
-O2 or -O3) for our next release. It is not clear to us whether this drastic
work-around is even sufficient (we do have functions marked with
__attribute__((always_inline)))). Is there a less drastic and maybe more
specific work-around available?


Cheers, and have a nice day debugging this! :)

Jörn

[Bug c++/115049] New: Silent severe miscompilation around inline functions

Reply via email to