https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83029
Bug ID: 83029 Summary: Memory leaks due to leaked thread local variable Product: gcc Version: 6.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: andreas.longva_fraunhofer at yahoo dot com Target Milestone: --- GCC version: 6.4.0, but I've been able to reproduce this also in 4.9.1 and 6.3.0. System: Linux x86-64, Scientific Linux 7.4 I've attached two minimal test cases (preprocessed) which showcase the behavior I am about to describe: main.cpp/main.ii: - This is the minimal test case which is closest in spirit to our original, far more complex code. - Compiler invocation: g++ -v -save-temps -std=c++11 -pthread -g -O0 main.cpp -Wall -Wextra main2.cpp/main2.ii: - This is a minimal test case which is intended to show that destructors are not being properly called in some scenarios. - Compiler invocation: g++ -v -save-temps -o main2.out -std=c++11 -pthread -g -O0 main2.cpp -Wall -Wextra Note that after discussing with redi and others on #gcc (freenode), it appears that the version of glibc is relevant for reproducing the bug (it appears that glibc >= 2.18 is probably not affected. My understanding is that for glibc < 2.18, gcc falls back to a fallback implementation for cleaning up thread local variables), and so the environment is crucial for reproduction. Here's the valgrind output for the binary file produced by main.cpp: $ valgrind --leak_check=full ./ ==32488== Memcheck, a memory error detector ==32488== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==32488== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==32488== Command: ./main.out ==32488== ==32488== ==32488== HEAP SUMMARY: ==32488== in use at exit: 4,120 bytes in 2 blocks ==32488== total heap usage: 14 allocs, 12 frees, 81,552 bytes allocated ==32488== ==32488== 24 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==32488== at 0x4C2A456: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:377) ==32488== by 0x4EC2AA5: __cxa_thread_atexit (atexit_thread.cc:142) ==32488== by 0x401208: __tls_init (main.cpp:6) ==32488== by 0x4013E9: TLS wrapper function for vec[abi:cxx11] (in /home/alongva/test/main.out) ==32488== by 0x40137D: Pool::Pool()::{lambda()#1}::operator()() const (main.cpp:18) ==32488== by 0x402635: void std::_Bind_simple<Pool::Pool()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1391) ==32488== by 0x4025D2: std::_Bind_simple<Pool::Pool()::{lambda()#1} ()>::operator()() (functional:1380) ==32488== by 0x40256D: std::thread::_State_impl<std::_Bind_simple<Pool::Pool()::{lambda()#1} ()> >::_M_run() (thread:197) ==32488== by 0x4EEF40E: execute_native_thread_routine (thread.cc:83) ==32488== by 0x56D6E24: start_thread (in /usr/lib64/libpthread-2.17.so) ==32488== by 0x59E334C: clone (in /usr/lib64/libc-2.17.so) ==32488== ==32488== 4,096 bytes in 1 blocks are definitely lost in loss record 2 of 2 ==32488== at 0x4C2A203: operator new(unsigned long) (vg_replace_malloc.c:334) ==32488== by 0x4022A5: __gnu_cxx::new_allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::allocate(unsigned long, void const*) (new_allocator.h:104) ==32488== by 0x40219E: std::allocator_traits<std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::allocate(std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >&, unsigned long) (alloc_traits.h:436) ==32488== by 0x402017: std::_Vector_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_allocate(unsigned long) (stl_vector.h:170) ==32488== by 0x401A5D: void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_emplace_back_aux<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (vector.tcc:412) ==32488== by 0x401722: void std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::emplace_back<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (vector.tcc:101) ==32488== by 0x40155F: std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::push_back(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&) (stl_vector.h:933) ==32488== by 0x40138F: Pool::Pool()::{lambda()#1}::operator()() const (main.cpp:18) ==32488== by 0x402635: void std::_Bind_simple<Pool::Pool()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1391) ==32488== by 0x4025D2: std::_Bind_simple<Pool::Pool()::{lambda()#1} ()>::operator()() (functional:1380) ==32488== by 0x40256D: std::thread::_State_impl<std::_Bind_simple<Pool::Pool()::{lambda()#1} ()> >::_M_run() (thread:197) ==32488== by 0x4EEF40E: execute_native_thread_routine (thread.cc:83) ==32488== ==32488== LEAK SUMMARY: ==32488== definitely lost: 4,120 bytes in 2 blocks ==32488== indirectly lost: 0 bytes in 0 blocks ==32488== possibly lost: 0 bytes in 0 blocks ==32488== still reachable: 0 bytes in 0 blocks ==32488== suppressed: 0 bytes in 0 blocks ==32488== ==32488== For counts of detected and suppressed errors, rerun with: -v ==32488== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0) And here is the output from the second binary (running valgrind on it would also show memory leaks): $ ./main2.out Constructing printer. Constructing printer. Printing something... Destroying printer... Printing something... As seen from above, it is suggested that two thread_local Printer instances are instantiated, but only one is destroyed (unless std::puts is not reliable in this instance). Note that the behavior is also not entirely deterministic: sometimes the binary runs without leaks, and in the first case, if one reduces the loop counter from `i < 100` to say `i < 1`, it only sometimes triggers the leak as reported by valgrind.