[Bug libstdc++/87071] New: libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 Bug ID: 87071 Summary: libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57 Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: virtuousfox at gmail dot com Target Milestone: --- Created attachment 44576 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44576&action=edit Xorg.pid-1158.gdb.log So I've updated the system on an old laptop after neglecting to do so for a long time and was met with inoperable X server. After sending a report to Mesa developers they insisted on that being an issue in libstdc++. Attached file contains full gdb dump made by automated script and filled with relevant debuginfo. Affected libstdc++ version are 8.1.1 from https://build.opensuse.org/package/show/openSUSE:Factory/gcc8 and 8.2.1 from https://build.opensuse.org/package/show/devel:gcc/gcc8 Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/8/lto-wrapper OFFLOAD_TARGET_NAMES=hsa:nvptx-none Target: x86_64-suse-linux Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,ada,go --enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver --enable-checking=release --disable-werror --with-gxx-include-dir=/usr/include/c++/8 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --enable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-8 --without-system-libunwind --enable-multilib --with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux Thread model: posix gcc version 8.2.1 20180817 [gcc-8-branch revision 263612] (SUSE Linux) cpuinfo from affected machine: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 104 model name : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-57 stepping: 2 cpu MHz : 800.000 cache size : 256 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall lbrv bugs: apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2 bogomips: 1596.01 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps processor : 1 … core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 … However, same versions run fine on machine with Radeon RX580 (radeonsi_dri) and processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 1 model name : AMD FX(tm)-6100 Six-Core Processor stepping: 2 microcode : 0x600063e cpu MHz : 3780.977 cache size : 2048 KB physical id : 0 siblings: 6 core id : 0 cpu cores : 3 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bugs: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips: 7919.04 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #1 from Sergey Kondakov --- Created attachment 44577 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44577&action=edit Asus_F3Ke.dmesg Verbose dmesg from affected machine.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #3 from Sergey Kondakov --- (In reply to Jonathan Wakely from comment #2) > (EE) Illegal instruction at address 0x72f2c8ea > > I don't see how this can possibly be a libstdc++ problem, since libstdc++ > doesn't produce any CPU instructions, illegal or not. > > As I already said to you, there's nothing we can do with this info. And Mesa devs said that it is. So everyone points fingers at each other in a circle and what am I supposed to do ? Program received signal SIGILL, Illegal instruction. 0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at /usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350 350 return static_cast(__builtin_memcpy(__s1, __s2, __n)); #0 0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at /usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350 InitializeLowerX86PassFlag = {_M_once = 0} SwrJit::intrinsicMap2[abi:cxx11] = {std::map with 0 elements, std::map with 0 elements, std::map with 0 elements} std::__ioinit = {static _S_refcount = 13, static _S_synced_with_stdio = true} SwrJit::DOUBLE = std::piecewise_construct = (anonymous namespace)::ForceMCJITLinking = SwrJit::intrinsicMap[abi:cxx11] = std::map with 0 elements SwrJit::LowerX86::ID = 0 '\000' Is include/c++/8/bits/char_traits.h not part of libstdc++ ? If your code is correct then whose isn't ?
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #7 from Sergey Kondakov --- Created attachment 44583 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44583&action=edit Xorg.pid-1381.gdb.log with disas (In reply to Alexander Monakov from comment #6) > In your gdb script, please add > > x/i $pc > disas > > after the backtrace command (probably 'bt full') and regenerate the log. > Without that the log doesn't actually show the "illegal" instruction. > Now, THAT's some real advice, thanks ! I've managed get that log but… > Also please show output of rpm -qf /usr/lib64/dri/r300_dri.so This, of course, points to Mesa-dri package BUT that gave me an idea to not rule out Mesa yet. And, indeed, it was the culprit all along, despite of what its dev have said. Or, more accurately, it was clang/LLVM… probably, I haven't properly checked yet. I've forked distro package (or, more precisely, its auto-build OBS scripts) of Mesa to build it with LTO and decrease its monstrous size. But it doesn't compile with gcc that way because of a long-standing bug and broken autotools scripts + OBS can't handle requirements of full LTO anyway. So I used clang & gold for building Mesa (and only it) with ThinLTO and it worked (and works, on newer PCs) fine. I spent days on figuring that out and hours on fighting package manager to selectively install "old", default Mesa package-set without affecting anything else. And it worked, crash is gone. Maybe it was combination of factors but I shouldn't have believed Mesa is irrelevant, as I was told, just because it's far from being the first in the chain of jumps of the backtrace. But then again, I don't have much clue about how it works. So, this issue can be closed UNLESS you think that backtrace shouldn't have lead to libstdc++ anyway and/or clang & Mesa couldn't have failed like that on their own. Normally, such suicidal code should not come out of a compiler with almost-default non-aggressive settings, wherever it may be. (In reply to Uroš Bizjak from comment #4) > (In reply to Sergey Kondakov from comment #3) > > > If your code is correct then whose isn't ? > Instructions are generated by the compiler. So, it is the compiler's fault, > it probably emits a SSE instruction that your processor doesn't understand. > > That said, at least we need a runtime testcase that fails on your target. If > this is not possible, then please at least decompile the library and show > the instruction that fails. We also need preprocessed source and exact > instructions, how to build the source, so the illegal insn will be generated. > > Also, please read https://gcc.gnu.org/bugs/ Oh, I've read that, all right. Full verbose (very, very verbose) build logs, self-tests, compilation scripts and built gcc/libstdc++ packages are in the OBS links in the original post, more precisely: https://build.opensuse.org/package/live_build_log/devel:gcc/gcc8/openSUSE_Factory/x86_64 https://build.opensuse.org/package/binaries/devel:gcc/gcc8/openSUSE_Factory (requires OBS registration to show download links) https://download.opensuse.org/repositories/devel:/gcc/openSUSE_Factory/ (does not require registration but page with massive package-listing halts browser and requires a lot of RAM to view) Except for https://build.opensuse.org/package/show/home:X0F:HSF/Mesa where my Mesa-dri:r300_dri.so is built. But asking to decompile one of the core system libraries or make an example, faulty program on a spot is like asking to decompile kernel or write a driver (actually, probably even worse): anyone capable of doing it in a day does not require anyone else's software-related assistance. (In reply to Jonathan Wakely from comment #5) > Right. As you can see in GDB, the libstdc++ code says: > > 350 return static_cast(__builtin_memcpy(__s1, __s2, > __n)); > > Do you see any processor instructions there? Anything that isn't valid on > your CPU? No, because it's just C++ code. I see a bunch of over-complicated gibberish starting with libstdc++ which was used as an argument by widely-known, reputable and respected Mesa developer to look into libstdc++. Your reclussive, insular existance and passive-aggressive "deal with issues of our _irreplaceable_ software, requiring high-level low-abstraction-layer engineer-grade knowledge and experience, yourself"-attitude in relation to one of the most obscure subject matters, on the other hand, begets only distrust and frustration. As the result, I spent almost no time investigating my suspicion which was correct in the first place, spent a lot of effort to investigate his claim and couldn't do anything with yours because of how non-productive it was. > (In reply to Uroš Bizjak from comment #4) > > Also, please read https://gcc.gnu.org/bugs/ > > I already said that before the bug was even filed. No wonder googling "gcc bugzilla registration" brings up an upvoted years-old post that's advising not to bother.
[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071 --- Comment #9 from Sergey Kondakov --- (In reply to Alexander Monakov from comment #8) > You should have mentioned you were using a custom-compiled Mesa, not the > distribution package (both here and in the original report to Mesa project). > > For some reason the disasm in the provided log is unusable (shows assembly > of the outermost frame), but downloading your package shows that failing > instruction is > >928ea: c5 fa 6f 05 0e 09 c3 00 vmovdqu 0xc3090e(%rip),%xmm0 > # cc3200 > > i.e. an AVX instruction, not supported on the CPU. Given that you were using > Clang to compile the package, this is not a GCC issue. You actually managed to get some info from separate package ? Amazing. I should have but half of my system is customized in some way, by me or by others via OBS's community repositories, at this point + it's rolling release distro. And my attention was completely drawn from Mesa. But here's the interesting part: a guy from openSUSE just figured out that offending code was launched by in-Mesa "SWR", Intel's AVX-based software renderer, which, for some reason, tried to do something even though it should not load unless explicitly requested or if direct rendering has failed. And it doesn't, if Mesa is built with gcc & linked with ld, even with it enabled ! One thing doesn't build with gcc, other fails with clang… there is no peace with Mesa. Anyway, thanks for your advices, I was getting desperate with that weird issue.