[Bug libstdc++/87071] New: libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57

2018-08-23 Thread virtuousfox at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071

Bug ID: 87071
   Summary: libstdc++ crashes during GPU driver initialization
with suspected attempt to execute unsupported
instruction by Athlon64 X2 TK-57
   Product: gcc
   Version: 8.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: virtuousfox at gmail dot com
  Target Milestone: ---

Created attachment 44576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44576&action=edit
Xorg.pid-1158.gdb.log

So I've updated the system on an old laptop after neglecting to do so for a
long time and was met with inoperable X server. After sending a report to Mesa
developers they insisted on that being an issue in libstdc++. Attached file
contains full gdb dump made by automated script and filled with relevant
debuginfo.

Affected libstdc++ version are 8.1.1 from
https://build.opensuse.org/package/show/openSUSE:Factory/gcc8 and 8.2.1 from
https://build.opensuse.org/package/show/devel:gcc/gcc8
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=hsa:nvptx-none
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
--enable-languages=c,c++,objc,fortran,obj-c++,ada,go
--enable-offload-targets=hsa,nvptx-none=/usr/nvptx-none, --without-cuda-driver
--enable-checking=release --disable-werror
--with-gxx-include-dir=/usr/include/c++/8 --enable-ssp --disable-libssp
--disable-libvtv --disable-cet --disable-libcc1 --enable-plugin
--with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux'
--with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit
--enable-libstdcxx-allocator=new --disable-libstdcxx-pch
--enable-version-specific-runtime-libs --with-gcc-major-version-only
--enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function
--program-suffix=-8 --without-system-libunwind --enable-multilib
--with-arch-32=x86-64 --with-tune=generic --build=x86_64-suse-linux
--host=x86_64-suse-linux
Thread model: posix
gcc version 8.2.1 20180817 [gcc-8-branch revision 263612] (SUSE Linux) 

cpuinfo from affected machine:
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 104
model name  : AMD Athlon(tm) 64 X2 Dual-Core Processor TK-57
stepping: 2
cpu MHz : 800.000
cache size  : 256 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
3dnowext 3dnow rep_good nopl cpuid extd_apicid pni cx16 lahf_lm cmp_legacy svm
extapic cr8_legacy 3dnowprefetch vmmcall lbrv
bugs: apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence
amd_e400 spectre_v1 spectre_v2
bogomips: 1596.01
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor   : 1
…
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
…


However, same versions run fine on machine with Radeon RX580 (radeonsi_dri) and
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 21
model   : 1
model name  : AMD FX(tm)-6100 Six-Core Processor
stepping: 2
microcode   : 0x600063e
cpu MHz : 3780.977
cache size  : 2048 KB
physical id : 0
siblings: 6
core id : 0
cpu cores   : 3
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf
pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm
cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate
ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean
flushbyasid decodeassists pausefilter pfthreshold
bugs: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2
spec_store_bypass
bogomips: 7919.04
TLB size: 1536 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57

2018-08-23 Thread virtuousfox at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071

--- Comment #1 from Sergey Kondakov  ---
Created attachment 44577
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44577&action=edit
Asus_F3Ke.dmesg

Verbose dmesg from affected machine.

[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57

2018-08-23 Thread virtuousfox at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071

--- Comment #3 from Sergey Kondakov  ---
(In reply to Jonathan Wakely from comment #2)
> (EE) Illegal instruction at address 0x72f2c8ea
> 
> I don't see how this can possibly be a libstdc++ problem, since libstdc++
> doesn't produce any CPU instructions, illegal or not.
> 
> As I already said to you, there's nothing we can do with this info.

And Mesa devs said that it is. So everyone points fingers at each other in a
circle and what am I supposed to do ?

Program received signal SIGILL, Illegal instruction.
0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at
/usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350
350 return static_cast(__builtin_memcpy(__s1, __s2,
__n));
#0  0x72f2c8ea in _GLOBAL__sub_I_lower_x86.cpp () at
/usr/bin/../lib64/gcc/x86_64-suse-linux/8/../../../../include/c++/8/bits/char_traits.h:350
InitializeLowerX86PassFlag = {_M_once = 0}
SwrJit::intrinsicMap2[abi:cxx11] = {std::map with 0 elements, std::map
with 0 elements, std::map with 0 elements}
std::__ioinit = {static _S_refcount = 13, static _S_synced_with_stdio =
true}
SwrJit::DOUBLE = 
std::piecewise_construct = 
(anonymous namespace)::ForceMCJITLinking = 
SwrJit::intrinsicMap[abi:cxx11] = std::map with 0 elements
SwrJit::LowerX86::ID = 0 '\000'

Is include/c++/8/bits/char_traits.h not part of libstdc++ ?
If your code is correct then whose isn't ?

[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57

2018-08-23 Thread virtuousfox at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071

--- Comment #7 from Sergey Kondakov  ---
Created attachment 44583
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44583&action=edit
Xorg.pid-1381.gdb.log with disas

(In reply to Alexander Monakov from comment #6)
> In your gdb script, please add
> 
>   x/i $pc
>   disas
> 
> after the backtrace command (probably 'bt full') and regenerate the log.
> Without that the log doesn't actually show the "illegal" instruction.
> 

Now, THAT's some real advice, thanks ! I've managed get that log but…

> Also please show output of rpm -qf /usr/lib64/dri/r300_dri.so

This, of course, points to Mesa-dri package BUT that gave me an idea to not
rule out Mesa yet. And, indeed, it was the culprit all along, despite of what
its dev have said. Or, more accurately, it was clang/LLVM… probably, I haven't
properly checked yet.

I've forked distro package (or, more precisely, its auto-build OBS scripts) of
Mesa to build it with LTO and decrease its monstrous size. But it doesn't
compile with gcc that way because of a long-standing bug and broken autotools
scripts + OBS can't handle requirements of full LTO anyway. So I used clang &
gold for building Mesa (and only it) with ThinLTO and it worked (and works, on
newer PCs) fine.

I spent days on figuring that out and hours on fighting package manager to
selectively install "old", default Mesa package-set without affecting anything
else. And it worked, crash is gone. Maybe it was combination of factors but I
shouldn't have believed Mesa is irrelevant, as I was told, just because it's
far from being the first in the chain of jumps of the backtrace. But then
again, I don't have much clue about how it works.

So, this issue can be closed UNLESS you think that backtrace shouldn't have
lead to libstdc++ anyway and/or clang & Mesa couldn't have failed like that on
their own. Normally, such suicidal code should not come out of a compiler with
almost-default non-aggressive settings, wherever it may be.

(In reply to Uroš Bizjak from comment #4)
> (In reply to Sergey Kondakov from comment #3)
> 
> > If your code is correct then whose isn't ?
> Instructions are generated by the compiler. So, it is the compiler's fault,
> it probably emits a SSE instruction that your processor doesn't understand.
> 
> That said, at least we need a runtime testcase that fails on your target. If
> this is not possible, then please at least decompile the library and show
> the instruction that fails. We also need preprocessed source and exact
> instructions, how to build the source, so the illegal insn will be generated.
> 
> Also, please read https://gcc.gnu.org/bugs/

Oh, I've read that, all right.

Full verbose (very, very verbose) build logs, self-tests, compilation scripts
and built gcc/libstdc++ packages are in the OBS links in the original post,
more precisely:
https://build.opensuse.org/package/live_build_log/devel:gcc/gcc8/openSUSE_Factory/x86_64
https://build.opensuse.org/package/binaries/devel:gcc/gcc8/openSUSE_Factory
(requires OBS registration to show download links)
https://download.opensuse.org/repositories/devel:/gcc/openSUSE_Factory/ (does
not require registration but page with massive package-listing halts browser
and requires a lot of RAM to view)
Except for https://build.opensuse.org/package/show/home:X0F:HSF/Mesa where my
Mesa-dri:r300_dri.so is built.

But asking to decompile one of the core system libraries or make an example,
faulty program on a spot is like asking to decompile kernel or write a driver
(actually, probably even worse): anyone capable of doing it in a day does not
require anyone else's software-related assistance.

(In reply to Jonathan Wakely from comment #5)
> Right. As you can see in GDB, the libstdc++ code says:
> 
> 350   return static_cast(__builtin_memcpy(__s1, __s2, 
> __n));
> 
> Do you see any processor instructions there? Anything that isn't valid on
> your CPU? No, because it's just C++ code.

I see a bunch of over-complicated gibberish starting with libstdc++ which was
used as an argument by widely-known, reputable and respected Mesa developer to
look into libstdc++. Your reclussive, insular existance and passive-aggressive
"deal with issues of our _irreplaceable_ software, requiring high-level
low-abstraction-layer engineer-grade knowledge and experience,
yourself"-attitude in relation to one of the most obscure subject matters, on
the other hand, begets only distrust and frustration.

As the result, I spent almost no time investigating my suspicion which was
correct in the first place, spent a lot of effort to investigate his claim and
couldn't do anything with yours because of how non-productive it was.

> (In reply to Uroš Bizjak from comment #4)
> > Also, please read https://gcc.gnu.org/bugs/
> 
> I already said that before the bug was even filed.

No wonder googling "gcc bugzilla registration" brings up an upvoted years-old
post that's advising not to bother.

[Bug libstdc++/87071] libstdc++ crashes during GPU driver initialization with suspected attempt to execute unsupported instruction by Athlon64 X2 TK-57

2018-08-27 Thread virtuousfox at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87071

--- Comment #9 from Sergey Kondakov  ---
(In reply to Alexander Monakov from comment #8)
> You should have mentioned you were using a custom-compiled Mesa, not the
> distribution package (both here and in the original report to Mesa project).
> 
> For some reason the disasm in the provided log is unusable (shows assembly
> of the outermost frame), but downloading your package shows that failing
> instruction is
> 
>928ea:   c5 fa 6f 05 0e 09 c3 00 vmovdqu 0xc3090e(%rip),%xmm0
> # cc3200 
> 
> i.e. an AVX instruction, not supported on the CPU. Given that you were using
> Clang to compile the package, this is not a GCC issue.

You actually managed to get some info from separate package ? Amazing.

I should have but half of my system is customized in some way, by me or by
others via OBS's community repositories, at this point + it's rolling release
distro. And my attention was completely drawn from Mesa. But here's the
interesting part: a guy from openSUSE just figured out that offending code was
launched by in-Mesa "SWR", Intel's AVX-based software renderer, which, for some
reason, tried to do something even though it should not load unless explicitly
requested or if direct rendering has failed. And it doesn't, if Mesa is built
with gcc & linked with ld, even with it enabled !

One thing doesn't build with gcc, other fails with clang… there is no peace
with Mesa. Anyway, thanks for your advices, I was getting desperate with that
weird issue.