On Sat, 25 Jul 2020 at 09:11, Jeff Law <l...@redhat.com> wrote: > On Fri, 2020-07-24 at 22:29 +0100, Richard W.M. Jones wrote: > > Just upgraded a development machine to: > > > > binutils-2.34.0-10.fc33.x86_64 > > gcc-10.1.1-2.fc33.x86_64 > > glibc-2.31.9000-21.fc33.x86_64 > > > > and a very simple C compile (non-LTO) is now segfaulting: > > > > make[3]: Entering directory '/home/rjones/d/nbdkit/common/protocol' > > /bin/sh ../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. > -I../.. -Wall -Wshadow -Wvla -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT > libprotocol_la-protostrings.lo -MD -MP -MF > .deps/libprotocol_la-protostrings.Tpo -c -o libprotocol_la-protostrings.lo > `test -f 'protostrings.c' || echo './'`protostrings.c > > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla > -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD > -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -fPIC > -DPIC -o .libs/libprotocol_la-protostrings.o > > libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -Wall -Wshadow -Wvla > -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -MT libprotocol_la-protostrings.lo -MD > -MP -MF .deps/libprotocol_la-protostrings.Tpo -c protostrings.c -o > libprotocol_la-protostrings.o >/dev/null 2>&1 > > mv -f .deps/libprotocol_la-protostrings.Tpo > .deps/libprotocol_la-protostrings.Plo > > /bin/sh ../../libtool --tag=CC --mode=link gcc -Wall -Wshadow -Wvla > -Werror -O0 -g -Wp,-U_FORTIFY_SOURCE -O0 -g -Wp,-U_FORTIFY_SOURCE -o > libprotocol.la libprotocol_la-protostrings.lo > > libtool: link: ar cru .libs/libprotocol.a > .libs/libprotocol_la-protostrings.o > > ../../libtool: line 1734: 2572327 Segmentation fault (core dumped) > ar cru .libs/libprotocol.a .libs/libprotocol_la-protostrings.o > > > > Core was generated by `ar cru .libs/libprotocol.a > .libs/libprotocol_la-protostrings.o'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 0x0000000000000000 in ?? () > > binutils-2.34.0-10.fc33.x86_64 > > (gdb) bt > > Missing separate debuginfos, use: dnf debuginfo-install#0 > 0x0000000000000000 in ?? () > > #1 0x00007f15bd3e03d0 in make_relative_prefix_1.part () > > from /lib64/libbfd-2.34.0.20200522.so > > #2 0x00007f15bd3d22db in bfd_plugin_object_p.lto_priv () > > from /lib64/libbfd-2.34.0.20200522.so > > #3 0x00007f15bd3401ce in bfd_check_format_matches () > > from /lib64/libbfd-2.34.0.20200522.so > > #4 0x00007f15bd340e7a in _bfd_write_archive_contents () > > from /lib64/libbfd-2.34.0.20200522.so > > #5 0x00007f15bd348b2a in bfd_close () from /lib64/ > libbfd-2.34.0.20200522.so > > #6 0x0000559ee83994b6 in write_archive () > > #7 0x0000559ee8396ac3 in main () > > > > I can't find any BZ for this. Any ideas what it could be? > After banging my head on the wall for a few hours, I think I see what's > happening > here. > > So at a high level ar makes a call to lrealpath. That naturally goes > through the > PLT. The PLT stub loads the value out of the GOT and jumps to it. The > problem > is the entry in the GOT is *zero* when it should be pointing to the > resolver. > > Now lrealpath is provided by libiberty and a copy is in libbfd.so and the > GOT > entry in libbfd.so looked right. But by the time the program has hit > main, the > GOT entry has been reset to zero. Naturally that's happening inside the > dynamic > linker (as expected, confirmed with a watchpoint). If you've ever had to > debug > ld.so, you'll know it's an insanely painful experience, but it proved > fruitful. > > The key was finding out that we were not using the libbfd.so linker map to > resolve lrealpath, instead we were using the linker map for the main > program (ar > in this case). So natrually it's time to look a bit more closely at the > symbol > table for ar. > > The main symbol table for ar it doesn't mention lrealpath. But that's > just a > confusing byproduct of having two symbol tables. What matters to ld.so is > the > *dynamic* symbol table. And ar has lrealpath in its dynamic symbol > table. And > here's the kicker, it's an absolute symbol with the value 0: > > 0000000000000000 A lrealpath > > A symbol in the main program takes precedence over a symbol in a DSO. So > the > dynamic linker was actually doing the right thing given the input it was > provided. > > Now why (*&@#$ does ar have lrealpath as an absolute symbol? It's got to > be > related to the fact that when we link ar we pull in another copy of > libiberty. > In fact, ar links against libiberty twice. Once via -liberty then again > against > libiberty.a (and kindof a 3rd time indirectly via libbfd). BUt even so > that > shouldn't be creating an absolute symbol. That's just weird. > > This smells like a linker bug to me. Not surprisingly if I force the > system to > use ld.gold, then I don't see the bogus absolute symbol and the resultant > ar > works just fine. > > It's late and I'll dig further over the weekend, but right now this looks > like a > linker bug to me. I may turn off LTO globally or in the various instances > of > binutils -- I need to sleep on that. > > Jeff > > _______________________________________________ > devel mailing list -- devel@lists.fedoraproject.org > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Super big thanks for investigating this, Jeff! It suddenly tripped my rawhide build and I started panicking, because it's my first official package :D. ~Andy
_______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org