On Mon, 28 Jul 2025 10:40:03 +0200 Florian Weimer <fwei...@redhat.com> wrote:
> * Dan Horák: > > > On Sun, 27 Jul 2025 21:34:12 +0200 > > Sandro Mani <manisan...@gmail.com> wrote: > > > >> Hi > >> > >> scotch is currently FTBFS on ppc64le (affects the current 7.0.7, the > >> previous 7.0.6, as well as the new 7.0.8 release), failing with [1] > >> > >> gmake[2]: *** [src/libscotch/CMakeFiles/ptscotchf_h.dir/build.make:77: > >> src/include/ptscotchf.h] Illegal instruction (core dumped) > >> > >> 7.0.6 previously successfully built with gcc-0:15.0.1-0.3.fc42.1.ppc64le > >> and now fails with gcc-0:15.1.1-5.fc43.1.ppc64le, so this looks like a gcc > >> regression. > >> > >> Being this on ppc64le and having no access to such a machine, how can I > >> debug this? > > > > you have access to a system from > > https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainers > > and it can be reproduced there. But it needs the Power10 system (same > > as current koji builders), not the Power9 (pre-DC-migration koji > > builders, it builds there OK). > > > > ppc64le-redhat-linux-gnu-openmpi/src/libscotch/ptdummysizes is the > > crashing binary ... > > > > and running it under gdb gives > > > > ... > > Program received signal SIGILL, Illegal instruction. > > 0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6 > > (gdb) where > > #0 0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6 > > #1 0x00007ffff787b38c in ucm_fire_mmap_events_internal () from > > /lib64/libucm.so.0 > > #2 0x00007ffff787bd88 in ucm_mmap_test_events_nolock () from > > /lib64/libucm.so.0 > > #3 0x00007ffff78818b8 in ucm_mmap_install () from /lib64/libucm.so.0 > > #4 0x00007ffff7881b30 in ucm_mmap_init () from /lib64/libucm.so.0 > > #5 0x00007ffff7881c2c in ucm_library_init () from /lib64/libucm.so.0 > > #6 0x00007ffff7881cbc in ucm_set_global_opts () from /lib64/libucm.so.0 > > #7 0x00007ffff725745c in ucs_init_ucm_opts () from /lib64/libucs.so.0 > > #8 0x00007ffff7243fb0 in ucs_init () from /lib64/libucs.so.0 > > #9 0x00007ffff7f989bc in call_init (l=<optimized out>, argc=1, > > argv=0x7fffffffece8, env=0x7fffffffecf8) at dl-init.c:74 > > #10 _dl_init (main_map=0x7ffff7ff12f0, argc=1, argv=0x7fffffffece8, > > env=0x7fffffffecf8) at dl-init.c:121 > > #11 0x00007ffff7fc3eb8 in _dl_start_user () from /lib64/ld64.so.2 > > The location of the crash: > > Dump of assembler code for function __GI___sbrk: > 0x00007ffff774e400 <+0>: d1 ff 21 f8 stdu r1,-48(r1) > => 0x00007ffff774e404 <+4>: 0e 00 10 06 .long 0x610000e > 0x00007ffff774e408 <+8>: 00 00 60 3d lis r11,0 > 0x00007ffff774e40c <+12>: ff 7f 6b 61 ori r11,r11,32767 > 0x00007ffff774e410 <+16>: c7 07 6b 79 sldi. r11,r11,32 > 0x00007ffff774e414 <+20>: 87 f7 6b 65 oris r11,r11,63367 > 0x00007ffff774e418 <+24>: b8 9e 6b 61 ori r11,r11,40632 > 0x00007ffff774e41c <+28>: a6 03 69 7d mtctr r11 > 0x00007ffff774e420 <+32>: 20 04 80 4e bctr > 0x00007ffff774e424 <+36>: 40 00 01 f8 std r0,64(r1) > 0x00007ffff774e428 <+40>: 99 61 ff 4b bl 0x7ffff77445c0 <__brk> > > This was patched by the ucx library. > > The original looks like this: > > 000000000014e400 <__sbrk>: > 14e400: d1 ff 21 f8 stdu r1,-48(r1) > 14e404: 0e 00 10 06 plbz r9,961781 # 2390f9 > <__libc_initial> > 14e408: f5 ac 20 89 > 14e40c: 78 1b 62 7c mr r2,r3 > 14e410: 00 00 09 2c cmpwi r9,0 > 14e414: 4c 00 82 40 bne 14e460 <__sbrk+0x60> > 14e418: 00 00 23 2c cmpdi r3,0 > 14e41c: b0 00 82 40 bne 14e4cc <__sbrk+0xcc> > 14e420: a6 02 08 7c mflr r0 > 14e424: 40 00 01 f8 std r0,64(r1) > 14e428: 99 61 ff 4b bl 1445c0 <brk> > > So there was a 64-bit instruction bundle at the patched offset, and that > may have been the reason why ucx failed to patch properly. I agree > I would very much prefer if there weren't any libraries like ucx in > Fedora that patch glibc merely because you link against them. It's fine > to do this for debugging tools, but as part of regular execution, it > risks too much breakage. thanks, Florian, for the insight IMO this issue is also causing the openmpi build failure in the mass-rebuild - https://koji.fedoraproject.org/koji/taskinfo?taskID=135241418 Dan -- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue