On Mon, 28 Jul 2025 10:40:03 +0200
Florian Weimer <fwei...@redhat.com> wrote:

> * Dan Horák:
> 
> > On Sun, 27 Jul 2025 21:34:12 +0200
> > Sandro Mani <manisan...@gmail.com> wrote:
> >
> >> Hi
> >> 
> >> scotch is currently FTBFS on ppc64le (affects the current 7.0.7, the 
> >> previous 7.0.6, as well as the new 7.0.8 release), failing with [1]
> >> 
> >> gmake[2]: *** [src/libscotch/CMakeFiles/ptscotchf_h.dir/build.make:77: 
> >> src/include/ptscotchf.h] Illegal instruction (core dumped)
> >> 
> >> 7.0.6 previously successfully built with gcc-0:15.0.1-0.3.fc42.1.ppc64le 
> >> and now fails with gcc-0:15.1.1-5.fc43.1.ppc64le, so this looks like a gcc 
> >> regression.
> >> 
> >> Being this on ppc64le and having no access to such a machine, how can I 
> >> debug this?
> >
> > you have access to a system from
> > https://fedoraproject.org/wiki/Test_Machine_Resources_For_Package_Maintainers
> > and it can be reproduced there. But it needs the Power10 system (same
> > as current koji builders), not the Power9 (pre-DC-migration koji
> > builders, it builds there OK).
> >
> > ppc64le-redhat-linux-gnu-openmpi/src/libscotch/ptdummysizes is the
> > crashing binary ...
> >
> > and running it under gdb gives
> >
> > ...
> > Program received signal SIGILL, Illegal instruction.
> > 0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6
> > (gdb) where
> > #0  0x00007ffff774e404 in sbrk () from /lib64/glibc-hwcaps/power10/libc.so.6
> > #1  0x00007ffff787b38c in ucm_fire_mmap_events_internal () from 
> > /lib64/libucm.so.0
> > #2  0x00007ffff787bd88 in ucm_mmap_test_events_nolock () from 
> > /lib64/libucm.so.0
> > #3  0x00007ffff78818b8 in ucm_mmap_install () from /lib64/libucm.so.0
> > #4  0x00007ffff7881b30 in ucm_mmap_init () from /lib64/libucm.so.0
> > #5  0x00007ffff7881c2c in ucm_library_init () from /lib64/libucm.so.0
> > #6  0x00007ffff7881cbc in ucm_set_global_opts () from /lib64/libucm.so.0
> > #7  0x00007ffff725745c in ucs_init_ucm_opts () from /lib64/libucs.so.0
> > #8  0x00007ffff7243fb0 in ucs_init () from /lib64/libucs.so.0
> > #9  0x00007ffff7f989bc in call_init (l=<optimized out>, argc=1, 
> > argv=0x7fffffffece8, env=0x7fffffffecf8) at dl-init.c:74
> > #10 _dl_init (main_map=0x7ffff7ff12f0, argc=1, argv=0x7fffffffece8, 
> > env=0x7fffffffecf8) at dl-init.c:121
> > #11 0x00007ffff7fc3eb8 in _dl_start_user () from /lib64/ld64.so.2
> 
> The location of the crash:
> 
> Dump of assembler code for function __GI___sbrk:
>    0x00007ffff774e400 <+0>:     d1 ff 21 f8     stdu    r1,-48(r1)
> => 0x00007ffff774e404 <+4>:     0e 00 10 06     .long 0x610000e
>    0x00007ffff774e408 <+8>:     00 00 60 3d     lis     r11,0
>    0x00007ffff774e40c <+12>:    ff 7f 6b 61     ori     r11,r11,32767
>    0x00007ffff774e410 <+16>:    c7 07 6b 79     sldi.   r11,r11,32
>    0x00007ffff774e414 <+20>:    87 f7 6b 65     oris    r11,r11,63367
>    0x00007ffff774e418 <+24>:    b8 9e 6b 61     ori     r11,r11,40632
>    0x00007ffff774e41c <+28>:    a6 03 69 7d     mtctr   r11
>    0x00007ffff774e420 <+32>:    20 04 80 4e     bctr
>    0x00007ffff774e424 <+36>:    40 00 01 f8     std     r0,64(r1)
>    0x00007ffff774e428 <+40>:    99 61 ff 4b     bl      0x7ffff77445c0 <__brk>
> 
> This was patched by the ucx library.
> 
> The original looks like this:
> 
> 000000000014e400 <__sbrk>:
>   14e400:       d1 ff 21 f8     stdu    r1,-48(r1)
>   14e404:       0e 00 10 06     plbz    r9,961781       # 2390f9 
> <__libc_initial>
>   14e408:       f5 ac 20 89 
>   14e40c:       78 1b 62 7c     mr      r2,r3
>   14e410:       00 00 09 2c     cmpwi   r9,0
>   14e414:       4c 00 82 40     bne     14e460 <__sbrk+0x60>
>   14e418:       00 00 23 2c     cmpdi   r3,0
>   14e41c:       b0 00 82 40     bne     14e4cc <__sbrk+0xcc>
>   14e420:       a6 02 08 7c     mflr    r0
>   14e424:       40 00 01 f8     std     r0,64(r1)
>   14e428:       99 61 ff 4b     bl      1445c0 <brk>
> 
> So there was a 64-bit instruction bundle at the patched offset, and that
> may have been the reason why ucx failed to patch properly.

I agree
 
> I would very much prefer if there weren't any libraries like ucx in
> Fedora that patch glibc merely because you link against them.  It's fine
> to do this for debugging tools, but as part of regular execution, it
> risks too much breakage.

thanks, Florian, for the insight

IMO this issue is also causing the openmpi build failure in the
mass-rebuild
- https://koji.fedoraproject.org/koji/taskinfo?taskID=135241418


                Dan
-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to