Samuel Thibault writes:

Hello,

> jann...@gnu.org, le sam. 16 nov. 2024 10:05:40 +0100, a ecrit:
>> So, I took another approach.  I minimized the tar archive, keeping only
>> gnulib and a simple main: bug.c that shows the same hanging behaviour
>> when called with one command line argument.  See attached.
>
> How do you compile it? I failed to get a hang after switching to #if 0

Heh, I tried on Debian/Hurd and it won't hang.  There is a problem in
the guix-built glibc.

> The specification here is only for warnings about formatting, so I
> don't see why it would entail a crash. Again, actual asm would be
> enlightening.

The attached gdb log clearly shows the problem:

--8<---------------cut here---------------start------------->8---
=> 0x000000000042efe8 <__error_internal+40>:    e8 13 10 bd ff          call   
0x0
(gdb) si
0x0000000000000000 in ?? ()
=> 0x0000000000000000:  
Cannot access memory at address 0x0
--8<---------------cut here---------------end--------------->8---

which should correspond to the "call f8" in the disassembled error.o.d:

--8<---------------cut here---------------start------------->8---
00000000000000c6 <__error_internal>:
  c6:   55                      push   %rbp
  ..
  ed:   31 ff                   xor    %edi,%edi
  ef:   48 8d 75 cc             lea    -0x34(%rbp),%rsi
  f3:   e8 00 00 00 00          call   f8 <__error_internal+0x32>
--8<---------------cut here---------------end--------------->8---

and is compiled from:

--8<---------------cut here---------------start------------->8---
void
__error_internal (int status, int errnum, const char *message,
                  va_list args, unsigned int mode_flags)
{
#if defined _LIBC
  /* We do not want this call to be cut short by a thread
     cancellation.  Therefore disable cancellation for now.  */
  int state = PTHREAD_CANCEL_ENABLE;
  __pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, &state);
#endif
--8<---------------cut here---------------end--------------->8---

The Debian-compiled error.o, however, seems to do a check

--8<---------------cut here---------------start------------->8---
 13e:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # 146 
<__error_internal+0x36>
 145:   00 
 146:   c7 45 c4 01 00 00 00    movl   $0x1,-0x3c(%rbp)
 14d:   74 0b                   je     15a <__error_internal+0x4a>
 14f:   48 8d 75 c4             lea    -0x3c(%rbp),%rsi
 153:   31 ff                   xor    %edi,%edi
 155:   e8 00 00 00 00          call   15a <__error_internal+0x4a>
--8<---------------cut here---------------end--------------->8---

and decides to skip the call 0x0.  Hmm.

Okay, so Guix hasn't been using

<https://salsa.debian.org/glibc-team/glibc/-/blob/25a0a47767fe7dc5151eb36afaade17218728efe/debian/patches/hurd-i386/local-static_pthread_setcancelstate.diff>

which didn't seem to be a problem before / with 32bit.  Adding this
patch and using the resulting "error.o" fixes it.  Oh my.

Meanwhile, I found another hang in bash when it calls WAITPID.  Linking
bash with the three patched error.o, fmtmsg.o, and iopopen.o, makes no
difference (as could be expected).  Of course, I should first rebuild
world with this patch and look again...but now I wonder if there's
another patch that could fix the waitpid hang.

I looked in salsa and saw these before

    git-AT_NO_AUTOMOUNT.diff
    git-context.diff
    git-fault-64bit.diff
    git-intr-msg-clobber.diff
    git-proc_getchildren_rusage.diff
    git-pthread_self.2.diff
    git-pthread_self.diff
    git-pthread_symbols.diff
    git-xattr.diff

but none of them were enabled/listed in the "series" files.  Doesn't
really look waitpid'y.

>> Anyway when using the POSIX variant:
>> 
>> --8<---------------cut here---------------start------------->8---
>> #if 1
>> #define GNULIB_VFPRINTF_POSIX 1 // fixes the static hang
>> #else
>> #define GNULIB_VFPRINTF_POSIX 0 // the gnulib setting: hangs
>
> One thing, however, is that your bug.c is bogus, it ends with a %
> without anything behind. Do you compile with warnings? It would warn
> about it.

Ah right, oops.  Added an "s", but it doesn't seem to matter in this case.

> Ok so it'd be the behavior of glibc that poses problem. Again, asm would
> tell us exactly what kind of operation gets wrong.

Indeed.  I extracted 'error.o' from Debian's libcrt.a; when using that
instead of Guix's error.o (that I cross-built), the bug is gone.

Greetings,
Janneke

Attachment: gdb-bug.log
Description: Binary data

Attachment: error.o.d
Description: Binary data

-- 
Janneke Nieuwenhuizen <jann...@gnu.org>  | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com

Reply via email to