On Thu, Feb 27, 2025 at 09:07:53AM +0800, Zhaoming Luo wrote:
> On Thu, Feb 27, 2025 at 01:34:46AM +0100, Samuel Thibault wrote:
> > Zhaoming Luo, le jeu. 27 févr. 2025 08:23:16 +0800, a ecrit:
> > > On Tue, Feb 25, 2025 at 08:28:34PM +0100, Samuel Thibault wrote:
> > > > Zhaoming Luo, le mar. 25 févr. 2025 21:14:14 +0800, a ecrit:
> > > > > The program './runtests.pl -g 546' stopped at [0] several times before
> > > > > the test is really running, so I think some preparations involved
> > > > > io_select_common. However, after the test is running, I set a 
> > > > > breakpoint
> > > > > at [1](it's like playing pingpong between two gdbs :-)). The test 
> > > > > still
> > > > > stops at [0] several times, so I think it's quite hard to find which
> > > > > EINTR caused the failure.
> > > > 
> > > > Hard doesn't mean impossible, just not trivial ;)
> > > > 
> > > > Remember that you can e.g. put printfs inside pfinet, so you can see
> > > > them intermixed with the printfs from your program. In the end it's not
> > > > that inconvenient compared to gdb.
> > > 
> > > I don't think it works, first I tried to add a printf in
> > > io_select_common() but I didn't see the content of printf in output.
> > 
> > Did you fflush(stdout)? Did you run the translator as active translator?
> > (perhaps also try stderr instead)
> > 
> Ah thanks, running the translator is needed.
> 
I think I found the issue \o/.

These are the places where I put printfs:

~/hurd/pfinet/glue-include/linux/sched.h:
```
isroot = current->isroot;     /* This is our context that needs switched.  */
next_wait = current->next_wait; /* This too, for multiple schedule calls.  */
current->next_wait = 0;
err = pthread_hurd_cond_timedwait_np(c, &global_lock, tsp);
+ fprintf (stderr, "In interruptible_sleep_on_timeout 117, err from 
pthread_hurd_cond_timedwait_np: %d\n", err);
if (err == EINTR)
  current->signal = 1;        /* We got cancelled, mark it for later.  */
```

~/hurd/pfinet/io-ops.c:
```
/* Block until we time out, are woken or cancelled.  */
timedout = interruptible_sleep_on_timeout (user->sock->sk->sleep,
                                           tsp);
+ fprintf (stderr, "In io_select_common 292, signal_pending (current): %d\n", 
signal_pending (current));
if (timedout)
  {
    pthread_mutex_unlock (&global_lock);
```

There is another one printf that's not important so please ignore that
one.

Here's the log:
```
...
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 
1073741828
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
In io_select_common 273, signal_pending (current): 0
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 
1073741828
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0
In io_select_common 292, signal_pending (current): 0
In io_select_common 273, signal_pending (current): 0
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 
1073741828
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
lib533.c:107In interruptible_sleep_on_timeout 117, err from 
pthread_hurd_cond_timedwait_np: 0
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
In io_select_common 273, signal_pending (current): 0
 select() failed, with thiserrno 1073741828 (Interrupted system call)
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0
...
```

The error occured when:
```
In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0
In io_select_common 292, signal_pending (current): 1
RETURN EINTR!!!
```

It means that pthread_hurd_cond_timedwait_np() successes but
current->signal is still 1. I assume current->signal will only be
modified to 1 when pthread_hurd_cond_timedwait_np() returns EINTR. Maybe
we should set current->signal to 0 when the return from
pthread_hurd_cond_timedwait_np() is 0? Not sure if it makes sense, I
have tried it, we can pass the test with this hack.

```
if (err == EINTR)
  current->signal = 1;        /* We got cancelled, mark it for later.  */
if (err == 0)
  current->signal = 0;
current->isroot = isroot;     /* Switch back to our context.  */
current->next_wait = next_wait;
```

Best,
Zhaoming


Reply via email to