On Thu, Feb 27, 2025 at 09:07:53AM +0800, Zhaoming Luo wrote: > On Thu, Feb 27, 2025 at 01:34:46AM +0100, Samuel Thibault wrote: > > Zhaoming Luo, le jeu. 27 févr. 2025 08:23:16 +0800, a ecrit: > > > On Tue, Feb 25, 2025 at 08:28:34PM +0100, Samuel Thibault wrote: > > > > Zhaoming Luo, le mar. 25 févr. 2025 21:14:14 +0800, a ecrit: > > > > > The program './runtests.pl -g 546' stopped at [0] several times before > > > > > the test is really running, so I think some preparations involved > > > > > io_select_common. However, after the test is running, I set a > > > > > breakpoint > > > > > at [1](it's like playing pingpong between two gdbs :-)). The test > > > > > still > > > > > stops at [0] several times, so I think it's quite hard to find which > > > > > EINTR caused the failure. > > > > > > > > Hard doesn't mean impossible, just not trivial ;) > > > > > > > > Remember that you can e.g. put printfs inside pfinet, so you can see > > > > them intermixed with the printfs from your program. In the end it's not > > > > that inconvenient compared to gdb. > > > > > > I don't think it works, first I tried to add a printf in > > > io_select_common() but I didn't see the content of printf in output. > > > > Did you fflush(stdout)? Did you run the translator as active translator? > > (perhaps also try stderr instead) > > > Ah thanks, running the translator is needed. > I think I found the issue \o/.
These are the places where I put printfs: ~/hurd/pfinet/glue-include/linux/sched.h: ``` isroot = current->isroot; /* This is our context that needs switched. */ next_wait = current->next_wait; /* This too, for multiple schedule calls. */ current->next_wait = 0; err = pthread_hurd_cond_timedwait_np(c, &global_lock, tsp); + fprintf (stderr, "In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: %d\n", err); if (err == EINTR) current->signal = 1; /* We got cancelled, mark it for later. */ ``` ~/hurd/pfinet/io-ops.c: ``` /* Block until we time out, are woken or cancelled. */ timedout = interruptible_sleep_on_timeout (user->sock->sk->sleep, tsp); + fprintf (stderr, "In io_select_common 292, signal_pending (current): %d\n", signal_pending (current)); if (timedout) { pthread_mutex_unlock (&global_lock); ``` There is another one printf that's not important so please ignore that one. Here's the log: ``` ... In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 1073741828 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! In io_select_common 273, signal_pending (current): 0 In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 1073741828 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0 In io_select_common 292, signal_pending (current): 0 In io_select_common 273, signal_pending (current): 0 In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 1073741828 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! lib533.c:107In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! In io_select_common 273, signal_pending (current): 0 select() failed, with thiserrno 1073741828 (Interrupted system call) In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0 ... ``` The error occured when: ``` In interruptible_sleep_on_timeout 117, err from pthread_hurd_cond_timedwait_np: 0 In io_select_common 292, signal_pending (current): 1 RETURN EINTR!!! ``` It means that pthread_hurd_cond_timedwait_np() successes but current->signal is still 1. I assume current->signal will only be modified to 1 when pthread_hurd_cond_timedwait_np() returns EINTR. Maybe we should set current->signal to 0 when the return from pthread_hurd_cond_timedwait_np() is 0? Not sure if it makes sense, I have tried it, we can pass the test with this hack. ``` if (err == EINTR) current->signal = 1; /* We got cancelled, mark it for later. */ if (err == 0) current->signal = 0; current->isroot = isroot; /* Switch back to our context. */ current->next_wait = next_wait; ``` Best, Zhaoming