On Thu, Feb 27, 2025 at 08:25:37PM +0100, Samuel Thibault wrote: > Hello, > > Nice investigation! > > Zhaoming Luo, le jeu. 27 févr. 2025 10:37:36 +0800, a ecrit: > > It means that pthread_hurd_cond_timedwait_np() successes but > > current->signal is still 1. I assume current->signal will only be > > modified to 1 when pthread_hurd_cond_timedwait_np() returns EINTR. Maybe > > we should set current->signal to 0 when the return from > > pthread_hurd_cond_timedwait_np() is 0? > > Actually it seems like the right way indeed, with some more fix. > > Clearing it to 0 is done by prepare_current() which is done on RPC > entry, so that the linux code knows that we are not interrupted so > far. That pthread_hurd_cond_timedwait_np call is the only place inside > pfinet where pfinet sleeps, and thus where we care for seeing EINTR, > which we then need to remember and propagate through the linux stack > down to the RPC stub. > > While the pthread_hurd_cond_timedwait_np call is sleeping, the > current_contents structure is used for other RPCs, so current->signal > may be cleared & set while managing other RPCs. That's actually why > interruptible_sleep_on_timeout takes care of saving current->isroot and > current->next_wait to restore them after taking the global lock again. > Since current->signal may written by something else, we have to put > something in it, always. > > Also, I'm thinking that if we previously got interrupted, we shouldn't > ever try to wait again until exiting from the RPC: an interrupt means we > really want to try hard to finish the RPC. So I'd say we want: > > current->next_wait = 0; > > if (current->signal) > /* We already got interrupted previously, keep interrupting the > RPC. */ > err = EINTR; > else > { > /* This is the only place where we sleep within an RPC and release the > global > lock while serving it. */ > err = pthread_hurd_cond_timedwait_np(c, &global_lock, tsp); > > if (err == EINTR) > current->signal = 1; /* We got cancelled, mark it for Linux code > to bail out. */ > else > current->signal = 0; > } > > current->isroot = isroot; > > even if the linux code is not supposed to call schedule() again if > signal_pending() != 0, it's probably safer to avoid the question. > > Could you check that it works fine? > I have applied the code and make test. I could only see two failed tests now. Test 537 is a stress test and it got stuck and outputing a lot of 'Deallocating a bogus port 6 ...' on the terminal. Test 1541 has possibility to pass. I will try to investigate. I think you can commit the code.
Zhaoming