On Thu, Feb 27, 2025 at 08:25:37PM +0100, Samuel Thibault wrote:
> Hello,
> 
> Nice investigation!
> 
> Zhaoming Luo, le jeu. 27 févr. 2025 10:37:36 +0800, a ecrit:
> > It means that pthread_hurd_cond_timedwait_np() successes but
> > current->signal is still 1. I assume current->signal will only be
> > modified to 1 when pthread_hurd_cond_timedwait_np() returns EINTR. Maybe
> > we should set current->signal to 0 when the return from
> > pthread_hurd_cond_timedwait_np() is 0?
> 
> Actually it seems like the right way indeed, with some more fix.
> 
> Clearing it to 0 is done by prepare_current() which is done on RPC
> entry, so that the linux code knows that we are not interrupted so
> far. That pthread_hurd_cond_timedwait_np call is the only place inside
> pfinet where pfinet sleeps, and thus where we care for seeing EINTR,
> which we then need to remember and propagate through the linux stack
> down to the RPC stub.
> 
> While the pthread_hurd_cond_timedwait_np call is sleeping, the
> current_contents structure is used for other RPCs, so current->signal
> may be cleared & set while managing other RPCs. That's actually why
> interruptible_sleep_on_timeout takes care of saving current->isroot and
> current->next_wait to restore them after taking the global lock again.
> Since current->signal may written by something else, we have to put
> something in it, always.
> 
> Also, I'm thinking that if we previously got interrupted, we shouldn't
> ever try to wait again until exiting from the RPC: an interrupt means we
> really want to try hard to finish the RPC. So I'd say we want:
> 
>   current->next_wait = 0;
> 
>   if (current->signal)
>     /* We already got interrupted previously, keep interrupting the
>        RPC.  */
>     err = EINTR;
>   else
>     {
>       /* This is the only place where we sleep within an RPC and release the 
> global
>        lock while serving it.  */
>       err = pthread_hurd_cond_timedwait_np(c, &global_lock, tsp);
> 
>       if (err == EINTR)
>       current->signal = 1;        /* We got cancelled, mark it for Linux code 
> to bail out.  */
>       else
>       current->signal = 0;
>     }
> 
>   current->isroot = isroot;
> 
> even if the linux code is not supposed to call schedule() again if
> signal_pending() != 0, it's probably safer to avoid the question.
> 
> Could you check that it works fine?
> 
I have applied the code and make test. I could only see two failed tests
now. Test 537 is a stress test and it got stuck and outputing a lot of
'Deallocating a bogus port 6 ...' on the terminal. Test 1541 has
possibility to pass. I will try to investigate. I think you can commit
the code.

Zhaoming


Reply via email to