Re: Sparc64 rthreads Instablilty

Claudio Jeker Tue, 07 Nov 2023 03:09:58 -0800

On Tue, Oct 31, 2023 at 01:18:42PM +0100, Claudio Jeker wrote:
> On Mon, Oct 23, 2023 at 11:06:53PM +0000, Kurt Miller wrote:
> > I experimented with adding a nanosleep after pthread_create() to
> > see if that would resolve the segfault issue - it does, but it
> > also exposed a new failure mode on -current. Every so often
> > the test program would not exit now. Thinking it may be related
> > to the detached threads I reworked the test program to use attached
> > threads and coordinated shutdown of them with pthread_join(). These
> > changes did not affect the new issue - every so often the main thread
> > exits after pthread_join() has been called on the created threads and
> > the program occasionally gets stuck with a number of threads still being
> > reported to egdb/ps etc. These threads should all be gone now since
> > pthread_join() has been called on all of them.
> > 
> > Here is the updated version of the program using attached threads and
> > a nanosleep() work-around to the original problem along with ps and egdb
> > output showing an example of a stuck process after the main thread exited.
> > 
> > ======== startup.c ==================
> > #include <pthread.h>
> > #include <err.h>
> > #include <time.h>
> > 
> > typedef struct {
> >     long t_num;
> >     pthread_t t_pthread_id;
> > 
> >     /* sync startup/shutdown */
> >     long t_state;
> >     pthread_mutex_t t_mutex;
> >     pthread_cond_t  t_cond_var;
> > } thread_t;
> > 
> > #define NTHREADS 40
> > 
> > thread_t threads[NTHREADS];
> > 
> > void
> > init_threads() {
> >     long t_num;
> >     for (t_num=0; t_num < NTHREADS; t_num++) {
> >         threads[t_num].t_num = t_num;
> >         threads[t_num].t_state = 0;
> >         if (pthread_mutex_init(&threads[t_num].t_mutex, NULL) != 0)
> >             err(1, "pthread_mutex_init failed");
> > 
> >         if (pthread_cond_init(&threads[t_num].t_cond_var, NULL) != 0)
> >             err(1, "pthread_cond_init failed");
> >     }
> > }
> > 
> > void
> > complete_threads() {
> >     long t_num;
> >     for (t_num=0; t_num < NTHREADS; t_num++) {
> >         pthread_mutex_lock(&threads[t_num].t_mutex);
> >         threads[t_num].t_state = 2;
> >         pthread_cond_broadcast(&threads[t_num].t_cond_var);
> >         pthread_mutex_unlock(&threads[t_num].t_mutex);
> >     }
> >     for (t_num=0; t_num < NTHREADS; t_num++) {
> >         if (pthread_join(threads[t_num].t_pthread_id, NULL) != 0)
> >             err(1, "pthread_join failed");
> >     }
> > }
> > 
> > void *
> > thread_start(thread_t *thread) {
> >     pthread_mutex_lock(&thread->t_mutex);
> >     thread->t_state = 1;
> >     pthread_cond_broadcast(&thread->t_cond_var);
> >  
> >     while (thread->t_state != 2)
> >        pthread_cond_wait(&thread->t_cond_var, &thread->t_mutex); 
> > 
> >     pthread_mutex_unlock(&thread->t_mutex);
> > 
> >     return(NULL);
> > }
> > 
> > void
> > create_thread(thread_t *thread) {
> >     int ret = pthread_create(&thread->t_pthread_id, NULL, (void* 
> > (*)(void*)) thread_start, thread);
> >  
> >     struct timespec timeout = { 0, 1000*1000L };
> >     nanosleep(&timeout, NULL);
> > 
> >     /* wait for thread startup */
> >     pthread_mutex_lock(&thread->t_mutex);
> >     while (thread->t_state == 0) 
> >        pthread_cond_wait(&thread->t_cond_var, &thread->t_mutex);
> >     pthread_mutex_unlock(&thread->t_mutex);
> > }
> > 
> > int
> > main( int argc, char *argv[] )
> > {
> >     long t_num;
> > 
> >     init_threads();
> > 
> >     /* startup threads */
> >     for (t_num=0; t_num < NTHREADS; t_num++) {
> >         create_thread(&threads[t_num]);
> >     }
> > 
> >     complete_threads();
> > 
> >     return 0;
> > }
> 
> I tested your program on an m10-1 and it shows the same behaviour.
> The problem is that the main thread exits because of some cancelation
> point. At least that is how it looks like:
> 
> Here a ktrace (main proc is 405055):
> 
>  25127/543740  thread   CALL 
> futex(0xb9f26aca00,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,2147483647,0,0)
>  25127/543740  thread   RET   futex 0
>  25127/543740  thread   CALL
> futex(0xb9f26aca00,0x81<FUTEX_WAIT|FUTEX_PRIVATE_FLAG>,1,0,0)
>  25127/405055  thread   RET   nanosleep 0
>  25127/405055  thread   CALL  
> mmap(0,0x82000,0x3<PROT_READ|PROT_WRITE>,0x5002<MAP_PRIVATE|MAP_ANON|MAP_STACK>,-1,0)
>  25127/405055  thread   RET   mmap 795249991680/0xb92897e000
>  25127/405055  thread   CALL  mprotect(0xb92897e000,0x2000,0<PROT_NONE>)
>  25127/405055  thread   RET   mprotect 0
>  25127/405055  thread   CALL  __tfork(0xffffffcd56f172c0,24)
>  25127/405055  thread   STRU  struct __tfork { tcb=0xb93e026e00, 
> tid=0xb93e026e28, stack=0xb9289feda0 }
>  25127/405055  thread   RET   __tfork 119120/0x1d150
>  25127/119120  thread   RET   __tfork 0
>  25127/405055  thread   CALL  nanosleep(0xffffffcd56f173b0,0)
>  25127/405055  thread   STRU  struct timespec { 0.001000000 }
>  25127/119120  thread   CALL  
> futex(0xb9f26facd0,0x82<FUTEX_WAKE|FUTEX_PRIVATE_FLAG>,2147483647,0,0)
>  25127/119120  thread   RET   futex 0
>  25127/119120  thread   CALL  
> futex(0xb9f26facd0,0x81<FUTEX_WAIT|FUTEX_PRIVATE_FLAG>,1,0,0)
>  25127/405055  thread   RET   nanosleep 0
>  25127/405055  thread   CALL  kbind(0xffffffcd56f16dc8,48,0x9b2ed0bc88166e83)
>  25127/405055  thread   RET   kbind 0
>  25127/405055  thread   CALL  futex(0xba0cf25928,0x2<FUTEX_WAKE>,1,0,0)
>  25127/405055  thread   RET   futex 0
>  25127/405055  thread   CALL  __threxit(0xb940a04b68)
> 
> In gdb the _initial_thread has indeed quit and if I read the output
> correctly via a cancelation point:
> (gdb) print _initial_thread 
> $1 = {donesem = {lock = 0 '\000', waitcount = 0, value = 1, shared = 0}, 
>   flags = 1, flags_lock = 0 '\000', tib = 0xb940a04b40, retval = 0x1, 
>   fn = 0x0, arg = 0x0, name = "Original thread", '\000' <repeats 16 times>, 
>   stack = 0x0, threads = {le_next = 0x0, le_prev = 0xb93e027a98}, waiting = {
>     tqe_next = 0x0, tqe_prev = 0x0}, blocking_cond = 0x0, attr = {
>     stack_addr = 0x0, stack_size = 524288, guard_size = 8192, 
>     detach_state = 0, contention_scope = 2, sched_policy = 2, sched_param = {
>       sched_priority = 0}, sched_inherit = 4}, local_storage = 0x0, 
>   cleanup_fns = 0x0, delayed_cancel = 0}
> (gdb) print _initial_thread.tib
> $2 = (tib *) 0xb940a04b40
> (gdb) print *_initial_thread.tib
> $3 = {tib_dtv = 0x0, tib_thread = 0xba0cf25920 <_initial_thread>, 
>   tib_locale = 0x0, tib_errno = 0, tib_canceled = 0, tib_cancel_point = 1, 
>   tib_cantcancel = 2, tib_tid = 0, tib_thread_flags = 2, tib_atexit = 0x0}
> 
> Now I have no clue about cancelation points so no idea why it happens.
> But because of this the main thread exits before calling
> complete_threads() and therefor the pthread_join is never called.
> 
> Not sure how to debug this further.


Played a bit more around. Here is what I found so far.
So what happens is that in nanosleep() the code goes off the rails in
LEAVE_CANCEL_POINT(1);

        ENTER_CANCEL_POINT(1);
        ret = HIDDEN(nanosleep)(timeout, remainder);
        LEAVE_CANCEL_POINT(1);
        return (ret);

This code is:

        {
                struct tib *_tib = TIB_GET();
                int _cantcancel = (_tib)->tib_cantcancel;
                if (_cantcancel == 0) {
                        ...
                }
                ret = HIDDEN(nanosleep)(timeout, remainder);
                if (_cantcancel == 0) {
                        (_tib)->tib_cancel_point = 0;
                        if (1) {
                                __asm volatile("":::"memory");
                                if (__predict_false((_tib)->tib_canceled))
                                        _thread_canceled();
                        }
                }
        }
        return (ret);

After the nanosleep() call the _tib register value is altered.
The assembly uses %l0 to store _tib but on return from the syscall this
value is sometimes pointing to the value of &thread->t_pthread_id from the
previous pthread_create() call. Now because of this tib_canceled is not 0
and _thread_canceled() is called. I managed to create this condition
numerous times and the value is always &thread->t_pthread_id of the last
thread created.

I tried to figure out where %l0 could be set to &thread->t_pthread_id.
Now in create_thread() the value is not stored in %l0 (IIRC it just uses
%o0 and does not touch any local register for this value).
In pthread_create() the value is the moved to the stack and later loaded
into %l2 (IIRC). No other code uses this value. So there is no simple way
for %l0 to be set to this value.

Now my theory is that somehow a window was not properly spilled or filled
and so on return from nanosleep the kernel restores %l0 with a bad value.
Now this does not happen all the time, so most probably it requires some
extra condition (extra traps).

A lot of this points towards a register window error in__tfork() which
affects only the parent process.

-- 
:wq Claudio

Re: Sparc64 rthreads Instablilty

Reply via email to