On Wed, 16 Sep 2015, Zhu Jefferry wrote: > The application is a multi-thread program, to use the pairs of mutex_lock and > mutex_unlock to protect the shared data structure. The type of this mutex > is PTHREAD_MUTEX_PI_RECURSIVE_NP. After running long time, to say several > days, > the mutex_lock data structure in user space looks like corrupt. > > thread 0 can do mutex_lock/unlock > __lock = this thread | FUTEX_WAITERS > __owner = 0, should be this thread
The kernel does not know about __owner. > __counter keep increasing, although there is no recursive mutex_lock call. > > thread 1 will be stuck > > The primary debugging shows the content of __lock is wrong in first. After a > call of > Mutex_unlock, the value of __lock should not be this thread self. But we > observed > The value of __lock is still self after unlock. So, other threads will be > stuck, How did you observe that? > This thread could lock due to recursive type and __counter keep increasing, > although mutex_unlock return fails, due to the wrong value of __owner, > but the application did not check the return value. So the thread 0 looks > like fine. But thread 1 will be stuck forever. Oh well. So thread 0 looks all fine, despite not checking return values. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/