Bugzilla Automation <bugzi...@freebsd.org> has asked freebsd-python (Nobody) <pyt...@freebsd.org> for maintainer-feedback: Bug 255445: lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255445
--- Description --- Seeing many TrueNAS (previously FreeNAS) users dump core on the main middlewared process (python) starting with our version 12.0 release. Relevant OS information: 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 f2858df162b(HEAD) TRUENAS amd64 Python versions that experience the core dump: Python 3.8.7 Python 3.9.4 When initially researching this, I did find a regression with threading and python 3.8 on freeBSD and was able to resolve that particular problem by backporting the commits: https://github.com/python/cpython/commit/4d96b4635aeff1b8ad41d41422ce808ce0b971 c8 and https://github.com/python/cpython/commit/9ad58acbe8b90b4d0f2d2e139e38bb5aa32b7f b6. The reason why I backported those commits is because all of the core dumps that I've analyzed are panic'ing in the same spot (or very close to it). For example, here are 2 backtraces showing null-ptr dereference. Core was generated by `python3.8: middlewared'. Program terminated with signal SIGSEGV, Segmentation fault. #0 cond_signal_common (cond=<optimized out>) at /truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:457 warning: Source file is more recent than executable. 457 mp = td->mutex_obj; [Current thread is 1 (LWP 100733)] (gdb) list 452 _sleepq_unlock(cvp); 453 return (0); 454 } 455 456 td = _sleepq_first(sq); 457 mp = td->mutex_obj; 458 cvp->__has_user_waiters = _sleepq_remove(sq, td); 459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) { 460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) { 461 _thr_wake_all(curthread->defer_waiters, (gdb) p *td Cannot access memory at address 0x0 and another one Core was generated by `python3.8: middlewared'. Program terminated with signal SIGSEGV, Segmentation fault. #0 cond_signal_common (cond=<optimized out>) at /truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:459warning: Source file is more recent than executable. 459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) { [Current thread is 1 (LWP 101105)] (gdb) list 454 } 455 456 td = _sleepq_first(sq); 457 mp = td->mutex_obj; 458 cvp->__has_user_waiters = _sleepq_remove(sq, td); 459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) { 460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) { 461 _thr_wake_all(curthread->defer_waiters, 462 curthread->nwaiter_defer); 463 curthread->nwaiter_defer = 0; (gdb) p *mp Cannot access memory at address 0x0 I'm trying to instrument a program to "stress" test threading (tearing down and recreating etc etc) but I've been unsuccessful at tickling this particular problem. The end-users that have seen this core dump sometimes go 1month + without a problem. Hoping someone more knowledgeable can at least give me a pointer or help me figure this one out. I have access to my VM that has all the relevant core dumps available so if someone needs remote access to it to "poke" around, please let me know. You can reach me at caleb [at] ixsystems.com _______________________________________________ freebsd-python@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-python To unsubscribe, send any mail to "freebsd-python-unsubscr...@freebsd.org"