On 09.06.2015 11:36, Michael Stahl wrote: > On 09.06.2015 10:02, Noel Grandin wrote: >> On 2015-06-09 09:58 AM, Stephan Bergmann wrote: >>> On Windows, with master as of last night, "make check" happened to run into >>> a deadlock in soffice.bin as below. The >>> main thread is trying to re-acquire the SolarMutex (after a >>> SolarMutexReleaser) from within the event loop, while an >>> incoming URP thread (apparently holding the SolarMutex) does >>> vcl::Window::dispose which, on Windows, blocks on sending a >>> message into the event loop. > > this started happening for me in sc_unoapi on a --enable-dbgutil build > almost always since half a year or so; before that it was less likely > for whatever reason but still did happen. (the deadlock moved from > ~Window to Window::dispose due to VclPtr refactoring.) > > the deadlock does not appear easily fixable; the last slide of my > threading talk last year was about it actually.
now something i read in the MSDN documentation of GetMessage gave me an idea, and it turns out that... i think i understand one quite surprising aspect of the Win32 SAL implementation: the SolarMutex is not *really* a mutex, or rather, it is not implemented purely as a mutex on this platform. non-main threads block on a osl::Mutex mpSalYieldMutex to acquire the SolarMutex, but the main thread does a very peculiar dance in ImplSalYieldMutexAcquireWithWait(): > do > { > if ( pInst->mpSalYieldMutex->tryToAcquire() ) > bAcquire = TRUE; > else > { > pInst->mpSalWaitMutex->acquire(); > if ( pInst->mpSalYieldMutex->tryToAcquire() ) > { > bAcquire = TRUE; > pInst->mpSalWaitMutex->release(); > } > else > { > pInst->mnYieldWaitCount++; > pInst->mpSalWaitMutex->release(); > MSG aTmpMsg; > GetMessageW( &aTmpMsg, pInst->mhComWnd, > SAL_MSG_RELEASEWAITYIELD, SAL_MSG_RELEASEWAITYIELD ); > pInst->mnYieldWaitCount--; > if ( pInst->mnYieldWaitCount ) > PostMessageW( pInst->mhComWnd, SAL_MSG_RELEASEWAITYIELD, > 0, 0 ); > } > } > } > while ( !bAcquire ); if the mpSalYieldMutex is locked by another thread, the main thread does not block on the mpSalYieldMutex, but instead blocks in Win32 GetMessage(). the reason why it does that, is that GetMessage() will not only get you the message with the type you pass in as parameter; in fact the first thing GetMessage() does is to look if another thread has a pending SendMessage() to a window on the current thread, and if so immediately dispatch that message. this means that the main thread can in fact handle the SendMessage() calls from other threads to create/destroy Windows and DCs without deadlocking, if it waits in ImplSalYieldMutexAcquireWithWait(). to prevent deadlocks on Windows, as soon as other threads are running, the main thread *must* acquire SolarMutex only via ImplSalYieldMutexAcquireWithWait(). unfortunately the Application::AcquireSolarMutex() does not do this but blocks on locking the mpSalYieldMutex. this is used by SolarMutexReleaser, and thus SolarMutexReleaser causes deadlocks on Windows (whereas all the entry points from Win32 WndProcs already do the right thing). my hope is that the main thread only first-acquires the SolarMutex via VCL internal code or via SolarMutexReleaser, and not with the ordinary SolarMutexGuard that is used in UNO entry points. commit 482c52e91fe41a52e68827e9bf64a9736427d517 should fix this deadlock scenario on master, and thus there are no known unfixable deadlock issues that would prevent running "make check" on Windows tinderboxes. _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice