On 31.01.2017 10:09, Stefan wrote:
Hi,

I've been looking at the cause of a deadlock when running ra-test.exe
with -fs-type=fsx (trunk version).

The most important findings are summed up here atm [1].

The issue was discussed with brane and danielsh on IRC (thanks for your
time, once again).

As far as my current understanding of the problem goes: the deadlock is
caused by the fact that the apr_terminate() function registered in
svn_cmdline_init() via the atexit-call is called after the termination
of the threads which were created as part of the calls to
apr_thread_pool_push() in svn_fs_x__batch_fsync_run().

This means that apr's thread counter (thd_cnt) is getting out of sync
(since the apr-function thread_pool_func() is not executed) and then
gets stuck in thread_pool_cleanup() (waiting for the already terminated
threads to be terminated).

To me it looks like svnserve's main-function already contains a
safeguard against a corresponding issue, and calls
apr_thread_pool_destroy(threads) (or was this a completely different
scenario?). This however does not cover the threads created from
svn_fs_x__batch_fsync_run().

Talking to danielsh and brane it became apparent to me that the issue
might not be too obvious (in the end it might still be an issue on how I
build SVN and therefore cause the atexit-registered apr_terminate()
function to be called too late). It's also not fully clear to me at
which exact point (in regards to registerd atexit()-calls) threads of
the process are terminated if the process itself terminates. If indeed
atexit()-registered functions get called after the threads are forcibly
terminates (which to me it looks like it does atm) it might contradict
the C(89/99) standard - see[2] 7.20.4.2/7.20.4.3. On the other side this
thread on stackoverflow [3] suggests it's simply undefined (by the
standard) what comes first.

As danielsh suggested, I'm planning to come up with a plain minimal
repro app only based on APR demonstrating the problem, so to make it
more obvious (and double check for myself) what the issue is about.

Regards,
Stefan

[1] http://www.luke1410.de:8090/browse/MAXSVN-94
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
[3]
https://stackoverflow.com/questions/39655868/what-does-the-posix-standard-say-about-thread-stacks-in-atexit-handlers-what
Hi Stefan,

I had a look at the code and found a possibly related problem.
If you are using DLLs, this might have affected you.

It would be nice if you could try r1781657 and see whether it
makes any difference in your case.

-- Stefan^2.

Reply via email to