We're seeing deadlocks in our Subversion multithreaded server when two distinct
processes try to fcntl(F_SETLKW) on two fsfs repositories' db/txn-current-lock,
when the processes begin transactions in reverse order.
Process 1 Process 2
--------- ---------
thread 1: begin txn in repos A thread 1: being txn in repos B
thread 2: begin txn in repos B thread 2: begin txn in repos A
During normal working hours, we get over 1 commit per second, peaking at 6,
which is why we're seeing this.
Questions:
Should a fix for this be put in libsvn_fs_fs() or should I do this in my
application? I'm thinking putting this in libsvn_fs_fs() is an appropriate fix,
even though other people probably won't see it.
I'm also thinking the code should retry a maximum of 100 times with a 1ms sleep,
doubling each sleep upon failure to a maximum 128 ms, such as WIN32_RETRY_LOOP.
Comments?
Blair