We're seeing deadlocks in our Subversion multithreaded server when two distinct processes try to fcntl(F_SETLKW) on two fsfs repositories' db/txn-current-lock, when the processes begin transactions in reverse order.

Process 1                               Process 2
---------                               ---------
thread 1: begin txn in repos A          thread 1: being txn in repos B
thread 2: begin txn in repos B          thread 2: begin txn in repos A

During normal working hours, we get over 1 commit per second, peaking at 6, which is why we're seeing this.

Questions:

Should a fix for this be put in libsvn_fs_fs() or should I do this in my application? I'm thinking putting this in libsvn_fs_fs() is an appropriate fix, even though other people probably won't see it.

I'm also thinking the code should retry a maximum of 100 times with a 1ms sleep, doubling each sleep upon failure to a maximum 128 ms, such as WIN32_RETRY_LOOP.

Comments?

Blair

Reply via email to