Ping? Just wondering if anybody else has seen this problem, or looked at Julian's patch.
-Hyrum On Tue, Jun 22, 2010 at 1:57 PM, Julian Foad <julian.f...@wandisco.com> wrote: > I am investigating the following DAV + FSFS server error message which > we at WANdisco have seen rarely, but in at least two real installations, > and am sharing it here in case anyone can shed light on it. It happens > during a DAV 'MERGE': > > Can't open file '[...]/db/transactions/props': No such file or directory > > FSFS is trying to open an invalid path; it should be > '[...]/db/transactions/<txn-id>/props'. From the source code, I can see > how this must happen at a low level: the txn-id path component will be > omitted if and only if something passes a NULL transaction-id string to > fs_fs.c:get_txn_proplist(). > > But why and from where? > > The error might be triggered by the WANdisco software sending commands > in the wrong sequence, or something like that, and so to that extent it > may not be interesting to the general community. In one case, detailed > below, it happened after a DELETE was sent, more than an hour into a > very long commit. In another case, it happened after a start-commit > hook failed. The server might be receiving a DAV MERGE request when it > isn't expecting one. Can anyone comment on that, given this limited > information? > > Here are some Apache access log entries from one occurrence, with paths > anonymized and the client Id string (always "SVN/1.6.6 (r40053) > neon/0.28.3") omitted for brevity: > [[[ > 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800] > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-" > 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800] > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 > 582 "-" > 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800] > "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-" > 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800] > "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-" > 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800] > "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-" > 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800] > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-" > 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800] > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-" > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800] > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 > 569 "-" > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800] > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-" > ]]] > > and the corresponding error log: > [[[ > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1] > Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into > "/svn/RepoJ/branches/BranchD/PathJ". [500, #0] > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1] > There was a problem opening the transaction specified by this activity. [500, > #2] > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1] > Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or > directory [500, #2] > ]]] > > The error appears to match the second-last MERGE in the access log. I > don't have a copy of the body of the MERGE request in this case. > > The error refers to an activity Id that was the subject of the DELETE > that was logged 18 minutes earlier. A comment from one of our guys > looking at the logs: "We are seeing 22,600 changes for Activity A and no > merge at the end (just a DELETE). Then we see about 931 commands for > Activity B and a MERGE that fails for Activity A." > > Is this a case where the client side shouldn't be sending a MERGE at > this point? Even if that is the primary problem, I'm still interested > in Subversion's response to this MERGE request. > > I want to assert(txn-id != NULL) in the relevant FSFS functions, as in > the attached patch. I determined the comments in the attached patch > after manually passing NULL instead of the txn-id. However, that won't > get me far on its own. > > To try to reproduce the error, do you think I should try sending a > sequence of DAV commands similar to the sequence shown above? > > Any other ideas on what might be happening or how to diagnose this? > > - Julian > >