I have opened issue #3696,
<http://subversion.tigris.org/issues/show_bug.cgi?id=3696>, to track
this.

- Julian


On Fri, 2010-07-30 at 16:17 +0100, Julian Foad wrote:
> On Tue, 2010-06-22 at 13:57 +0100, Julian Foad wrote:
> > I am investigating the following DAV + FSFS server error message which
> > we at WANdisco have seen rarely, but in at least two real installations,
> > and am sharing it here in case anyone can shed light on it.  It happens
> > during a DAV 'MERGE':
> > 
> > Can't open file '[...]/db/transactions/props': No such file or directory
> > 
> > FSFS is trying to open an invalid path; it should be
> > '[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
> > how this must happen at a low level: the txn-id path component will be
> > omitted if and only if something passes a NULL transaction-id string to
> > fs_fs.c:get_txn_proplist().
> > 
> > But why and from where?
> 
> A status update:
> 
> I've traced the calls to get_txn_proplist(), a few call levels up, and I
> believe one of these must have supplied txn_id = NULL.
> 
> Callees of get_txn_proplist(), with txn-id parameter identified by name
> and by param-list position:
> 
> get_txn_proplist(,,txn_id)
>   svn_fs_fs__change_txn_props(txn->id)  # DOESN'T CAUSE THE ERROR
>     txn_vtable.change_props(txn->id)
>     svn_fs_fs__change_txn_prop(txn->id)
>     commit_body()
>     svn_fs_fs__begin_txn(using svn_fs_fs__create_txn())
>     svn_fs_fs__begin_obliteration_txn(using ...)
>   svn_fs_fs__get_txn(txn_id)            # CAN CAUSE THE ERROR
>     svn_fs_fs__open_txn(,,name)
>       svn_fs_t.vtable.open_txn(,,name)
>         svn_fs_open_txn(,,name)
>           dav_svn__abort_txn(,txn_name)
>           prep_working()  # NO, txn_name != NULL
>           prep_private(comb->priv.root.txn_name)
>             DAV_RESOURCE_TYPE_PRIVATE
>           is_our_resource()  # NO, txn name != NULL
>           open_txn(,,txn_name)
>             dav_svn__checkout()
>             merge()
>               dav_svn__hooks_vsn.merge
>       fs_merge(,,,target_root->txn)
>         svn_fs_root_t.vtable.merge(,,,target_root->txn)
>           svn_fs_merge(,,target_root->txn)
>             # no callers?
>     svn_fs_fs__get_txn_ids(,,,txn_name)
>       svn_fs_fs__dag_txn_root(,,txn_id)
>         root_node<-open_path<-get_dag(,root->txn)
>       svn_fs_fs__dag_txn_base_root(,,txn_id)
>         merge_changes(,,txn->id)
>           svn_fs_fs__commit_txn(,,txn->id)
>             svn_fs_txn_t.vtable.commit(,,txn->id)
>               svn_fs_commit_txn(,,txn->id)
>           fs_merge()
>       svn_fs_fs__dag_clone_root(,,txn_id)
>         mutable_root_node(,root->txn)
>           make_path_mutable(root->txn)
>   commit_body(baton->txn->id)           # DOESN'T CAUSE THE ERROR
>     svn_fs_fs__commit(,,txn->id)
>       svn_fs_fs__commit_txn(,,txn->id)
>   svn_fs_fs__txn_proplist(,txn->id)     # CAN CAUSE THE ERROR
>     svn_fs_txn_t.vtable.get_proplist(,txn->id)
>       svn_fs_txn_proplist(,,txn->id)
>     commit_body()
>     svn_fs_fs__txn_prop(,txn->id)
>       svn_fs_txn_t.vtable.get_prop(,txn->id)
>         svn_fs_txn_prop(,txn->id)
>     svn_fs_fs__txn_root(,txn->id)
>       svn_fs_txn_t.vtable.root(,txn->id)
>         svn_fs_txn_root(,txn->id)
> 
> The ones annotated "CAN CAUSE THE ERROR" did result in the "Can't open
> 'db/transactions/props'" error when I forced txn_id=NULL at these places
> in a trivial test scenario, and the ones marked "DOESN'T", didn't.
> 
> >From the fact that a DAV MERGE was the operation under way, I would
> hazard a guess that dav_svn__hooks_vsn.merge() might be the call that's
> going wrong.  I haven't tried to trace that back to its caller (in
> mod_dav perhaps?).
> 
> I don't know if this line of attack has any mileage left in it.  I'm
> thinking of leaving it at that for now, unless anyone comes up with
> suggestions for how to proceed.  I don't feel capable of setting up a
> test system that would replicate the sequence of DAV messages that
> resulted in the error condition.
> 
> (The issue of the current assertion leaving a DOS attack possible is
> still in my sights - email thread "Re: svn commit: r980046".)
> 
> - Julian
> 
> 
> > The error might be triggered by the WANdisco software sending commands
> > in the wrong sequence, or something like that, and so to that extent it
> > may not be interesting to the general community.  In one case, detailed
> > below, it happened after a DELETE was sent, more than an hour into a
> > very long commit.  In another case, it happened after a start-commit
> > hook failed.  The server might be receiving a DAV MERGE request when it
> > isn't expecting one.  Can anyone comment on that, given this limited
> > information?
> > 
> > Here are some Apache access log entries from one occurrence, with paths
> > anonymized and the client Id string (always "SVN/1.6.6 (r40053)
> > neon/0.28.3") omitted for brevity:
> > [[[
> > 127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 
> > 207 582 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
> > "MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
> > 127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
> > "DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
> > "MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 
> > 207 569 "-"
> > 127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
> > "MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
> > ]]]
> > 
> > and the corresponding error log:
> > [[[
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into 
> > "/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > There was a problem opening the transaction specified by this activity. 
> > [500, #2]
> > [Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
> > Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or 
> > directory [500, #2]
> > ]]]
> > 
> > The error appears to match the second-last MERGE in the access log.  I
> > don't have a copy of the body of the MERGE request in this case.
> > 
> > The error refers to an activity Id that was the subject of the DELETE
> > that was logged 18 minutes earlier.  A comment from one of our guys
> > looking at the logs: "We are seeing 22,600 changes for Activity A and no
> > merge at the end (just a DELETE). Then we see about 931 commands for
> > Activity B and a MERGE that fails for Activity A."
> > 
> > Is this a case where the client side shouldn't be sending a MERGE at
> > this point?  Even if that is the primary problem, I'm still interested
> > in Subversion's response to this MERGE request.
> > 
> > I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
> > the attached patch.  I determined the comments in the attached patch
> > after manually passing NULL instead of the txn-id.  However, that won't
> > get me far on its own.
> > 
> > To try to reproduce the error, do you think I should try sending a
> > sequence of DAV commands similar to the sequence shown above?
> > 
> > Any other ideas on what might be happening or how to diagnose this?
> > 
> > - Julian
> > 
> 
> 


Reply via email to