I am investigating the following DAV + FSFS server error message which
we at WANdisco have seen rarely, but in at least two real installations,
and am sharing it here in case anyone can shed light on it.  It happens
during a DAV 'MERGE':

Can't open file '[...]/db/transactions/props': No such file or directory

FSFS is trying to open an invalid path; it should be
'[...]/db/transactions/<txn-id>/props'.  From the source code, I can see
how this must happen at a low level: the txn-id path component will be
omitted if and only if something passes a NULL transaction-id string to
fs_fs.c:get_txn_proplist().

But why and from where?

The error might be triggered by the WANdisco software sending commands
in the wrong sequence, or something like that, and so to that extent it
may not be interesting to the general community.  In one case, detailed
below, it happened after a DELETE was sent, more than an hour into a
very long commit.  In another case, it happened after a start-commit
hook failed.  The server might be receiving a DAV MERGE request when it
isn't expecting one.  Can anyone comment on that, given this limited
information?

Here are some Apache access log entries from one occurrence, with paths
anonymized and the client Id string (always "SVN/1.6.6 (r40053)
neon/0.28.3") omitted for brevity:
[[[
127.0.0.1 - UserK [16/Dec/2009:20:25:17 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [16/Dec/2009:23:20:45 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityA/branches/BranchD/PathG HTTP/1.1" 207 
582 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:45 -0800]
"MERGE /svn/RepoJ/branches/BranchH/PathC HTTP/1.1" 200 953 "-"
127.0.0.1 - UserM [17/Dec/2009:00:12:59 -0800]
"MERGE /svn/RepoJ/branches/BranchP/PathC HTTP/1.1" 200 965 "-"
127.0.0.1 - UserK [17/Dec/2009:00:21:19 -0800]
"DELETE /svn/RepoJ/!svn/act/ActivityA HTTP/1.1" 204 - "-"
127.0.0.1 - UserK [17/Dec/2009:00:32:45 -0800]
"MKACTIVITY /svn/RepoJ/!svn/act/ActivityB HTTP/1.1" 201 312 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:58 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"PROPPATCH /svn/RepoJ/!svn/wrk/ActivityB/branches/BranchD/PathE HTTP/1.1" 207 
569 "-"
127.0.0.1 - UserK [17/Dec/2009:00:39:59 -0800]
"MERGE /svn/RepoJ/branches/BranchD/PathJ HTTP/1.1" 500 279 "-"
]]]

and the corresponding error log:
[[[
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Could not MERGE resource "/svn/RepoJ/!svn/act/ActivityA" into 
"/svn/RepoJ/branches/BranchD/PathJ". [500, #0]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
There was a problem opening the transaction specified by this activity. [500, 
#2]
[Thu Dec 17 00:39:58 2009] [error] [client 127.0.0.1]
Can't open file '/svn/repos/RepoJ/db/transactions/props': No such file or 
directory [500, #2]
]]]

The error appears to match the second-last MERGE in the access log.  I
don't have a copy of the body of the MERGE request in this case.

The error refers to an activity Id that was the subject of the DELETE
that was logged 18 minutes earlier.  A comment from one of our guys
looking at the logs: "We are seeing 22,600 changes for Activity A and no
merge at the end (just a DELETE). Then we see about 931 commands for
Activity B and a MERGE that fails for Activity A."

Is this a case where the client side shouldn't be sending a MERGE at
this point?  Even if that is the primary problem, I'm still interested
in Subversion's response to this MERGE request.

I want to assert(txn-id != NULL) in the relevant FSFS functions, as in
the attached patch.  I determined the comments in the attached patch
after manually passing NULL instead of the txn-id.  However, that won't
get me far on its own.

To try to reproduce the error, do you think I should try sending a
sequence of DAV commands similar to the sequence shown above?

Any other ideas on what might be happening or how to diagnose this?

- Julian

Add assertions and comments in FSFS where the txn-id must be non-null.
Errors have occasionally been seen that indicate it must have been null in
at least one of these places.

This is a patch in progress, to help with debugging, not necessarily to be
committed.

* subversion/libsvn_fs_fs/fs_fs.c
  (path_txn_dir, get_txn_proplist, svn_fs_fs__change_txn_props,
   svn_fs_fs__get_txn, commit_body, svn_fs_fs__txn_proplist): Add assertions
    and comments.
--This line, and those below, will be ignored--

Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c	(revision 956813)
+++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
@@ -317,6 +317,7 @@ path_revprops(svn_fs_t *fs, svn_revnum_t
 static APR_INLINE const char *
 path_txn_dir(svn_fs_t *fs, const char *txn_id, apr_pool_t *pool)
 {
+  SVN_ERR_ASSERT(txn_id != NULL);
   return svn_dirent_join_many(pool, fs->path, PATH_TXNS_DIR,
                               apr_pstrcat(pool, txn_id, PATH_EXT_TXN, NULL),
                               NULL);
@@ -4584,6 +4585,12 @@ get_txn_proplist(apr_hash_t *proplist,
 {
   svn_stream_t *stream;
 
+  /* The following error has been seen several times in real life:
+   * "Can't open file '[...]/db/transactions/props': No such file or directory"
+   * The correct path would be: '[...]/db/transactions/<txn-id>/props'.
+   * The only way that can happen is if txn_id is NULL here. */
+  SVN_ERR_ASSERT(txn_id != NULL);
+
   /* Open the transaction properties file. */
   SVN_ERR(svn_stream_open_readonly(&stream, path_txn_props(fs, txn_id, pool),
                                    pool, pool));
@@ -4622,6 +4629,8 @@ svn_fs_fs__change_txn_props(svn_fs_txn_t
   int i;
   svn_error_t *err;
 
+  /* ### Passing txn-id NULL here -> incorrect results but not an error msg. */
+  SVN_ERR_ASSERT(txn->id != NULL);
   err = get_txn_proplist(txn_prop, txn->fs, txn->id, pool);
   /* Here - and here only - we need to deal with the possibility that the
      transaction property file doesn't yet exist.  The rest of the
@@ -4669,6 +4678,8 @@ svn_fs_fs__get_txn(transaction_t **txn_p
   txn = apr_pcalloc(pool, sizeof(*txn));
   txn->proplist = apr_hash_make(pool);
 
+  /* ### Passing txn-id NULL here -> "Can't open 'transactions/props'" */
+  SVN_ERR_ASSERT(txn_id != NULL);
   SVN_ERR(get_txn_proplist(txn->proplist, fs, txn_id, pool));
   root_id = svn_fs_fs__id_txn_create("0", "0", txn_id, pool);
 
@@ -6040,6 +6051,8 @@ commit_body(void *baton, apr_pool_t *poo
       apr_hash_t *proplist = apr_hash_make(pool);
       svn_sqlite__stmt_t *stmt;
 
+      /* ### Passing txn-id NULL here -> ??? (no effect seen yet) */
+      SVN_ERR_ASSERT(cb->txn->id != NULL);
       SVN_ERR(get_txn_proplist(proplist, cb->fs, cb->txn->id, pool));
 
       SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->revprop_db,
@@ -7111,6 +7124,9 @@ svn_fs_fs__txn_proplist(apr_hash_t **tab
                         apr_pool_t *pool)
 {
   apr_hash_t *proplist = apr_hash_make(pool);
+
+  /* ### Passing txn-id NULL here -> "Can't open 'transactions/props'" */
+  SVN_ERR_ASSERT(txn->id != NULL);
   SVN_ERR(get_txn_proplist(proplist, txn->fs, txn->id, pool));
   *table_p = proplist;
 

Reply via email to