First of all thanks for asking these questions, my effort to answer them gave me even more confidence in the changes.
On Wed, Nov 21, 2012 at 1:36 AM, Bert Huijben <b...@qqmail.nl> wrote: > What would be the effect of *not* patching the client? When I first wrote the patch and tried to apply it to 1.6 I was getting SIGABRT's from some tests that exercised the error handling code that I later patched in commit_util.c. For example basic_tests.py 8: basic corruption detection. In producing the response to this email I removed the client library change since I'd forgotten exactly which test was failing, expecting it to fail and refresh my memory. I was surprised to find it did not fail. The command I used to remove the client side changes was: svn merge -c-1404736 ^/subversion/branches/1.6.x-rep_write_cleanup@1404736 I went back to the original patch I'd been using before I'd developed the client changes and was once again able to see the problem. When I produced the failure under gdb I saw that the unlock call in rep_write_cleanup was returning an error. This surprised me, since I didn't really expect there to be a problem unlocking the transaction ever. What was happening is that the iterpool was not cleaned up so that the rep_write_cleanup was not called until after the client lib called the abort_edit function on the editor it was driving, which calls svn_fs_abort_txn and ultimately deletes the transaction. So I resolved the problem by fixing the fact that we weren't cleaning up our iterpool, which is something that was resolved in 1.7 when the wc code was rewritten. The error stopped being thrown and my SIGABRT went away. However, the real reason I was getting SIGABRT was that I was passing err->apr_err without copying it. Ultimately, the error gets passed up the chain in APR until ultimately it gets passed outside of the pool destruction call that free'd the memory it was in. Daniel's review caught this error and I resolved it before the 1.6 patch was made. However, I still thought the client change was necessary so I still included it. So now to answer your question... The effect of not including this change now is leaving unnecessary data in memory in the case of an error while committing. If you're calling svn_client_commit from a long lived pool and you're hitting those error paths you have a memory leak. My original purpose for adding the change is essentially moot. Since the client always abort_edit if it got an error during the edit drive which will remove the transaction entry from memory (including the being_written) flag and the protorev file from disk. > What impact does this have on other users of the fs/repos apis? None. Other users delete the entire transaction if something fails along the way of building it. mod_dav_svn is unique in that the communications of individual pieces of the commit can be retried (though we don't do that with our client implementation). So with the new code either they destroy/cleanup the pool triggering the new rep_write_cleanup() call which wipes out the failed representation and removes the lock on the protorevfile. Or they do not and end up destroying the transaction themselves (probably with an abort_edit call) and then the rep_write_cleanup will return an error which goes nowhere since the error occurs in the cleanup/destroy call of a pool which returns a void. > In other words: Is this a breaking change that we shouldn't port back to 1.6 > clients and try to patch server only? No I don't think this change breaks anything for anyone. We could remove the client change now as it really doesn't have any impact. But it's a valid change for other reasons as mentioned above, so we might as well leave it. Even without the client library change you're still touching the client. The solution I'm using here is a fix to the fsfs layer. Which when using ra_local the client is the server. Moving the fix up to mod_dav_svn is much harder. Ultimately the problem here is that the fs is receiving the representation of a node via a window handler. The window handler only has two things you can tell it. 1) Here is some data. 2) There is no more data. As such there is no way for mod_dav_svn to communicate to the fs layer that the representation I sent you is incomplete. Additionally, the whole thing is driven through the svndiff parsing code, which has an option (error_on_early_close) to say error if the svndiff looks incomplete. If this option is true then the window handler is never called with NULL for the window (we've reached the end of the data). Setting this flag to FALSE would make that call happen but then fsfs would have no idea that we feed it incomplete data, thus adding an invalid representation to the protorev file. Long term the fix is probably refactoring some public interfaces. But that's not a very attractive option for backport. Nor am I in a huge hurry to do it since I think making these changes is going to cascade out into a lot of other code.