Hi,
we seem to be raising and discussing a few different issues in this thread. I've tried to summarize current status with some added observations. I propose to start up separate mail threads when discussing one of these specific issues. 1. When a very low Timeout is set in the apache server configuration, the server truncates response bodies in some situations: 1a. I see this happening with responses to GET requests, but it's not clear yet why this happens, and if we need to worry about it, as a 3 second timeout where the default is 300 secs is really low. Philip has reported this happening with a REPORT response (http://svn.haxx.se/dev/archive-2012-11/0097.shtml). 1b. If the server aborts a connection, ra_serf doesn't interpret this as an error. This results in many different types of error messages, like invalid checksums etc. A check has been added to serf trunk r1678 that will raise an error if the response body is shorter than expected. 2. ra_serf consumes way too much memory while doing large checkout's (issue #4196). This is caused (at least partially) by ra_serf not closing empty directories until the very end of the editor run, thereby keeping info objects in memory. Mike & Justin have tried to fix this in r1406366. In my opinion, the '! dir->ref_count in close_all_dirs' asserts that Philip keeps seeing (but I can't reproduce) are related to this specific issue, and have not been fixed by r1406366. 3. Ivan added a potential issue because ra_serf+serf are not handling events on each open connection regularly. This can result in timeouts, where the server closes a connection because it thinks the client isn't responding anymore. If this happens on one of the connections where the GET/PROPFIND requests are sent, I don't think this will break the update/checkout: the server will close the connection and serf will resend the requests on a new connection. However, if such a timeout occurs on the connection where ra_serf is reading a large REPORT response, the response will be truncated and svn will stop with an error. Two observations: 3a. When processing connections, serf will read as much data from one connection as the network allows it to. Only when it can't read more from the connection, it will go on with the next connection. Ivan has provided a patch to limit the amount of responses read per serf_context_run call per connection to 4. (http://svn.haxx.se/dev/archive-2012-11/0193.shtml) 3b. The same observation as 3a, but then applied to the connection where we receive the REPORT response: ra_serf will keep reading as much data as possible, and create new GET/PROPFIND requests as a result, without giving serf the time to send these new requests on other connections. I don't think this can result in svn stopping, but might result in stalls as we don't get enough requests out to fetch files. 4. Ivan further observed that the spillbuf mechanism, used to keep ra_serf reading from the REPORT connection while waiting for slow disc i/o, has two drawbacks: 4a. the amount of data stored in the spillbuf can get very large when processing the REPORT response is paused. When resuming, all data in the spillbuf is read and processed in one go. As this can be 1MB+ it can take a while, potentially timing out the REPORT response connection -> svn will stop. 4b. The GET and PROPFIND requests resulting from parsing that big chunk of data will all be sent on the same connection, where we have 4 available to send more requests in parallel. 5. In one of Philip error-reporting mails, there was mention of a "408 Request Time-out" response. (http://svn.haxx.se/dev/archive-2012-11/0076.shtml). No idea if the server sends this response because one of the above issues, or if this has a different root caus. What I do know is that serf does not expect this response as it's not related to an outstanding request. In that case, serf will return and APR_GENERAL error. I think 3+4+5 are relatively easy to solve, but probably difficult to reproduce to validate the implemented solutions. 1a and 2 require more analysis, I'll try to focus on these two. Lieven On Thu, Nov 1, 2012 at 1:29 PM, Philip Martin <phi...@codematters.co.uk> wrote: > I've been experimenting checking out subversion/tags from my local > mirror of the Subversion repository using a client and mod_dav_svn from > trunk. I'm getting several different errors seemingly at random, I > suspect they may be related to some common underlying problem. > > Sometimes I see a checksum error such as this: > > ../src/subversion/svn/checkout-cmd.c:168: (apr_err=200014) > ../src/subversion/libsvn_client/checkout.c:179: (apr_err=200014) > ../src/subversion/libsvn_client/update.c:579: (apr_err=200014) > ../src/subversion/libsvn_client/update.c:440: (apr_err=200014) > ../src/subversion/libsvn_wc/adm_crawler.c:858: (apr_err=200014) > ../src/subversion/libsvn_ra_serf/update.c:2515: (apr_err=200014) > ../src/subversion/libsvn_ra_serf/util.c:2023: (apr_err=200014) > ../src/subversion/libsvn_ra_serf/util.c:2004: (apr_err=200014) > ../src/subversion/libsvn_ra_serf/update.c:1065: (apr_err=200014) > ../src/subversion/libsvn_ra_serf/update.c:895: (apr_err=200014) > ../src/subversion/libsvn_wc/update_editor.c:3919: (apr_err=200014) > ../src/subversion/libsvn_subr/checksum.c:469: (apr_err=200014) > svn: E200014: Checksum mismatch for > '/home/pm/sw/subversion/obj/wc/0.25.0/subversion/libsvn_fs/tree.c': > expected: c38c3e8df15f36a45b8a626e4d82a25d > actual: 2532722e781ef9afe673c89fd467d5f0 > > The expected checksum is the checksum in the repository. It's a > different file each time this error occurs. > > Another error is an assert: > > Program received signal SIGABRT, Aborted. > 0x00007ffff6530425 in __GI_raise (sig=<optimised out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 0x00007ffff6530425 in __GI_raise (sig=<optimised out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x00007ffff6533b8b in __GI_abort () at abort.c:91 > #2 0x00007ffff6d41b78 in svn_error_abort_on_malfunction (can_return=1, > file=0x7ffff5a97058 "../src/subversion/libsvn_ra_serf/update.c", line=673, > expr=0x7ffff5a970fe "! dir->ref_count") > at ../src/subversion/libsvn_subr/error.c:678 > #3 0x00007ffff6d41bcc in svn_error__malfunction (can_return=1, > file=0x7ffff5a97058 "../src/subversion/libsvn_ra_serf/update.c", line=673, > expr=0x7ffff5a970fe "! dir->ref_count") > at ../src/subversion/libsvn_subr/error.c:703 > #4 0x00007ffff5a85d50 in close_all_dirs (dir=0x7fffe885f0c8) > at ../src/subversion/libsvn_ra_serf/update.c:673 > #5 0x00007ffff5a85cda in close_all_dirs (dir=0x7fffe88d00c8) > at ../src/subversion/libsvn_ra_serf/update.c:669 > #6 0x00007ffff5a85cda in close_all_dirs (dir=0x7fffe89860c8) > at ../src/subversion/libsvn_ra_serf/update.c:669 > #7 0x00007ffff5a85cda in close_all_dirs (dir=0x7ffff7f2a0c8) > at ../src/subversion/libsvn_ra_serf/update.c:669 > #8 0x00007ffff5a8b01b in finish_report (report_baton=0x7ffff7fb8060, > pool=0x7ffff7fb5028) at ../src/subversion/libsvn_ra_serf/update.c:2665 > #9 0x00007ffff76945cc in svn_wc_crawl_revisions5 (wc_ctx=0x7ffff7f670e0, > local_abspath=0x7ffff7fb5168 "/home/pm/sw/subversion/obj/wc", > reporter=0x7ffff5ca07c0, report_baton=0x7ffff7fb8060, restore_files=1, > depth=svn_depth_unknown, honor_depth_exclude=1, > depth_compatibility_trick=0, use_commit_times=0, > cancel_func=0x416b1b <svn_cl__check_cancel>, cancel_baton=0x0, > notify_func=0x41c703 <notify>, notify_baton=0x7ffff7f673d0, > scratch_pool=0x7ffff7fb5028) > at ../src/subversion/libsvn_wc/adm_crawler.c:858 > > This appears to be in the end of the checkout code but the working copy > is far from complete. > > I have also seen this error: > > ../src/subversion/svn/checkout-cmd.c:168: (apr_err=175009) > ../src/subversion/libsvn_client/checkout.c:179: (apr_err=175009) > ../src/subversion/libsvn_client/update.c:579: (apr_err=175009) > ../src/subversion/libsvn_client/update.c:440: (apr_err=175009) > ../src/subversion/libsvn_wc/adm_crawler.c:858: (apr_err=175009) > ../src/subversion/libsvn_ra_serf/update.c:2515: (apr_err=175009) > ../src/subversion/libsvn_ra_serf/util.c:2028: (apr_err=175009) > ../src/subversion/libsvn_ra_serf/util.c:1850: (apr_err=175009) > svn: E175009: Premature EOF seen from server (http status=200) > ../src/subversion/libsvn_ra_serf/util.c:1850: (apr_err=70014) > svn: E070014: End of file found > > All of these look like some problem with the data transfer, serf losing > data or mod_dav_svn not sending it, but I suppose they could also be > caused by memory corruption. These errors generally occur when there > are thousands of files in the pristine store and tens of thousands of > files in the working copy. > > I've seen these error while checking out across my LAN and on localhost > so I don't think it can be network problem. I'm using serf 1.1.x@1674 > and subversion/mod_dav_svn from trunk (currently r1404529 but I was > seeing these errors a couple of weeks ago as well). I'm seeing these > errors on Debian/stable and Ubuntu/12.04. > > -- > Philip