> Den lör 11 maj 2024 kl 03:00 skrev Williams, James P. {Jim} (JSC-CD4)[KBR > Wyle Services, LLC] via users <users@subversion.apache.org>: > You previously mentioned Subversion 1.14.1, is that on the server or on the > client?
I'm using 1.14.1 on both the client and server. > Still it would be interesting to compare just to rule out a problem within > the repository. You can use svnserve directly or tunneled over SSH, see the > Subversion book: With svnserve 1.14.1, I see no problems. Checkouts complete every time. I'm not sure what to conclude about that. It's a different protocol, so it doesn't necessarily exonerate the client or the network. > > #0 epoll_wait /usr/lib64/libc.so.6 > Waiting for a reply from the server ... ? Yeah, that'd be my guess. When the hang occurs, I've got about 90% of the working copy checked out. I expect the client is waiting for more bytes to arrive on the socket. > Do you see any activity on the server (CPU / disk) during this time? The server is well-behaved throughout all of my tests. It shows no CPU spike or log messages hinting that it's noticed a problem. > Memory allocation? Yeah, both forms of core dumps I've seen have memory/pool allocation at the top of the stack. Maybe some odd reentrancy case is being tickled that's not often seen. It points at a likely secondary problem, a bug in the client. > Parsing the XML message from the server? > Can you catch/view the actual XML message sent from the server? I'm thinking > if this is mangled in some strange way that is upsetting the XML parser. We're not able to install tools like wireshark, if that's what you're suggesting. I don't see a way to get to that XML other than doctoring SVN source. > Again something with memory allocation - same here, can you see what the > server is actually sending? Same answer. > I don't immediately see the call stacks above and the fact that it would fail > more often if the WC is on an NFS drive. Possibly if the NFS drive is slower > and this causes some kind of timeout? Can you create a ramdisk and have the > WC there temporary and see if there is a difference? I think NFS definitely slows things down, and that change in timing makes the hangs and crashes more likely. Unfortunately, I don't have the access needed to create a ramdisk. I'm able to checkout onto a local or NFS-mounted disk though. I would think the former is equivalent. No? Thanks for the reply, Daniel. Jim From: Daniel Sahlberg <daniel.l.sahlb...@gmail.com> Sent: Saturday, May 11, 2024 1:51 PM To: Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] <james.p.willi...@nasa.gov> Cc: users@subversion.apache.org Subject: Re: [EXTERNAL] Re: svn checkout Hangs/Crashes/Succeeds Over HTTP CAUTION: This email originated from outside of NASA. Please take care when clicking links or opening attachments. Use the "Report Message" button to report suspicious messages to the NASA SOC. Hi, I've added a few comments/questions below. Kind regards, Daniel Sahlberg Den lör 11 maj 2024 kl 03:00 skrev Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users <users@subversion.apache.org<mailto:users@subversion.apache.org>>: > How did you upgrade your server from RHEL 6 to RHEL 8? Because so much changed from RHEL 6 to 8, including Apache from 2.2.15 to 2.4.37, all the Apache modules, etc., I started from the skeleton configuration the operating system provides and made mostly the same customizations we had for RHEL 6, or modernized them where the docs said things changed. Mostly, that was tweaks to authentication (from LDAP to Kerberos), SSL, and the SVN endpoints. Browser access to all SVN and ViewVC pages seems to work fine. You previously mentioned Subversion 1.14.1, is that on the server or on the client? [...] > And do the problems happen if you use svn:// rather than https:// ? I thought svn:// worked only with svnserve, which we don't run. Are you suggesting I try to run it as a test, or that I consider abandoning Apache in favor of it? Yikes; that'd be painful. I hear you on the HTTP integration. We have about 2000 repos and a few hundred developers. I've supported that server for at least 15 years, and it hasn't been too bad...until now. I have personally only ever used Subversion over http/https (except for testing purposes) and I haven't had any of the problems described by Nico - I guess YMMV... Still it would be interesting to compare just to rule out a problem within the repository. You can use svnserve directly or tunneled over SSH, see the Subversion book: https://svnbook.red-bean.com/en/1.7/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.sshauth On Fri, May 10, 2024 at 4:17 PM Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users <users@subversion.apache.org<mailto:users@subversion.apache.org>> wrote: > > I'm upgrading an Apache HTTP server of our SVN repos on RedHat Enterprise > Linux 8. Using Subversion 1.14.1, svn checkout of even a small, simple repo > with about 150 files hangs about 90% of the time, crashes 5%, and succeeds > 5%. Given enough time, the hangs eventually time out after checking out much > of the repo. A debugger shows the following stack during the hang. > > > > #0 epoll_wait /usr/lib64/libc.so.6 Waiting for a reply from the server ... ? Do you see any activity on the server (CPU / disk) during this time? > > #1 impl_pollset_poll /usr/lib64/libapr-1.so.0 > > #2 serf_context_run /usr/lib64/libserf-1.so.0 > > #3 svn_ra_serf.context_run /usr/lib64/libsvn_ra_serf-1.so.0 > > #4 finish_report /usr/lib64/libsvn_ra_serf-1.so.0 > > #5 svn_wc_crawl_revisions5 /usr/lib64/libsvn_wc-1.so.0 > > #6 update_internal.isra /usr/lib64/libsvn_client-1.so.0 > > #7 svn_client.update_internal /usr/lib64/libsvn_client-1.so.0 > > #8 svn_client.checkout_internal /usr/lib64/libsvn_client-1.so.0 > > #9 svn_client_checkout3 /usr/lib64/libsvn_client-1.so.0 > > #10 svn_cl.checkout > > #11 sub_main > > #12 main > > > > strace shows repeated calls to epoll_wait about 1 sec apart. > > > > When the checkout crashes, it's a SIGSEGV with this stack, > > > > #0 apr_pool_create_ex (libapr-1.so.0) Memory allocation? > > #1 svn_pool_create_ex (libsvn_subr-1.so.0) > > #2 update_opened (libsvn_ra_serf-1.so.0) > > #3 expat_start (libsvn_ra_serf-1.so.0) Parsing the XML message from the server? Can you catch/view the actual XML message sent from the server? I'm thinking if this is mangled in some strange way that is upsetting the XML parser. > > #4 expat_start_handler (libsvn_subr-1.so.0) > > #5 doContent (libexpat.so.1) > > #6 contentProcessor (libexpat.so.1) > > #7 XML_ParseBuffer (libexpat.so.1) > > #8 svn_xml_parse (libsvn_subr-1.so.0) > > #9 expat_response_handler (libsvn_ra_serf-1.so.0) > > #10 process_buffer.isra.9 (libsvn_ra_serf-1.so.0) > > #11 finish_report (libsvn_ra_serf-1.so.0) > > #12 svn_wc_crawl_revisions5 (libsvn_wc-1.so.0) > > #13 update_internal.isra.0 (libsvn_client-1.so.0) > > #14 svn_client__update_internal (libsvn_client-1.so.0) > > #15 svn_client__checkout_internal (libsvn_client-1.so.0) > > #16 svn_client_checkout3 (libsvn_client-1.so.0) > > #17 svn_cl__checkout (svn) > > #18 sub_main (svn) > > #19 main (svn) > > #20 __libc_start_main (libc.so.6) > > #21 _start (svn) > > > > or this one, > > > > #0 apr_allocator_alloc (libapr-1.so.0) > > #1 serf_bucket_mem_alloc (libserf-1.so.0) Again something with memory allocation - same here, can you see what the server is actually sending? > > #2 serf_bucket_response_create (libserf-1.so.0) > > #3 serf.process_connection (libserf-1.so.0) > > #4 serf_event_trigger (libserf-1.so.0) > > #5 serf_context_run (libserf-1.so.0) > > #6 svn_ra_serf.context_run (libsvn_ra_serf-1.so.0) > > #7 finish_report (libsvn_ra_serf-1.so.0) > > #8 svn_wc_crawl_revisions5 (libsvn_wc-1.so.0) > > #9 update_internal.isra (libsvn_client-1.so.0) > > #10 svn_client.update_internal (libsvn_client-1.so.0) > > #11 svn_client.checkout_internal (libsvn_client-1.so.0) > > #12 svn_client_checkout3 (libsvn_client-1.so.0) > > #13 svn_cl.checkout (svn) > > #14 sub_main (svn) > > #15 main (svn) > > > > After a failure, I'm left with a half-checked out working copy with many > locks. I can complete it with svn cleanup and another svn checkout, but > that's not realistic for our CI/CD or general use. Server logs show no > indication of a problem; the server appears healthy. > > > > I've tried a million things before submitting this bug report, read half a > million posts and searches, but haven't been able to get past this. I'd sure > appreciate any ideas you have on the way forward. Here's a bit more about my > system. > > > > svn, version 1.14.1 (r1886195) > > * ra_svn : Module for accessing a repository using the svn network > protocol. > > - with Cyrus SASL authentication > > - handles 'svn' scheme > > * ra_local : Module for accessing a repository on local disk. > > - handles 'file' scheme > > * ra_serf : Module for accessing a repository via WebDAV protocol using > serf. > > - using serf 1.3.9 (compiled with 1.3.9) > > - handles 'http' scheme > > - handles 'https' scheme > > > > The following authentication credential caches are available: > > > > * Plaintext cache in /home/me/.subversion > > * Gnome Keyring > > * GPG-Agent > > svn 1.10.2 was failing the same way before we upgraded to 1.14.1 as a > possible fix. > Checking out to a local disk succeeds more often, but still hangs and > crashes. Checking out to an NFS drive just makes it worse. I don't immediately see the call stacks above and the fact that it would fail more often if the WC is on an NFS drive. Possibly if the NFS drive is slower and this causes some kind of timeout? Can you create a ramdisk and have the WC there temporary and see if there is a difference? > > > > And here's more about our Apache. > > > > Server version: Apache/2.4.37 (Red Hat Enterprise Linux) > > Server built: Aug 30 2023 11:01:53 > > Server's Module Magic Number: 20120211:83 > > Server loaded: APR 1.6.3, APR-UTIL 1.6.1 > > Compiled using: APR 1.6.3, APR-UTIL 1.6.1 > > Architecture: 64-bit > > Server MPM: worker > > threaded: yes (fixed thread count) > > forked: yes (variable process count) > > Server compiled with.... > > -D APR_HAS_SENDFILE > > -D APR_HAS_MMAP > > -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) > > -D APR_USE_SYSVSEM_SERIALIZE > > -D APR_USE_PTHREAD_SERIALIZE > > -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT > > -D APR_HAS_OTHER_CHILD > > -D AP_HAVE_RELIABLE_PIPED_LOGS > > -D DYNAMIC_MODULE_LIMIT=256 > > -D HTTPD_ROOT="/etc/httpd" > > -D SUEXEC_BIN="/usr/sbin/suexec" > > -D DEFAULT_PIDLOG="run/httpd.pid" > > -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" > > -D DEFAULT_ERRORLOG="logs/error_log" > > -D AP_TYPES_CONFIG_FILE="conf/mime.types" > > -D SERVER_CONFIG_FILE="conf/httpd.conf" > > Access to this server from a browser with SVN and ViewVC pages seems to work. > Authentication is over Kerberos with mod_auth_gssapi. > Authorization uses AuthzSVNAccessFile and an access file. > SSL is used with SSLCryptoDevice set to builtin, based on this. > I've tried all three MPMs with no change, based on another post: prefork, > worker, and event. > We've had Apache running on RedHat 6 with these repos for many years. > > > > I'd be glad to provide additional details or run more tests. Thanks for any > ideas you have, and for supporting this software. > > > > Jim