> Den lör 11 maj 2024 kl 03:00 skrev Williams, James P. {Jim} (JSC-CD4)[KBR 
> Wyle Services, LLC] via users <users@subversion.apache.org>:
> You previously mentioned Subversion 1.14.1, is that on the server or on the 
> client?

I'm using 1.14.1 on both the client and server.

> Still it would be interesting to compare just to rule out a problem within 
> the repository. You can use svnserve directly or tunneled over SSH, see the 
> Subversion book:

With svnserve 1.14.1, I see no problems.  Checkouts complete every time.  I'm 
not sure what to conclude about that.  It's a different protocol, so it doesn't 
necessarily exonerate the client or the network.

> >       #0  epoll_wait                   /usr/lib64/libc.so.6
> Waiting for a reply from the server ... ?

Yeah, that'd be my guess.  When the hang occurs, I've got about 90% of the 
working copy checked out.  I expect the client is waiting for more bytes to 
arrive on the socket.

> Do you see any activity on the server (CPU / disk) during this time?

The server is well-behaved throughout all of my tests.  It shows no CPU spike 
or log messages hinting that it's noticed a problem.

> Memory allocation?

Yeah, both forms of core dumps I've seen have memory/pool allocation at the top 
of the stack.  Maybe some odd reentrancy case is being tickled that's not often 
seen.  It points at a likely secondary problem, a bug in the client.

> Parsing the XML message from the server?
> Can you catch/view the actual XML message sent from the server? I'm thinking 
> if this is mangled in some strange way that is upsetting the XML parser.

We're not able to install tools like wireshark, if that's what you're 
suggesting.  I don't see a way to get to that XML other than doctoring SVN 
source.

> Again something with memory allocation - same here, can you see what the 
> server is actually sending?

Same answer.

> I don't immediately see the call stacks above and the fact that it would fail 
> more often if the WC is on an NFS drive. Possibly if the NFS drive is slower 
> and this causes some kind of timeout? Can you create a ramdisk and have the 
> WC there temporary and see if there is a difference?

I think NFS definitely slows things down, and that change in timing makes the 
hangs and crashes more likely.  Unfortunately, I don't have the access needed 
to create a ramdisk.  I'm able to checkout onto a local or NFS-mounted disk 
though.  I would think the former is equivalent.  No?

Thanks for the reply, Daniel.

Jim

From: Daniel Sahlberg <daniel.l.sahlb...@gmail.com>
Sent: Saturday, May 11, 2024 1:51 PM
To: Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] 
<james.p.willi...@nasa.gov>
Cc: users@subversion.apache.org
Subject: Re: [EXTERNAL] Re: svn checkout Hangs/Crashes/Succeeds Over HTTP

CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.


Hi,

I've added a few comments/questions below.

Kind regards,
Daniel Sahlberg

Den lör 11 maj 2024 kl 03:00 skrev Williams, James P. {Jim} (JSC-CD4)[KBR Wyle 
Services, LLC] via users 
<users@subversion.apache.org<mailto:users@subversion.apache.org>>:
> How did you upgrade your server from RHEL 6 to RHEL 8?

Because so much changed from RHEL 6 to 8, including Apache from 2.2.15 to 
2.4.37, all the Apache modules, etc., I started from the skeleton configuration 
the operating system provides and made mostly the same customizations we had 
for RHEL 6, or modernized them where the docs said things changed.  Mostly, 
that was tweaks to authentication (from LDAP to Kerberos), SSL, and the SVN 
endpoints.  Browser access to all SVN and ViewVC pages seems to work fine.

You previously mentioned Subversion 1.14.1, is that on the server or on the 
client?

[...]

> And do the problems happen if you use svn:// rather than https:// ?

I thought svn:// worked only with svnserve, which we don't run.  Are you 
suggesting I try to run it as a test, or that I consider abandoning Apache in 
favor of it?  Yikes; that'd be painful.

I hear you on the HTTP integration.  We have about 2000 repos and a few hundred 
developers.  I've supported that server for at least 15 years, and it hasn't 
been too bad...until now.

I have personally only ever used Subversion over http/https (except for testing 
purposes) and I haven't had any of the problems described by Nico - I guess 
YMMV...

Still it would be interesting to compare just to rule out a problem within the 
repository. You can use svnserve directly or tunneled over SSH, see the 
Subversion book:

https://svnbook.red-bean.com/en/1.7/svn.serverconfig.svnserve.html#svn.serverconfig.svnserve.sshauth


On Fri, May 10, 2024 at 4:17 PM Williams, James P. {Jim} (JSC-CD4)[KBR
Wyle Services, LLC] via users 
<users@subversion.apache.org<mailto:users@subversion.apache.org>> wrote:
>
> I'm upgrading an Apache HTTP server of our SVN repos on RedHat Enterprise 
> Linux 8.  Using Subversion 1.14.1, svn checkout of even a small, simple repo 
> with about 150 files hangs about 90% of the time, crashes 5%, and succeeds 
> 5%.  Given enough time, the hangs eventually time out after checking out much 
> of the repo.  A debugger shows the following stack during the hang.
>
>
>
>       #0  epoll_wait                   /usr/lib64/libc.so.6

Waiting for a reply from the server ... ?

Do you see any activity on the server (CPU / disk) during this time?

>
>       #1  impl_pollset_poll            /usr/lib64/libapr-1.so.0
>
>       #2  serf_context_run             /usr/lib64/libserf-1.so.0
>
>       #3  svn_ra_serf.context_run      /usr/lib64/libsvn_ra_serf-1.so.0
>
>       #4  finish_report                /usr/lib64/libsvn_ra_serf-1.so.0
>
>       #5  svn_wc_crawl_revisions5      /usr/lib64/libsvn_wc-1.so.0
>
>       #6  update_internal.isra         /usr/lib64/libsvn_client-1.so.0
>
>       #7  svn_client.update_internal   /usr/lib64/libsvn_client-1.so.0
>
>       #8  svn_client.checkout_internal /usr/lib64/libsvn_client-1.so.0
>
>       #9  svn_client_checkout3         /usr/lib64/libsvn_client-1.so.0
>
>       #10 svn_cl.checkout
>
>       #11 sub_main
>
>       #12 main
>
>
>
> strace shows repeated calls to epoll_wait about 1 sec apart.
>
>
>
> When the checkout crashes, it's a SIGSEGV with this stack,
>
>
>
>       #0  apr_pool_create_ex            (libapr-1.so.0)

Memory allocation?

>
>       #1  svn_pool_create_ex            (libsvn_subr-1.so.0)
>
>       #2  update_opened                 (libsvn_ra_serf-1.so.0)
>
>       #3  expat_start                   (libsvn_ra_serf-1.so.0)

Parsing the XML message from the server?

Can you catch/view the actual XML message sent from the server? I'm thinking if 
this is mangled in some strange way that is upsetting the XML parser.

>
>       #4  expat_start_handler           (libsvn_subr-1.so.0)
>
>       #5  doContent                     (libexpat.so.1)
>
>       #6  contentProcessor              (libexpat.so.1)
>
>       #7  XML_ParseBuffer               (libexpat.so.1)
>
>       #8  svn_xml_parse                 (libsvn_subr-1.so.0)
>
>       #9  expat_response_handler        (libsvn_ra_serf-1.so.0)
>
>       #10 process_buffer.isra.9         (libsvn_ra_serf-1.so.0)
>
>       #11 finish_report                 (libsvn_ra_serf-1.so.0)
>
>       #12 svn_wc_crawl_revisions5       (libsvn_wc-1.so.0)
>
>       #13 update_internal.isra.0        (libsvn_client-1.so.0)
>
>       #14 svn_client__update_internal   (libsvn_client-1.so.0)
>
>       #15 svn_client__checkout_internal (libsvn_client-1.so.0)
>
>       #16 svn_client_checkout3          (libsvn_client-1.so.0)
>
>       #17 svn_cl__checkout              (svn)
>
>       #18 sub_main                      (svn)
>
>       #19 main                          (svn)
>
>       #20 __libc_start_main             (libc.so.6)
>
>       #21 _start                        (svn)
>
>
>
> or this one,
>
>
>
>       #0  apr_allocator_alloc          (libapr-1.so.0)
>
>       #1  serf_bucket_mem_alloc        (libserf-1.so.0)

Again something with memory allocation - same here, can you see what the server 
is actually sending?

>
>       #2  serf_bucket_response_create  (libserf-1.so.0)
>
>       #3  serf.process_connection      (libserf-1.so.0)
>
>       #4  serf_event_trigger           (libserf-1.so.0)
>
>       #5  serf_context_run             (libserf-1.so.0)
>
>       #6  svn_ra_serf.context_run      (libsvn_ra_serf-1.so.0)
>
>       #7  finish_report                (libsvn_ra_serf-1.so.0)
>
>       #8  svn_wc_crawl_revisions5      (libsvn_wc-1.so.0)
>
>       #9  update_internal.isra         (libsvn_client-1.so.0)
>
>       #10 svn_client.update_internal   (libsvn_client-1.so.0)
>
>       #11 svn_client.checkout_internal (libsvn_client-1.so.0)
>
>       #12 svn_client_checkout3         (libsvn_client-1.so.0)
>
>       #13 svn_cl.checkout              (svn)
>
>       #14 sub_main                     (svn)
>
>       #15 main                         (svn)
>
>
>
> After a failure, I'm left with a half-checked out working copy with many 
> locks.  I can complete it with svn cleanup and another svn checkout, but 
> that's not realistic for our CI/CD or general use.  Server logs show no 
> indication of a problem; the server appears healthy.
>
>
>
> I've tried a million things before submitting this bug report, read half a 
> million posts and searches, but haven't been able to get past this.  I'd sure 
> appreciate any ideas you have on the way forward.  Here's a bit more about my 
> system.
>
>
>
> svn, version 1.14.1 (r1886195)
>
>      * ra_svn : Module for accessing a repository using the svn network 
> protocol.
>
>       - with Cyrus SASL authentication
>
>       - handles 'svn' scheme
>
>      * ra_local : Module for accessing a repository on local disk.
>
>       - handles 'file' scheme
>
>      * ra_serf : Module for accessing a repository via WebDAV protocol using 
> serf.
>
>       - using serf 1.3.9 (compiled with 1.3.9)
>
>       - handles 'http' scheme
>
>       - handles 'https' scheme
>
>
>
>      The following authentication credential caches are available:
>
>
>
>      * Plaintext cache in /home/me/.subversion
>
>      * Gnome Keyring
>
>      * GPG-Agent
>
> svn 1.10.2 was failing the same way before we upgraded to 1.14.1 as a 
> possible fix.
> Checking out to a local disk succeeds more often, but still hangs and 
> crashes.  Checking out to an NFS drive just makes it worse.

I don't immediately see the call stacks above and the fact that it would fail 
more often if the WC is on an NFS drive. Possibly if the NFS drive is slower 
and this causes some kind of timeout? Can you create a ramdisk and have the WC 
there temporary and see if there is a difference?

>
>
>
> And here's more about our Apache.
>
>
>
> Server version: Apache/2.4.37 (Red Hat Enterprise Linux)
>
> Server built:   Aug 30 2023 11:01:53
>
> Server's Module Magic Number: 20120211:83
>
> Server loaded:  APR 1.6.3, APR-UTIL 1.6.1
>
> Compiled using: APR 1.6.3, APR-UTIL 1.6.1
>
> Architecture:   64-bit
>
> Server MPM:     worker
>
>   threaded:     yes (fixed thread count)
>
>     forked:     yes (variable process count)
>
> Server compiled with....
>
> -D APR_HAS_SENDFILE
>
> -D APR_HAS_MMAP
>
> -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
>
> -D APR_USE_SYSVSEM_SERIALIZE
>
> -D APR_USE_PTHREAD_SERIALIZE
>
> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>
> -D APR_HAS_OTHER_CHILD
>
> -D AP_HAVE_RELIABLE_PIPED_LOGS
>
> -D DYNAMIC_MODULE_LIMIT=256
>
> -D HTTPD_ROOT="/etc/httpd"
>
> -D SUEXEC_BIN="/usr/sbin/suexec"
>
> -D DEFAULT_PIDLOG="run/httpd.pid"
>
> -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
>
> -D DEFAULT_ERRORLOG="logs/error_log"
>
> -D AP_TYPES_CONFIG_FILE="conf/mime.types"
>
> -D SERVER_CONFIG_FILE="conf/httpd.conf"
>
> Access to this server from a browser with SVN and ViewVC pages seems to work.
> Authentication is over Kerberos with mod_auth_gssapi.
> Authorization uses AuthzSVNAccessFile and an access file.
> SSL is used with SSLCryptoDevice set to builtin, based on this.
> I've tried all three MPMs with no change, based on another post:  prefork, 
> worker, and event.
> We've had Apache running on RedHat 6 with these repos for many years.
>
>
>
> I'd be glad to provide additional details or run more tests.  Thanks for any 
> ideas you have, and for supporting this software.
>
>
>
> Jim
  • svn c... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
    • ... Nico Kadel-Garcia
      • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
        • ... Daniel Sahlberg
          • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
            • ... Johan Corveleyn
              • ... Daniel Sahlberg
                • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
                • ... Nathan Hartman
                • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
        • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
          • ... Johan Corveleyn
            • ... Williams, James P. {Jim} (JSC-CD4)[KBR Wyle Services, LLC] via users
              • ... Daniel Sahlberg

Reply via email to