On 30.12.2015 16:49, Evgeny Kotkov wrote:
Evgeny Kotkov <evgeny.kot...@visualsvn.com> writes:

Stefan Fuhrmann <stef...@apache.org> writes:

Thanks, it would be nice if could do something within Subversion because
httpd and apr patches may take a while to trickle down into releases.
Meanwhile, I posted a patch for mod_dav on the httpd dev list.

I took a look at what's happening, and I think the root cause of the problem
is that during a PROPFIND walk, we use the request pool for allocations that
happen per every walked entry.  This occurs when mod_dav_svn/repos.c:
do_walk() calls dav_propfind_walker() with dav_walk_resource.pool pointing
to the request pool.

The problem with unbounded memory usage during PROPFINDs in mod_dav is old
and quite fundamental.  Its first acknowledgment dates back to year 2003 [1],
and the corresponding Bugzilla issue has been open since 2009 [2, 3].

Subversion's mod_dav_svn module is built around mod_dav, and is affected as
well.  The propfind walker makes different FS calls that allocate a lot of
memory — especially if a cache miss happens.  A typical walk can cross the
mod_dav - mod_dav_svn boundary multiple times.  Since mod_dav was written
long ago and doesn't follow the same pool usage practices as Subversion,
some of the allocations happen in long-living pools, such as the request
pool.

Given your response to Daniel later in this thread,
I want to re-iterate the nature of the problem to
prevent decision making based on false mental models.

The "designed" memory usage of mod_dav as in "what
gets allocated via palloc" is O(total size of all
properties).  That is unbounded and if you attach
large properties to each node in a large folder,
you may end up with GB sized pools.  This is a real
problem and needs to fixed eventually.

The current situation, however, is that we have O(N^2)
allocation *overhead*, i.e. memory provided by the
APR pool allocator that will never be used.

So, it is not that anybody allocates large structures
in long-living pools (except for large properties).
Instead APR reuses blocks from large, short-lived
allocations for small long-lived allocations.
Preventing that already goes a long way mitigating
the problems with any reasonable setup.

So, the memory usage for a PROPFIND of depth 1 (svn ls -v) is unbounded.
Although the actual memory consumption can be mitigated by the cache hits,
since a cache hit either reduces or eliminates additional allocations, a real
server can easily consume gigabytes of RAM, when the data is not and possibly
can't fit in the cache.  The problem is serious, and can lead to various out
of memory conditions, such as constant swapping or server crashes.

I'm aware of two possible ways of solving the problem:

  (1) Fix mod_dav, adjust mod_dav_svn accordingly

  (2) Reimplement PROPFIND in mod_dav_svn

Doing (1) is hard, since it requires revving mod_dav's API and special care
for other modules using it.  As far as I know, those modules don't have tests
and not breaking their behavior could be a challenging task.

(1) sounds like the right thing to do in the longer
term: fix the problem at the root and make everyone
profit from it henceforth.  But I have no idea of what
that entails and if your assessment is that it can't
be done easily - for various reasons - then I'll
believe that.

I believe that currently we don't have the necessary resources (aka committers)
to do that.  Given that the problem exists since 2003, I also doubt that it's
going to solve itself in the nearby future.  And moreover, even if somehow
managed to do that, the fix would probably only be available in the newer
2.6.x httpd releases.

With that in mind, I propose we do (2).

Although I'm not entirely happy with reimplementing a part of the protocol, I
don't think that there are other realistic approaches to this serious problem.
It wouldn't be the first time we're doing something like this, since we already
handle POST ourselves, as well as provide the custom Subversion's HTTPv2
protocol implementation.

Assuming that your analysis for (1) is correct,
I think (2) is a viable option.  However, I would
very much like to see mod_dav fixed as well and
to have (2) only as a temporary workaround.
"Temporary" being a quite euphemistic term here ...

Apart from solving the memory usage issue, having the PROPFIND implementation
in mod_dav_svn would allow us to improve a couple of other aspects of how we
handle these requests, e.g.:

  - Shrink down the PROPFIND response size by avoiding duplicate xmlns:
    namespace declarations that currently exist.

  - Optimize how we retrieve properties from the FS layer, since we would
    no longer be doing mod_dav - mod_dav_svn - mod_dav transitions, and
    would have more flexibility.

I attached a proof-of-concept patch that does (2).  This patch solves the
memory usage problem in my experiments and passes HTTPv2 / HTTPv1 tests.
>
> Thoughts?

Does the patch already include the improvements you
suggested above?  Is it feasible to #ifdef the usage
of that code?  I think if we have an easy and clean
opt-in / opt-out, people could be more accepting of
that workaround.

-- Stefan^2.

Reply via email to