Stefan Fuhrmann <[email protected]> writes:
> The extra temporary space is not a concern: Your server would run out of
> disk space just one equally large revision earlier than it does today.
I wouldn't say it is not a concern at all — e.g., in the situation where a
user cannot possibly commit a 4 GB file just because doing so now requires
at least 8 GB of free disk space. While it might sound like an edge case,
this could be important for some of the users.
> Shall I just enable the feature unconditionally?
I'm not sure about this. The feature has a price, and there are cases when
enabling parallel writes has a visible performance impact. Below are my
results for a couple of quick tests:
(First two tests should be reproducible, since they were performed on an
Azure VM; last one was done on a spinning disk in my environment; all
tests were executed over https:// protocol.)
Importing 2000 files of Subversion's source code:
22.233 → 30.546 s (37% slower)
Importing a 300 MB .zip file:
36.650 s → 46.255 s (26% slower)
Importing a 4 GB .iso file:
159.372 s → 212.559 s (33% slower)
After giving all this topic a second thought, I wonder whether we are heading
in the right direction. We aim for a faster svn commit over high-latency
networks. In order to achieve that, we try to implement the parallel PUTs,
beginning from the FS layer.
This leaves a couple of questions:
(1) Why do we start with adding a quite complex FS feature, given that we
don't know what kind of problems are associated with implementing this
in ra_serf?
(Can we actually do it? What can be parallelized while keeping the
necessary order of operations on the transaction? How do we plug that
into the commit editor? As well as that currently HTTP/2 is not
officially supported by neither httpd nor serf.)
(2) Is making parallel PUTs the proper way to speed up commits?
As far as I know, squashing everything into a single POST would make the
commit up to 10-20 times faster, depending on the amount of changes.
Although there are associated challenges, this approach doesn't require
us to deal with concurrency and doesn't introduce a dependency on httpd.
How faster is a commit going to be with parallel PUTs? Would that be
at least twice faster? Even if yes, that would require us to keep the
non-trivial code that is prone to deadlocks and different types of race
conditions. For instance, transaction.c is quite complex by itself and
already contains a mechanism to *prevent* concurrent writes. Adding
a layer that allows concurrent writes *on top of that* makes it even
more complex.
So, are we sure that we need to implement it this way?
Regards,
Evgeny Kotkov