On 13.02.2021 15:32, Evgeny Kotkov wrote:
Evgeny Kotkov <kot...@apache.org> writes:
URL: http://svn.apache.org/viewvc?rev=1886490&view=rev
Log:
In the update editor, stream data to both pristine and working files.
...
Several quick benchmarks:
- Checking out subversion/trunk over https://
Total time: 3.861 s → 2.812 s
Read IO: 57322 KB → 316 KB
Write IO: 455013 KB → 359977 KB
- Checking out 4 large binary files (7.4 GB) over https://
Total time: 91.594 s → 70.214 s
Read IO: 7798883 KB → 19 KB
Write IO: 15598167 KB → 15598005 KB
Hey everyone,
Here's an improvement I have been working on recently.
Apparently, the client has an (implicit) limit on the size of directories that
can be safely checked out over HTTP without hitting a timeout. The problem is
that when the client installs the new working files, it does so in a separate
step. This step happens per-directory and involves copying and possibly
translating the pristine contents into new working files. While that happens,
nothing is read from the connection. So the amount of work that can be done
without hitting a timeout is limited.
Assuming the default HTTP timeout = 60 seconds of httpd 2.4.x and a relatively
fast disk, that puts the limit at around 6 GB for any directory. Not cool.
My attempt to fix this is by making checkout stream data to both pristine and
the (projected) working file, so that the actual install would then happen as
just an atomic rename. Since we now never stop reading from the connection,
the timeouts should no longer be an issue. The new approach also has several
nice properties, such as not having to re-read the pristine files, not
interfering with the network-level buffering, TCP slow starts, and etc.
I see that it reduces the amount of both read and write I/O during all
checkouts, which should give a mild overall increase of how fast the
checkouts work.
I like this concept very much indeed.
Noting that this change only fixes "svn checkout", but not "svn export".
It also affects 'svn update', right, since 'checkout' is implemented as
an update of an empty working copy.
Export uses a separate implementation of the delta editor, and it should
be possible to update it in a similar way — but I'm leaving that for future
work for now.
The only thing that mildly worries me about the implementation is that
the wc-db code is now responsible for installing working files. There's
a bit of an abstraction leak here. On the other hand, you need to make
the whole operation transactional with the sqlite updates, so ... maybe
it's better this way.
-- Brane