RE: large number of large binary files in subversion

Winston Smith Tue, 24 May 2011 00:02:39 -0700

Folks,

Thanks for your replies. So, in principle, I should not expect any problems.
The machine would be a decent one-core Athlon3500+ with 2GB RAM,
doing nothing else other than serving  bugzilla, reviewboard and mediawiki
with lighttpd, and the repo(s) is/are on a permanently mounted USB disk.
Network throughput would not be the issue since the working copies
would be on the same machine.


I am aware that operations like copy, move, update, commit would take
time and/or space, but these files and their locations rarely change.
I was more concerned about the likelihood of repository corruption.
But this does not seem to be of any concern.

Thanks again.

- Winston

----------------------------------------
> Date: Tue, 24 May 2011 12:04:26 +0530
> From: ar...@collab.net
> To: dev@subversion.apache.org
> Subject: Re: large number of large binary files in subversion
>
> On Tuesday 24 May 2011 12:58 AM, Stefan Sperling wrote:
> > On Mon, May 23, 2011 at 11:07:50PM +0400, Konstantin Kolinko wrote:
> >> In svn 1.7 there is pristine storage area in the working copy, where
> >> all present files are stored by their checksums. If I understand this
> >> pristine storage correctly, if you move a file remotely on the server
> >> (svn mv URL URL) then when you update your working copy and both old
> >> and new paths are in the same working copy, Subversion will find the
> >> file in its pristine storage and won't re-download it over network. If
> >> what I wrote is true (I have not verified whether this actually works
> >> this way, but I have some hopes),
> >
> > Unfortunately, that's not how it works.
> >
> > When a new file is added during an update, the entire file content is
> > first spooled to a temporary file to calculcate its checksum.
> > If a pristine with the same checksum is already present, the temporary
> > file is deleted.
> >
> > (see pristine_install_txn() in subversion/libsvn_wc/wc_db_pristine.c)
> Why can't we send the recorded checksum from the server instead of
> sending the whole file and then calculating it on the client side?
>
> If the checksum matches one of the pristine files, then use that to
> populate the nodes table. If there is no match, only then do we spool to
> a temporary file and what not.
>
> This seems like a straightforward idea. Any pitfalls to this approach?

RE: large number of large binary files in subversion

Reply via email to