Re: A two-part vision for Subversion and large binary objects.

Nathan Hartman Wed, 28 Jul 2021 08:24:27 -0700

On Tue, Jul 27, 2021 at 9:24 PM Karl Fogel <kfo...@red-bean.com> wrote:
>
> Hi, everyone.  I'd like feedback an idea that I've had for some
> years now but never written up before.
>
> Subversion can already be used to manage large (usually binary)
> files.  In fact, we use SVN for this at my company and it works
> decently.  However, there are two possible features that would
> make Subversion go beyond "decent" all the way to "quite good" at
> this :-).  They are:
>
> 1) Make pristine text-base files optional.  See issue #525 for
> details.  In summary: currently, every large file uses twice the
> storage on the client side, and yet for most of these files
> there's little benefit.  They're usually not plaintext, so 'svn
> diff' against the pristine base is pointless (unless you have some
> specialized diff tool for the particular binary format, but that's
> rare), and 'svn commit' likewise just sends up the whole working
> file.  The only thing a local base gets you is local 'svn revert',
> which can be nice, but many of us would happily give it up for
> large files to avoid the 2x local storage cost.
>
> Note that this is a purely client-side change, controlled entirely
> by client-side configuration.  Different people can thus have
> different thresholds, depending on how much local disk space they
> have.  A server would never even know if a client is or isn't
> saving text-bases.



I like the thinking here. Either or both features (#525 and
'--depth=directories') would be very valuable additions.

Regarding #525, in addition to points discussed previously (i.e., that
SVN is strong at large repos and blobs than alternatives, and #525
would make SVN even stronger in this area), it would improve the
experience for two additional types of users:

* those whose repositories are on the same machine; in this case it's
definitely more sensible to grab a file directly from the repository
than to keep an extra copy, which may never be used, in the wc.

* those (like me) who put their working copies on ramdisks, to speed
up in-tree builds while reducing flash wear caused by useless build
artifacts. Obviously in a typical system, RAM is more scarce than
non-volatile and freeing up a good chunk of it by not storing
pristines might improve the computer's overall performance enough to
justify having to re-download files when needed (and if the repository
is local, so much more so).

More below...


> 2) Add a new '--depth=directories' depth type to make it easy to
> check out a sparse tree, that is, a skeleton directory tree
> without the files.  Then, within a given directory, you can do
> 'svn update --depth=files' or check out a particular file by name
> as needed.  There's no ticket associated with this feature, as far
> as I know, but I can file one after this post if people think this
> idea is worthwhile.


Regarding point 2, '--depth=directories', this sounds like a feature
that I would probably start using right away. I routinely work with a
large project consisting of nearly 100k files in many directories, so
I check it out with '--depth=immediates' and I have scripts that
expand several parts I need often; for other areas I expand
directories manually. Being able to checkout the entire skeleton tree
as you described would save quite a few commands whenever I need to
expand some deeply nested directory.

When writing the previous paragraph, I realized this would encounter
the following issue:

Suppose you have a skeleton checked out with '--depth=directories' and
it contains a path a/b/c/d/e/f/g; each of a, b, c, etc., may contain
other subdirectories.

If you run 'svn up --set-depth=files' on a/b/c, as it works currently
d/e/f/g will be deleted. (Unless they contain modifications.) So there
has to be a way to make some depth elements sticky, or, alternatively,
there has to be a way to do depth arithmetic, e.g.:
'svn up --set-depth+=files'

The following might be a separate feature request but it might be cool
if 'svn update --set-depth=...' could support the '--parents'
argument, at least when setting depth=files. Rationale: a user might
want to bring in all the files in g (in a/b/c/d/e/f/g) and all of g's
ancestors, but not g's siblings nor those of its ancestors. (This use
case comes up often in a project I work on; don't know whether anyone
else has a need for this use case.)

Food for thought...

Cheers,
Nathan

Re: A two-part vision for Subversion and large binary objects.

Reply via email to