Re: A two-part vision for Subversion and large binary objects.

Daniel Shahaf Fri, 30 Jul 2021 16:23:02 -0700

Karl Fogel wrote on Tue, Jul 27, 2021 at 20:24:32 -0500:
> 1) Make pristine text-base files optional.  See issue #525 for details.  In
> summary: currently, every large file uses twice the storage on the client
> side, and yet for most of these files there's little benefit.  They're
> usually not plaintext, so 'svn diff' against the pristine base is pointless
> (unless you have some specialized diff tool for the particular binary
> format, but that's rare),


Then how do people do pre- or post-commit reviews of their changes?

> and 'svn commit' likewise just sends up the whole working file.  The
> only thing a local base gets you is local 'svn revert', which can be
> nice, but many of us would happily give it up for large files to avoid
> the 2x local storage cost.
> 

What about the ability to commit a change by uploading the delta as
opposed to the new fulltext?

> Note that this is a purely client-side change, controlled entirely by
> client-side configuration.  Different people can thus have different
> thresholds, depending on how much local disk space they have.  A server
> would never even know if a client is or isn't saving text-bases.
> 

What would «svn status» of a modified file without a pristine say?
How many network/worktree accesses would it involve?

Would it be possible to convert a file back and forth between
having and not having a pristine?

Suppose the user reverts the file without using «svn revert».  Would the
file show up as modified?  Would a commit cause a null change to the
file (new noderev with fulltext and props both identical to the
predecessor noderev's)?

How about (hard-|sym)linking the worktree file to the pristine and
making it read-only until the user requests it to be made editable?
Compare git-annex-unlock(1).

There was also a request to store pristines compressed, but I don't know
whether there's still demand for that.

> 2) Add a new '--depth=directories' depth type to make it easy to check out a
> sparse tree, that is, a skeleton directory tree without the files.  Then,
> within a given directory, you can do 'svn update --depth=files' or check out
> a particular file by name as needed.  There's no ticket associated with this
> feature, as far as I know, but I can file one after this post if people
> think this idea is worthwhile.
> 

Hmm.

Taking the FreeBSD ports tree (https://svnweb.freebsd.org/ports/head/)
as an example, the obvious next feature request would be to also fetch
the pkg-descr files from each port directory, even as new ports are
added, in order to facilitate a local search of port descriptions (via
«make search» in ports(7)).

Taking ASF's dist/release/ tree as an example, it might be useful to
automatically retrieve only READMEs and detached signatures, but not the
artifacts themselves.

In general, I suspect «svn_boolean_t download_it_p(dirent *foo) { return
foo->kind == svn_node_directory; }» is only half right: when FOO is a
directory, download_it_p() generally gets the right answer, but when FOO
is not a directory, download_it_p() sometimes false negatives.

Separately, this sounds like it shouldn't be too hard to prototype:
e.g., something along these lines:
.
    svn checkout --depth=empty -- "$URL" foo
    cd foo
    svn update --parents --set-depth=empty -- $(LC_ALL=C svn info -R -- "$URL" 
| grep-dctrl -F 'Node Kind' -ns Path directory | sort)
.
where grep-dctrl(1) is a generic "grep a list of rfc822 paragraphs" tool.
I realize such prototypes wouldn't automatically deepen the worktree as
new directories are added.

There's also svn-viewspec.py.

> It's easy to see how these two features would work together to make
> Subversion a quite good system for managing blobs ("binary large objects"):
⋮
> * When someone needs a blob locally, they just check out (i.e., update) that
> blob.  There are various ways to do this, and it would even be easy to
> script new tools based on 'svn ls' that auto-complete the filenames or
> whatever.

«svn ls» is a network operation, so autocomplete scripts might not like
to use it due to latency.

> When one is done with the file, one can keep it around or make it disappear
> locally. (Right now making it go away requires some fancy dance moves, but we
> could fix 'svn update --depth=empty FILENAME' to Do The Right Thing, or we
> could add a new flag, or whatever.

There already is «svn update --set-depth=exclude».  An «svn cleanup» is
required thereafter to vacuum the unused pristine
(https://subversion.apache.org/docs/release-notes/1.7#wc-pristines).

> Also, people would presumably write scripts to help with blob
> management in SVN, and eventually some of those scripts would make
> their way into our contrib/ area.)

contrib/ is deprecated.

> * Subversion's existing path-based authorization can be used so that each
> person's sparse checkout has the directories it needs and doesn't have any
> subtrees that it shouldn't have.

Authz is completely orthogonal to these feature requests; they involve
no changes to authz implementation or configuration.

> Neither of these two proposed changes is huge.  Of the two, issue #525 is
> bigger, and recently there is some interest in solving it (I need to follow
> up with some other folks who have shown interest, and I will post back here
> if it looks like we have a coalition).  The --depth change shouldn't be very
> hard at all, though please correct me if I'm mistaken about that.

Does it involve extending svn_depth_t with an svn_depth_directories
value?  That type is used all over the place, so there might be
a non-negligible amount of code to review for correctness (and lack of
asserts) in the face of such an extension.

Also, I'm not sure whether new RA APIs would be required in order to
implement the new behaviour performantly.

> I wanted to circulate this to see if it sounds good to others, and because
> people might suggest refinements -- or even suggest better ideas entirely
> for managing blobs in Subversion.

Increase the svndiff window size, so a single byte addition at the start
of the file doesn't result in $filesize/100KB delta ops?

Cheers,

Daniel

Re: A two-part vision for Subversion and large binary objects.

Reply via email to