On Tue, Jul 27, 2021 at 9:24 PM Karl Fogel <kfo...@red-bean.com> wrote: > > Hi, everyone. I'd like feedback an idea that I've had for some > years now but never written up before. > > Subversion can already be used to manage large (usually binary) > files. In fact, we use SVN for this at my company and it works > decently. However, there are two possible features that would > make Subversion go beyond "decent" all the way to "quite good" at > this :-). They are: > > 1) Make pristine text-base files optional. See issue #525 for > details. In summary: currently, every large file uses twice the > storage on the client side, and yet for most of these files > there's little benefit. They're usually not plaintext, so 'svn > diff' against the pristine base is pointless (unless you have some > specialized diff tool for the particular binary format, but that's > rare), and 'svn commit' likewise just sends up the whole working > file. The only thing a local base gets you is local 'svn revert', > which can be nice, but many of us would happily give it up for > large files to avoid the 2x local storage cost. > > Note that this is a purely client-side change, controlled entirely > by client-side configuration. Different people can thus have > different thresholds, depending on how much local disk space they > have. A server would never even know if a client is or isn't > saving text-bases.
I like the thinking here. Either or both features (#525 and '--depth=directories') would be very valuable additions. Regarding #525, in addition to points discussed previously (i.e., that SVN is strong at large repos and blobs than alternatives, and #525 would make SVN even stronger in this area), it would improve the experience for two additional types of users: * those whose repositories are on the same machine; in this case it's definitely more sensible to grab a file directly from the repository than to keep an extra copy, which may never be used, in the wc. * those (like me) who put their working copies on ramdisks, to speed up in-tree builds while reducing flash wear caused by useless build artifacts. Obviously in a typical system, RAM is more scarce than non-volatile and freeing up a good chunk of it by not storing pristines might improve the computer's overall performance enough to justify having to re-download files when needed (and if the repository is local, so much more so). More below... > 2) Add a new '--depth=directories' depth type to make it easy to > check out a sparse tree, that is, a skeleton directory tree > without the files. Then, within a given directory, you can do > 'svn update --depth=files' or check out a particular file by name > as needed. There's no ticket associated with this feature, as far > as I know, but I can file one after this post if people think this > idea is worthwhile. Regarding point 2, '--depth=directories', this sounds like a feature that I would probably start using right away. I routinely work with a large project consisting of nearly 100k files in many directories, so I check it out with '--depth=immediates' and I have scripts that expand several parts I need often; for other areas I expand directories manually. Being able to checkout the entire skeleton tree as you described would save quite a few commands whenever I need to expand some deeply nested directory. When writing the previous paragraph, I realized this would encounter the following issue: Suppose you have a skeleton checked out with '--depth=directories' and it contains a path a/b/c/d/e/f/g; each of a, b, c, etc., may contain other subdirectories. If you run 'svn up --set-depth=files' on a/b/c, as it works currently d/e/f/g will be deleted. (Unless they contain modifications.) So there has to be a way to make some depth elements sticky, or, alternatively, there has to be a way to do depth arithmetic, e.g.: 'svn up --set-depth+=files' The following might be a separate feature request but it might be cool if 'svn update --set-depth=...' could support the '--parents' argument, at least when setting depth=files. Rationale: a user might want to bring in all the files in g (in a/b/c/d/e/f/g) and all of g's ancestors, but not g's siblings nor those of its ancestors. (This use case comes up often in a project I work on; don't know whether anyone else has a need for this use case.) Food for thought... Cheers, Nathan