Re: A two-part vision for Subversion and large binary objects.

Karl Fogel Sun, 06 Mar 2022 20:20:07 -0800

On 04 Mar 2022, Julian Foad wrote:

I had a talk with Karl about this, and now I understand theconcern much better.
(Karl, please correct anything I misrepresent.)

You've described it well, Julian. Thank you (and thank you alsofor your patience in explaining to me the current State Of TheOnion in a phone call, when I was still behind on reading dev@posts -- I'm caught up now).

The one thing I would add to your summary below is that theconcern on the client side is not just about wasted time (that is,the time spent fetching pristines for files that won't, in theend, actually need pristines locally).

The concern is also local *space*. It's not unusual for one ofthese working copies to bring a local disk to within a fewenormous files of full disk usage -- in other words, to be in asituation where fetching a certain number of pristines couldresult in the disk running out of space. So if one has modified Nof the large versioned files, and then an update brings down Ncorrespondingly large pristines, well, hilarity could ensue :-).

But even beyond my experience with particular use cases, I thinkwe should aim for the simplicity of a principle here:

Principle: When a file is checked out without its pristine, thenSVN should never fetch that pristine unless we actually need to.

(Naturally, this principle applies, via the distributive property,to all the files in a fully pristine-less working copy. Since inthe future we may offer UI to allow working copies in which somefiles are checked out with pristine and some without, I am beingcareful to articulate the principle here as being about filesrather than about working copies.)

The justification for this principle is that there's presumably a*reason* why the user requested that there be no pristine for thatfile. Whatever that reason is, we have no reason to think we knowbetter than the user does.The most likely reason is that the file is huge and the userdoesn't want to pay the disk-space cost, nor the network-time costin the case of updates for which the file hasn't changed in therepository. But maybe the reason is something else. Who knows?Not our business. The user told SVN what they wanted, and SVNshould do that thing.

Now, if the user runs an operation that requires a pristine,that's different -- then they're effectively notifying us thatthey're changing their decision. We should obey the user in thatcase too. It's just that it would be bad form for us to gofetching a pristine when a) the user already said they don't wantit and b) SVN has no identifiable need for it in this operation.

I do understand the reasons why Evgeny thought pre-fetchingpristines for modified files as part of an 'update' could be agood idea. There would surely be _some_ occasions where a userwould be pleasantly surprised to find that they have that pristinelocally just when they need it. But in the end, I believe that

a) In the most common use cases, it's probably not what the userwants anyway;

b) The failure mode of unnecessary fetching and storing is muchworse than the failure mode of not having fetched a pristine thatsomeone might turn out to want (there are workarounds for thelatter);

c) It's generally better if we have a simple and comprehensibleprinciple, like the one I articulated above.


Best regards,
-Karl

He shares the view that it would be unacceptable for 'svn update'tofetch pristines of files that have become locally modified sincetheprevious fetch opportunity, but that are not actually beingupdated by
this update.
In his use cases a developer locally modifies some largefiles. The
developer also modifies some small files (such as 'readme' files
describing the large files). The developer doesn't need to difforrevert the large files, and so chooses the checkout mode whichdoesn't
keep the pristines initially.
Before committing, the developer runs 'update', expecting tofetch anyremote changes to the small files (and not large files, not inthiscase), expecting it to be quick, and then the developer continueswork
and eventually commits.
The time taken to fetch the pristines of the large, modifiedfiles wouldbe long (for example, ten minutes). Taking a long time for thecommit isacceptable because the commit is the end of the work flow (andthedeveloper can go away or move on to something else while itproceeds).The concern is that taking a long time at the update stage wouldbe too disruptive.
It wouldn't be a problem for an operation that really needs the
pristines taking a long time. (Revert, for example.) Theperception isthat update doesn't really need them. That is, while it obviouslyneedsin principle to fetch the new pristines of the files that needupdatingto a new version from the server (or fetch a delta and so be abletogenerate the pristine), it doesn't, in principle, need pristinesoffiles that it isn't going to update. In this use case, it isn'tgoing toupdate the large, locally modified files. And fetching theirpristineswouldn't massively benefit the commit either, because they arepoorly
diffable kinds of files. So it is wasted time.
If the implementation currently requires these pristines, thatwouldseem to be an implementation detail and we would seek to changethat.
So my task now is to investigate for any way we can eliminate or
optimise the unnecessary fetching, at least in this specificcase.
Filed as https://subversion.apache.org/issue/4892 .

I will investigate this issue next week.

- Julian

Re: A two-part vision for Subversion and large binary objects.

Reply via email to