On 04 Mar 2022, Julian Foad wrote:
I had a talk with Karl about this, and now I understand the concern much better.

(Karl, please correct anything I misrepresent.)

You've described it well, Julian. Thank you (and thank you also for your patience in explaining to me the current State Of The Onion in a phone call, when I was still behind on reading dev@ posts -- I'm caught up now).

The one thing I would add to your summary below is that the concern on the client side is not just about wasted time (that is, the time spent fetching pristines for files that won't, in the end, actually need pristines locally).

The concern is also local *space*. It's not unusual for one of these working copies to bring a local disk to within a few enormous files of full disk usage -- in other words, to be in a situation where fetching a certain number of pristines could result in the disk running out of space. So if one has modified N of the large versioned files, and then an update brings down N correspondingly large pristines, well, hilarity could ensue :-).

But even beyond my experience with particular use cases, I think we should aim for the simplicity of a principle here:

Principle: When a file is checked out without its pristine, then SVN should never fetch that pristine unless we actually need to.

(Naturally, this principle applies, via the distributive property, to all the files in a fully pristine-less working copy. Since in the future we may offer UI to allow working copies in which some files are checked out with pristine and some without, I am being careful to articulate the principle here as being about files rather than about working copies.)

The justification for this principle is that there's presumably a *reason* why the user requested that there be no pristine for that file. Whatever that reason is, we have no reason to think we know better than the user does. The most likely reason is that the file is huge and the user doesn't want to pay the disk-space cost, nor the network-time cost in the case of updates for which the file hasn't changed in the repository. But maybe the reason is something else. Who knows? Not our business. The user told SVN what they wanted, and SVN should do that thing.

Now, if the user runs an operation that requires a pristine, that's different -- then they're effectively notifying us that they're changing their decision. We should obey the user in that case too. It's just that it would be bad form for us to go fetching a pristine when a) the user already said they don't want it and b) SVN has no identifiable need for it in this operation.

I do understand the reasons why Evgeny thought pre-fetching pristines for modified files as part of an 'update' could be a good idea. There would surely be _some_ occasions where a user would be pleasantly surprised to find that they have that pristine locally just when they need it. But in the end, I believe that

a) In the most common use cases, it's probably not what the user wants anyway;

b) The failure mode of unnecessary fetching and storing is much worse than the failure mode of not having fetched a pristine that someone might turn out to want (there are workarounds for the latter);

c) It's generally better if we have a simple and comprehensible principle, like the one I articulated above.

Best regards,
-Karl

He shares the view that it would be unacceptable for 'svn update' to fetch pristines of files that have become locally modified since the previous fetch opportunity, but that are not actually being updated by
this update.

In his use cases a developer locally modifies some large files. The
developer also modifies some small files (such as 'readme' files
describing the large files). The developer doesn't need to diff or revert the large files, and so chooses the checkout mode which doesn't
keep the pristines initially.

Before committing, the developer runs 'update', expecting to fetch any remote changes to the small files (and not large files, not in this case), expecting it to be quick, and then the developer continues work
and eventually commits.

The time taken to fetch the pristines of the large, modified files would be long (for example, ten minutes). Taking a long time for the commit is acceptable because the commit is the end of the work flow (and the developer can go away or move on to something else while it proceeds). The concern is that taking a long time at the update stage would be too disruptive.

It wouldn't be a problem for an operation that really needs the
pristines taking a long time. (Revert, for example.) The perception is that update doesn't really need them. That is, while it obviously needs in principle to fetch the new pristines of the files that need updating to a new version from the server (or fetch a delta and so be able to generate the pristine), it doesn't, in principle, need pristines of files that it isn't going to update. In this use case, it isn't going to update the large, locally modified files. And fetching their pristines wouldn't massively benefit the commit either, because they are poorly
diffable kinds of files. So it is wasted time.

If the implementation currently requires these pristines, that would seem to be an implementation detail and we would seek to change that.

So my task now is to investigate for any way we can eliminate or
optimise the unnecessary fetching, at least in this specific case.

Filed as https://subversion.apache.org/issue/4892 .

I will investigate this issue next week.

- Julian

Reply via email to