On 04 Mar 2022, Julian Foad wrote:
I had a talk with Karl about this, and now I understand the
concern much better.
(Karl, please correct anything I misrepresent.)
You've described it well, Julian. Thank you (and thank you also
for your patience in explaining to me the current State Of The
Onion in a phone call, when I was still behind on reading dev@
posts -- I'm caught up now).
The one thing I would add to your summary below is that the
concern on the client side is not just about wasted time (that is,
the time spent fetching pristines for files that won't, in the
end, actually need pristines locally).
The concern is also local *space*. It's not unusual for one of
these working copies to bring a local disk to within a few
enormous files of full disk usage -- in other words, to be in a
situation where fetching a certain number of pristines could
result in the disk running out of space. So if one has modified N
of the large versioned files, and then an update brings down N
correspondingly large pristines, well, hilarity could ensue :-).
But even beyond my experience with particular use cases, I think
we should aim for the simplicity of a principle here:
Principle: When a file is checked out without its pristine, then
SVN should never fetch that pristine unless we actually need to.
(Naturally, this principle applies, via the distributive property,
to all the files in a fully pristine-less working copy. Since in
the future we may offer UI to allow working copies in which some
files are checked out with pristine and some without, I am being
careful to articulate the principle here as being about files
rather than about working copies.)
The justification for this principle is that there's presumably a
*reason* why the user requested that there be no pristine for that
file. Whatever that reason is, we have no reason to think we know
better than the user does.
The most likely reason is that the file is huge and the user
doesn't want to pay the disk-space cost, nor the network-time cost
in the case of updates for which the file hasn't changed in the
repository. But maybe the reason is something else. Who knows?
Not our business. The user told SVN what they wanted, and SVN
should do that thing.
Now, if the user runs an operation that requires a pristine,
that's different -- then they're effectively notifying us that
they're changing their decision. We should obey the user in that
case too. It's just that it would be bad form for us to go
fetching a pristine when a) the user already said they don't want
it and b) SVN has no identifiable need for it in this operation.
I do understand the reasons why Evgeny thought pre-fetching
pristines for modified files as part of an 'update' could be a
good idea. There would surely be _some_ occasions where a user
would be pleasantly surprised to find that they have that pristine
locally just when they need it. But in the end, I believe that
a) In the most common use cases, it's probably not what the user
wants anyway;
b) The failure mode of unnecessary fetching and storing is much
worse than the failure mode of not having fetched a pristine that
someone might turn out to want (there are workarounds for the
latter);
c) It's generally better if we have a simple and comprehensible
principle, like the one I articulated above.
Best regards,
-Karl
He shares the view that it would be unacceptable for 'svn update'
to
fetch pristines of files that have become locally modified since
the
previous fetch opportunity, but that are not actually being
updated by
this update.
In his use cases a developer locally modifies some large
files. The
developer also modifies some small files (such as 'readme' files
describing the large files). The developer doesn't need to diff
or
revert the large files, and so chooses the checkout mode which
doesn't
keep the pristines initially.
Before committing, the developer runs 'update', expecting to
fetch any
remote changes to the small files (and not large files, not in
this
case), expecting it to be quick, and then the developer continues
work
and eventually commits.
The time taken to fetch the pristines of the large, modified
files would
be long (for example, ten minutes). Taking a long time for the
commit is
acceptable because the commit is the end of the work flow (and
the
developer can go away or move on to something else while it
proceeds).
The concern is that taking a long time at the update stage would
be too disruptive.
It wouldn't be a problem for an operation that really needs the
pristines taking a long time. (Revert, for example.) The
perception is
that update doesn't really need them. That is, while it obviously
needs
in principle to fetch the new pristines of the files that need
updating
to a new version from the server (or fetch a delta and so be able
to
generate the pristine), it doesn't, in principle, need pristines
of
files that it isn't going to update. In this use case, it isn't
going to
update the large, locally modified files. And fetching their
pristines
wouldn't massively benefit the commit either, because they are
poorly
diffable kinds of files. So it is wasted time.
If the implementation currently requires these pristines, that
would
seem to be an implementation detail and we would seek to change
that.
So my task now is to investigate for any way we can eliminate or
optimise the unnecessary fetching, at least in this specific
case.
Filed as https://subversion.apache.org/issue/4892 .
I will investigate this issue next week.
- Julian