On 16. 1. 26 18:21, Evgeny Kotkov wrote:
Nathan Hartman<[email protected]> writes:
In pristineless working copies, some pristines are available some of
the time, such as when they have been fetched for any reason by an
earlier operation. (In the current implementation, these may have been
fetched for no other reason than because they share a common subtree
with several modified files.) In this case, the content comparison
could be performed, rather than the checksum comparison. The decision
(whether to perform a content or checksum comparison) could be based
on whether the pristine in question is available at this time, rather
than on the pristineness of the working copy as a whole.
Pros:
- performs the "best" comparison possible with the available
information (if we consider a content comparison to be "better" or
"more definitive" than a checksum comparison)
- future effort to allow more granular user control over pristines
(rather than the all-or-nothing approach in 1.15.x) could benefit
from such logic. Specifically, if a working copy is partially-
pristined, I think we would want the content comparison performed
for pristined files.
- content comparison might be more performant than checksum
comparison, due to short-circuit evaluation when the first
difference is encountered; no such shortcut is possible with
checksum calculation.
I wouldn't say that one is universally "better" than the other, just that they
have different characteristics.
For example, checksum-based comparison can reduce the number of heavy
"open file" I/O syscalls, because in some cases we don't have to open
the pristine file at all.
Also, such a behavior change would currently only affect the subset of
files that were kept hydrated after the operation, i.e., the modified files,
while remaining unchanged for the majority of unmodified files.
Cons:
- inconsistency: status checks of a file may behave differently at
different times, since the pristine may be available during some
invocations and unavailable in others.
While this unpredictability by itself seems undesirable to me, I think there's
a bigger issue.
The current checksum-based approach avoids the complexity of *depending*
on the hydration state of individual pristines. This reflects the broader
intent of the pristineless WC design: avoiding the need for specialized
code paths by minimizing different behavior and state dependencies.
Since the "is the file modified?" check is a pretty low-level building block,
it can be a part of an operation that doesn't hold a write-lock on the working
copy subtree. If this check starts depending on the presence of individual
pristines, we would likely need to extend their lifetime, maybe by pinning
them for the duration of the comparison. In turn, that would effectively
introduce the need for a read lock within a low-level read-only operation.
And given the large surface area of this primitive, adding such locking
requirements is something that I think we'd better avoid.
So, from a technical perspective, while I think we could try to make this
check depend on the global mode of the working copy (which I'm currently
working on), I don't think the potential benefits of depending on individual
pristine states outweigh the added complexity.
For now, I agree.
But for example, on the better-pristines branch which I recently
revived, I intend to implement storing small (compressed) pristine texts
right in the wc-db, spilling over to on-disk files only if they exceed
some predefined size. Access to those blobs will be subject to
in-database locking and transactions, not filesystem metadata (locks).
So not using the content as a fallback if both the checksum and the size
are equal seems like a wasted opportunity.
The same argument can be made for almost-pristine-less working copies.
Content comparison is the only sure defence against tripping over hash
collisions. The question that needs to be answered: how does the added
complexity of what you describe above compare with changing the, or
adding another, hash type to the pristines metadata? Considering that
the latter is not going to happen soon.
On top of that, when the working copy is configured to use pristines, we
used content comparison, too. So it's not really a question of whether
there's added complexity but rather on which level it occurs: based on
working copy configuration – with externals bringing their own concept
of "fun" to the party – or based on records in the pristines table that
are the source of truth no matter what the WC configuration says.
-- Brane