Re: Status of branches/pristine-checksum-salt

Branko Čibej Fri, 16 Jan 2026 09:52:08 -0800

On 16. 1. 26 18:21, Evgeny Kotkov wrote:

Nathan Hartman<[email protected]> writes:

In pristineless working copies, some pristines are available some of
the time, such as when they have been fetched for any reason by an
earlier operation. (In the current implementation, these may have been
fetched for no other reason than because they share a common subtree
with several modified files.) In this case, the content comparison
could be performed, rather than the checksum comparison. The decision
(whether to perform a content or checksum comparison) could be based
on whether the pristine in question is available at this time, rather
than on the pristineness of the working copy as a whole.

Pros:

- performs the "best" comparison possible with the available
   information (if we consider a content comparison to be "better" or
   "more definitive" than a checksum comparison)

- future effort to allow more granular user control over pristines
   (rather than the all-or-nothing approach in 1.15.x) could benefit
   from such logic. Specifically, if a working copy is partially-
   pristined, I think we would want the content comparison performed
   for pristined files.

- content comparison might be more performant than checksum
   comparison, due to short-circuit evaluation when the first
   difference is encountered; no such shortcut is possible with
   checksum calculation.

I wouldn't say that one is universally "better" than the other, just that they
have different characteristics.

For example, checksum-based comparison can reduce the number of heavy
"open file" I/O syscalls, because in some cases we don't have to open
the pristine file at all.

Also, such a behavior change would currently only affect the subset of
files that were kept hydrated after the operation, i.e., the modified files,
while remaining unchanged for the majority of unmodified files.

Cons:

- inconsistency: status checks of a file may behave differently at
   different times, since the pristine may be available during some
   invocations and unavailable in others.

While this unpredictability by itself seems undesirable to me, I think there's
a bigger issue.

The current checksum-based approach avoids the complexity of *depending*
on the hydration state of individual pristines.  This reflects the broader
intent of the pristineless WC design: avoiding the need for specialized
code paths by minimizing different behavior and state dependencies.

Since the "is the file modified?" check is a pretty low-level building block,
it can be a part of an operation that doesn't hold a write-lock on the working
copy subtree.  If this check starts depending on the presence of individual
pristines, we would likely need to extend their lifetime, maybe by pinning
them for the duration of the comparison.  In turn, that would effectively
introduce the need for a read lock within a low-level read-only operation.
And given the large surface area of this primitive, adding such locking
requirements is something that I think we'd better avoid.

So, from a technical perspective, while I think we could try to make this
check depend on the global mode of the working copy (which I'm currently
working on), I don't think the potential benefits of depending on individual
pristine states outweigh the added complexity.


For now, I agree.

But for example, on the better-pristines branch which I recentlyrevived, I intend to implement storing small (compressed) pristine textsright in the wc-db, spilling over to on-disk files only if they exceedsome predefined size. Access to those blobs will be subject toin-database locking and transactions, not filesystem metadata (locks).So not using the content as a fallback if both the checksum and the sizeare equal seems like a wasted opportunity.

The same argument can be made for almost-pristine-less working copies.Content comparison is the only sure defence against tripping over hashcollisions. The question that needs to be answered: how does the addedcomplexity of what you describe above compare with changing the, oradding another, hash type to the pristines metadata? Considering thatthe latter is not going to happen soon.

On top of that, when the working copy is configured to use pristines, weused content comparison, too. So it's not really a question of whetherthere's added complexity but rather on which level it occurs: based onworking copy configuration – with externals bringing their own conceptof "fun" to the party – or based on records in the pristines table thatare the source of truth no matter what the WC configuration says.


-- Brane

Re: Status of branches/pristine-checksum-salt

Reply via email to