Re: Status of branches/pristine-checksum-salt

Evgeny Kotkov via dev Fri, 16 Jan 2026 09:24:20 -0800

Nathan Hartman <[email protected]> writes:

> In pristineless working copies, some pristines are available some of
> the time, such as when they have been fetched for any reason by an
> earlier operation. (In the current implementation, these may have been
> fetched for no other reason than because they share a common subtree
> with several modified files.) In this case, the content comparison
> could be performed, rather than the checksum comparison. The decision
> (whether to perform a content or checksum comparison) could be based
> on whether the pristine in question is available at this time, rather
> than on the pristineness of the working copy as a whole.
>
> Pros:
>
> - performs the "best" comparison possible with the available
>   information (if we consider a content comparison to be "better" or
>   "more definitive" than a checksum comparison)
>
> - future effort to allow more granular user control over pristines
>   (rather than the all-or-nothing approach in 1.15.x) could benefit
>   from such logic. Specifically, if a working copy is partially-
>   pristined, I think we would want the content comparison performed
>   for pristined files.
>
> - content comparison might be more performant than checksum
>   comparison, due to short-circuit evaluation when the first
>   difference is encountered; no such shortcut is possible with
>   checksum calculation.


I wouldn't say that one is universally "better" than the other, just that they
have different characteristics.

For example, checksum-based comparison can reduce the number of heavy
"open file" I/O syscalls, because in some cases we don't have to open
the pristine file at all.

Also, such a behavior change would currently only affect the subset of
files that were kept hydrated after the operation, i.e., the modified files,
while remaining unchanged for the majority of unmodified files.

> Cons:
>
> - inconsistency: status checks of a file may behave differently at
>   different times, since the pristine may be available during some
>   invocations and unavailable in others.

While this unpredictability by itself seems undesirable to me, I think there's
a bigger issue.

The current checksum-based approach avoids the complexity of *depending*
on the hydration state of individual pristines.  This reflects the broader
intent of the pristineless WC design: avoiding the need for specialized
code paths by minimizing different behavior and state dependencies.

Since the "is the file modified?" check is a pretty low-level building block,
it can be a part of an operation that doesn't hold a write-lock on the working
copy subtree.  If this check starts depending on the presence of individual
pristines, we would likely need to extend their lifetime, maybe by pinning
them for the duration of the comparison.  In turn, that would effectively
introduce the need for a read lock within a low-level read-only operation.
And given the large surface area of this primitive, adding such locking
requirements is something that I think we'd better avoid.

So, from a technical perspective, while I think we could try to make this
check depend on the global mode of the working copy (which I'm currently
working on), I don't think the potential benefits of depending on individual
pristine states outweigh the added complexity.


Thanks,
Evgeny Kotkov

Re: Status of branches/pristine-checksum-salt

Reply via email to