On Thu, Jan 15, 2026 at 9:16 AM Evgeny Kotkov via dev < [email protected]> wrote:
> Branko Čibej <[email protected]> writes: > > > Didn't we have file size and modification time as additional checks if > > a full-text compare was needed? The size is recorded in the wc-db and > > should be even if the pristine file is absent, but the mtime is not, > IIRC. > > In any case, the more checks we use, the harder it is to construct a > > collision. > > Yes, we begin by comparing file sizes and modification times against the > values stored in wc.db. This logic is identical for both pristineful and > pristineless working copies. > > It gets slightly trickier with eol/keyword translation, but if no > translation > is needed, I think it boils down to this: > > - If both the sizes and timestamps match, the file is considered > unmodified. > - If the sizes differ, the file is considered modified. > > However, there are still cases where these quick checks are inconclusive. > For example, if a file is modified but retains the same size, or if the > on-disk timestamps have somehow changed. In those cases, we fall back > to a content comparison via questions.c:compare_and_verify(): > > - In trunk, compare_and_verify() does not distinguish between pristineful > and pristineless working copies and always performs a checksum-based > comparison (for instance, because the pristine content is unavailable > in the pristineless case). > > - In 1.14, compare_and_verify() always performs a content comparison > between the pristine and the working file. Thanks for explaining this. (This clears up some questions I was going to try to answer by researching the history.) Since the checksum-based check is new in trunk (and 1.15)... I'm currently thinking that we could make compare_and_verify() perform a > content comparison for pristineful working copies, to avoid changing more > characteristics than necessary. So my plan was to sketch a patch to see > how this translates into code. > ...I am inclined to agree with this plan. In other words, if the plan comes to fruition, the behavior of compare_and_verify() would remain unchanged since 1.14.x, unless the working copy is pristineless. One more thought: In pristineless working copies, some pristines are available some of the time, such as when they have been fetched for any reason by an earlier operation. (In the current implementation, these may have been fetched for no other reason than because they share a common subtree with several modified files.) In this case, the content comparison could be performed, rather than the checksum comparison. The decision (whether to perform a content or checksum comparison) could be based on whether the pristine in question is available at this time, rather than on the pristineness of the working copy as a whole. Pros: - performs the "best" comparison possible with the available information (if we consider a content comparison to be "better" or "more definitive" than a checksum comparison) - future effort to allow more granular user control over pristines (rather than the all-or-nothing approach in 1.15.x) could benefit from such logic. Specifically, if a working copy is partially- pristined, I think we would want the content comparison performed for pristined files. - content comparison might be more performant than checksum comparison, due to short-circuit evaluation when the first difference is encountered; no such shortcut is possible with checksum calculation. Cons: - inconsistency: status checks of a file may behave differently at different times, since the pristine may be available during some invocations and unavailable in others. Thoughts? Cheers, Nathan

