On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:
This should answer Duys comments as well.
> Torsten Bögershausen <[email protected]> writes:
>
[snip]
> > Should the following be protected by core.checkstat ?
> > if (check_stat) {
>
> I do not think such a if statement is strictly necessary.
>
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date. So it all depends on why you are limiting with
> check_stat. Is it because stdev is unusable? Is it because nsec is
> unusable? Is it because ino is unusable? Only in the last case,
> paying attention to check_stat will reduce the false positive.
>
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.
Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.
>
> >> + if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> + dup->ce_flags |= CE_MATCHED;
> >> + break;
> >> + }
> >> + }
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
>
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update. False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum. And we do not care all that much if it is more expensive;
> this is an error codepath after all.
>
> And from code structure's point of view, I think it makes sense. It
> would be even better if we can lose the conditional compilation.
The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
>From now on ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that.
Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:
static void mark_colliding_entries(const struct checkout *state,
struct cache_entry *ce, struct stat *st)
{
int i;
ce->ce_flags |= CE_MATCHED;
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
int folded = 0;
if (dup == ce)
break;
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
/*
* Windows sets ino to 0. On other FS ino = 0 will already be
* used, so we don't see it for a file in a Git working tree
*/
if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
folded = 1;
/*
* Fallback for NTFS and other case insenstive FS,
* which don't use POSIX inums
*/
if (!fspathcmp(dup->name, ce->name))
folded = 1;
if (folded) {
dup->ce_flags |= CE_MATCHED;
break;
}
}
}
>
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path. When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()). The current code then goes on and overwrites
> the file with the contents from the "other" path.
>
> Even if we had false negative in this loop, if we leave the contents
> for the earlier path while reporting the "other" path, then the user
> can get curious, inspect what contents the "other" path has on the
> filesystem, and can notice that it belongs to the (unreported--due
> to false negative) earlier path.
>
[snip]