Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

Torsten Bögershausen Thu, 16 Aug 2018 07:04:00 -0700

On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote:

This should answer Duys comments as well.
> Torsten Bögershausen <[email protected]> writes:
> 
[snip]
> > Should the following be protected by core.checkstat ? 
> >     if (check_stat) {
> 
> I do not think such a if statement is strictly necessary.
> 
> Even if check_stat tells us "when checking if a cached stat
> information tells us that the path may have modified, use minimum
> set of fields from the 'struct stat'", we still capture and update
> the values from the same "full" set of fields when we mark a cache
> entry up-to-date.  So it all depends on why you are limiting with
> check_stat.  Is it because stdev is unusable?  Is it because nsec is
> unusable?  Is it because ino is unusable?  Only in the last case,
> paying attention to check_stat will reduce the false positive.
> 
> But then you made me wonder what value check_stat has on Windows.
> If it is false, perhaps we do not even need the conditional
> compilation, which is a huge plus.


Agreed:
check_stat is 0 on Windows, and inum is allways 0 in lstat().
I was thinking about systems which don't have inodes and inum,
and then generate an inum in memory, sometimes random.
After a reboot or a re-mount of the file systems those ino values
change.
However, for the initial clone we are fine in any case.

> 
> >> +          if (dup->ce_stat_data.sd_ino == st->st_ino) {
> >> +                  dup->ce_flags |= CE_MATCHED;
> >> +                  break;
> >> +          }
> >> +  }
> >> +#endif
> >
> > Another thing is that we switch of the ASCII case-folding-detection-logic
> > off for Windows users, even if we otherwise rely on icase.
> > I think we can use fspathcmp() as a fallback. when inodes fail,
> > because we may be on a network file system.
> >
> > (I don't have a test setup at the moment, but what happens with inodes
> > when a Windows machine exports a share to Linux or Mac ?)
> >
> > Is there a chance to get the fspathcmp() back, like this ?
> 
> If fspathcmp() never gives false positives, I do not think we would
> mind using it like your update.  False negatives are fine, as that
> is better than just punting the whole thing when there is no usable
> inum.  And we do not care all that much if it is more expensive;
> this is an error codepath after all.
> 
> And from code structure's point of view, I think it makes sense.  It
> would be even better if we can lose the conditional compilation.

The current implementation of fspathcmp() does not give false positvies,
and future versions should not either.
All case-insentive file systems have always treated 'a-z' equal to 'A-Z'.
In FAT MS/DOS there had only been uppercase letters as file names,
and `type file.txt` (the equivilant to ´cat file.txt´ in *nix)
simply resultet in `type FILE.TXT`
Later, with VFAT and later with HPFS/NTFS a file could be stored on
disk as "File.txt".
>From now on  ´type FILE.TXT´ still worked, (and all other upper-lowercase
combinations).
This all is probably nothing new.
The main point should be that fspathcmp() should never return a false positive,
and I think we all agree on that. 


Now back to the compiler switch:
Windows always set inum to 0 and I can't think about a situation where
a file in a working tree gets inum = 0, can we use the following:

static void mark_colliding_entries(const struct checkout *state,
                                   struct cache_entry *ce, struct stat *st)
{
        int i;
        ce->ce_flags |= CE_MATCHED;

        for (i = 0; i < state->istate->cache_nr; i++) {
                struct cache_entry *dup = state->istate->cache[i];
                int folded = 0;

                if (dup == ce)
                        break;

                if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
                        continue;
                /*
                 * Windows sets ino to 0. On other FS ino = 0 will already be
                 *  used, so we don't see it for a file in a Git working tree
                 */
                if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino))
                        folded = 1;

                /*
                 * Fallback for NTFS and other case insenstive FS,
                 * which don't use POSIX inums
                 */
                if (!fspathcmp(dup->name, ce->name))
                        folded = 1;

                if (folded) {
                        dup->ce_flags |= CE_MATCHED;
                        break;
                }
        }
}


> 
> Another thing we maybe want to see is if we can update the caller of
> this function so that we do not overwrite the earlier checkout with
> the data for this path.  When two paths collide, we check out one of
> the paths without reporting (because we cannot notice), then attempt
> to check out the other path and report (because we do notice the
> previous one with lstat()).  The current code then goes on and overwrites
> the file with the contents from the "other" path.
> 
> Even if we had false negative in this loop, if we leave the contents
> for the earlier path while reporting the "other" path, then the user
> can get curious, inspect what contents the "other" path has on the
> filesystem, and can notice that it belongs to the (unreported--due
> to false negative) earlier path.
> 
[snip]

Re: [PATCH v4] clone: report duplicate entries on case-insensitive filesystems

Reply via email to