Samuel Lijin <[email protected]> writes:
> When we taught read_directory_recursive() to recurse into untracked
> directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that
> had the side effect of teaching it to collect the untracked contents of
> untracked directories. It doesn't always make sense to return these,
> though (we do need them for `clean -d`), so we introduce a flag
> (DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory()
> strips dir->entries of the untracked contents of untracked dirs.
>
> We also introduce check_contains() to check if one dir_entry corresponds
> to a path which contains the path corresponding to another dir_entry.
>
> Signed-off-by: Samuel Lijin <[email protected]>
> ---
> dir.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> dir.h | 3 ++-
> 2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 6bd0350e9..214a148ee 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1852,6 +1852,14 @@ static int cmp_name(const void *p1, const void *p2)
> return name_compare(e1->name, e1->len, e2->name, e2->len);
> }
>
> +/* check if *out lexically contains *in */
> +static int check_contains(const struct dir_entry *out, const struct
> dir_entry *in)
> +{
> + return (out->len < in->len) &&
> + (out->name[out->len - 1] == '/') &&
> + !memcmp(out->name, in->name, out->len);
> +}
OK, treat_one_path() and treat_pah_fast() both ensure that a path to
a directory is terminated with '/' before calling dir_add_name() and
dir_add_ignored(), so we know a dir_entry "out" that is a directory
must end with '/'. Good.
The second and third line being overly indented is a bit
distracting, though.
> static int treat_leading_path(struct dir_struct *dir,
> const char *path, int len,
> const struct pathspec *pathspec)
> @@ -2067,6 +2075,52 @@ int read_directory(struct dir_struct *dir, const char
> *path,
> read_directory_recursive(dir, path, len, untracked, 0,
> pathspec);
> QSORT(dir->entries, dir->nr, cmp_name);
> QSORT(dir->ignored, dir->ignored_nr, cmp_name);
> +
> + // if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick
> + // up untracked contents of untracked dirs; by default we discard these,
> + // but given DIR_KEEP_UNTRACKED_CONTENTS we do not
/*
* Our multi-line comments are formatted like this
* example. No C++/C99 // comments, outside of
* borrowed code and platform specific compat/ code,
* please.
*/
> + if ((dir->flags & DIR_SHOW_IGNORED_TOO)
> + && !(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) {
Both having && at the end and && at the beginning are valid C, but
please stick to one style in a single file.
> + int i, j, nr_removed = 0;
> +
> + // remove from dir->entries untracked contents of untracked dirs
/* And our single-liner comments look like this */
> + for (i = 0; i < dir->nr; i++) {
> + if (!dir->entries[i])
> + continue;
> +
> + for (j = i + 1; j < dir->nr; j++) {
> + if (!dir->entries[j])
> + continue;
> + if (check_contains(dir->entries[i],
> dir->entries[j])) {
> + nr_removed++;
> + free(dir->entries[j]);
> + dir->entries[j] = NULL;
> + }
> + else {
> + break;
> + }
> + }
> + }
This loop is O(n^2). I wonder if we can do better, especially we
know dir->entries[] is sorted already.
Well, because it is sorted, if A/, A/B, and A/B/C are all untracked,
the first round that scans for A/ will nuke both A/B and A/B/C, so
we won't have to scan looking for entries inside A/B, which is a bit
of consolation ;-)
> + for (i = 0;;) {
> + while (i < dir->nr && dir->entries[i])
> + i++;
> + if (i == dir->nr)
> + break;
> + j = i;
> + while (j < dir->nr && !dir->entries[j])
> + j++;
> + if (j == dir->nr)
> + break;
> + dir->entries[i] = dir->entries[j];
> + dir->entries[j] = NULL;
> + }
> + dir->nr -= nr_removed;
This looks like an overly complicated way to scan an array and skip
NULLs. Are you doing an equivalent of this loop, or am I missing
something subtle?
for (src = dst = 0; src < nr; src++)
if (array[src])
array[dst++] = src;
nr = dst;