On Mon, May 22, 2017 at 12:48 AM, Junio C Hamano <[email protected]> wrote:
> Samuel Lijin <[email protected]> writes:
>
>> + for (j = i = 0; i < dir.nr;) {
>> + for (;
>> + j < dir.ignored_nr &&
>> + 0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]);
>> + j++);
>> +
>> + if ((j < dir.ignored_nr) &&
>> + check_dir_entry_contains(dir.entries[i],
>> dir.ignored[j])) {
>> + /* skip any dir.entries which contains a dir.ignored */
>> + free(dir.entries[i]);
>> + dir.entries[i++] = NULL;
>> + } else {
>> + /* prune the contents of a dir.entries which will be
>> removed */
>> + struct dir_entry *ent = dir.entries[i++];
>> + for (;
>> + i < dir.nr &&
>> + check_dir_entry_contains(ent, dir.entries[i]);
>> + i++) {
>> + free(dir.entries[i]);
>> + dir.entries[i] = NULL;
>> + }
>> + }
>> + }
>
> The second loop in the else clause is a bit tricky, and the comment
> "which will be removed" is not all that helpful to explain why the
> loop is there.
>
> But I think the code is correct. Here is how I understood it.
>
> While looking at dir.entries[i], the code noticed that nothing
> in that directory is ignored. But entries in dir.entries[] that
> come later may be contained in dir.entries[i] and we just want
> to show the top-level untracked one (e.g. "a/" and "a/b/" were
> in entries[], there is nothing in "a/", so naturally there is
> nothing in "a/b/", but we do not want to bother showing
> both---showing "a/" alone saying "the entire a/ is untracked" is
> what we want).
Yep, that's the gist of it.
More specifically: the contents of untracked dirs have to be picked up
so that clean -d can distinguish between purely untracked dirs and
untracked dirs which also contain ignored files. In the case of a
purely untracked dir, clean -d can remove the dir itself, and should
just skip over all the dir's contents; in the case of a mixed
untracked dir, clean -d should not remove the dir itself, just the
untracked contents.
> We may want to have a test to ensure "a/b/" is indeed omitted in
> such a situation from the output, though.
Is there a reason to ensure specifically a/b/ is tested, or are the
current tests, which do cover the a/b (i.e. where a/b is a file, not
where a/b/ is a dir) case, insufficient for some reason?
> By the way, instead of putting NULL, it may be easier to follow if
> you used two pointers, src and dst, into dir.entries[], just like
> you did in your latest version of [PATCH 4/6]. That way, you do not
> have to change anything in the later loop that walks over elements
> in the dir.entries[] array. It would also help the logic easier to
> follow if the above loop were its own helper function.
Agreed on the helper function. On the src-dst thing: I considered it,
but I figured another O(n) set of array moves was unnecessary. I guess
this is one of those cases where premature optimization doesn't make
sense?
> Putting them all together, here is what I came up with that can be
> squashed into your patch. I am undecided myself if this is easier
> to follow than your version, but it seems to pass your test ;-)
>
> Thanks.
>
> builtin/clean.c | 70
> ++++++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 42 insertions(+), 28 deletions(-)
>
> diff --git a/builtin/clean.c b/builtin/clean.c
> index dd3308a447..c8712e7ac8 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -851,9 +851,49 @@ static void interactive_main_loop(void)
> }
> }
>
> +static void simplify_untracked(struct dir_struct *dir)
> +{
> + int src, dst, ign;
> +
> + for (src = dst = ign = 0; src < dir->nr; src++) {
> + /*
> + * Skip entries in ignored[] that cannot be inside
> + * entries[src]
> + */
> + while (ign < dir->ignored_nr &&
> + 0 <= cmp_dir_entry(&dir->entries[src],
> &dir->ignored[ign]))
> + ign++;
> +
> + if (dir->ignored_nr <= ign ||
> + !check_dir_entry_contains(dir->entries[src],
> dir->ignored[ign])) {
> + /*
> + * entries[src] does not contain an ignored
> + * path -- we need to keep it. But we do not
> + * want to show entries[] that are contained
> + * in entries[src].
> + */
> + struct dir_entry *ent = dir->entries[src++];
> + dir->entries[dst++] = ent;
> + while (src < dir->nr &&
> + check_dir_entry_contains(ent,
> dir->entries[src])) {
> + free(dir->entries[src++]);
> + }
> + /* compensate for the outer loop's loop control */
> + src--;
> + } else {
> + /*
> + * entries[src] contains an ignored path --
> + * drop it.
> + */
> + free(dir->entries[src]);
> + }
> + }
> + dir->nr = dst;
> +}
> +
> int cmd_clean(int argc, const char **argv, const char *prefix)
> {
> - int i, j, res;
> + int i, res;
> int dry_run = 0, remove_directories = 0, quiet = 0, ignored = 0;
> int ignored_only = 0, config_set = 0, errors = 0, gone = 1;
> int rm_flags = REMOVE_DIR_KEEP_NESTED_GIT;
> @@ -928,30 +968,7 @@ int cmd_clean(int argc, const char **argv, const char
> *prefix)
> prefix, argv);
>
> fill_directory(&dir, &pathspec);
> -
> - for (j = i = 0; i < dir.nr;) {
> - for (;
> - j < dir.ignored_nr &&
> - 0 <= cmp_dir_entry(&dir.entries[i], &dir.ignored[j]);
> - j++);
> -
> - if ((j < dir.ignored_nr) &&
> - check_dir_entry_contains(dir.entries[i],
> dir.ignored[j])) {
> - /* skip any dir.entries which contains a dir.ignored
> */
> - free(dir.entries[i]);
> - dir.entries[i++] = NULL;
> - } else {
> - /* prune the contents of a dir.entries which will be
> removed */
> - struct dir_entry *ent = dir.entries[i++];
> - for (;
> - i < dir.nr &&
> - check_dir_entry_contains(ent, dir.entries[i]);
> - i++) {
> - free(dir.entries[i]);
> - dir.entries[i] = NULL;
> - }
> - }
> - }
> + simplify_untracked(&dir);
>
> for (i = 0; i < dir.nr; i++) {
> struct dir_entry *ent = dir.entries[i];
> @@ -959,9 +976,6 @@ int cmd_clean(int argc, const char **argv, const char
> *prefix)
> struct stat st;
> const char *rel;
>
> - if (!ent)
> - continue;
> -
> if (!cache_name_is_other(ent->name, ent->len))
> continue;
>