On Fri, Nov 13, 2015 at 04:29:15PM +0100, Andreas Krey wrote:

> > Likewise, I think dir.c:remove_dir_recurse is in a similar boat.
> > Grepping for resolve_gitlink_ref, it looks like there may be others,
> > too.
> 
> Can't we handle this in resolve_gitlink_ref itself? As I understand it,
> it should resolve a ref (here "HEAD") when path points to a submodule.
> When there isn't one it should return -1, so:

I'm not sure. I think part of the change to git-clean was that
is_git_directory() is a _better_ check than "can we resolve HEAD?"
because it covers empty repos, too.

> diff --git a/refs.c b/refs.c
> index 132eff5..f8648c5 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1553,6 +1553,10 @@ int resolve_gitlink_ref(const char *path, const char 
> *refname, unsigned char *sh
>       if (!len)
>               return -1;
>       submodule = xstrndup(path, len);
> +     if (!is_git_directory(submodule)) {
> +             free(submodule);
> +             return -1;
> +     }
>       refs = get_ref_cache(submodule);
>       free(submodule);
> 
> I'm way too little into the code to see what may this may get wrong.

I don't think it produces wrong outcomes, but I think it's sub-optimal.
In cases where we already have a ref cache, we'll hit the filesystem for
each lookup to re-confirm what we already know. That doesn't affect your
case, but it does when we actually _do_ have a submodule.

So if we were to follow this route, I think it would go better in
get_ref_cache itself (right after we determine there is no existing
cache, but before we call create_ref_cache()).

> But this, as well as the old hash-ref-cache patch speeds me
> up considerably, in this case a git ls-files -o from half a
> minute of mostly user CPU to a second.

Right, that makes sense to me.

> > All of these should be using the same test, I think. Doing that with
> > is_git_directory() is probably OK. It is a little more expensive than we
> > might want for mass-use (it actually opens and parses the HEAD file in
> > each directory),
> 
> This happens as well when we let resolve_gitlink_ref run its old course.
> (It (ls-files) even seems to try to open .git and then .git/HEAD, even
> if the former fails with ENOENT.)

Yes, I think my earlier comment that you are quoting was just misguided.
We only do the extra work if the directory actually does look like a
gitdir, and the many-directories case we are optimizing here is the
opposite of that.

So summing up, I think:

  1. We could get by with teaching get_ref_cache not to auto-create ref
     caches for non-git-directories.

  2. But for a little more work, pushing the is_git_directory() check
     out to the call-sites gives us probably saner semantics overall.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to