Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-11-16 Thread Jeff King
On Sat, Nov 14, 2015 at 02:35:01PM +0100, Andreas Krey wrote: > On Fri, 13 Nov 2015 19:01:18 +, Jeff King wrote: > ... > > 2. But for a little more work, pushing the is_git_directory() check > > out to the call-sites gives us probably saner semantics overall. > > Oops, now I get it[1]:

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-11-14 Thread Andreas Krey
On Fri, 13 Nov 2015 19:01:18 +, Jeff King wrote: ... > 2. But for a little more work, pushing the is_git_directory() check > out to the call-sites gives us probably saner semantics overall. Oops, now I get it[1]: You mean replacing resolve_gitlink_ref usages with is_git_directory, like:

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-11-14 Thread Andreas Krey
On Fri, 13 Nov 2015 19:01:18 +, Jeff King wrote: > > Can't we handle this in resolve_gitlink_ref itself? As I understand it, > > it should resolve a ref (here "HEAD") when path points to a submodule. > > When there isn't one it should return -1, so: > > I'm not sure. I think part of the c

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-11-13 Thread Jeff King
On Fri, Nov 13, 2015 at 04:29:15PM +0100, Andreas Krey wrote: > > Likewise, I think dir.c:remove_dir_recurse is in a similar boat. > > Grepping for resolve_gitlink_ref, it looks like there may be others, > > too. > > Can't we handle this in resolve_gitlink_ref itself? As I understand it, > it sho

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-11-13 Thread Andreas Krey
On Tue, 17 Mar 2015 01:48:00 +, Jeff King wrote: > On Mon, Mar 16, 2015 at 10:35:18PM -0700, Junio C Hamano wrote: > > > > It looks like we don't even really care about the value of HEAD. We just > > > want to know "is it a git directory?". I think in other places (like > > > "git add"), we ju

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Jeff King
On Mon, Mar 16, 2015 at 10:35:18PM -0700, Junio C Hamano wrote: > > It looks like we don't even really care about the value of HEAD. We just > > want to know "is it a git directory?". I think in other places (like > > "git add"), we just do an existence check for "$dir/.git". That would > > not ca

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Junio C Hamano
Jeff King writes: > The get_ref_cache code was designed to scale to the actual number of > submodules. I do not mind seeing it become a hash if people really do > have a large number of submodules, but that is not what is happening > here. > ... > So git-clean speculatively asks "what is HEAD in

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Jeff King
[+cc Michael for get_ref_cache wisdom] On Mon, Mar 16, 2015 at 07:40:40PM +0100, Andreas Krey wrote: > >I am guessing that the repository has tons > > of submodules? > > Not a single one. Thats's thie interesting thing that > makes me think I'm not actually solving the right problem. > > This r

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Andreas Krey
On Mon, 16 Mar 2015 10:23:05 +, Junio C Hamano wrote: > Andreas Krey writes: > ... > say "a lot of ignored directories", but do you mean directories in > the working tree (which I suppose do not have much to do with the > submodule_ref_caches[])? Apparently, they do. >I am guessing that the

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Junio C Hamano
Andreas Krey writes: > get_ref_cache used a linear list, which obviously is O(n^2). > Use a fixed bucket hash which just takes a factor of 10 > (~ 317^2) out of the n^2 - which is enough. > > Signed-off-by: Andreas Krey > --- > > This brings 'git clean -ndx' times down from 17 minutes > to 1

Re: [PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Thomas Gummerer
Hi, On 03/16, Andreas Krey wrote: > get_ref_cache used a linear list, which obviously is O(n^2). > Use a fixed bucket hash which just takes a factor of 10 > (~ 317^2) out of the n^2 - which is enough. > > Signed-off-by: Andreas Krey > --- > > This brings 'git clean -ndx' times down from 17 mi

[PATCH] refs.c: get_ref_cache: use a bucket hash

2015-03-16 Thread Andreas Krey
get_ref_cache used a linear list, which obviously is O(n^2). Use a fixed bucket hash which just takes a factor of 10 (~ 317^2) out of the n^2 - which is enough. Signed-off-by: Andreas Krey --- This brings 'git clean -ndx' times down from 17 minutes to 11 seconds on one of our workspaces (whi