Re: Resolving deltas dominates clone time

2019-04-30 Thread Martin Fick
On Tuesday, April 30, 2019 2:02:32 PM MDT Jeff King wrote: > On Tue, Apr 23, 2019 at 02:09:31PM -0600, Martin Fick wrote: > > I think that if there were no default limit during a clone it could have > > disastrous effects on people using the repo tool from the android project, > > or any other "sub

Re: Resolving deltas dominates clone time

2019-04-30 Thread Jeff King
On Tue, Apr 30, 2019 at 08:48:08PM +0200, Ævar Arnfjörð Bjarmason wrote: > > So I'd say the right answer is probably either online_cpus() or half > > that. The latter would be more appropriate for the machines I have, but > > I'd worry that it would leave performance on the table for non-intel > >

Re: Resolving deltas dominates clone time

2019-04-30 Thread Ævar Arnfjörð Bjarmason
On Tue, Apr 30 2019, Jeff King wrote: > On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote: > >> On Tue, Apr 23, 2019 at 11:45 AM Jeff King wrote: >> > >> > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: >> > >> > > Here are my p5302 numbers on linux.git, by the way. >> > >

Re: Resolving deltas dominates clone time

2019-04-30 Thread Jeff King
On Tue, Apr 23, 2019 at 02:09:31PM -0600, Martin Fick wrote: > Here are my index-pack results (I only ran them once since they take a while) > using vgit 1.8.3.2: > > Threads real usersys > 1 108m46.151s 106m14.420s 1m57.192s > 2 58m14.274s 106m23.158s 5m32.736s > 3

Re: Resolving deltas dominates clone time

2019-04-30 Thread Jeff King
On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote: > On Tue, Apr 23, 2019 at 11:45 AM Jeff King wrote: > > > > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: > > > > > Here are my p5302 numbers on linux.git, by the way. > > > > > > Test

Re: Resolving deltas dominates clone time

2019-04-23 Thread Martin Fick
On Tuesday, April 23, 2019 5:08:40 PM MDT Duy Nguyen wrote: > On Tue, Apr 23, 2019 at 11:45 AM Jeff King wrote: > > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: > > > Here are my p5302 numbers on linux.git, by the way. > > > > > > Test jk/

Re: Resolving deltas dominates clone time

2019-04-23 Thread Duy Nguyen
On Tue, Apr 23, 2019 at 11:45 AM Jeff King wrote: > > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: > > > Here are my p5302 numbers on linux.git, by the way. > > > > Test jk/p5302-repeat-fix > >

Re: Resolving deltas dominates clone time

2019-04-23 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 22 2019, Jeff King wrote: > On Mon, Apr 22, 2019 at 08:01:15PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Your patch is optionally removing the "woah, we got an object with a >> > duplicate sha1, let's check that the bytes are the same in both copies" >> > check. But Martin's probl

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: > Here are my p5302 numbers on linux.git, by the way. > > Test jk/p5302-repeat-fix > -- > 5302.2: index-pack 0 threads

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Mon, Apr 22, 2019 at 04:32:16PM -0600, Martin Fick wrote: > > Hours? I think something might be wrong. It takes 20s to run on > > linux.git. > > OK, yes I was running this on a "bad" copy of the repo, see below because I > think it might be of some interest also... > > On the better copy, th

Re: Resolving deltas dominates clone time

2019-04-22 Thread Martin Fick
On Monday, April 22, 2019 4:56:54 PM MDT Jeff King wrote: > On Mon, Apr 22, 2019 at 02:21:40PM -0600, Martin Fick wrote: > > > Try this (with a recent version of git; your v1.8.2.1 won't have > > > > > > --batch-all-objects): > > > # count the on-disk size of all objects > > > git cat-file --b

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Mon, Apr 22, 2019 at 04:56:54PM -0400, Jeff King wrote: > > I suspect at 3 threads, seems like the default? > > Ah, right, I forgot we cap it at 3 (which was determined experimentally, > and which we more or less attributed to lock contention as the > bottleneck). I think you need to use $GIT_

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Mon, Apr 22, 2019 at 02:21:40PM -0600, Martin Fick wrote: > > Try this (with a recent version of git; your v1.8.2.1 won't have > > --batch-all-objects): > > > > # count the on-disk size of all objects > > git cat-file --batch-all-objects --batch-check='%(objectsize) > > %(objectsize:disk)'

Re: Resolving deltas dominates clone time

2019-04-22 Thread Martin Fick
On Friday, April 19, 2019 11:58:25 PM MDT Jeff King wrote: > On Fri, Apr 19, 2019 at 03:47:22PM -0600, Martin Fick wrote: > > I have been thinking about this problem, and I suspect that this compute > > time is actually spent doing SHA1 calculations, is that possible? Some > > basic back of the env

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Mon, Apr 22, 2019 at 08:01:15PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Your patch is optionally removing the "woah, we got an object with a > > duplicate sha1, let's check that the bytes are the same in both copies" > > check. But Martin's problem is a clone, so we wouldn't have any existing

Re: Resolving deltas dominates clone time

2019-04-22 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 22 2019, Jeff King wrote: > On Sat, Apr 20, 2019 at 09:59:12AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > If you don't mind losing the collision-detection, using openssl's sha1 >> > might help. The delta resolution should be threaded, too. So in _theory_ >> > you're using 66 minute

Re: Resolving deltas dominates clone time

2019-04-22 Thread Jeff King
On Sat, Apr 20, 2019 at 09:59:12AM +0200, Ævar Arnfjörð Bjarmason wrote: > > If you don't mind losing the collision-detection, using openssl's sha1 > > might help. The delta resolution should be threaded, too. So in _theory_ > > you're using 66 minutes of CPU time, but that should only take 1-2 >

Re: Resolving deltas dominates clone time

2019-04-20 Thread Ævar Arnfjörð Bjarmason
On Sat, Apr 20 2019, Jeff King wrote: > On Fri, Apr 19, 2019 at 03:47:22PM -0600, Martin Fick wrote: > >> I have been thinking about this problem, and I suspect that this compute time >> is actually spent doing SHA1 calculations, is that possible? Some basic back >> of the envelope math and scri

Re: Resolving deltas dominates clone time

2019-04-19 Thread Jeff King
On Fri, Apr 19, 2019 at 03:47:22PM -0600, Martin Fick wrote: > I have been thinking about this problem, and I suspect that this compute time > is actually spent doing SHA1 calculations, is that possible? Some basic back > of the envelope math and scripting seems to show that the repo may actuall

Resolving deltas dominates clone time

2019-04-19 Thread Martin Fick
We have a serious performance problem with one of our large repos. The repo is our internal version of the android platform/manifest project. Our repo after running a clean "repack -A -d -F" is close to 8G in size, has over 700K refs, and it has over 8M objects. The repo takes around 40min to cl