On Fri, Mar 01, 2019 at 08:14:26PM +0100, Alban Gruin wrote:
> > diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> > index f1cb45c227..7aaa86f1c0 100644
> > --- a/builtin/name-rev.c
> > +++ b/builtin/name-rev.c
> > @@ -431,6 +431,8 @@ int cmd_name_rev(int argc, const char **argv, const
> > char *prefix)
> > OPT_END(),
> > };
> >
> > + save_commit_buffer = 0;
> > +
> [...]
>
> Unfortunately this does not work in all cases, apparently. On my git
> copy, I have 3 origins. If I run this:
>
> git log --graph --oneline --abbrev=-1 -5 | git name-rev --stdin
>
> With or without your change, it uses 3GB of RAM. With this series, it
> uses 25MB of RAM.
Sorry if I was unclear. I meant to try that _in addition_ to your
changes. It helps by avoiding keeping the useless commit-object buffers
in RAM as we traverse. But the most it can save is the total
uncompressed bytes of all commit objects. I.e., in git.git:
$ git cat-file --batch-check='%(objectsize) %(objecttype)'
--batch-all-objects |
grep commit |
perl -alne '$total += $F[0]; END { print $total }'
74678114
or around 70MB. In linux.git, it's more like 700MB.
But in your examples, the problem is the inefficiencies in name-rev's
algorithm, and you're not actually traversing that many commits. So I
think you'd want to turn off save_commit_buffer as an extra patch in
your series. It may or not be a big win for any given case, but it's
quite easy to do.
-Peff