> On 2019/05/09, at 11:00, Jonathan Tan <jonathanta...@google.com> wrote:
>
> Thanks for the numbers. Let me think about it some more, but I'm still
> reluctant to introduce multiple filter support in the protocol and the
> implementation for the following reasons:
Correction to the original command - I was tweaking it in the middle of running
it, and introduced an error that I didn’t notice. Here is one that will work
for an entire repo:
$ git rev-list --objects --filter=blob:none HEAD: | awk '{print $1}' | xargs -n
1 git cat-file -s | awk '{ total += $1; print total }'
When run to completion, Chromium totaled 17 301 144 bytes.
>
> - For large projects like Linux and Chromium, it may be reasonable to
> expect that an infrequent checkout would result in a few-megabyte
> download.
Anyone developing on Chromium would definitely consider a 17 MB original clone
to be an improvement over the status quo, but it is still not ideal.
And the 17MB initial download is only incurred once *assuming* the next idea is
implemented:
> - (After some in-office discussion) It may be possible to mitigate much
> of that by sending root trees that we have as "have" (e.g. by
> consulting the reflog), and that wouldn't need any protocol change.
This would complicate the code - not in Git itself, but in my FUSE-related
logic. We would have to explore the reflog and try to find the closest commits
in history to the target commit being checked out. This is sounding a bit hacky
and round-about, and it assumes that at the FUSE layer we can detect when a
checkout is happening cleanly and sufficiently early (rather than when one of
the sub-sub-trees is being accessed).
> - Supporting any combination of filter means that we have more to
> implement and test, especially if we want to support more filters in
> the future. In particular, the different filters (e.g. blob, tree)
> have different code paths now in Git. One way to solve it would be to
> combine everything into one monolith, but I would like to avoid it if
> possible (after having to deal with revision walking a few times...)
I don’t believe there is any need to introduce monolithic code. The bulk of the
filter implementation is in list-objects-filter.c, and I don’t think the file
will get much longer with an additional filter that “combines” the existing
filter. The new filter is likely simpler than the sparse filter. Once I add the
new filter and send out the initial patch set, we can discuss splitting up the
file, if it appears to be necessary.
My idea - if it is not clear already - is to add another OO-like interface to
list-objects-filter.c which parallels the 5 that are already there.