Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax
On Fri, Dec 8, 2017 at 12:16 AM, Jeff King wrote: > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments > with ",", 2017-10-01) switched the syntax of the trailers > placeholder, but forgot to update the documentation in > pretty-formats.txt. > > There's need to mention the old syntax; it was never in a I suppose you mean: s/need/no need/ > released version of Git. > > Signed-off-by: Jeff King
Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax
On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote: > On Fri, Dec 8, 2017 at 12:16 AM, Jeff King wrote: > > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments > > with ",", 2017-10-01) switched the syntax of the trailers > > placeholder, but forgot to update the documentation in > > pretty-formats.txt. > > > > There's need to mention the old syntax; it was never in a > > I suppose you mean: s/need/no need/ Yes, indeed. Thanks. -Peff
RE: URGENT PLEASE
Re: [PATCH] hashmap: adjust documentation to reflect reality
On Thu, Dec 07, 2017 at 10:47:43PM +0100, Johannes Schindelin wrote: > > We could add that example to the test helper as then we have a good (tested) > > example for that case, too. > > What we could *also* do, and what would probably make *even more* sense, > is to simplify the example drastically, to a point where testing it in > test-hashmap is pointless, and where a reader can gather *very* quickly > what it takes to use the hashmap routines. Yes, I'd be in favor of that, too. > In any case, I would really like to see my patch applied first, as it is a > separate concern from what you gentle people talk about: I simply want > that incorrect documentation fixed. The earlier, the better. Definitely. I think it is in "next" already. -Peff
Re: [PATCH v2 5/7] diff: use skip-to-optional-val in parsing --relative
On Thu, Dec 07, 2017 at 02:31:38PM -0800, Junio C Hamano wrote: > If this goes on top as a standalone patch, then the reason why it is > separate from the other users of _default() is not because the way > it uses the null return is special, but because it was written by a > different author, I would think. Mostly I was just concerned that we missed a somewhat subtle bug in the first conversion, and I think it's worth calling out in the commit message why that callsite must be converted in a particular way. Whether that happens in a separate commit or squashed, I don't care too much. But I dunno. Maybe it is obvious now that the correct code is in there. ;) -Peff
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote: > Sometimes users are given a hash of an object and they want to > identify it further (ex.: Use verify-pack to find the largest blobs, > but what are these? or [1]) > > One might be tempted to extend git-describe to also work with blobs, > such that `git describe ` gives a description as > ':'. This was implemented at [2]; as seen by the sheer > number of responses (>110), it turns out this is tricky to get right. > The hard part to get right is picking the correct 'commit-ish' as that > could be the commit that (re-)introduced the blob or the blob that > removed the blob; the blob could exist in different branches. Neat. I didn't follow the original thread very closely, but I think this is a good idea (and is definitely something I've simulated manually with "git log --raw | grep" before). Bear with me while I pontificate for a moment. We do something similar at GitHub to report on too-large objects during a push (we identify the object during index-pack, but then want to hand back a human-readable pathname). We do it with a patch to "rev-list", so that you can use the normal traversal options to limit the set of visited commits. Which effectively makes it "traverse and find a place where this object is referenced, and then tell me the path at which you found it (or tell me if it was not referenced at all)". That sidesteps the "describe" problem, because now it is the user's responsibility to tell you which commits to look in. :) But the rev-list solution only finds _a_ commit that contains the object, and which likely has nothing to do with the object at all. Your solution here finds the introduction and removal of the object, which are interesting for a wider variety of searches. So I wondered if this could replace my rev-list patch (which I've been meaning to upstream for a while). But I think the answer is "no", there is room for both features. What you're finding here is much more specific and useful data, but it's also much more expensive to generate. For my uses cases, it was enough to ask "show me a plausible path quickly" or even "is this object reachable" (which you can answer out of bitmaps!). > Junio hinted at a different approach of solving this problem, which this > patch implements. Teach the diff machinery another flag for restricting > the information to what is shown. For example: > > $ ./git log --oneline --blobfind=v2.0.0:Makefile > b2feb64309 Revert the whole "ask curl-config" topic for now > 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" > > we observe that the Makefile as shipped with 2.0 was introduced in > v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by > a different blob. So I like this very much as an addition to Git's toolbox. But does it really need to be limited to blobs? We should be able to ask for trees (both root trees and sub-trees), too. That's going to be less common, I think, but I have definitely done git log --raw -t --pretty=raw --no-abbrev | less +/$tree_sha1 to debug things (corrupted repos, funny merges, etc). You could even do it for submodule commits. Which demonstrates that this feature does not even need to actually have a resolvable object; you're just looking for the textual sha1. You do your filtering on the diff queue, which I think will have the entries you need if "-t" is specified (and should always have submodule commits, I think). So you'd just need to relax the incoming object check, like: diff --git a/diff.c b/diff.c index e4571df3bf..0007bb0a09 100644 --- a/diff.c +++ b/diff.c @@ -4490,8 +4490,12 @@ static int parse_blobfind_opt(struct diff_options *opt, const char *arg) { struct object_id oid; - if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != OBJ_BLOB) - return error("object '%s' is not a blob", arg); + /* +* We don't even need to have the object, and 40-hex sha1s will +* just resolve here. +*/ + if (get_oid_blob(arg, &oid)) + return error("unable to resolve '%s'", arg); if (!opt->blobfind) opt->blobfind = xcalloc(1, sizeof(*opt->blobfind)); At which point: ./git log --oneline -t --blobfind=v2.0.0:Documentation just works (the results are kind of interesting, too; you see that tree "go away" in a lot of different commits because many independent branches touched the directory). Also interesting: ./git log --oneline --blobfind=HEAD:sha1collisiondetection Well, the output for this particular case isn't that interesting. But it shows that we can find submodules, too. -Peff
Re: Unfortunate interaction between git diff-index and fakeroot
On 12/07/2017 05:31 PM, Junio C Hamano wrote: Correct. fakeroot would report that the files that are actually owned by the user who is running fakeroot are owned by root; the cached stat information in the index would be "corrected" to say that they are owned by root. So once the index is refreshed like so, things will become consistent. I still fail to understand, what's the benefit of storing the owner ID in the index, where no one seems to use this information. I mean, one could store many other meta information in the index, such as ACL lists, number of blocks allocated, etc, but the code seems to ignore unused information, such as commit 2cb45e95438c113871fbbea5b4f629f9463034e7 which ignores st_dev, because it doesn't actually matter, or commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores mode changes not relevant to the owner. In which situation does anyone care about the owner UID? What's the difference between that, and, e.g., st_dev? If you are using "diff-index" for the "is the tree dirty?" check without running "update-index --refresh", then you are not using the command correctly. Just want to clarify. It seems like the linux kernel is using diff-index to do just that, in scripts/setlocalversion https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/setlocalversion#n77 Do I understand correctly that linux should update the index first, or better, use porcelain "git diff --name-only" instead? It seems to be the case previously, but in commit cdf2bc632ebc9ef512345fe8e6015edfd367e256 git update-index was removed, to prevent error messages on read-only linux tree. Do I understand correctly that the more correct solution for modern git, is to use git diff --name-only instead? It should work on read-only as well as on read-write trees, and would not break if anyone uses git under fakeroot. Did I understand you correctly?
Re: [PATCH] decorate: clean up and document API
On Thu, Dec 07, 2017 at 04:14:24PM -0800, Jonathan Tan wrote: > Improve the names of the identifiers in decorate.h, document them, and > add an example of how to use these functions. > > The example is compiled and run as part of the test suite. > > Signed-off-by: Jonathan Tan > --- > This patch contains some example code in a test helper. > > There is a discussion at [1] about where example code for hashmap should > go. For something relatively simple, like decorate, perhaps example code > isn't necessary; but if we include example code anyway, I think that we > might as well make it compiled and tested, so this patch contains that > example code in a test helper. I have mixed feelings. On the one hand, compiling and running the code ensures that those things actually work. On the other hand, I expect you can make a much clearer example if instead of having running code, you show snippets of almost-code. E.g.: struct decoration d = { NULL }; add_decoration(&d, obj, "foo"); ... str = lookup_decoration(obj); pretty much gives the relevant overview, with very little boilerplate. Yes, it omits things like the return value of add_decoration(), but those sorts of details are probably better left to the function docstrings. Other than that philosophical point, the documentation you added looks pretty good to me. Two possible improvements to the API we could do on top: 1. Should there be a DECORATION_INIT macro (possibly taking the "name" as an argument)? (Actually, the whole name thing seems like a confusing and bad API design in the first place). 2. This is really just an oidmap to a void pointer. I wonder if we ought to be wrapping that code (I think we still want some interface so that the caller doesn't have to declare their own structs). -Peff
[PATCH] cvsimport: apply shell-quoting regex globally
Commit 5b4efea666 (cvsimport: shell-quote variable used in backticks, 2017-09-11) tried to shell-quote a variable, but forgot to use the "/g" modifier to apply the quoting to the whole variable. This means we'd miss any embedded single-quotes after the first one. Reported-by: Signed-off-by: Jeff King --- git-cvsimport.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-cvsimport.perl b/git-cvsimport.perl index 36929921ea..2d8df83172 100755 --- a/git-cvsimport.perl +++ b/git-cvsimport.perl @@ -642,7 +642,7 @@ sub is_sha1 { sub get_headref ($) { my $name = shift; - $name =~ s/'/'\\''/; + $name =~ s/'/'\\''/g; my $r = `git rev-parse --verify '$name' 2>/dev/null`; return undef unless $? == 0; chomp $r; -- 2.15.1.659.g8bd2eae3ea
Re: [PATCH] setup.c: fix comment about order of .git directory discovery
On Thu, Dec 07, 2017 at 09:59:49AM +0100, SZEDER Gábor wrote: > Since gitfiles were introduced in b44ebb19e (Add platform-independent > .git "symlink", 2008-02-20) the order of checks during .git directory > discovery is: gitfile, gitdir, bare repo. However, that commit did > only partially update the in-code comment describing this order, > missing the last line which still puts gitdir before gitfile. I was confused at first, because your patch only changes to ".git/" to ".git". So there is no ordering change, only a swapping out of one for the other. But that pushes done the ".git/" into the "etc." which is below. So I think this makes sense. Thanks. -Peff
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote: > > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c > > index 22034f87e7..8e8a15ea4a 100644 > > --- a/builtin/fmt-merge-msg.c > > +++ b/builtin/fmt-merge-msg.c > > @@ -377,7 +377,8 @@ static void shortlog(const char *name, > > string_list_append(&subjects, > >oid_to_hex(&commit->object.oid)); > > else > > - string_list_append(&subjects, strbuf_detach(&sb, NULL)); > > + string_list_append_nodup(&subjects, > > +strbuf_detach(&sb, NULL)); > > } > > > > if (opts->credit_people) > > What is leaked comes from strbuf, so the title is not a lie, but I > tend to think that this leak is caused by a somewhat strange > string_list API. The subjects string-list is initialized as a "dup" > kind, but a caller that wants to avoid leaking can (and should) use > _nodup() call to add a string without duping. It all feels a bit > too convoluted. I'm not sure it's string-list's fault. Many callers (including this one) have _some_ entries whose strings must be duplicated and others which do not. So either: 1. The list gets marked as "nodup", and we add an extra xstrdup() to the oid_to_hex call above. And also need to remember to free() the strings later, since the list does not own them. or 2. We mark it as "dup" and incur an extra allocation and copy, like: string_list_append(&subjects, sb.buf); strbuf_release(&buf); So I'd really blame the caller, which doesn't want to do (2) out of a sense of optimization. It could also perhaps write it as: while (commit = get_revision(rev)) { strbuf_reset(&sb); ... maybe put some stuff in sb ... if (!sb.len) string_list_append(&subjects, oid_to_hex(obj)); else string_list_append(&subjects, sb.buf); } strbuf_release(&sb); which at least avoids the extra allocations. By the way, I think there's another quite subtle leak in this function. We do this: format_commit_message(commit, "%s", &sb, &ctx); strbuf_ltrim(&sb); and then only use "sb" if sb.len is non-zero. But we may have actually allocated to create our zero-length string (e.g., if we had a strbuf full of spaces and trimmed them all off). Since we reuse "sb" over and over as we loop, this will actually only leak once for the whole loop, not once per iteration. So it's probably not a big deal, but writing it with the explicit reset/release pattern fixes that (and is more idiomatic for our code base, I think). -Peff
[PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH
You may want to run the test suite with a different shell than you use to build Git. For instance, you may build with SHELL_PATH=/bin/sh (because it's faster, or it's what you expect to exist on systems where the build will be used) but want to run the test suite with bash (e.g., since that allows using "-x" reliably across the whole test suite). There's currently no good way to do this. You might think that doing two separate make invocations, like: make && make -C t SHELL_PATH=/bin/bash would work. And it _almost_ does. The second make will see our bash SHELL_PATH, and we'll use that to run the individual test scripts (or tell prove to use it to do so). So far so good. But this breaks down when "--tee" or "--verbose-log" is used. Those options cause the test script to actually re-exec itself using $SHELL_PATH. But wait, wouldn't our second make invocation have set SHELL_PATH correctly in the environment? Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we built during the first "make". And that overrides the environment, giving us the original SHELL_PATH again. Let's introduce a new variable that lets you specify a specific shell to be run for the test scripts. Note that we have to touch both the main and t/ Makefiles, since we have to record it in GIT-BUILD-OPTIONS in one, and use it in the latter. Signed-off-by: Jeff King --- Makefile | 8 t/Makefile| 6 -- t/test-lib.sh | 2 +- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index fef9c8d272..8a21c4d8f1 100644 --- a/Makefile +++ b/Makefile @@ -425,6 +425,10 @@ all:: # # to say "export LESS=FRX (and LV=-c) if the environment variable # LESS (and LV) is not set, respectively". +# +# Define TEST_SHELL_PATH if you want to use a shell besides SHELL_PATH for +# running the test scripts (e.g., bash has better support for "set -x" +# tracing). GIT-VERSION-FILE: FORCE @$(SHELL_PATH) ./GIT-VERSION-GEN @@ -729,6 +733,8 @@ endif export PERL_PATH export PYTHON_PATH +TEST_SHELL_PATH = $(SHELL_PATH) + LIB_FILE = libgit.a XDIFF_LIB = xdiff/lib.a VCSSVN_LIB = vcs-svn/lib.a @@ -1725,6 +1731,7 @@ prefix_SQ = $(subst ','\'',$(prefix)) gitwebdir_SQ = $(subst ','\'',$(gitwebdir)) SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) +TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH)) PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH)) PYTHON_PATH_SQ = $(subst ','\'',$(PYTHON_PATH)) TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH)) @@ -2355,6 +2362,7 @@ GIT-LDFLAGS: FORCE # and the first level quoting from the shell that runs "echo". GIT-BUILD-OPTIONS: FORCE @echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@+ + @echo TEST_SHELL_PATH=\''$(subst ','\'',$(TEST_SHELL_PATH_SQ))'\' >>$@+ @echo PERL_PATH=\''$(subst ','\'',$(PERL_PATH_SQ))'\' >>$@+ @echo DIFF=\''$(subst ','\'',$(subst ','\'',$(DIFF)))'\' >>$@+ @echo PYTHON_PATH=\''$(subst ','\'',$(PYTHON_PATH_SQ))'\' >>$@+ diff --git a/t/Makefile b/t/Makefile index 1bb06c36f2..96317a35f4 100644 --- a/t/Makefile +++ b/t/Makefile @@ -8,6 +8,7 @@ #GIT_TEST_OPTS = --verbose --debug SHELL_PATH ?= $(SHELL) +TEST_SHELL_PATH ?= $(SHELL_PATH) PERL_PATH ?= /usr/bin/perl TAR ?= $(TAR) RM ?= rm -f @@ -23,6 +24,7 @@ endif # Shell quote; SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) +TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH)) PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH)) TEST_RESULTS_DIRECTORY_SQ = $(subst ','\'',$(TEST_RESULTS_DIRECTORY)) @@ -42,11 +44,11 @@ failed: test -z "$$failed" || $(MAKE) $$failed prove: pre-clean $(TEST_LINT) - @echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' $(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS) + @echo "*** prove ***"; $(PROVE) --exec '$(TEST_SHELL_PATH_SQ)' $(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS) $(MAKE) clean-except-prove-cache $(T): - @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS) + @echo "*** $@ ***"; '$(TEST_SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS) pre-clean: $(RM) -r '$(TEST_RESULTS_DIRECTORY_SQ)' diff --git a/t/test-lib.sh b/t/test-lib.sh index b8dd5e79ac..1ae0fc02d0 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -80,7 +80,7 @@ done,*) # from any previous runs. >"$GIT_TEST_TEE_OUTPUT_FILE" - (GIT_TEST_TEE_STARTED=done ${SHELL_PATH} "$0" "$@" 2>&1; + (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; echo $? >"$BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE" test "$(cat "$BASE.exit")" = 0 exit -- 2.15.1.659.g8bd2eae3ea
[PATCH v2 3/4] test-lib: make "-x" work with "--verbose-log"
The "-x" tracing option implies "--verbose". This is a problem when running under a TAP harness like "prove", where we need to use "--verbose-log" instead. Instead, let's handle this the same way we do for --valgrind, including the recent fix from 88c6e9d31c (test-lib: --valgrind should not override --verbose-log, 2017-09-05). Namely, let's enable --verbose only when we know there isn't a more specific verbosity option indicated. Note that we also have to tweak `want_trace` to turn it on (previously we just lumped $verbose_log in with $verbose, but now we don't necessarily auto-set the latter). Signed-off-by: Jeff King --- t/test-lib.sh | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/t/test-lib.sh b/t/test-lib.sh index 7914453a3b..b8dd5e79ac 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -264,7 +264,6 @@ do shift ;; -x) trace=t - verbose=t shift ;; --verbose-log) verbose_log=t @@ -283,6 +282,11 @@ then test -z "$verbose_log" && verbose=t fi +if test -n "$trace" && test -z "$verbose_log" +then + verbose=t +fi + if test -n "$color" then # Save the color control sequences now rather than run tput @@ -586,7 +590,9 @@ maybe_setup_valgrind () { } want_trace () { - test "$trace" = t && test "$verbose" = t + test "$trace" = t && { + test "$verbose" = t || test "$verbose_log" = t + } } # This is a separate function because some tests use -- 2.15.1.659.g8bd2eae3ea
[PATCH v2 2/4] t5615: avoid re-using descriptor 4
File descriptors 3 and 4 are special in our test suite, as they link back to the test script's original stdout and stderr. Normally this isn't something tests need to worry about: they are free to clobber these descriptors for sub-commands without affecting the overall script. But there's one very special thing about descriptor 4: since d88785e424 (test-lib: set BASH_XTRACEFD automatically, 2016-05-11), we ask bash to output "set -x" output to it by number. This goes to _any_ descriptor 4, even if it no longer points to the place it did when we set BASH_XTRACEFD. But in t5615, we run a shell loop with descriptor 4 redirected. As a result, t5615 works with non-bash shells even with "-x". And it works with bash without "-x". But the combination of "bash t5615-alternate-env.sh -x" gets a test failure (because our "set -x" output pollutes one of the files). We can fix this by using any descriptor _except_ the magical 4. So let's switch arbitrarily to using 5/6 in this loop, not 3/4. Another alternative is to use a different descriptor for BASH_XTRACEFD. But picking an unused one turns out to be hard. Most shells limit us to 9 numbered descriptors. Bash can handle more, but: - while the BASH_XTRACEFD is specific to bash, GIT_TRACE=4 has a similar problem, and would affect all shells - constructs like "999>/dev/null" are synticatically invalid to non-bash shells. So we have to actually bury it inside an eval, which creates more complications. Of the numbers 1-9, you might think that "9" would be less used than "4". But it's not; many of our scripts use descriptors 8 and 9 (probably under the assumption that they are high and therefore unused). The least-used descriptor is currently "7". We could switch to that, but we're just trading one magic number for another. Signed-off-by: Jeff King --- t/t5615-alternate-env.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t5615-alternate-env.sh b/t/t5615-alternate-env.sh index d2d883f3a1..b4905b822c 100755 --- a/t/t5615-alternate-env.sh +++ b/t/t5615-alternate-env.sh @@ -7,9 +7,9 @@ check_obj () { alt=$1; shift while read obj expect do - echo "$obj" >&3 && - echo "$obj $expect" >&4 - done 3>input 4>expect && + echo "$obj" >&5 && + echo "$obj $expect" >&6 + done 5>input 6>expect && GIT_ALTERNATE_OBJECT_DIRECTORIES=$alt \ git "$@" cat-file --batch-check='%(objectname) %(objecttype)' \ actual && -- 2.15.1.659.g8bd2eae3ea
[PATCH v2 0/4] making test-suite tracing more useful
This series fixes some rough edges in the "-x" feature of the test suite. With it, it should be possible to turn on tracing for CI runs. This is mostly a repost of v1 at: https://public-inbox.org/git/20171019210140.64lb52cqtgdh2...@sigill.intra.peff.net which had some discussion, but wasn't picked up. I fixed a few typos pointed out by reviewers, and I tried to summarize the discussion around "magic descriptor 4" in the commit message of patch 2. My conclusion there was that none of the options is particularly good. The only thing thing that might make sense is making "7" the magical descriptor. But given that "4" only had one troubling call, I'm inclined to just leave it as-is and see if this ever comes up again (especially if we start using "-x" in the Travis builds, then we'd catch any problems). [1/4]: test-lib: silence "-x" cleanup under bash [2/4]: t5615: avoid re-using descriptor 4 [3/4]: test-lib: make "-x" work with "--verbose-log" [4/4]: t/Makefile: introduce TEST_SHELL_PATH Makefile | 8 t/Makefile | 6 -- t/t5615-alternate-env.sh | 6 +++--- t/test-lib.sh| 46 +- 4 files changed, 48 insertions(+), 18 deletions(-) -Peff
[PATCH v2 1/4] test-lib: silence "-x" cleanup under bash
When the test suite's "-x" option is used with bash, we end up seeing cleanup cruft in the output: $ bash t0001-init.sh -x [...] ++ diff -u expected actual + test_eval_ret_=0 + want_trace + test t = t + test t = t + set +x ok 42 - re-init from a linked worktree This ranges from mildly annoying (for a successful test) to downright confusing (when we say "last command exited with error", but it's really 5 commands back). We normally are able to suppress this cleanup. As the in-code comment explains, we can't convince the shell not to print it, but we can redirect its stderr elsewhere. But since d88785e424 (test-lib: set BASH_XTRACEFD automatically, 2016-05-11), that doesn't hold for bash. It sends the "set -x" output directly to descriptor 4, not to stderr. We can fix this by also redirecting descriptor 4, and paying close attention to which commands redirected and which are not (see the updated comment). Two alternatives I considered and rejected: - unsetting and setting BASH_XTRACEFD; doing so closes the descriptor, which we must avoid - we could keep everything in a single block as before, redirect 4>/dev/null there, but retain 5>&4 as a copy. And then selectively restore 4>&5 for commands which should be allowed to trace. This would work, but the descriptor swapping seems unnecessarily confusing. Signed-off-by: Jeff King --- t/test-lib.sh | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/t/test-lib.sh b/t/test-lib.sh index 116bd6a70c..7914453a3b 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -601,26 +601,40 @@ test_eval_inner_ () { } test_eval_ () { - # We run this block with stderr redirected to avoid extra cruft - # during a "-x" trace. Once in "set -x" mode, we cannot prevent + # If "-x" tracing is in effect, then we want to avoid polluting stderr + # with non-test commands. But once in "set -x" mode, we cannot prevent # the shell from printing the "set +x" to turn it off (nor the saving # of $? before that). But we can make sure that the output goes to # /dev/null. # - # The test itself is run with stderr put back to &4 (so either to - # /dev/null, or to the original stderr if --verbose was used). + # There are a few subtleties here: + # + # - we have to redirect descriptor 4 in addition to 2, to cover + # BASH_XTRACEFD + # + # - the actual eval has to come before the redirection block (since + # it needs to see descriptor 4 to set up its stderr) + # + # - likewise, any error message we print must be outside the block to + # access descriptor 4 + # + # - checking $? has to come immediately after the eval, but it must + # be _inside_ the block to avoid polluting the "set -x" output + # + + test_eval_inner_ "$@" &3 2>&4 { - test_eval_inner_ "$@" &3 2>&4 test_eval_ret_=$? if want_trace then set +x - if test "$test_eval_ret_" != 0 - then - say_color error >&4 "error: last command exited with \$?=$test_eval_ret_" - fi fi - } 2>/dev/null + } 2>/dev/null 4>&2 + + if test "$test_eval_ret_" != 0 && want_trace + then + say_color error >&4 "error: last command exited with \$?=$test_eval_ret_" + fi return $test_eval_ret_ } -- 2.15.1.659.g8bd2eae3ea
[PATCH] refs: drop "clear packed-refs while locked" assertion
This patch fixes a regression in v2.14.0. It's actually fixed already in v2.15.0 because all of the packed-ref code there was rewritten. So there's no point in applying this on "master" or even "maint". But I figured it was worth sharing here in case somebody else runs across it, and in case we ever do a v2.14.4 release. -- >8 -- In clear_packed_ref_cache(), we assert that we're not currently holding the packed-refs lock. But in each of the three code paths that can hit this, the assertion is either a noop or actively does the wrong thing: 1. in rollback_packed_refs(), we will have just released the lock before calling the function, and so the assertion can never trigger. 2. get_packed_ref_cache() can reach this assertion via validate_packed_ref_cache(). But it calls the validate function only when it knows that we're not holding the lock, so again, the assertion can never trigger. 3. lock_packed_refs() also calls validate_packed_ref_cache(). In this case we're _always_ holding the lock, which means any time the validate function has to clear the cache, we'll trigger this assertion and die. This doesn't happen often in practice because the validate function clears the cache only if we find that somebody else has racily rewritten the packed-refs file between the time we read it and the time we took the lock. So most of the time we don't reach the assertion at all (nobody has racily written the file so there's no need to clear the cache). And when we do, it is not actually indicative of a bug; clearing the cache while holding the lock is the right thing to do here. This final case is relatively new, being triggerd by the extra validation added in fed6ebebf1 (lock_packed_refs(): fix cache validity check, 2017-06-12). Signed-off-by: Jeff King --- refs/files-backend.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index f21a954ce7..dd41e1d382 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -99,8 +99,6 @@ static void clear_packed_ref_cache(struct files_ref_store *refs) if (refs->packed) { struct packed_ref_cache *packed_refs = refs->packed; - if (is_lock_file_locked(&refs->packed_refs_lock)) - die("BUG: packed-ref cache cleared while locked"); refs->packed = NULL; release_packed_ref_cache(packed_refs); } -- 2.15.1.659.g8bd2eae3ea
Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)
Hi, On Fri, 8 Dec 2017, Torsten Bögershausen wrote: > > * tb/check-crlf-for-safe-crlf (2017-11-27) 1 commit > > (merged to 'next' on 2017-12-05 at 7adaa1fe01) > > + convert: tighten the safe autocrlf handling > > > > The "safe crlf" check incorrectly triggered for contents that does > > not use CRLF as line endings, which has been corrected. > > > > Broken on Windows??? > > cf. > > Yes, broken on Windows. A fix is coming the next days. We might want to consider using a saner Continuous Testing workflow, to avoid re-testing (and re-finding) breakages in individual patch series just because completely unrelated patch got updated. I mean, yes, it seemed like a good idea a long time ago to have One Branch that contains All The Patch Series Currently Cooking, back when our most reliable (because only) test facilities were poor humans. But we see how many more subtle bugs are spotted nowadays where Git's source code is tested automatically on a growing number of Operating System/CPU architecture "coordinates", and it is probably time to save some human resources. How about testing the individual branches instead? This would save me a ton of time, as bisecting is just too expensive given the scattered base commits of the branches smooshed into `pu`. (There is a new Git/Error.pm breakage in pu for about a week that I simply have not gotten around to, or better put: that I did not want to tackle given the time committment).) Ciao, Dscho
Kindly Assist Me
Kindly Assist Me In good faith from Mr.Sonami, actually could you please consider to help me to relocate this sum of five million, three hundred thousand dollars (US$5.3 m) to your country for establishing a medium industry in your country? The said 5.3 million dollars was deposited in our bank by Mr. Jean-Noël Rey a Swiss land citizen who died in a terrorist attacks in 2016. We have never met before, but need not to worry as I am using the only secured and confidential medium available to seek for your foreign assistance in this business. I am contacting you independently of my investigation and no one is informed of this communication. I contacted you in this transaction on my search for a reliable, capable partner and I discovered that the late Swiss died along side with his supposed to be next of kin through CNN News on terrorist attacks. I will give you all vital information concerning the Swiss and the 5.3 million dollars in our custody so that you will contact my bank for them to release the money to you. You can come here in person or you can request the bank to give you the contact of the bank lawyer's who can represent your interest in the transfer process. Reply and send this information to me at: sonami...@gmail.com Your full Name. Age... Country Occupation.. Your Telephone Numbers. Have a nice day and good luck Mr.Sonami
[PATCH Outreachy 1/2] format: create pretty.h file
Create header for pretty.c to make formatting interface more structured. This is a middle point, this file would be merged futher with other files which contain formatting stuff. Signed-off-by: Olga Telezhnaia Mentored-by: Christian Couder Mentored by: Jeff King --- archive.c | 1 + builtin/notes.c | 2 +- builtin/reset.c | 2 +- builtin/show-branch.c | 2 +- combine-diff.c| 1 + commit.c | 1 + commit.h | 80 -- diffcore-pickaxe.c| 1 + grep.c| 1 + log-tree.c| 1 + notes-cache.c | 1 + pretty.h | 87 +++ revision.h| 2 +- sequencer.c | 1 + sha1_name.c | 1 + submodule.c | 1 + 16 files changed, 101 insertions(+), 84 deletions(-) create mode 100644 pretty.h diff --git a/archive.c b/archive.c index 0b7b62af0c3ec..60607e8c00857 100644 --- a/archive.c +++ b/archive.c @@ -2,6 +2,7 @@ #include "config.h" #include "refs.h" #include "commit.h" +#include "pretty.h" #include "tree-walk.h" #include "attr.h" #include "archive.h" diff --git a/builtin/notes.c b/builtin/notes.c index 1a2c7d92ad7e7..7c8176164561b 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -12,7 +12,7 @@ #include "builtin.h" #include "notes.h" #include "blob.h" -#include "commit.h" +#include "pretty.h" #include "refs.h" #include "exec_cmd.h" #include "run-command.h" diff --git a/builtin/reset.c b/builtin/reset.c index 906e541658230..e15f595799c40 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -12,7 +12,7 @@ #include "lockfile.h" #include "tag.h" #include "object.h" -#include "commit.h" +#include "pretty.h" #include "run-command.h" #include "refs.h" #include "diff.h" diff --git a/builtin/show-branch.c b/builtin/show-branch.c index 2e24b5c330e8e..e8a4aa40cb4b6 100644 --- a/builtin/show-branch.c +++ b/builtin/show-branch.c @@ -1,6 +1,6 @@ #include "cache.h" #include "config.h" -#include "commit.h" +#include "pretty.h" #include "refs.h" #include "builtin.h" #include "color.h" diff --git a/combine-diff.c b/combine-diff.c index 2505de119a2be..01ba1b03a06d2 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -1,5 +1,6 @@ #include "cache.h" #include "commit.h" +#include "pretty.h" #include "blob.h" #include "diff.h" #include "diffcore.h" diff --git a/commit.c b/commit.c index cab8d4455bdbd..ac17a27a4ab0a 100644 --- a/commit.c +++ b/commit.c @@ -1,6 +1,7 @@ #include "cache.h" #include "tag.h" #include "commit.h" +#include "pretty.h" #include "pkt-line.h" #include "utf8.h" #include "diff.h" diff --git a/commit.h b/commit.h index 99a3fea68d3f6..41a2067809444 100644 --- a/commit.h +++ b/commit.h @@ -121,93 +121,13 @@ struct commit_list *copy_commit_list(struct commit_list *list); void free_commit_list(struct commit_list *list); -/* Commit formats */ -enum cmit_fmt { - CMIT_FMT_RAW, - CMIT_FMT_MEDIUM, - CMIT_FMT_DEFAULT = CMIT_FMT_MEDIUM, - CMIT_FMT_SHORT, - CMIT_FMT_FULL, - CMIT_FMT_FULLER, - CMIT_FMT_ONELINE, - CMIT_FMT_EMAIL, - CMIT_FMT_MBOXRD, - CMIT_FMT_USERFORMAT, - - CMIT_FMT_UNSPECIFIED -}; - -static inline int cmit_fmt_is_mail(enum cmit_fmt fmt) -{ - return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD); -} - struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */ -struct pretty_print_context { - /* -* Callers should tweak these to change the behavior of pp_* functions. -*/ - enum cmit_fmt fmt; - int abbrev; - const char *after_subject; - int preserve_subject; - struct date_mode date_mode; - unsigned date_mode_explicit:1; - int print_email_subject; - int expand_tabs_in_log; - int need_8bit_cte; - char *notes_message; - struct reflog_walk_info *reflog_info; - struct rev_info *rev; - const char *output_encoding; - struct string_list *mailmap; - int color; - struct ident_split *from_ident; - - /* -* Fields below here are manipulated internally by pp_* functions and -* should not be counted on by callers. -*/ - struct string_list in_body_headers; - int graph_width; -}; - -struct userformat_want { - unsigned notes:1; -}; - extern int has_non_ascii(const char *text); extern const char *logmsg_reencode(const struct commit *commit, char **commit_encoding, const char *output_encoding); -extern void get_commit_format(const char *arg, struct rev_info *); -extern const char *format_subject(struct strbuf *sb, const char *msg, - const char *line_separator); -extern void userformat_find_requirements(const char *fmt, struct userformat_want *w); -extern int commit_format_is_empty(enum cmit_fmt); ex
[PATCH Outreachy 2/2] format: create docs for pretty.h
Write some docs for functions in pretty.h. Take it as a first draft, they would be changed later. Signed-off-by: Olga Telezhnaia Mentored-by: Christian Couder Mentored by: Jeff King --- pretty.h | 44 1 file changed, 44 insertions(+) diff --git a/pretty.h b/pretty.h index ef5167484fb64..5c85d94e332d7 100644 --- a/pretty.h +++ b/pretty.h @@ -48,6 +48,7 @@ struct pretty_print_context { int graph_width; }; +/* Check whether commit format is mail. */ static inline int cmit_fmt_is_mail(enum cmit_fmt fmt) { return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD); @@ -57,31 +58,74 @@ struct userformat_want { unsigned notes:1; }; +/* Set the flag "w->notes" if there is placeholder %N in "fmt". */ void userformat_find_requirements(const char *fmt, struct userformat_want *w); + +/* + * Shortcut for invoking pretty_print_commit if we do not have any context. + * Context would be set empty except "fmt". + */ void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit, struct strbuf *sb); + +/* + * Get information about user and date from "line", format it and + * put it into "sb". + * Format of "line" must be readable for split_ident_line function. + * The resulting format is "what: name date". + */ void pp_user_info(struct pretty_print_context *pp, const char *what, struct strbuf *sb, const char *line, const char *encoding); + +/* + * Format title line of commit message taken from "msg_p" and + * put it into "sb". + * First line of "msg_p" is also affected. + */ void pp_title_line(struct pretty_print_context *pp, const char **msg_p, struct strbuf *sb, const char *encoding, int need_8bit_cte); + +/* + * Get current state of commit message from "msg_p" and continue formatting + * by adding indentation and '>' signs. Put result into "sb". + */ void pp_remainder(struct pretty_print_context *pp, const char **msg_p, struct strbuf *sb, int indent); +/* + * Create a text message about commit using given "format" and "context". + * Put the result to "sb". + * Please use this function for custom formats. + */ void format_commit_message(const struct commit *commit, const char *format, struct strbuf *sb, const struct pretty_print_context *context); +/* + * Parse given arguments from "arg", check it for correctness and + * fill struct rev_info. + */ void get_commit_format(const char *arg, struct rev_info *); +/* + * Make a commit message with all rules from given "pp" + * and put it into "sb". + * Please use this function if you have a context (candidate for "pp"). + */ void pretty_print_commit(struct pretty_print_context *pp, const struct commit *commit, struct strbuf *sb); +/* + * Change line breaks in "msg" to "line_separator" and put it into "sb". + * Return "msg" itself. + */ const char *format_subject(struct strbuf *sb, const char *msg, const char *line_separator); +/* Check if "cmit_fmt" will produce an empty output. */ int commit_format_is_empty(enum cmit_fmt); #endif /* PRETTY_H */ -- https://github.com/git/git/pull/439
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
Stefan Beller writes: > diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c > new file mode 100644 > index 00..e65c7cad6e > --- /dev/null > +++ b/diffcore-blobfind.c > @@ -0,0 +1,41 @@ > +/* > + * Copyright (c) 2017 Google Inc. > + */ > +#include "cache.h" > +#include "diff.h" > +#include "diffcore.h" > + > +static void diffcore_filter_blobs(struct diff_queue_struct *q, > + struct diff_options *options) > +{ > + int src, dst; > + > + if (!options->blobfind) > + BUG("blobfind oidset not initialized???"); > + > + for (src = dst = 0; src < q->nr; src++) { > + struct diff_filepair *p = q->queue[src]; > + > + if ((DIFF_FILE_VALID(p->one) && > + oidset_contains(options->blobfind, &p->one->oid)) || > + (DIFF_FILE_VALID(p->two) && > + oidset_contains(options->blobfind, &p->two->oid))) { Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking at the sides of the pair? I think an unmerged pair is queued with filespecs whose modes are cleared, but we should not be relying on that---I think we even had bugs we fixed by introducing an explicit is_unmerged field to the filepair struct. > @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs) > simplify_merges(revs); > if (revs->children.name) > set_children(revs); > + if (revs->diffopt.blobfind) > + revs->simplify_history = 0; > return 0; > } It makes sense to clear this bit by default, but is this an unconditonal clearing? In other words, is there a way for the user to countermand this default and ask for merge simplification from the command line, e.g. "diff --simplify-history --blobfind="? > +test_description='test finding specific blobs in the revision walking' > +. ./test-lib.sh > + > +test_expect_success 'setup ' ' > + git commit --allow-empty -m "empty initial commit" && > + > + echo "Hello, world!" >greeting && > + git add greeting && > + git commit -m "add the greeting blob" && # borrowed from Git from the > Bottom Up > + git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) && > + > + echo asdf >unrelated && > + git add unrelated && > + git commit -m "unrelated history" && > + > + git revert HEAD^ && > + > + git commit --allow-empty -m "another unrelated commit" > +' > + > +test_expect_success 'find the greeting blob' ' > + cat >expect <<-EOF && > + Revert "add the greeting blob" > + add the greeting blob > + EOF > + > + git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw && > + cut -c 14- actual.raw >actual && > + test_cmp expect actual The hardcoded numbers look strange and smell like a workaround of not asking for full object names. Also, now it has the simplify-history bit in the change, don't we want to check a mergy history, and also running "diff-files" while the index has unmerged entries? Other than that, the changes look quite sane and pleasnt read. Thanks. > +' > + > +test_done
Re: [PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH
Hi Peff, the other three patches look good to me. On Fri, 8 Dec 2017, Jeff King wrote: > You may want to run the test suite with a different shell > than you use to build Git. For instance, you may build with > SHELL_PATH=/bin/sh (because it's faster, or it's what you > expect to exist on systems where the build will be used) but > want to run the test suite with bash (e.g., since that > allows using "-x" reliably across the whole test suite). > There's currently no good way to do this. > > You might think that doing two separate make invocations, > like: > > make && > make -C t SHELL_PATH=/bin/bash > > would work. And it _almost_ does. The second make will see > our bash SHELL_PATH, and we'll use that to run the > individual test scripts (or tell prove to use it to do so). > So far so good. > > But this breaks down when "--tee" or "--verbose-log" is > used. Those options cause the test script to actually > re-exec itself using $SHELL_PATH. But wait, wouldn't our > second make invocation have set SHELL_PATH correctly in the > environment? > > Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we > built during the first "make". And that overrides the > environment, giving us the original SHELL_PATH again. ... and we could simply see whether the environment variable TEST_SHELL_PATH (which we would set in t/Makefile from the passed-in SHELL_PATH) is set, and override it again. I still think we can do without recording test-phase details in the build-phase (which may, or may not, know what the test-phase wants to do). In other words, I believe that we can make the invocation you mentioned above work, by touching only t/Makefile (to pass SHELL_PATH as TEST_SHELL_PATH) and t/test-lib.sh (to override the SHELL_PATH from GIT-BUILD-OPTIONS with TEST_SHELL_PATH, if set). Ciao, Dscho
[PATCH] doc: reword gitworflows for neutrality
Changed 'he' to 'them' to be more neutral in "gitworkflows.txt". See discussion at: https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/ Signed-off-by: Matthieu Moy Signed-off-by: Timothee Albertin Signed-off-by: Nathan Payre Signed-off-by: Daniel Bensoussan --- Documentation/gitworkflows.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt index 02569d0..926e044 100644 --- a/Documentation/gitworkflows.txt +++ b/Documentation/gitworkflows.txt @@ -407,8 +407,8 @@ follows. `git pull ` = -Occasionally, the maintainer may get merge conflicts when he tries to -pull changes from downstream. In this case, he can ask downstream to +Occasionally, the maintainer may get merge conflicts when they try to +pull changes from downstream. In this case, they can ask downstream to do the merge and resolve the conflicts themselves (perhaps they will know better how to resolve them). It is one of the rare cases where downstream 'should' merge from upstream. -- 2.11.0
[PATCH v7 00/10] Partial clone part 2: fsck and promisors
From: Jeff Hostetler This is V7 of part 2 of partial clone. This builds upon V6 of part 1. This version squashes the fixup commits that I added to the V6p2 series. The net result is identical. Jonathan Tan (10): extension.partialclone: introduce partial clone extension fsck: introduce partialclone extension fsck: support refs pointing to promisor objects fsck: support referenced promisor objects fsck: support promisor objects as CLI argument index-pack: refactor writing of .keep files introduce fetch-object: fetch one promisor object sha1_file: support lazily fetching missing objects rev-list: support termination at promisor objects gc: do not repack promisor packfiles Documentation/git-pack-objects.txt | 11 + Documentation/gitremote-helpers.txt| 7 + Documentation/rev-list-options.txt | 11 + Documentation/technical/repository-version.txt | 12 + Makefile | 1 + builtin/cat-file.c | 2 + builtin/fetch-pack.c | 10 + builtin/fsck.c | 26 +- builtin/gc.c | 3 + builtin/index-pack.c | 113 builtin/pack-objects.c | 37 ++- builtin/prune.c| 7 + builtin/repack.c | 8 +- builtin/rev-list.c | 71 - cache.h| 13 +- environment.c | 1 + fetch-object.c | 27 ++ fetch-object.h | 6 + fetch-pack.c | 48 ++-- fetch-pack.h | 8 + list-objects.c | 29 ++- object.c | 2 +- packfile.c | 77 +- packfile.h | 13 + remote-curl.c | 14 +- revision.c | 33 ++- revision.h | 5 +- setup.c| 7 +- sha1_file.c| 32 ++- t/t0410-partial-clone.sh | 343 + transport.c| 8 + transport.h| 11 + 32 files changed, 899 insertions(+), 97 deletions(-) create mode 100644 fetch-object.c create mode 100644 fetch-object.h create mode 100755 t/t0410-partial-clone.sh -- 2.9.3
[PATCH v7 03/10] fsck: support refs pointing to promisor objects
From: Jonathan Tan Teach fsck to not treat refs referring to missing promisor objects as an error when extensions.partialclone is set. For the purposes of warning about no default refs, such refs are still treated as legitimate refs. Signed-off-by: Jonathan Tan --- builtin/fsck.c | 8 t/t0410-partial-clone.sh | 24 2 files changed, 32 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index 2934299..ee937bb 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid, obj = parse_object(oid); if (!obj) { + if (is_promisor_object(oid)) { + /* +* Increment default_refs anyway, because this is a +* valid ref. +*/ +default_refs++; +return 0; + } error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid)); errors_found |= ERROR_REACHABLE; /* We'll continue with the rest despite the error.. */ diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 3ddb3b9..bf75162 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -13,6 +13,14 @@ pack_as_from_promisor () { >repo/.git/objects/pack/pack-$HASH.promisor } +promise_and_delete () { + HASH=$(git -C repo rev-parse "$1") && + git -C repo tag -a -m message my_annotated_tag "$HASH" && + git -C repo rev-parse my_annotated_tag | pack_as_from_promisor && + git -C repo tag -d my_annotated_tag && + delete_object repo "$HASH" +} + test_expect_success 'missing reflog object, but promised by a commit, passes fsck' ' test_create_repo repo && test_commit -C repo my_commit && @@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, even with extension test_must_fail git -C repo fsck ' +test_expect_success 'missing ref object, but promised, passes fsck' ' + rm -rf repo && + test_create_repo repo && + test_commit -C repo my_commit && + + A=$(git -C repo commit-tree -m a HEAD^{tree}) && + + # Reference $A only from ref + git -C repo branch my_branch "$A" && + promise_and_delete "$A" && + + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "arbitrary string" && + git -C repo fsck +' + test_done -- 2.9.3
[PATCH v7 01/10] extension.partialclone: introduce partial clone extension
From: Jonathan Tan Introduce new repository extension option: `extensions.partialclone` See the update to Documentation/technical/repository-version.txt in this patch for more information. Signed-off-by: Jonathan Tan --- Documentation/technical/repository-version.txt | 12 cache.h| 2 ++ environment.c | 1 + setup.c| 7 ++- 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt index 00ad379..e03eacc 100644 --- a/Documentation/technical/repository-version.txt +++ b/Documentation/technical/repository-version.txt @@ -86,3 +86,15 @@ for testing format-1 compatibility. When the config key `extensions.preciousObjects` is set to `true`, objects in the repository MUST NOT be deleted (e.g., by `git-prune` or `git repack -d`). + +`partialclone` +~~ + +When the config key `extensions.partialclone` is set, it indicates +that the repo was created with a partial clone (or later performed +a partial fetch) and that the remote may have omitted sending +certain unwanted objects. Such a remote is called a "promisor remote" +and it promises that all such omitted objects can be fetched from it +in the future. + +The value of this key is the name of the promisor remote. diff --git a/cache.h b/cache.h index 6440e2b..35e3f5e 100644 --- a/cache.h +++ b/cache.h @@ -860,10 +860,12 @@ extern int grafts_replace_parents; #define GIT_REPO_VERSION 0 #define GIT_REPO_VERSION_READ 1 extern int repository_format_precious_objects; +extern char *repository_format_partial_clone; struct repository_format { int version; int precious_objects; + char *partial_clone; /* value of extensions.partialclone */ int is_bare; char *work_tree; struct string_list unknown_extensions; diff --git a/environment.c b/environment.c index 8289c25..e52aab3 100644 --- a/environment.c +++ b/environment.c @@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1; int warn_on_object_refname_ambiguity = 1; int ref_paranoia = -1; int repository_format_precious_objects; +char *repository_format_partial_clone; const char *git_commit_encoding; const char *git_log_output_encoding; const char *apply_default_whitespace; diff --git a/setup.c b/setup.c index 03f51e0..58536bd 100644 --- a/setup.c +++ b/setup.c @@ -420,7 +420,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata) ; else if (!strcmp(ext, "preciousobjects")) data->precious_objects = git_config_bool(var, value); - else + else if (!strcmp(ext, "partialclone")) { + if (!value) + return config_error_nonbool(var); + data->partial_clone = xstrdup(value); + } else string_list_append(&data->unknown_extensions, ext); } else if (strcmp(var, "core.bare") == 0) { data->is_bare = git_config_bool(var, value); @@ -463,6 +467,7 @@ static int check_repository_format_gently(const char *gitdir, int *nongit_ok) } repository_format_precious_objects = candidate.precious_objects; + repository_format_partial_clone = candidate.partial_clone; string_list_clear(&candidate.unknown_extensions, 0); if (!has_common) { if (candidate.is_bare != -1) { -- 2.9.3
[PATCH v7 07/10] introduce fetch-object: fetch one promisor object
From: Jonathan Tan Introduce fetch-object, providing the ability to fetch one object from a promisor remote. This uses fetch-pack. To do this, the transport mechanism has been updated with 2 flags, "from-promisor" to indicate that the resulting pack comes from a promisor remote (and thus should be annotated as such by index-pack), and "no-dependents" to indicate that only the objects themselves need to be fetched (but fetching additional objects is nevertheless safe). Whenever "no-dependents" is used, fetch-pack will refrain from using any object flags, because it is most likely invoked as part of a dynamic object fetch by another Git command (which may itself use object flags). An alternative to this is to leave fetch-pack alone, and instead update the allocation of flags so that fetch-pack's flags never overlap with any others, but this will end up shrinking the number of flags available to nearly every other Git command (that is, every Git command that accesses objects), so the approach in this commit was used instead. This will be tested in a subsequent commit. Signed-off-by: Jonathan Tan --- Documentation/gitremote-helpers.txt | 7 ++ Makefile| 1 + builtin/fetch-pack.c| 8 +++ builtin/index-pack.c| 16 ++--- fetch-object.c | 24 +++ fetch-object.h | 6 + fetch-pack.c| 48 + fetch-pack.h| 8 +++ remote-curl.c | 14 ++- transport.c | 8 +++ transport.h | 11 + 11 files changed, 126 insertions(+), 25 deletions(-) create mode 100644 fetch-object.c create mode 100644 fetch-object.h diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt index 4a584f3..4b8c93e 100644 --- a/Documentation/gitremote-helpers.txt +++ b/Documentation/gitremote-helpers.txt @@ -466,6 +466,13 @@ set by Git if the remote helper has the 'option' capability. Transmit as a push option. As the push option must not contain LF or NUL characters, the string is not encoded. +'option from-promisor' {'true'|'false'}:: + Indicate that these objects are being fetched from a promisor. + +'option no-dependents' {'true'|'false'}:: + Indicate that only the objects wanted need to be fetched, not + their dependents. + SEE ALSO linkgit:git-remote[1] diff --git a/Makefile b/Makefile index ca378a4..795e0c7 100644 --- a/Makefile +++ b/Makefile @@ -792,6 +792,7 @@ LIB_OBJS += ewah/ewah_bitmap.o LIB_OBJS += ewah/ewah_io.o LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o +LIB_OBJS += fetch-object.o LIB_OBJS += fetch-pack.o LIB_OBJS += fsck.o LIB_OBJS += gettext.o diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 366b9d1..02abe72 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) args.update_shallow = 1; continue; } + if (!strcmp("--from-promisor", arg)) { + args.from_promisor = 1; + continue; + } + if (!strcmp("--no-dependents", arg)) { + args.no_dependents = 1; + continue; + } usage(fetch_pack_usage); } if (deepen_not.nr) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 4f305a7..24c2f05 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, const char *msg, if (close(fd) != 0) die_errno(_("cannot close written %s file '%s'"), suffix, filename); - *report = suffix; + if (report) + *report = suffix; } strbuf_release(&name_buf); } static void final(const char *final_pack_name, const char *curr_pack_name, const char *final_index_name, const char *curr_index_name, - const char *keep_msg, unsigned char *sha1) + const char *keep_msg, const char *promisor_msg, + unsigned char *sha1) { const char *report = "pack"; struct strbuf pack_name = STRBUF_INIT; @@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name, if (keep_msg) write_special_file("keep", keep_msg, final_pack_name, sha1, &report); + if (promisor_msg) + write_special_file("promisor", promisor_msg, final_pack_name, + sha1, NULL); if (final_pack_name != curr_pack_name) {
[PATCH v7 09/10] rev-list: support termination at promisor objects
From: Jonathan Tan Teach rev-list to support termination of an object traversal at any object from a promisor remote (whether one that the local repo also has, or one that the local repo knows about because it has another promisor object that references it). This will be used subsequently in gc and in the connectivity check used by fetch. For efficiency, if an object is referenced by a promisor object, and is in the local repo only as a non-promisor object, object traversal will not stop there. This is to avoid building the list of promisor object references. (In list-objects.c, the case where obj is NULL in process_blob() and process_tree() do not need to be changed because those happen only when there is a conflict between the expected type and the existing object. If the object doesn't exist, an object will be synthesized, which is fine.) Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- Documentation/rev-list-options.txt | 11 builtin/rev-list.c | 71 +++--- list-objects.c | 29 ++- object.c | 2 +- revision.c | 33 +++- revision.h | 5 +- t/t0410-partial-clone.sh | 101 + 7 files changed, 240 insertions(+), 12 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 8d8b7f4..0ce8ccd 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object traversal to continue if a missing object is encountered. Missing objects will silently be omitted from the results. + +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +Unexpected missing objects will raise an error. ++ The form '--missing=print' is like 'allow-any', but will also print a list of the missing objects. Object IDs are prefixed with a ``?'' character. endif::git-rev-list[] +--exclude-promisor-objects:: + (For internal use only.) Prefilter object traversal at + promisor boundary. This is used with partial clone. This is + stronger than `--missing=allow-promisor` because it limits the + traversal, rather than just silencing errors about missing + objects. + --no-walk[=(sorted|unsorted)]:: Only show the given commits, but do not traverse their ancestors. This has no effect if a range is specified. If the argument diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 547a3e0..48f922d 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -15,6 +15,7 @@ #include "progress.h" #include "reflog-walk.h" #include "oidset.h" +#include "packfile.h" static const char rev_list_usage[] = "git rev-list [OPTION] ... [ -- paths... ]\n" @@ -67,6 +68,7 @@ enum missing_action { MA_ERROR = 0,/* fail if any missing objects are encountered */ MA_ALLOW_ANY,/* silently allow ALL missing objects */ MA_PRINT,/* print ALL missing objects in special section */ + MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */ }; static enum missing_action arg_missing_action; @@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void *data) static inline void finish_object__ma(struct object *obj) { + /* +* Whether or not we try to dynamically fetch missing objects +* from the server, we currently DO NOT have the object. We +* can either print, allow (ignore), or conditionally allow +* (ignore) them. +*/ switch (arg_missing_action) { case MA_ERROR: die("missing blob object '%s'", oid_to_hex(&obj->oid)); @@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj) oidset_insert(&missing_objects, &obj->oid); return; + case MA_ALLOW_PROMISOR: + if (is_promisor_object(&obj->oid)) + return; + die("unexpected missing blob object '%s'", + oid_to_hex(&obj->oid)); + return; + default: BUG("unhandled missing_action"); return; } } -static void finish_object(struct object *obj, const char *name, void *cb_data) +static int finish_object(struct object *obj, const char *name, void *cb_data) { struct rev_list_info *info = cb_data; - if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) + if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) { finish_object__ma(obj); + return 1; + } if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) parse_object(&obj->oid); + return 0; } static void show_object(stru
[PATCH v7 06/10] index-pack: refactor writing of .keep files
From: Jonathan Tan In a subsequent commit, index-pack will be taught to write ".promisor" files which are similar to the ".keep" files it knows how to write. Refactor the writing of ".keep" files, so that the implementation of writing ".promisor" files becomes easier. Signed-off-by: Jonathan Tan --- builtin/index-pack.c | 99 1 file changed, 53 insertions(+), 46 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 8ec459f..4f305a7 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f) free(sorted_by_pos); } +static const char *derive_filename(const char *pack_name, const char *suffix, + struct strbuf *buf) +{ + size_t len; + if (!strip_suffix(pack_name, ".pack", &len)) + die(_("packfile name '%s' does not end with '.pack'"), + pack_name); + strbuf_add(buf, pack_name, len); + strbuf_addch(buf, '.'); + strbuf_addstr(buf, suffix); + return buf->buf; +} + +static void write_special_file(const char *suffix, const char *msg, + const char *pack_name, const unsigned char *sha1, + const char **report) +{ + struct strbuf name_buf = STRBUF_INIT; + const char *filename; + int fd; + int msg_len = strlen(msg); + + if (pack_name) + filename = derive_filename(pack_name, suffix, &name_buf); + else + filename = odb_pack_name(&name_buf, sha1, suffix); + + fd = odb_pack_keep(filename); + if (fd < 0) { + if (errno != EEXIST) + die_errno(_("cannot write %s file '%s'"), + suffix, filename); + } else { + if (msg_len > 0) { + write_or_die(fd, msg, msg_len); + write_or_die(fd, "\n", 1); + } + if (close(fd) != 0) + die_errno(_("cannot close written %s file '%s'"), + suffix, filename); + *report = suffix; + } + strbuf_release(&name_buf); +} + static void final(const char *final_pack_name, const char *curr_pack_name, const char *final_index_name, const char *curr_index_name, - const char *keep_name, const char *keep_msg, - unsigned char *sha1) + const char *keep_msg, unsigned char *sha1) { const char *report = "pack"; struct strbuf pack_name = STRBUF_INIT; struct strbuf index_name = STRBUF_INIT; - struct strbuf keep_name_buf = STRBUF_INIT; int err; if (!from_stdin) { @@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const char *curr_pack_name, die_errno(_("error while closing pack file")); } - if (keep_msg) { - int keep_fd, keep_msg_len = strlen(keep_msg); - - if (!keep_name) - keep_name = odb_pack_name(&keep_name_buf, sha1, "keep"); - - keep_fd = odb_pack_keep(keep_name); - if (keep_fd < 0) { - if (errno != EEXIST) - die_errno(_("cannot write keep file '%s'"), - keep_name); - } else { - if (keep_msg_len > 0) { - write_or_die(keep_fd, keep_msg, keep_msg_len); - write_or_die(keep_fd, "\n", 1); - } - if (close(keep_fd) != 0) - die_errno(_("cannot close written keep file '%s'"), - keep_name); - report = "keep"; - } - } + if (keep_msg) + write_special_file("keep", keep_msg, final_pack_name, sha1, + &report); if (final_pack_name != curr_pack_name) { if (!final_pack_name) @@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char *curr_pack_name, strbuf_release(&index_name); strbuf_release(&pack_name); - strbuf_release(&keep_name_buf); } static int git_index_pack_config(const char *k, const char *v, void *cb) @@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only) } } -static const char *derive_filename(const char *pack_name, const char *suffix, - struct strbuf *buf) -{ - size_t len; - if (!strip_suffix(pack_name, ".pack", &len)) - die(_("packfile name '%s' does not end with '.pack'"), - pack_name); - strbuf_add(buf, pack_name, len); - strbuf_addstr(buf, suffix); - return
[PATCH v7 08/10] sha1_file: support lazily fetching missing objects
From: Jonathan Tan Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan --- builtin/cat-file.c | 2 ++ builtin/fetch-pack.c | 2 ++ builtin/fsck.c | 3 +++ builtin/index-pack.c | 6 ++ cache.h | 8 fetch-object.c | 3 +++ sha1_file.c | 32 ++ t/t0410-partial-clone.sh | 51 8 files changed, 99 insertions(+), 8 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index f5fa4fd..cf9ea5c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt) for_each_loose_object(batch_loose_object, &sa, 0); for_each_packed_object(batch_packed_object, &sa, 0); + if (repository_format_partial_clone) + warning("This repository has extensions.partialClone set. Some objects may not be loaded."); cb.opt = opt; cb.expand = &data; diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 02abe72..15eeed7 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) struct oid_array shallow = OID_ARRAY_INIT; struct string_list deepen_not = STRING_LIST_INIT_DUP; + fetch_if_missing = 0; + packet_trace_identity("fetch-pack"); memset(&args, 0, sizeof(args)); diff --git a/builtin/fsck.c b/builtin/fsck.c index 578a7c8..3b76c0e 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) int i; struct alternate_object_database *alt; + /* fsck knows how to handle missing promisor objects */ + fetch_if_missing = 0; + errors_found = 0; check_replace_refs = 0; diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 24c2f05..a0a35e6 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */ int report_end_of_input = 0; + /* +* index-pack never needs to fetch missing objects, since it only +* accesses the repo to do hash collision checks +*/ + fetch_if_missing = 0; + if (argc == 2 && !strcmp(argv[1], "-h")) usage(index_pack_usage); diff --git a/cache.h b/cache.h index c76f2e9..6980072 100644 --- a/cache.h +++ b/cache.h @@ -1727,6 +1727,14 @@ struct object_info { #define OBJECT_INFO
[PATCH v7 10/10] gc: do not repack promisor packfiles
From: Jonathan Tan Teach gc to stop traversal at promisor objects, and to leave promisor packfiles alone. This has the effect of only repacking non-promisor packfiles, and preserves the distinction between promisor packfiles and non-promisor packfiles. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- Documentation/git-pack-objects.txt | 11 builtin/gc.c | 3 +++ builtin/pack-objects.c | 37 -- builtin/prune.c| 7 + builtin/repack.c | 8 -- t/t0410-partial-clone.sh | 54 -- 6 files changed, 114 insertions(+), 6 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index aa403d0..81bc490 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -255,6 +255,17 @@ a missing object is encountered. This is the default action. The form '--missing=allow-any' will allow object traversal to continue if a missing object is encountered. Missing objects will silently be omitted from the results. ++ +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +Unexpected missing object will raise an error. + +--exclude-promisor-objects:: + Omit objects that are known to be in the promisor remote. (This + option has the purpose of operating only on locally created objects, + so that when we repack, we still maintain a distinction between + locally created objects [without .promisor] and objects from the + promisor remote [with .promisor].) This is used with partial clone. SEE ALSO diff --git a/builtin/gc.c b/builtin/gc.c index 3c5eae0..77fa720 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) argv_array_push(&prune, prune_expire); if (quiet) argv_array_push(&prune, "--no-progress"); + if (repository_format_partial_clone) + argv_array_push(&prune, + "--exclude-promisor-objects"); if (run_command_v_opt(prune.argv, RUN_GIT_CMD)) return error(FAILED_RUN, prune.argv[0]); } diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 45ad35d..f5fc401 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -75,6 +75,8 @@ static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; +static int exclude_promisor_objects; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 256 * 1024 * 1024; static unsigned long cache_max_small_delta_size = 1000; @@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0; static struct list_objects_filter_options filter_options; enum missing_action { - MA_ERROR = 0,/* fail if any missing objects are encountered */ - MA_ALLOW_ANY,/* silently allow ALL missing objects */ + MA_ERROR = 0, /* fail if any missing objects are encountered */ + MA_ALLOW_ANY, /* silently allow ALL missing objects */ + MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */ }; static enum missing_action arg_missing_action; static show_object_fn fn_show_object; @@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object *obj, const char *name, void show_object(obj, name, data); } +static void show_object__ma_allow_promisor(struct object *obj, const char *name, void *data) +{ + assert(arg_missing_action == MA_ALLOW_PROMISOR); + + /* +* Quietly ignore EXPECTED missing objects. This avoids problems with +* staging them now and getting an odd error later. +*/ + if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid)) + return; + + show_object(obj, name, data); +} + static int option_parse_missing_action(const struct option *opt, const char *arg, int unset) { @@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct option *opt, if (!strcmp(arg, "allow-any")) { arg_missing_action = MA_ALLOW_ANY; + fetch_if_missing = 0; fn_show_object = show_object__ma_allow_any; return 0; } + if (!strcmp(arg, "allow-promisor")) { + arg_missing_action = MA_ALLOW_PROMISOR; + fetch_if_missing = 0; + fn_show_object = show_object__ma_allow_promisor; + return 0; + } + die(_("invalid value for --missing")); return 0; } @@ -3008,6 +3033,8 @@ int cmd_pack
[PATCH v7 02/10] fsck: introduce partialclone extension
From: Jonathan Tan Currently, Git does not support repos with very large numbers of objects or repos that wish to minimize manipulation of certain blobs (for example, because they are very large) very well, even if the user operates mostly on part of the repo, because Git is designed on the assumption that every referenced object is available somewhere in the repo storage. In such an arrangement, the full set of objects is usually available in remote storage, ready to be lazily downloaded. Teach fsck about the new state of affairs. In this commit, teach fsck that missing promisor objects referenced from the reflog are not an error case; in future commits, fsck will be taught about other cases. Signed-off-by: Jonathan Tan --- builtin/fsck.c | 2 +- cache.h | 3 +- packfile.c | 77 +++-- packfile.h | 13 t/t0410-partial-clone.sh | 81 5 files changed, 171 insertions(+), 5 deletions(-) create mode 100755 t/t0410-partial-clone.sh diff --git a/builtin/fsck.c b/builtin/fsck.c index 56afe40..2934299 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid, xstrfmt("%s@{%"PRItime"}", refname, timestamp)); obj->flags |= USED; mark_object_reachable(obj); - } else { + } else if (!is_promisor_object(oid)) { error("%s: invalid reflog entry %s", refname, oid_to_hex(oid)); errors_found |= ERROR_REACHABLE; } diff --git a/cache.h b/cache.h index 35e3f5e..c76f2e9 100644 --- a/cache.h +++ b/cache.h @@ -1587,7 +1587,8 @@ extern struct packed_git { unsigned pack_local:1, pack_keep:1, freshened:1, -do_not_close:1; +do_not_close:1, +pack_promisor:1; unsigned char sha1[20]; struct revindex_entry *revindex; /* something like ".git/objects/pack/x.pack" */ diff --git a/packfile.c b/packfile.c index 4a5fe7a..234797c 100644 --- a/packfile.c +++ b/packfile.c @@ -8,6 +8,11 @@ #include "list.h" #include "streaming.h" #include "sha1-lookup.h" +#include "commit.h" +#include "object.h" +#include "tag.h" +#include "tree-walk.h" +#include "tree.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -643,10 +648,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local) return NULL; /* -* ".pack" is long enough to hold any suffix we're adding (and +* ".promisor" is long enough to hold any suffix we're adding (and * the use xsnprintf double-checks that) */ - alloc = st_add3(path_len, strlen(".pack"), 1); + alloc = st_add3(path_len, strlen(".promisor"), 1); p = alloc_packed_git(alloc); memcpy(p->pack_name, path, path_len); @@ -654,6 +659,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local) if (!access(p->pack_name, F_OK)) p->pack_keep = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor"); + if (!access(p->pack_name, F_OK)) + p->pack_promisor = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack"); if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) { free(p); @@ -781,7 +790,8 @@ static void prepare_packed_git_one(char *objdir, int local) if (ends_with(de->d_name, ".idx") || ends_with(de->d_name, ".pack") || ends_with(de->d_name, ".bitmap") || - ends_with(de->d_name, ".keep")) + ends_with(de->d_name, ".keep") || + ends_with(de->d_name, ".promisor")) string_list_append(&garbage, path.buf); else report_garbage(PACKDIR_FILE_GARBAGE, path.buf); @@ -1889,6 +1899,9 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags) for (p = packed_git; p; p = p->next) { if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local) continue; + if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) && + !p->pack_promisor) + continue; if (open_pack_index(p)) { pack_errors = 1; continue; @@ -1899,3 +1912,61 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags) } return r ? r : pack_errors; } + +static int add_promisor_object(const struct object_id *oid, + struct packed_git *pack, +
[PATCH v7 05/10] fsck: support promisor objects as CLI argument
From: Jonathan Tan Teach fsck to not treat missing promisor objects provided on the CLI as an error when extensions.partialclone is set. Signed-off-by: Jonathan Tan --- builtin/fsck.c | 2 ++ t/t0410-partial-clone.sh | 13 + 2 files changed, 15 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index 4c2a56d..578a7c8 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) struct object *obj = lookup_object(oid.hash); if (!obj || !(obj->flags & HAS_OBJ)) { + if (is_promisor_object(&oid)) + continue; error("%s: object missing", oid_to_hex(&oid)); errors_found |= ERROR_OBJECT; continue; diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 4f9931f..e96f436 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes fsck' ' git -C repo fsck ' +test_expect_success 'missing CLI object, but promised, passes fsck' ' + rm -rf repo && + test_create_repo repo && + test_commit -C repo my_commit && + + A=$(git -C repo commit-tree -m a HEAD^{tree}) && + promise_and_delete "$A" && + + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "arbitrary string" && + git -C repo fsck "$A" +' + test_done -- 2.9.3
[PATCH v7 04/10] fsck: support referenced promisor objects
From: Jonathan Tan Teach fsck to not treat missing promisor objects indirectly pointed to by refs as an error when extensions.partialclone is set. Signed-off-by: Jonathan Tan --- builtin/fsck.c | 11 +++ t/t0410-partial-clone.sh | 23 +++ 2 files changed, 34 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index ee937bb..4c2a56d 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt if (obj->flags & REACHABLE) return 0; obj->flags |= REACHABLE; + + if (is_promisor_object(&obj->oid)) + /* +* Further recursion does not need to be performed on this +* object since it is a promisor object (so it does not need to +* be added to "pending"). +*/ + return 0; + if (!(obj->flags & HAS_OBJ)) { if (parent && !has_object_file(&obj->oid)) { printf("broken link from %7s %s\n", @@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj) * do a full fsck */ if (!(obj->flags & HAS_OBJ)) { + if (is_promisor_object(&obj->oid)) + return; if (has_sha1_pack(obj->oid.hash)) return; /* it is in pack - forget about it */ printf("missing %s %s\n", printable_type(obj), diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index bf75162..4f9931f 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, passes fsck' ' git -C repo fsck ' +test_expect_success 'missing object, but promised, passes fsck' ' + rm -rf repo && + test_create_repo repo && + test_commit -C repo 1 && + test_commit -C repo 2 && + test_commit -C repo 3 && + git -C repo tag -a annotated_tag -m "annotated tag" && + + C=$(git -C repo rev-parse 1) && + T=$(git -C repo rev-parse 2^{tree}) && + B=$(git hash-object repo/3.t) && + AT=$(git -C repo rev-parse annotated_tag) && + + promise_and_delete "$C" && + promise_and_delete "$T" && + promise_and_delete "$B" && + promise_and_delete "$AT" && + + git -C repo config core.repositoryformatversion 1 && + git -C repo config extensions.partialclone "arbitrary string" && + git -C repo fsck +' + test_done -- 2.9.3
Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)
On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano wrote: > * cc/skip-to-optional-val (2017-12-07) 7 commits > - t4045: test 'diff --relative' for real > - t4045: reindent to make helpers readable > - diff: use skip-to-optional-val in parsing --relative > - diff: use skip_to_optional_val_default() > - diff: use skip_to_optional_val() > - index-pack: use skip_to_optional_val() > - git-compat-util: introduce skip_to_optional_val() > > Introduce a helper to simplify code to parse a common pattern that > expects either "--key" or "--key=". > > Even though I queued fixes for "diff --relative" on top, it may > still want a final reroll to make it harder to misuse by allowing > NULL at the valp part of the argument. Yeah, I already implemented that and it will be in the next v3 version. > Also s/_val/_arg/. I am not sure that is a good idea, because it could suggest that the functions are designed to parse only command option arguments, while they can be used to parse any "key=val" string where "key" is also allowed. > cf. > cf. It doesn't look like s/_val/_arg/ was discussed in the above messages.
Re: Unfortunate interaction between git diff-index and fakeroot
Elazar Leibovich writes: > ignore unused information, such as commit > 2cb45e95438c113871fbbea5b4f629f9463034e7 > which ignores st_dev, because it doesn't actually matter, or I do not think it ignores because "it doesn't matter". st_dev is known not to be stable (e.g. across reboots and reconnects nfs), so it is a poor indicator to use as a quick measure to see if the file contents may have changed since we last checked. On the other hand, in a sane (it is of course open to debate "by whose definition") environment, the fact that the owner of the file has changed _is_ a significant enough sign to suspect that the file contents have changed (i.e. the file is suspect to be dirty; you can actually check and refresh, of course, though). > commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores > mode changes not relevant to the owner. And that one is not even about the cached stat information used for the quick "dirty-ness" check. The change is about the mode bits in recorded history. File's mtime is not recorded in the history, either, but we do care and record it in the index, because it can be used as a good (albeit a bit too coarse grained) indicator that the contents of a file may have changed. The owner and group fall into the same category.
Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)
Christian Couder writes: > On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano wrote: > > >> * cc/skip-to-optional-val (2017-12-07) 7 commits >> - t4045: test 'diff --relative' for real >> - t4045: reindent to make helpers readable >> - diff: use skip-to-optional-val in parsing --relative >> - diff: use skip_to_optional_val_default() >> - diff: use skip_to_optional_val() >> - index-pack: use skip_to_optional_val() >> - git-compat-util: introduce skip_to_optional_val() >> >> Introduce a helper to simplify code to parse a common pattern that >> expects either "--key" or "--key=". >> >> Even though I queued fixes for "diff --relative" on top, it may >> still want a final reroll to make it harder to misuse by allowing >> NULL at the valp part of the argument. > > Yeah, I already implemented that and it will be in the next v3 version. Good. I am hoping that you've followed the discussion on the tests, where all of us agreed that the approach taken by Jacob's one is preferrable over what is queued above? >> Also s/_val/_arg/. > > I am not sure that is a good idea, because it could suggest that the > functions are designed to parse only command option arguments, while > they can be used to parse any "key=val" string where "key" is also > allowed. > >> cf. >> cf. > > It doesn't look like s/_val/_arg/ was discussed in the above messages. It came from your statement that was made before the thread, where you said you'll rename it to use arg after I said I suspect that arg would make more sense than val. https://public-inbox.org/git/CAP8UFD2OSsqzhyAL-QG1TOowB-xgbf=kc9whre+flc+0j1x...@mail.gmail.com/ Thanks.
[PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
From: Jeff Hostetler This is V7 of part 3 of partial clone. It builds upon V7 of part 2 (which builds upon V6 of part 1). This version adds additional tests, fixes test errors on the MAC version, and squashes some fixup commits. It also restores functionality accidentally dropped from the V6 series for "git fetch" to automatically inherit the partial-clone filter-spec when appropriate. This version extends the --no-filter argument to override this inheritance. Jeff Hostetler (8): upload-pack: add object filtering for partial clone fetch-pack, index-pack, transport: partial clone fetch-pack: add --no-filter fetch: support filters partial-clone: define partial clone settings in config t5616: end-to-end tests for partial clone fetch: inherit filter-spec from partial clone t5616: test bulk prefetch after partial fetch Jonathan Tan (8): sha1_file: support lazily fetching missing objects rev-list: support termination at promisor objects gc: do not repack promisor packfiles fetch-pack: test support excluding large blobs fetch: refactor calculation of remote list clone: partial clone unpack-trees: batch fetching of missing blobs fetch-pack: restore save_commit_buffer after use Documentation/config.txt | 4 + Documentation/git-pack-objects.txt| 11 ++ Documentation/rev-list-options.txt| 11 ++ Documentation/technical/pack-protocol.txt | 8 + Documentation/technical/protocol-capabilities.txt | 8 + builtin/cat-file.c| 2 + builtin/clone.c | 22 ++- builtin/fetch-pack.c | 10 ++ builtin/fetch.c | 83 - builtin/fsck.c| 3 + builtin/gc.c | 3 + builtin/index-pack.c | 6 + builtin/pack-objects.c| 37 +++- builtin/prune.c | 7 + builtin/repack.c | 8 +- builtin/rev-list.c| 73 +++- cache.h | 9 + config.c | 5 + connected.c | 2 + environment.c | 1 + fetch-object.c| 29 ++- fetch-object.h| 5 + fetch-pack.c | 17 ++ fetch-pack.h | 2 + list-objects-filter-options.c | 92 -- list-objects-filter-options.h | 18 ++ list-objects.c| 29 ++- object.c | 2 +- remote-curl.c | 6 + revision.c| 33 +++- revision.h| 5 +- sha1_file.c | 32 +++- t/t0410-partial-clone.sh | 206 +- t/t5500-fetch-pack.sh | 63 +++ t/t5601-clone.sh | 101 +++ t/t5616-partial-clone.sh | 146 +++ transport-helper.c| 5 + transport.c | 4 + transport.h | 5 + unpack-trees.c| 22 +++ upload-pack.c | 31 +++- 41 files changed, 1110 insertions(+), 56 deletions(-) create mode 100755 t/t5616-partial-clone.sh -- 2.9.3
[PATCH v7 03/16] gc: do not repack promisor packfiles
From: Jonathan Tan Teach gc to stop traversal at promisor objects, and to leave promisor packfiles alone. This has the effect of only repacking non-promisor packfiles, and preserves the distinction between promisor packfiles and non-promisor packfiles. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- Documentation/git-pack-objects.txt | 11 builtin/gc.c | 3 +++ builtin/pack-objects.c | 37 -- builtin/prune.c| 7 + builtin/repack.c | 8 -- t/t0410-partial-clone.sh | 54 -- 6 files changed, 114 insertions(+), 6 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index aa403d0..81bc490 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -255,6 +255,17 @@ a missing object is encountered. This is the default action. The form '--missing=allow-any' will allow object traversal to continue if a missing object is encountered. Missing objects will silently be omitted from the results. ++ +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +Unexpected missing object will raise an error. + +--exclude-promisor-objects:: + Omit objects that are known to be in the promisor remote. (This + option has the purpose of operating only on locally created objects, + so that when we repack, we still maintain a distinction between + locally created objects [without .promisor] and objects from the + promisor remote [with .promisor].) This is used with partial clone. SEE ALSO diff --git a/builtin/gc.c b/builtin/gc.c index 3c5eae0..77fa720 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) argv_array_push(&prune, prune_expire); if (quiet) argv_array_push(&prune, "--no-progress"); + if (repository_format_partial_clone) + argv_array_push(&prune, + "--exclude-promisor-objects"); if (run_command_v_opt(prune.argv, RUN_GIT_CMD)) return error(FAILED_RUN, prune.argv[0]); } diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 45ad35d..f5fc401 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -75,6 +75,8 @@ static int use_bitmap_index = -1; static int write_bitmap_index; static uint16_t write_bitmap_options; +static int exclude_promisor_objects; + static unsigned long delta_cache_size = 0; static unsigned long max_delta_cache_size = 256 * 1024 * 1024; static unsigned long cache_max_small_delta_size = 1000; @@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0; static struct list_objects_filter_options filter_options; enum missing_action { - MA_ERROR = 0,/* fail if any missing objects are encountered */ - MA_ALLOW_ANY,/* silently allow ALL missing objects */ + MA_ERROR = 0, /* fail if any missing objects are encountered */ + MA_ALLOW_ANY, /* silently allow ALL missing objects */ + MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */ }; static enum missing_action arg_missing_action; static show_object_fn fn_show_object; @@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object *obj, const char *name, void show_object(obj, name, data); } +static void show_object__ma_allow_promisor(struct object *obj, const char *name, void *data) +{ + assert(arg_missing_action == MA_ALLOW_PROMISOR); + + /* +* Quietly ignore EXPECTED missing objects. This avoids problems with +* staging them now and getting an odd error later. +*/ + if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid)) + return; + + show_object(obj, name, data); +} + static int option_parse_missing_action(const struct option *opt, const char *arg, int unset) { @@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct option *opt, if (!strcmp(arg, "allow-any")) { arg_missing_action = MA_ALLOW_ANY; + fetch_if_missing = 0; fn_show_object = show_object__ma_allow_any; return 0; } + if (!strcmp(arg, "allow-promisor")) { + arg_missing_action = MA_ALLOW_PROMISOR; + fetch_if_missing = 0; + fn_show_object = show_object__ma_allow_promisor; + return 0; + } + die(_("invalid value for --missing")); return 0; } @@ -3008,6 +3033,8 @@ int cmd_pack
[PATCH v7 09/16] fetch: support filters
From: Jeff Hostetler Teach fetch to support filters. This is only allowed for the remote configured in extensions.partialcloneremote. Signed-off-by: Jonathan Tan --- builtin/fetch.c | 23 +-- connected.c | 2 ++ remote-curl.c | 6 ++ t/t5500-fetch-pack.sh | 36 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 1b1f039..14aab71 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -18,6 +18,7 @@ #include "argv-array.h" #include "utf8.h" #include "packfile.h" +#include "list-objects-filter-options.h" static const char * const builtin_fetch_usage[] = { N_("git fetch [] [ [...]]"), @@ -55,6 +56,7 @@ static int recurse_submodules_default = RECURSE_SUBMODULES_ON_DEMAND; static int shown_url = 0; static int refmap_alloc, refmap_nr; static const char **refmap_array; +static struct list_objects_filter_options filter_options; static int git_fetch_config(const char *k, const char *v, void *cb) { @@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = { TRANSPORT_FAMILY_IPV4), OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"), TRANSPORT_FAMILY_IPV6), + OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), OPT_END() }; @@ -1044,6 +1047,11 @@ static struct transport *prepare_transport(struct remote *remote, int deepen) set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes"); if (update_shallow) set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes"); + if (filter_options.choice) { + set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER, + filter_options.filter_spec); + set_option(transport, TRANS_OPT_FROM_PROMISOR, "1"); + } return transport; } @@ -1328,6 +1336,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) packet_trace_identity("fetch"); + fetch_if_missing = 0; + /* Record the command line for the reflog */ strbuf_addstr(&default_rla, "fetch"); for (i = 1; i < argc; i++) @@ -1361,6 +1371,9 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (depth || deepen_since || deepen_not.nr) deepen = 1; + if (filter_options.choice && !repository_format_partial_clone) + die("--filter can only be used when extensions.partialClone is set"); + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); @@ -1390,10 +1403,16 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) } } - if (remote) + if (remote) { + if (filter_options.choice && + strcmp(remote->name, repository_format_partial_clone)) + die(_("--filter can only be used with the remote configured in core.partialClone")); result = fetch_one(remote, argc, argv); - else + } else { + if (filter_options.choice) + die(_("--filter can only be used with the remote configured in core.partialClone")); result = fetch_multiple(&list); + } if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) { struct argv_array options = ARGV_ARRAY_INIT; diff --git a/connected.c b/connected.c index f416b05..3a5bd67 100644 --- a/connected.c +++ b/connected.c @@ -56,6 +56,8 @@ int check_connected(sha1_iterate_fn fn, void *cb_data, argv_array_push(&rev_list.args,"rev-list"); argv_array_push(&rev_list.args, "--objects"); argv_array_push(&rev_list.args, "--stdin"); + if (repository_format_partial_clone) + argv_array_push(&rev_list.args, "--exclude-promisor-objects"); argv_array_push(&rev_list.args, "--not"); argv_array_push(&rev_list.args, "--all"); argv_array_push(&rev_list.args, "--quiet"); diff --git a/remote-curl.c b/remote-curl.c index 4318391..6ec5352 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -24,6 +24,7 @@ struct options { char *deepen_since; struct string_list deepen_not; struct string_list push_options; + char *filter; unsigned progress : 1, check_self_contained_and_connected : 1, cloning : 1, @@ -165,6 +166,9 @@ static int set_option(const char *name, const char *value) } else if (!strcmp(name, "no-dependents")) { options.no_dependents = 1; return 0; + } else if (!strcmp(name, "filter")) { + options.filter = xstrdup(value);; + return 0; } else { return 1 /* unsupported */; } @@ -834,6 +838,8 @@ static int fetch_git(struct discovery *heads, argv_
[PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use
From: Jonathan Tan In fetch-pack, the global variable save_commit_buffer is set to 0, but not restored to its original value after use. In particular, if show_log() (in log-tree.c) is invoked after fetch_pack() in the same process, show_log() will return before printing out the commit message (because the invocation to get_cached_commit_buffer() returns NULL, because the commit buffer was not saved). I discovered this when attempting to run "git log -S" in a partial clone, triggering the case where revision walking lazily loads missing objects. Therefore, restore save_commit_buffer to its original value after use. An alternative to solve the problem I had is to replace get_cached_commit_buffer() with get_commit_buffer(). That invocation was introduced in commit a97934d ("use get_cached_commit_buffer where appropriate", 2014-06-13) to replace "commit->buffer" introduced in commit 3131b71 ("Add "--show-all" revision walker flag for debugging", 2008-02-13). In the latter commit, the commit author seems to be deciding between not showing an unparsed commit at all and showing an unparsed commit without the message (which is what the commit does), and did not mention parsing the unparsed commit, so I prefer to preserve the existing behavior. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- fetch-pack.c | 4 1 file changed, 4 insertions(+) diff --git a/fetch-pack.c b/fetch-pack.c index 3c5f064..773453c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -717,6 +717,7 @@ static int everything_local(struct fetch_pack_args *args, { struct ref *ref; int retval; + int old_save_commit_buffer = save_commit_buffer; timestamp_t cutoff = 0; save_commit_buffer = 0; @@ -786,6 +787,9 @@ static int everything_local(struct fetch_pack_args *args, print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote), ref->name); } + + save_commit_buffer = old_save_commit_buffer; + return retval; } -- 2.9.3
[PATCH v7 12/16] unpack-trees: batch fetching of missing blobs
From: Jonathan Tan When running checkout, first prefetch all blobs that are to be updated but are missing. This means that only one pack is downloaded during such operations, instead of one per missing blob. This operates only on the blob level - if a repository has a missing tree, they are still fetched one at a time. This does not use the delayed checkout mechanism introduced in commit 2841e8f ("convert: add "status=delayed" to filter process protocol", 2017-06-30) due to significant conceptual differences - in particular, for partial clones, we already know what needs to be fetched based on the contents of the local repo alone, whereas for status=delayed, it is the filter process that tells us what needs to be checked in the end. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- fetch-object.c | 26 ++ fetch-object.h | 5 + t/t5601-clone.sh | 52 unpack-trees.c | 22 ++ 4 files changed, 101 insertions(+), 4 deletions(-) diff --git a/fetch-object.c b/fetch-object.c index 258fcfa..853624f 100644 --- a/fetch-object.c +++ b/fetch-object.c @@ -5,11 +5,10 @@ #include "transport.h" #include "fetch-object.h" -void fetch_object(const char *remote_name, const unsigned char *sha1) +static void fetch_refs(const char *remote_name, struct ref *ref) { struct remote *remote; struct transport *transport; - struct ref *ref; int original_fetch_if_missing = fetch_if_missing; fetch_if_missing = 0; @@ -18,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned char *sha1) die(_("Remote with no URL")); transport = transport_get(remote, remote->url[0]); - ref = alloc_ref(sha1_to_hex(sha1)); - hashcpy(ref->old_oid.hash, sha1); transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1"); transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1"); transport_fetch_refs(transport, ref); fetch_if_missing = original_fetch_if_missing; } + +void fetch_object(const char *remote_name, const unsigned char *sha1) +{ + struct ref *ref = alloc_ref(sha1_to_hex(sha1)); + hashcpy(ref->old_oid.hash, sha1); + fetch_refs(remote_name, ref); +} + +void fetch_objects(const char *remote_name, const struct oid_array *to_fetch) +{ + struct ref *ref = NULL; + int i; + + for (i = 0; i < to_fetch->nr; i++) { + struct ref *new_ref = alloc_ref(oid_to_hex(&to_fetch->oid[i])); + oidcpy(&new_ref->old_oid, &to_fetch->oid[i]); + new_ref->next = ref; + ref = new_ref; + } + fetch_refs(remote_name, ref); +} diff --git a/fetch-object.h b/fetch-object.h index f371300..4b269d0 100644 --- a/fetch-object.h +++ b/fetch-object.h @@ -1,6 +1,11 @@ #ifndef FETCH_OBJECT_H #define FETCH_OBJECT_H +#include "sha1-array.h" + extern void fetch_object(const char *remote_name, const unsigned char *sha1); +extern void fetch_objects(const char *remote_name, + const struct oid_array *to_fetch); + #endif diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 6d37c6d..13610b7 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does not support object filte test_i18ngrep "filtering not recognized by server" err ' +test_expect_success 'batch missing blob request during checkout' ' + rm -rf server client && + + test_create_repo server && + echo a >server/a && + echo b >server/b && + git -C server add a b && + + git -C server commit -m x && + echo aa >server/a && + echo bb >server/b && + git -C server add a b && + git -C server commit -m x && + + test_config -C server uploadpack.allowfilter 1 && + test_config -C server uploadpack.allowanysha1inwant 1 && + + git clone --filter=blob:limit=0 "file://$(pwd)/server" client && + + # Ensure that there is only one negotiation by checking that there is + # only "done" line sent. ("done" marks the end of negotiation.) + GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ && + grep "git> done" trace >done_lines && + test_line_count = 1 done_lines +' + +test_expect_success 'batch missing blob request does not inadvertently try to fetch gitlinks' ' + rm -rf server client && + + test_create_repo repo_for_submodule && + test_commit -C repo_for_submodule x && + + test_create_repo server && + echo a >server/a && + echo b >server/b && + git -C server add a b && + git -C server commit -m x && + + echo aa >server/a && + echo bb >server/b && + # Also add a gitlink pointing to an arbitrary repository + git -C server submodule add "$(pwd)/repo_for_submodule" c && + git -C server add a b c && +
[PATCH v7 11/16] clone: partial clone
From: Jonathan Tan Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- builtin/clone.c | 22 -- t/t5601-clone.sh | 49 + 2 files changed, 69 insertions(+), 2 deletions(-) diff --git a/builtin/clone.c b/builtin/clone.c index dbddd98..f519bd4 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -26,6 +26,7 @@ #include "run-command.h" #include "connected.h" #include "packfile.h" +#include "list-objects-filter-options.h" /* * Overall FIXMEs: @@ -60,6 +61,7 @@ static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP; static int option_dissociate; static int max_jobs = -1; static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP; +static struct list_objects_filter_options filter_options; static int recurse_submodules_cb(const struct option *opt, const char *arg, int unset) @@ -135,6 +137,7 @@ static struct option builtin_clone_options[] = { TRANSPORT_FAMILY_IPV4), OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"), TRANSPORT_FAMILY_IPV6), + OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), OPT_END() }; @@ -886,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) struct refspec *refspec; const char *fetch_pattern; + fetch_if_missing = 0; + packet_trace_identity("clone"); argc = parse_options(argc, argv, prefix, builtin_clone_options, builtin_clone_usage, 0); @@ -1073,6 +1078,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) warning(_("--shallow-since is ignored in local clones; use file:// instead.")); if (option_not.nr) warning(_("--shallow-exclude is ignored in local clones; use file:// instead.")); + if (filter_options.choice) + warning(_("--filter is ignored in local clones; use file:// instead.")); if (!access(mkpath("%s/shallow", path), F_OK)) { if (option_local > 0) warning(_("source repository is shallow, ignoring --local")); @@ -1104,7 +,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix) transport_set_option(transport, TRANS_OPT_UPLOADPACK, option_upload_pack); - if (transport->smart_options && !deepen) + if (filter_options.choice) { + transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER, +filter_options.filter_spec); + transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1"); + } + + if (transport->smart_options && !deepen && !filter_options.choice) transport->smart_options->check_self_contained_and_connected = 1; refs = transport_get_remote_refs(transport); @@ -1164,13 +1177,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix) write_refspec_config(src_ref_prefix, our_head_points_at, remote_head_points_at, &branch_top); + if (filter_options.choice) + partial_clone_register("origin", &filter_options); + if (is_local) clone_local(path, git_dir); else if (refs && complete_refs_before_fetch) transport_fetch_refs(transport, mapped_refs); update_remote_refs(refs, mapped_refs, remote_head_points_at, - branch_top.buf, reflog_msg.buf, transport, !is_local); + branch_top.buf, reflog_msg.buf, transport, + !is_local && !filter_options.choice); update_head(our_head_points_at, remote_head, reflog_msg.buf); @@ -1191,6 +1208,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) } junk_mode = JUNK_LEAVE_REPO; + fetch_if_missing = 1; err = checkout(submodule_progress); strbuf_release(&reflog_msg); diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 9c56f77..6d37c6d 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable pack' ' git -C replay.git index-pack -v --stdin err && + + test_i18ngrep "filtering not recognized by server" err +' + +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +test_expect_success 'partial clone using HTTP' ' + partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" "$HTTPD_URL/smart/server" +' + +stop_httpd + test_done -- 2.9.3
[PATCH v7 15/16] fetch: inherit filter-spec from partial clone
From: Jeff Hostetler Teach (partial) fetch to inherit the filter-spec used by the partial clone. Extend --no-filter to override this inheritance. Signed-off-by: Jeff Hostetler --- builtin/fetch-pack.c | 2 +- builtin/fetch.c | 56 --- builtin/rev-list.c| 2 +- list-objects-filter-options.c | 2 +- list-objects-filter-options.h | 12 ++ t/t5616-partial-clone.sh | 22 - 6 files changed, 89 insertions(+), 7 deletions(-) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index cbf5035..a7bc136 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -158,7 +158,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) continue; } if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) { - list_objects_filter_release(&args.filter_options); + list_objects_filter_set_no_filter(&args.filter_options); continue; } usage(fetch_pack_usage); diff --git a/builtin/fetch.c b/builtin/fetch.c index 14aab71..79c866c 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1275,6 +1275,56 @@ static int fetch_multiple(struct string_list *list) return result; } +/* + * Fetching from the promisor remote should use the given filter-spec + * or inherit the default filter-spec from the config. + */ +static inline void fetch_one_setup_partial(struct remote *remote) +{ + /* +* Explicit --no-filter argument overrides everything, regardless +* of any prior partial clones and fetches. +*/ + if (filter_options.no_filter) + return; + + /* +* If no prior partial clone/fetch and the current fetch DID NOT +* request a partial-fetch, do a normal fetch. +*/ + if (!repository_format_partial_clone && !filter_options.choice) + return; + + /* +* If this is the FIRST partial-fetch request, we enable partial +* on this repo and remember the given filter-spec as the default +* for subsequent fetches to this remote. +*/ + if (!repository_format_partial_clone && filter_options.choice) { + partial_clone_register(remote->name, &filter_options); + return; + } + + /* +* We are currently limited to only ONE promisor remote and only +* allow partial-fetches from the promisor remote. +*/ + if (strcmp(remote->name, repository_format_partial_clone)) { + if (filter_options.choice) + die(_("--filter can only be used with the remote configured in core.partialClone")); + return; + } + + /* +* Do a partial-fetch from the promisor remote using either the +* explicitly given filter-spec or inherit the filter-spec from +* the config. +*/ + if (!filter_options.choice) + partial_clone_get_default_filter_spec(&filter_options); + return; +} + static int fetch_one(struct remote *remote, int argc, const char **argv) { static const char **refs = NULL; @@ -1404,13 +1454,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) } if (remote) { - if (filter_options.choice && - strcmp(remote->name, repository_format_partial_clone)) - die(_("--filter can only be used with the remote configured in core.partialClone")); + if (filter_options.choice || repository_format_partial_clone) + fetch_one_setup_partial(remote); result = fetch_one(remote, argc, argv); } else { if (filter_options.choice) die(_("--filter can only be used with the remote configured in core.partialClone")); + /* TODO should this also die if we have a previous partial-clone? */ result = fetch_multiple(&list); } diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 48f922d..8503dea 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -460,7 +460,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) continue; } if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) { - list_objects_filter_release(&filter_options); + list_objects_filter_set_no_filter(&filter_options); continue; } if (!strcmp(arg, "--filter-print-omitted")) { diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 5c47e2b..6a3cc98 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -94,7 +94,7 @@ int opt_parse_list_objects_filter(const struct option *opt, struct list_o
[PATCH v7 16/16] t5616: test bulk prefetch after partial fetch
From: Jeff Hostetler Add test to t5616 to bulk fetch missing objects following a partial fetch. A technique like this could be used in a pre-command hook for example. Signed-off-by: Jeff Hostetler --- t/t5616-partial-clone.sh | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 433e07e..29d8631 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -105,7 +105,7 @@ test_expect_success 'push new commits to server for file.2.txt' ' git -C src push -u srv master ' -# Do FULL fetch by disabling filter-spec using --no-filter. +# Do FULL fetch by disabling inherited filter-spec using --no-filter. # Verify we have all the new blobs. test_expect_success 'override inherited filter-spec using --no-filter' ' git -C pc1 fetch --no-filter origin && @@ -113,4 +113,34 @@ test_expect_success 'override inherited filter-spec using --no-filter' ' test_line_count = 0 observed ' +# create new commits in "src" repo to establish a history on file.3.txt +# and push to "srv.bare". +test_expect_success 'push new commits to server for file.3.txt' ' + for x in a b c d e f + do + echo "Mod file.3.txt $x" >>src/file.3.txt + git -C src add file.3.txt + git -C src commit -m "mod $x" + done && + git -C src push -u srv master +' + +# Do a partial fetch and then try to manually fetch the missing objects. +# This can be used as the basis of a pre-command hook to bulk fetch objects +# perhaps combined with a command in dry-run mode. +test_expect_success 'manual prefetch of missing objects' ' + git -C pc1 fetch --filter=blob:none origin && + git -C pc1 rev-list master..origin/master --quiet --objects --missing=print \ + | awk -f print_1.awk \ + | sed "s/?//" \ + | sort >observed.oids && + test_line_count = 6 observed.oids && + git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" observed.oids && + test_line_count = 0 observed.oids +' + test_done -- 2.9.3
[PATCH v7 10/16] partial-clone: define partial clone settings in config
From: Jeff Hostetler Create get and set routines for "partial clone" config settings. These will be used in a future commit by clone and fetch to remember the promisor remote and the default filter-spec. Signed-off-by: Jeff Hostetler --- cache.h | 1 + config.c | 5 +++ environment.c | 1 + list-objects-filter-options.c | 90 +++ list-objects-filter-options.h | 6 +++ 5 files changed, 88 insertions(+), 15 deletions(-) diff --git a/cache.h b/cache.h index 6980072..bccc510 100644 --- a/cache.h +++ b/cache.h @@ -861,6 +861,7 @@ extern int grafts_replace_parents; #define GIT_REPO_VERSION_READ 1 extern int repository_format_precious_objects; extern char *repository_format_partial_clone; +extern const char *core_partial_clone_filter_default; struct repository_format { int version; diff --git a/config.c b/config.c index adb7d7a..adeee04 100644 --- a/config.c +++ b/config.c @@ -1241,6 +1241,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.partialclonefilter")) { + return git_config_string(&core_partial_clone_filter_default, +var, value); + } + /* Add other config variables here and to Documentation/config.txt. */ return 0; } diff --git a/environment.c b/environment.c index e52aab3..7537565 100644 --- a/environment.c +++ b/environment.c @@ -28,6 +28,7 @@ int warn_on_object_refname_ambiguity = 1; int ref_paranoia = -1; int repository_format_precious_objects; char *repository_format_partial_clone; +const char *core_partial_clone_filter_default; const char *git_commit_encoding; const char *git_log_output_encoding; const char *apply_default_whitespace; diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index 4c5b34e..5c47e2b 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -21,29 +21,36 @@ * subordinate commands when necessary. We also "intern" the arg for * the convenience of the current command. */ -int parse_list_objects_filter(struct list_objects_filter_options *filter_options, - const char *arg) +static int gently_parse_list_objects_filter( + struct list_objects_filter_options *filter_options, + const char *arg, + struct strbuf *errbuf) { const char *v0; - if (filter_options->choice) - die(_("multiple object filter types cannot be combined")); + if (filter_options->choice) { + if (errbuf) { + strbuf_init(errbuf, 0); + strbuf_addstr( + errbuf, + _("multiple filter-specs cannot be combined")); + } + return 1; + } filter_options->filter_spec = strdup(arg); if (!strcmp(arg, "blob:none")) { filter_options->choice = LOFC_BLOB_NONE; return 0; - } - if (skip_prefix(arg, "blob:limit=", &v0)) { - if (!git_parse_ulong(v0, &filter_options->blob_limit_value)) - die(_("invalid filter-spec expression '%s'"), arg); - filter_options->choice = LOFC_BLOB_LIMIT; - return 0; - } + } else if (skip_prefix(arg, "blob:limit=", &v0)) { + if (git_parse_ulong(v0, &filter_options->blob_limit_value)) { + filter_options->choice = LOFC_BLOB_LIMIT; + return 0; + } - if (skip_prefix(arg, "sparse:oid=", &v0)) { + } else if (skip_prefix(arg, "sparse:oid=", &v0)) { struct object_context oc; struct object_id sparse_oid; @@ -57,15 +64,27 @@ int parse_list_objects_filter(struct list_objects_filter_options *filter_options filter_options->sparse_oid_value = oiddup(&sparse_oid); filter_options->choice = LOFC_SPARSE_OID; return 0; - } - if (skip_prefix(arg, "sparse:path=", &v0)) { + } else if (skip_prefix(arg, "sparse:path=", &v0)) { filter_options->choice = LOFC_SPARSE_PATH; filter_options->sparse_path_value = strdup(v0); return 0; } - die(_("invalid filter-spec expression '%s'"), arg); + if (errbuf) { + strbuf_init(errbuf, 0); + strbuf_addf(errbuf, "invalid filter-spec '%s'", arg); + } + memset(filter_options, 0, sizeof(*filter_options)); + return 1; +} + +int parse_list_objects_filter(struct list_objects_filter_options *filter_options, + const char *arg) +{ + struct strbuf buf = STRBUF_INIT; + if (gently_parse_list_objects_filter(filter_options, arg, &buf)) + di
[PATCH v7 14/16] t5616: end-to-end tests for partial clone
From: Jeff Hostetler Additional end-to-end tests for partial clone. Signed-off-by: Jeff Hostetler --- t/t5616-partial-clone.sh | 96 1 file changed, 96 insertions(+) create mode 100755 t/t5616-partial-clone.sh diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh new file mode 100755 index 000..3b8cc0b --- /dev/null +++ b/t/t5616-partial-clone.sh @@ -0,0 +1,96 @@ +#!/bin/sh + +test_description='git partial clone' + +. ./test-lib.sh + +# create a normal "src" repo where we can later create new commits. +# expect_1.oids will contain a list of the OIDs of all blobs. +test_expect_success 'setup normal src repo' ' + echo "{print \$1}" >print_1.awk && + echo "{print \$2}" >print_2.awk && + + git init src && + for n in 1 2 3 4 + do + echo "This is file: $n" > src/file.$n.txt + git -C src add file.$n.txt + git -C src commit -m "file $n" + git -C src ls-files -s file.$n.txt >>temp + done && + awk -f print_2.awk expect_1.oids && + test_line_count = 4 expect_1.oids +' + +# bare clone "src" giving "srv.bare" for use as our server. +test_expect_success 'setup bare clone for server' ' + git clone --bare "file://$(pwd)/src" srv.bare && + git -C srv.bare config --local uploadpack.allowfilter 1 && + git -C srv.bare config --local uploadpack.allowanysha1inwant 1 +' + +# do basic partial clone from "srv.bare" +# confirm we are missing all of the known blobs. +# confirm partial clone was registered in the local config. +test_expect_success 'do partial clone 1' ' + git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 && + git -C pc1 rev-list HEAD --quiet --objects --missing=print \ + | awk -f print_1.awk \ + | sed "s/?//" \ + | sort >observed.oids && + test_cmp expect_1.oids observed.oids && + test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" && + test "$(git -C pc1 config --local extensions.partialclone)" = "origin" && + test "$(git -C pc1 config --local core.partialclonefilter)" = "blob:none" +' + +# checkout master to force dynamic object fetch of blobs at HEAD. +test_expect_success 'verify checkout with dynamic object fetch' ' + git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed && + test_line_count = 4 observed && + git -C pc1 checkout master && + git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed && + test_line_count = 0 observed +' + +# create new commits in "src" repo to establish a blame history on file.1.txt +# and push to "srv.bare". +test_expect_success 'push new commits to server' ' + git -C src remote add srv "file://$(pwd)/srv.bare" && + for x in a b c d e + do + echo "Mod $x" >>src/file.1.txt + git -C src add file.1.txt + git -C src commit -m "mod $x" + done && + git -C src blame master -- file.1.txt >expect.blame && + git -C src push -u srv master +' + +# (partial) fetch in the partial clone repo from the promisor remote. +# verify that fetch inherited the filter-spec from the config and DOES NOT +# have the new blobs. +test_expect_success 'partial fetch inherits filter settings' ' + git -C pc1 fetch origin && + git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed && + test_line_count = 5 observed +' + +# force dynamic object fetch using diff. +# we should only get 1 new blob (for the file in origin/master). +test_expect_success 'verify diff causes dynamic object fetch' ' + git -C pc1 diff master..origin/master -- file.1.txt && + git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed && + test_line_count = 4 observed +' + +# force full dynamic object fetch of the file's history using blame. +# we should get the intermediate blobs for the file. +test_expect_success 'verify blame causes dynamic object fetch' ' + git -C pc1 blame origin/master -- file.1.txt >observed.blame && + test_cmp expect.blame observed.blame && + git -C pc1 rev-list master..origin/master --quiet --objects --missing=print >observed && + test_line_count = 0 observed +' + +test_done -- 2.9.3
[PATCH v7 06/16] fetch-pack: add --no-filter
From: Jeff Hostetler Fixup fetch-pack to accept --no-filter to be consistent with rev-list and pack-objects. Signed-off-by: Jeff Hostetler --- builtin/fetch-pack.c | 4 1 file changed, 4 insertions(+) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 7957807..cbf5035 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -157,6 +157,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) parse_list_objects_filter(&args.filter_options, arg); continue; } + if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) { + list_objects_filter_release(&args.filter_options); + continue; + } usage(fetch_pack_usage); } if (deepen_not.nr) -- 2.9.3
[PATCH v7 08/16] fetch: refactor calculation of remote list
From: Jonathan Tan Separate out the calculation of remotes to be fetched from and the actual fetching. This will allow us to include an additional step before the actual fetching in a subsequent commit. Signed-off-by: Jonathan Tan --- builtin/fetch.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 225c734..1b1f039 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) { int i; struct string_list list = STRING_LIST_INIT_DUP; - struct remote *remote; + struct remote *remote = NULL; int result = 0; struct argv_array argv_gc_auto = ARGV_ARRAY_INIT; @@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) else if (argc > 1) die(_("fetch --all does not make sense with refspecs")); (void) for_each_remote(get_one_remote_for_fetch, &list); - result = fetch_multiple(&list); } else if (argc == 0) { /* No arguments -- use default remote */ remote = remote_get(NULL); - result = fetch_one(remote, argc, argv); } else if (multiple) { /* All arguments are assumed to be remotes or groups */ for (i = 0; i < argc; i++) if (!add_remote_or_group(argv[i], &list)) die(_("No such remote or remote group: %s"), argv[i]); - result = fetch_multiple(&list); } else { /* Single remote or group */ (void) add_remote_or_group(argv[0], &list); @@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) /* More than one remote */ if (argc > 1) die(_("Fetching a group and specifying refspecs does not make sense")); - result = fetch_multiple(&list); } else { /* Zero or one remotes */ remote = remote_get(argv[0]); - result = fetch_one(remote, argc-1, argv+1); + argc--; + argv++; } } + if (remote) + result = fetch_one(remote, argc, argv); + else + result = fetch_multiple(&list); + if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) { struct argv_array options = ARGV_ARRAY_INIT; -- 2.9.3
[PATCH v7 05/16] fetch-pack, index-pack, transport: partial clone
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- builtin/fetch-pack.c | 4 fetch-pack.c | 13 + fetch-pack.h | 2 ++ transport-helper.c | 5 + transport.c | 4 transport.h | 5 + 6 files changed, 33 insertions(+) diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 15eeed7..7957807 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) args.no_dependents = 1; continue; } + if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) { + parse_list_objects_filter(&args.filter_options, arg); + continue; + } usage(fetch_pack_usage); } if (deepen_not.nr) diff --git a/fetch-pack.c b/fetch-pack.c index 0798e0b..3c5f064 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -29,6 +29,7 @@ static int deepen_not_ok; static int fetch_fsck_objects = -1; static int transfer_fsck_objects = -1; static int agent_supported; +static int server_supports_filtering; static struct lock_file shallow_lock; static const char *alternate_shallow_file; @@ -379,6 +380,8 @@ static int find_common(struct fetch_pack_args *args, if (deepen_not_ok) strbuf_addstr(&c, " deepen-not"); if (agent_supported)strbuf_addf(&c, " agent=%s", git_user_agent_sanitized()); + if (args->filter_options.choice) + strbuf_addstr(&c, " filter"); packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf); strbuf_release(&c); } else @@ -407,6 +410,9 @@ static int find_common(struct fetch_pack_args *args, packet_buf_write(&req_buf, "deepen-not %s", s->string); } } + if (server_supports_filtering && args->filter_options.choice) + packet_buf_write(&req_buf, "filter %s", +args->filter_options.filter_spec); packet_buf_flush(&req_buf); state_len = req_buf.len; @@ -969,6 +975,13 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args, else prefer_ofs_delta = 0; + if (server_supports("filter")) { + server_supports_filtering = 1; + print_verbose(args, _("Server supports filter")); + } else if (args->filter_options.choice) { + warning("filtering not recognized by server, ignoring"); + } + if ((agent_feature = server_feature_value("agent", &agent_len))) { agent_supported = 1; if (agent_len) diff --git a/fetch-pack.h b/fetch-pack.h index aeac152..3e224a1 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -3,6 +3,7 @@ #include "string-list.h" #include "run-command.h" +#include "list-objects-filter-options.h" struct oid_array; @@ -12,6 +13,7 @@ struct fetch_pack_args { int depth; const char *deepen_since; const struct string_list *deepen_not; + struct list_objects_filter_options filter_options; unsigned deepen_relative:1; unsigned quiet:1; unsigned keep_pack:1; diff --git a/transport-helper.c b/transport-helper.c index c948d52..0650ca0 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -671,6 +671,11 @@ static int fetch(struct transport *transport, if (data->transport_options.update_shallow) set_helper_option(transport, "update-shallow", "true"); + if (data->transport_options.filter_options.choice) + set_helper_option( + transport, "filter", + data->transport_options.filter_options.filter_spec); + if (data->fetch) return fetch_with_fetch(transport, nr_heads, to_fetch); diff --git a/transport.c b/transport.c index f2fbc6f..cce50f5 100644 --- a/transport.c +++ b/transport.c @@ -166,6 +166,9 @@ static int set_git_option(struct git_transport_options *opts, } else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) { opts->no_dependents = !!value; return 0; + } else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) { + parse_list_objects_filter(&opts->filter_options, value); + return 0; } return 1; } @@ -236,6 +239,7 @@ static int fetch_refs_via_pack(struct transport *transport, args.update_shallow = data->options.update_shallow; args.from_promisor = data->options.from_promisor; args.no_dependents = data->options.no_dependents; + args.filter_options = data->options.filter_options; if (!data->got_remote_heads) { connect_setup(transport,
[PATCH v7 07/16] fetch-pack: test support excluding large blobs
From: Jonathan Tan Created tests to verify fetch-pack and upload-pack support for excluding large blobs using --filter=blobs:limit= parameter. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- t/t5500-fetch-pack.sh | 27 +++ upload-pack.c | 13 + 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 80a1a32..c57916b 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' ' ) ' +test_expect_success 'filtering by size' ' + rm -rf server client && + test_create_repo server && + test_commit -C server one && + test_config -C server uploadpack.allowfilter 1 && + + test_create_repo client && + git -C client fetch-pack --filter=blob:limit=0 ../server HEAD && + + # Ensure that object is not inadvertently fetched + test_must_fail git -C client cat-file -e $(git hash-object server/one.t) +' + +test_expect_success 'filtering by size has no effect if support for it is not advertised' ' + rm -rf server client && + test_create_repo server && + test_commit -C server one && + + test_create_repo client && + git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err && + + # Ensure that object is fetched + git -C client cat-file -e $(git hash-object server/one.t) && + + test_i18ngrep "filtering not recognized by server" err +' + test_done diff --git a/upload-pack.c b/upload-pack.c index e6d38b9..15b6605 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -139,10 +139,15 @@ static void create_pack_file(void) if (use_include_tag) argv_array_push(&pack_objects.args, "--include-tag"); if (filter_options.filter_spec) { - struct strbuf buf = STRBUF_INIT; - sq_quote_buf(&buf, filter_options.filter_spec); - argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf); - strbuf_release(&buf); + if (pack_objects.use_shell) { + struct strbuf buf = STRBUF_INIT; + sq_quote_buf(&buf, filter_options.filter_spec); + argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf); + strbuf_release(&buf); + } else { + argv_array_pushf(&pack_objects.args, "--filter=%s", +filter_options.filter_spec); + } } pack_objects.in = -1; -- 2.9.3
[PATCH v7 01/16] sha1_file: support lazily fetching missing objects
From: Jonathan Tan Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan --- builtin/cat-file.c | 2 ++ builtin/fetch-pack.c | 2 ++ builtin/fsck.c | 3 +++ builtin/index-pack.c | 6 ++ cache.h | 8 fetch-object.c | 3 +++ sha1_file.c | 32 ++ t/t0410-partial-clone.sh | 51 8 files changed, 99 insertions(+), 8 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index f5fa4fd..cf9ea5c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt) for_each_loose_object(batch_loose_object, &sa, 0); for_each_packed_object(batch_packed_object, &sa, 0); + if (repository_format_partial_clone) + warning("This repository has extensions.partialClone set. Some objects may not be loaded."); cb.opt = opt; cb.expand = &data; diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 02abe72..15eeed7 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) struct oid_array shallow = OID_ARRAY_INIT; struct string_list deepen_not = STRING_LIST_INIT_DUP; + fetch_if_missing = 0; + packet_trace_identity("fetch-pack"); memset(&args, 0, sizeof(args)); diff --git a/builtin/fsck.c b/builtin/fsck.c index 578a7c8..3b76c0e 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) int i; struct alternate_object_database *alt; + /* fsck knows how to handle missing promisor objects */ + fetch_if_missing = 0; + errors_found = 0; check_replace_refs = 0; diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 24c2f05..a0a35e6 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */ int report_end_of_input = 0; + /* +* index-pack never needs to fetch missing objects, since it only +* accesses the repo to do hash collision checks +*/ + fetch_if_missing = 0; + if (argc == 2 && !strcmp(argv[1], "-h")) usage(index_pack_usage); diff --git a/cache.h b/cache.h index c76f2e9..6980072 100644 --- a/cache.h +++ b/cache.h @@ -1727,6 +1727,14 @@ struct object_info { #define OBJECT_INFO
[PATCH v7 02/16] rev-list: support termination at promisor objects
From: Jonathan Tan Teach rev-list to support termination of an object traversal at any object from a promisor remote (whether one that the local repo also has, or one that the local repo knows about because it has another promisor object that references it). This will be used subsequently in gc and in the connectivity check used by fetch. For efficiency, if an object is referenced by a promisor object, and is in the local repo only as a non-promisor object, object traversal will not stop there. This is to avoid building the list of promisor object references. (In list-objects.c, the case where obj is NULL in process_blob() and process_tree() do not need to be changed because those happen only when there is a conflict between the expected type and the existing object. If the object doesn't exist, an object will be synthesized, which is fine.) Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- Documentation/rev-list-options.txt | 11 builtin/rev-list.c | 71 +++--- list-objects.c | 29 ++- object.c | 2 +- revision.c | 33 +++- revision.h | 5 +- t/t0410-partial-clone.sh | 101 + 7 files changed, 240 insertions(+), 12 deletions(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 8d8b7f4..0ce8ccd 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object traversal to continue if a missing object is encountered. Missing objects will silently be omitted from the results. + +The form '--missing=allow-promisor' is like 'allow-any', but will only +allow object traversal to continue for EXPECTED promisor missing objects. +Unexpected missing objects will raise an error. ++ The form '--missing=print' is like 'allow-any', but will also print a list of the missing objects. Object IDs are prefixed with a ``?'' character. endif::git-rev-list[] +--exclude-promisor-objects:: + (For internal use only.) Prefilter object traversal at + promisor boundary. This is used with partial clone. This is + stronger than `--missing=allow-promisor` because it limits the + traversal, rather than just silencing errors about missing + objects. + --no-walk[=(sorted|unsorted)]:: Only show the given commits, but do not traverse their ancestors. This has no effect if a range is specified. If the argument diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 547a3e0..48f922d 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -15,6 +15,7 @@ #include "progress.h" #include "reflog-walk.h" #include "oidset.h" +#include "packfile.h" static const char rev_list_usage[] = "git rev-list [OPTION] ... [ -- paths... ]\n" @@ -67,6 +68,7 @@ enum missing_action { MA_ERROR = 0,/* fail if any missing objects are encountered */ MA_ALLOW_ANY,/* silently allow ALL missing objects */ MA_PRINT,/* print ALL missing objects in special section */ + MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */ }; static enum missing_action arg_missing_action; @@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void *data) static inline void finish_object__ma(struct object *obj) { + /* +* Whether or not we try to dynamically fetch missing objects +* from the server, we currently DO NOT have the object. We +* can either print, allow (ignore), or conditionally allow +* (ignore) them. +*/ switch (arg_missing_action) { case MA_ERROR: die("missing blob object '%s'", oid_to_hex(&obj->oid)); @@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj) oidset_insert(&missing_objects, &obj->oid); return; + case MA_ALLOW_PROMISOR: + if (is_promisor_object(&obj->oid)) + return; + die("unexpected missing blob object '%s'", + oid_to_hex(&obj->oid)); + return; + default: BUG("unhandled missing_action"); return; } } -static void finish_object(struct object *obj, const char *name, void *cb_data) +static int finish_object(struct object *obj, const char *name, void *cb_data) { struct rev_list_info *info = cb_data; - if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) + if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) { finish_object__ma(obj); + return 1; + } if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) parse_object(&obj->oid); + return 0; } static void show_object(stru
[PATCH v7 04/16] upload-pack: add object filtering for partial clone
From: Jeff Hostetler Teach upload-pack to negotiate object filtering over the protocol and to send filter parameters to pack-objects. This is intended for partial clone and fetch. The idea to make upload-pack configurable using uploadpack.allowFilter comes from Jonathan Tan's work in [1]. [1] https://public-inbox.org/git/f211093280b422c32cc1b7034130072f35c5ed51.1506714999.git.jonathanta...@google.com/ Signed-off-by: Jeff Hostetler --- Documentation/config.txt | 4 Documentation/technical/pack-protocol.txt | 8 +++ Documentation/technical/protocol-capabilities.txt | 8 +++ upload-pack.c | 26 ++- 4 files changed, 45 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1ac0ae6..e528210 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3268,6 +3268,10 @@ uploadpack.packObjectsHook:: was run. I.e., `upload-pack` will feed input intended for `pack-objects` to the hook, and expects a completed packfile on stdout. + +uploadpack.allowFilter:: + If this option is set, `upload-pack` will advertise partial + clone and partial fetch object filtering. + Note that this configuration variable is ignored if it is seen in the repository-level config (this is a safety measure against fetching from diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index ed1eae8..a43a113 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -212,6 +212,7 @@ out of what the server said it could do with the first 'want' line. upload-request= want-list *shallow-line *1depth-request + [filter-request] flush-pkt want-list = first-want @@ -227,6 +228,8 @@ out of what the server said it could do with the first 'want' line. additional-want = PKT-LINE("want" SP obj-id) depth = 1*DIGIT + + filter-request= PKT-LINE("filter" SP filter-spec) Clients MUST send all the obj-ids it wants from the reference @@ -249,6 +252,11 @@ complete those commits. Commits whose parents are not received as a result are defined as shallow and marked as such in the server. This information is sent back to the client in the next step. +The client can optionally request that pack-objects omit various +objects from the packfile using one of several filtering techniques. +These are intended for use with partial clone and partial fetch +operations. See `rev-list` for possible "filter-spec" values. + Once all the 'want's and 'shallow's (and optional 'deepen') are transferred, clients MUST send a flush-pkt, to tell the server side that it is done sending the list. diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 26dcc6f..332d209 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the to be included in the push certificate. A send-pack client MUST NOT send a push-cert packet unless the receive-pack server advertises this capability. + +filter +-- + +If the upload-pack server advertises the 'filter' capability, +fetch-pack may send "filter" commands to request a partial clone +or partial fetch and request that the server omit various objects +from the packfile. diff --git a/upload-pack.c b/upload-pack.c index e25f725..e6d38b9 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -10,6 +10,8 @@ #include "diff.h" #include "revision.h" #include "list-objects.h" +#include "list-objects-filter.h" +#include "list-objects-filter-options.h" #include "run-command.h" #include "connect.h" #include "sigchain.h" @@ -18,6 +20,7 @@ #include "parse-options.h" #include "argv-array.h" #include "prio-queue.h" +#include "quote.h" static const char * const upload_pack_usage[] = { N_("git upload-pack [] "), @@ -64,6 +67,10 @@ static int advertise_refs; static int stateless_rpc; static const char *pack_objects_hook; +static int filter_capability_requested; +static int filter_advertise; +static struct list_objects_filter_options filter_options; + static void reset_timeout(void) { alarm(timeout); @@ -131,6 +138,12 @@ static void create_pack_file(void) argv_array_push(&pack_objects.args, "--delta-base-offset"); if (use_include_tag) argv_array_push(&pack_objects.args, "--include-tag"); + if (filter_options.filter_spec) { + struct strbuf buf = STRBUF_INIT; + sq_quote_buf(&buf, filter_options.filter_spec); + argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf); + strbuf_release(&buf);
Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)
Igor Djordjevic writes: > To get back on track, and regarding what`s already been said, would > having something like this(1) feel useful? > > (1) git commit --onto Are you asking me if _I_ find it useful? It is not a very useful question to ask, as I've taken things that I do not find useful myself. Having said that, I do not see me personally using it. You keep claiming that committing without ever materializing the exact state that is committed in the working tree is a good thing. I do not subscribe to that view. I'd rather do a quick fix-up on top (which ensures that at least the fix-up works in the context of the tip), and then "rebase -i" to move it a more appropriate place in the history (during which I have a chance to ensure that the fix-up works in the context it is intended to apply to). I know that every time I say this, people who prefer to commit things that never existed in the working tree will say "but we'll test it later after we make these commit without having their state in the working tree". But I also know better that "later" often do not come, ever, at least for people like me ;-). The amount of work _required_ to record the fix-up at its final resting place deeper in the history would be larger with "rebase -i" approach, simply because approaches like "commit --onto" and "git post" that throw a new commit deep in the history would not require ever materializing it in the working tree. But because I care about what I am actually committing, and because I am just lazy as any other human (if not more), I'd prefer an apporach that _forces_ me to have a checkout of the exact state that I'd be committing. That would prod me to actually looking at and testing the state after the change in the context it is meant to go.
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
On 08/12/17 09:34, Jeff King wrote: > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote: [snip] >> Junio hinted at a different approach of solving this problem, which this >> patch implements. Teach the diff machinery another flag for restricting >> the information to what is shown. For example: >> >> $ ./git log --oneline --blobfind=v2.0.0:Makefile >> b2feb64309 Revert the whole "ask curl-config" topic for now >> 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" >> >> we observe that the Makefile as shipped with 2.0 was introduced in >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by >> a different blob. Sorry, this has nothing to do with this specific thread. :( However, the above caught my eye. Commit b2feb64309 does not 'replace the Makefile with a different blob'. That commit was a 'last minute' revert of a topic _prior_ to v2.0.0, which resulted in the _same_ blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20). [I haven't been following this topic, so just ignore me if I have misunderstood what the above was describing! :-D ] ATB, Ramsay Jones
Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax
Jeff King writes: > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments > with ",", 2017-10-01) switched the syntax of the trailers > placeholder, but forgot to update the documentation in > pretty-formats.txt. > > There's need to mention the old syntax; it was never in a > released version of Git. There's or There's no? > > Signed-off-by: Jeff King > --- > This should go on top of tb/delimit-pretty-trailers-args-with-comma. > > Documentation/pretty-formats.txt | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/Documentation/pretty-formats.txt > b/Documentation/pretty-formats.txt > index d433d50f81..e664c088a5 100644 > --- a/Documentation/pretty-formats.txt > +++ b/Documentation/pretty-formats.txt > @@ -204,11 +204,13 @@ endif::git-rev-list[] >than given and there are spaces on its left, use those spaces > - '%><()', '%><|()': similar to '% <()', '%<|()' >respectively, but padding both sides (i.e. the text is centered) > -- %(trailers): display the trailers of the body as interpreted by > - linkgit:git-interpret-trailers[1]. If the `:only` option is given, > - omit non-trailer lines from the trailer block. If the `:unfold` > - option is given, behave as if interpret-trailer's `--unfold` option > - was given. E.g., `%(trailers:only:unfold)` to do both. > +- %(trailers[:options]): display the trailers of the body as interpreted > + by linkgit:git-interpret-trailers[1]. The `trailers` string may be > + followed by a colon and zero or more comma-separated options. If the > + `only` option is given, omit non-trailer lines from the trailer block. > + If the `unfold` option is given, behave as if interpret-trailer's > + `--unfold` option was given. E.g., `%(trailers:only,unfold)` to do > + both. > > NOTE: Some placeholders may depend on other options given to the > revision traversal engine. For example, the `%g*` reflog options will
Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax
Jeff King writes: > On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote: > >> On Fri, Dec 8, 2017 at 12:16 AM, Jeff King wrote: >> > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments >> > with ",", 2017-10-01) switched the syntax of the trailers >> > placeholder, but forgot to update the documentation in >> > pretty-formats.txt. >> > >> > There's need to mention the old syntax; it was never in a >> >> I suppose you mean: s/need/no need/ > > Yes, indeed. Thanks. Ah, I probably should switch to 'read-only' mode until I finish my inbox.
[PATCH 0/2] Offer more information in `git version --build-options`'s output
In Git for Windows, we ask users to paste the output of said command into their bug reports, with the idea that this frequently helps identify where the problems are coming from. There are some obvious missing bits of information in said output, though, and this patch series tries to fill the gaps at least a little. Adric Norris (1): git version --build-options: report the build platform, too Johannes Schindelin (1): version --build-options: report commit, too, if possible Makefile | 4 +++- help.c| 4 help.h| 13 + version.c | 1 + version.h | 1 + 5 files changed, 22 insertions(+), 1 deletion(-) base-commit: 95ec6b1b3393eb6e26da40c565520a8db9796e9f Published-As: https://github.com/dscho/git/releases/tag/built-from-commit-v1 Fetch-It-Via: git fetch https://github.com/dscho/git built-from-commit-v1 -- 2.15.1.windows.2
[PATCH 2/2] version --build-options: report commit, too, if possible
In particular when local tags are used (or tags that are pushed to some fork) to build Git, it is very hard to figure out from which particular revision a particular Git executable was built. Let's just report that in our build options. We need to be careful, though, to report when the current commit cannot be determined, e.g. when building from a tarball without any associated Git repository. Signed-off-by: Johannes Schindelin --- Makefile | 4 +++- help.c| 2 ++ version.c | 1 + version.h | 1 + 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index fef9c8d2725..92a0ae3d8e3 100644 --- a/Makefile +++ b/Makefile @@ -1893,7 +1893,9 @@ builtin/help.sp builtin/help.s builtin/help.o: EXTRA_CPPFLAGS = \ version.sp version.s version.o: GIT-VERSION-FILE GIT-USER-AGENT version.sp version.s version.o: EXTRA_CPPFLAGS = \ '-DGIT_VERSION="$(GIT_VERSION)"' \ - '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' + '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' \ + '-DGIT_BUILT_FROM_COMMIT="$(shell git rev-parse -q --verify HEAD || \ + echo "(unknown)")"' $(BUILT_INS): git$X $(QUIET_BUILT_IN)$(RM) $@ && \ diff --git a/help.c b/help.c index df1332fa3c9..6ebea665780 100644 --- a/help.c +++ b/help.c @@ -413,6 +413,8 @@ int cmd_version(int argc, const char **argv, const char *prefix) printf("git version %s\n", git_version_string); if (build_options) { + printf("built from commit: %s\n", + git_built_from_commit_string); printf("sizeof-long: %d\n", (int)sizeof(long)); printf("machine: %s\n", build_platform); /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */ diff --git a/version.c b/version.c index 6106a8098c8..41b718c29e1 100644 --- a/version.c +++ b/version.c @@ -3,6 +3,7 @@ #include "strbuf.h" const char git_version_string[] = GIT_VERSION; +const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT; const char *git_user_agent(void) { diff --git a/version.h b/version.h index 6911a4f1558..7c62e805771 100644 --- a/version.h +++ b/version.h @@ -2,6 +2,7 @@ #define VERSION_H extern const char git_version_string[]; +extern const char git_built_from_commit_string[]; const char *git_user_agent(void); const char *git_user_agent_sanitized(void); -- 2.15.1.windows.2
[PATCH 1/2] git version --build-options: report the build platform, too
From: Adric Norris When asking for bug reports to include the output of `git version --build-options`, the idea is that we get a better idea of the environment where said bug occurs. In this context, it is useful to distinguish between 32 and 64-bit builds. We start by distinguishing between x86 and x86_64 (hoping that interested parties will extend this to other architectures in the future). In addition, all 32-bit variants (i686, i586, etc.) are collapsed into `x86'. An example of the output is: $ git version --build-options git version 2.9.3.windows.2.826.g06c0f2f sizeof-long: 4 machine: x86_64 The label of `machine' was chosen so the new information will approximate the output of `uname -m'. Signed-off-by: Adric Norris Signed-off-by: Johannes Schindelin --- help.c | 2 ++ help.h | 13 + 2 files changed, 15 insertions(+) diff --git a/help.c b/help.c index 88a3aeaeb9f..df1332fa3c9 100644 --- a/help.c +++ b/help.c @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd) int cmd_version(int argc, const char **argv, const char *prefix) { + static char build_platform[] = GIT_BUILD_PLATFORM; int build_options = 0; const char * const usage[] = { N_("git version []"), @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char *prefix) if (build_options) { printf("sizeof-long: %d\n", (int)sizeof(long)); + printf("machine: %s\n", build_platform); /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */ } return 0; diff --git a/help.h b/help.h index b21d7c94e8c..42dd9852194 100644 --- a/help.h +++ b/help.h @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct cmdnames *main_cmds, stru */ extern void help_unknown_ref(const char *ref, const char *cmd, const char *error); #endif /* HELP_H */ + +/* + * identify build platform + */ +#ifndef GIT_BUILD_PLATFORM + #if defined __x86__ || defined __i386__ || defined __i586__ || defined __i686__ + #define GIT_BUILD_PLATFORM "x86" + #elif defined __x86_64__ + #define GIT_BUILD_PLATFORM "x86_64" + #else + #define GIT_BUILD_PLATFORM "unknown" + #endif +#endif -- 2.15.1.windows.2
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
Junio C Hamano writes: > Stefan Beller writes: > ... >> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs) >> simplify_merges(revs); >> if (revs->children.name) >> set_children(revs); >> +if (revs->diffopt.blobfind) >> +revs->simplify_history = 0; >> return 0; >> } > > It makes sense to clear this bit by default, but is this an > unconditonal clearing? In other words, is there a way for the user > to countermand this default and ask for merge simplification from > the command line, e.g. "diff --simplify-history --blobfind="? Looking at the places that assign to revs->simplify_history in revision.c it seems that "this option turns the merge simplification off unconditionally" is pretty much the norm, and this change just follows suit. So perhaps it is OK to let this pass, at least for now.
Re: [PATCH 1/2] git version --build-options: report the build platform, too
Hi, Johannes Schindelin wrote: > From: Adric Norris > > When asking for bug reports to include the output of `git version > --build-options`, the idea is that we get a better idea of the > environment where said bug occurs. In this context, it is useful to > distinguish between 32 and 64-bit builds. Neat! The cover letter gives a clearer idea of the motivation: In Git for Windows, we ask users to paste the output of said command into their bug reports, with the idea that this frequently helps identify where the problems are coming from. There are some obvious missing bits of information in said output, though, and this patch series tries to fill the gaps at least a little. Could some of that text go here, too? [...] > --- a/help.c > +++ b/help.c > @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd) > > int cmd_version(int argc, const char **argv, const char *prefix) > { > + static char build_platform[] = GIT_BUILD_PLATFORM; > int build_options = 0; > const char * const usage[] = { > N_("git version []"), > @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char > *prefix) > > if (build_options) { > printf("sizeof-long: %d\n", (int)sizeof(long)); > + printf("machine: %s\n", build_platform); Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection of a mutable static string? That is, something like printf("machine: %s\n", GIT_BUILD_PLATFORM); > /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */ What do you think of this idea? uname_M isn't one of the variables in that file, though, so that's orthogonal to this patch. [...] > --- a/help.h > +++ b/help.h > @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct > cmdnames *main_cmds, stru > */ > extern void help_unknown_ref(const char *ref, const char *cmd, const char > *error); > #endif /* HELP_H */ > + > +/* > + * identify build platform > + */ > +#ifndef GIT_BUILD_PLATFORM > + #if defined __x86__ || defined __i386__ || defined __i586__ || defined > __i686__ > + #define GIT_BUILD_PLATFORM "x86" > + #elif defined __x86_64__ > + #define GIT_BUILD_PLATFORM "x86_64" > + #else > + #define GIT_BUILD_PLATFORM "unknown" > + #endif > +#endif This code needs to be inside the HELP_H header guard. Thanks and hope that helps, Jonathan
Re: [PATCH 2/2] version --build-options: report commit, too, if possible
Hi, Johannes Schindelin wrote: > In particular when local tags are used (or tags that are pushed to some > fork) to build Git, it is very hard to figure out from which particular > revision a particular Git executable was built. Hm, can you say more about how this comes up in practice? I wonder if we should always augment the version string with the commit hash. E.g. I am running git version 2.15.1.424.g9478a66081 now, which already includes the commit hash because it disambiguates the .424 thing, but depending on the motivation, maybe we would also want git version 2.15.1.0.g9b185bef0c15 for people running v2.15.1 (or at least when such a tag is not a well known, published tag). > We need to be careful, though, to report when the current commit cannot be > determined, e.g. when building from a tarball without any associated Git > repository. This means that on Debian, it would always print built from commit: (unknown) Maybe I shouldn't care, but I wonder if there's a way to improve on that. E.g. should there be a makefile knob to allow Debian to specify what to put there? Thanks, Jonathan
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
Am 08.12.2017 um 11:14 schrieb Jeff King: > On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote: > >>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c >>> index 22034f87e7..8e8a15ea4a 100644 >>> --- a/builtin/fmt-merge-msg.c >>> +++ b/builtin/fmt-merge-msg.c >>> @@ -377,7 +377,8 @@ static void shortlog(const char *name, >>> string_list_append(&subjects, >>>oid_to_hex(&commit->object.oid)); >>> else >>> - string_list_append(&subjects, strbuf_detach(&sb, NULL)); >>> + string_list_append_nodup(&subjects, >>> +strbuf_detach(&sb, NULL)); >>> } >>> >>> if (opts->credit_people) >> >> What is leaked comes from strbuf, so the title is not a lie, but I >> tend to think that this leak is caused by a somewhat strange >> string_list API. The subjects string-list is initialized as a "dup" >> kind, but a caller that wants to avoid leaking can (and should) use >> _nodup() call to add a string without duping. It all feels a bit >> too convoluted. > > I'm not sure it's string-list's fault. Many callers (including this one) > have _some_ entries whose strings must be duplicated and others which do > not. > > So either: > >1. The list gets marked as "nodup", and we add an extra xstrdup() to the > oid_to_hex call above. And also need to remember to free() the > strings later, since the list does not own them. > > or > >2. We mark it as "dup" and incur an extra allocation and copy, like: > > string_list_append(&subjects, sb.buf); > strbuf_release(&buf); The two modes (dup/nodup) make string_list code tricky. Not sure how far we'd get with something simpler (e.g. an array of char pointers), but having the caller do all string allocations would make the code easier to analyze. > So I'd really blame the caller, which doesn't want to do (2) out of a > sense of optimization. It could also perhaps write it as: > >while (commit = get_revision(rev)) { > strbuf_reset(&sb); > ... maybe put some stuff in sb ... > if (!sb.len) > string_list_append(&subjects, oid_to_hex(obj)); > else > string_list_append(&subjects, sb.buf); >} >strbuf_release(&sb); > > which at least avoids the extra allocations. Right, we'd just have extra string copies in that case. > By the way, I think there's another quite subtle leak in this function. > We do this: > >format_commit_message(commit, "%s", &sb, &ctx); >strbuf_ltrim(&sb); > > and then only use "sb" if sb.len is non-zero. But we may have actually > allocated to create our zero-length string (e.g., if we had a strbuf > full of spaces and trimmed them all off). Since we reuse "sb" over and > over as we loop, this will actually only leak once for the whole loop, > not once per iteration. So it's probably not a big deal, but writing it > with the explicit reset/release pattern fixes that (and is more > idiomatic for our code base, I think). It's subtle, but I think it's not leaking, at least not in your example case (and I can't think of another way). IIUC format_subject(), which handles the "%s" part, doesn't touch sb if the subject is made up only of whitespace. René
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
Am 07.12.2017 um 22:27 schrieb Jeff King: > Grepping for "list_append.*detach" shows a few other possible cases in > transport-helper.c, which I think are leaks. -- >8 -- Subject: [PATCH] transport-helper: plug strbuf and string_list leaks Transfer ownership of detached strbufs to string_lists of the duplicating variety by calling string_list_append_nodup() instead of string_list_append() to avoid duplicating and then leaking the buffer. While at it make sure to release the string_list when done; push_refs_with_export() already does that. Reported-by: Jeff King Signed-off-by: Rene Scharfe --- transport-helper.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/transport-helper.c b/transport-helper.c index bf05a2dcf1..f682e7c534 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport, struct strbuf cas = STRBUF_INIT; strbuf_addf(&cas, "%s:%s", ref->name, oid_to_hex(&ref->old_oid_expect)); - string_list_append(&cas_options, strbuf_detach(&cas, NULL)); + string_list_append_nodup(&cas_options, +strbuf_detach(&cas, NULL)); } } if (buf.len == 0) { @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport, strbuf_addch(&buf, '\n'); sendline(data, &buf); strbuf_release(&buf); + string_list_release(&cas_options, 0); return push_update_refs_status(data, remote_refs, flags); } @@ -930,7 +932,8 @@ static int push_refs_with_export(struct transport *transport, private = apply_refspecs(data->refspecs, data->refspec_nr, ref->name); if (private && !get_oid(private, &oid)) { strbuf_addf(&buf, "^%s", private); - string_list_append(&revlist_args, strbuf_detach(&buf, NULL)); + string_list_append_nodup(&revlist_args, +strbuf_detach(&buf, NULL)); oidcpy(&ref->old_oid, &oid); } free(private); -- 2.15.1
Re: [PATCH v5 4/4] builtin/branch: strip refs/heads/ using skip_prefix
On Friday 08 December 2017 04:44 AM, Junio C Hamano wrote: Junio C Hamano writes: Somehow this fell underneath my radar horizon. I see v4 and v5 of 4/4 but do not seem to find 1-3/4. Is this meant to be a standalone patch, or am I expected to already have 1-3 that we already are committed to take? Ah, I am guessing that this would apply on top of 1-3/4 in the thread with <20171118172648.17918-1-kaartic.sivar...@gmail.com> You guessed right; at the right time. I was about to ask why this got "out of your radar" in reply to your recent "What's cooking" email :-) The base of the series seems to predate 16169285 ("Merge branch 'jc/branch-name-sanity'", 2017-11-28), so let me see how it looks by applying those three plus this one on top of 'master' before that point. Let me know if this has terrible conflicts so that I can rebase the series on top of 'master'. Thanks, Kaartic
Re: [PATCH Outreachy 1/2] format: create pretty.h file
Olga Telezhnaya writes: > -extern void get_commit_format(const char *arg, struct rev_info *); > -extern const char *format_subject(struct strbuf *sb, const char *msg, > - const char *line_separator); > -extern void userformat_find_requirements(const char *fmt, struct > userformat_want *w); > -extern int commit_format_is_empty(enum cmit_fmt); > extern const char *skip_blank_lines(const char *msg); > -extern void format_commit_message(const struct commit *commit, > - const char *format, struct strbuf *sb, > - const struct pretty_print_context *context); > -extern void pretty_print_commit(struct pretty_print_context *pp, > - const struct commit *commit, > - struct strbuf *sb); > -extern void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit, > -struct strbuf *sb); > -void pp_user_info(struct pretty_print_context *pp, > - const char *what, struct strbuf *sb, > - const char *line, const char *encoding); > -void pp_title_line(struct pretty_print_context *pp, > -const char **msg_p, > -struct strbuf *sb, > -const char *encoding, > -int need_8bit_cte); > -void pp_remainder(struct pretty_print_context *pp, > - const char **msg_p, > - struct strbuf *sb, > - int indent); > ... > +void userformat_find_requirements(const char *fmt, struct userformat_want > *w); > +void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit, > + struct strbuf *sb); > +void pp_user_info(struct pretty_print_context *pp, const char *what, > + struct strbuf *sb, const char *line, > + const char *encoding); > +void pp_title_line(struct pretty_print_context *pp, const char **msg_p, > + struct strbuf *sb, const char *encoding, > + int need_8bit_cte); > +void pp_remainder(struct pretty_print_context *pp, const char **msg_p, > + struct strbuf *sb, int indent); > + > +void format_commit_message(const struct commit *commit, > + const char *format, struct strbuf *sb, > + const struct pretty_print_context *context); > + > +void get_commit_format(const char *arg, struct rev_info *); > + > +void pretty_print_commit(struct pretty_print_context *pp, > + const struct commit *commit, > + struct strbuf *sb); > + > +const char *format_subject(struct strbuf *sb, const char *msg, > + const char *line_separator); > + > +int commit_format_is_empty(enum cmit_fmt); I see you've "standardized" to drop "extern" from the declarations in the header; I have an impression that our preference however is to go in the other direction. The choice of bits that are moved to the new header looks quite sensible to me. Thanks.
Re: [PATCH 1/2] git version --build-options: report the build platform, too
Jonathan Nieder writes: >> @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd) >> >> int cmd_version(int argc, const char **argv, const char *prefix) >> { >> +static char build_platform[] = GIT_BUILD_PLATFORM; >> int build_options = 0; >> const char * const usage[] = { >> N_("git version []"), >> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char >> *prefix) >> >> if (build_options) { >> printf("sizeof-long: %d\n", (int)sizeof(long)); >> +printf("machine: %s\n", build_platform); > > Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection > of a mutable static string? That is, something like > > printf("machine: %s\n", GIT_BUILD_PLATFORM); Good point. And if this is externally identified as "machine", probably the macro should also use the same word, not "platform". We can go either way, as long as we are consistent, though. >> --- a/help.h >> +++ b/help.h >> @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct >> cmdnames *main_cmds, stru >> */ >> extern void help_unknown_ref(const char *ref, const char *cmd, const char >> *error); >> #endif /* HELP_H */ >> + >> +/* >> + * identify build platform >> + */ >> +#ifndef GIT_BUILD_PLATFORM >> +#if defined __x86__ || defined __i386__ || defined __i586__ || defined >> __i686__ >> +#define GIT_BUILD_PLATFORM "x86" >> +#elif defined __x86_64__ >> +#define GIT_BUILD_PLATFORM "x86_64" >> +#else >> +#define GIT_BUILD_PLATFORM "unknown" >> +#endif >> +#endif > > This code needs to be inside the HELP_H header guard. Certainly. Thanks.
[PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows
From: Torsten Bögershausen The new MIX tests don't pass under Windows, adapt them to use the correct native line ending. Signed-off-by: Torsten Bögershausen --- Sorry for the breakage. This needs to go on top of tb/check-crlf-for-safe-crlf t/t0027-auto-crlf.sh | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index 97154f5c79..8d929b76dc 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -170,22 +170,22 @@ commit_MIX_chkwrn () { git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err" done - test_expect_success "commit file with mixed EOL crlf=$crlf attr=$attr LF" ' + test_expect_success "commit file with mixed EOL onto LF crlf=$crlf attr=$attr" ' check_warning "$lfwarn" ${pfx}_LF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF" ' + test_expect_success "commit file with mixed EOL onto CLRF attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$crlfwarn" ${pfx}_CRLF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF_mix_LF" ' + test_expect_success "commit file with mixed EOL onto CRLF_mix_LF attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf LF_mix_cr" ' + test_expect_success "commit file with mixed EOL onto LF_mix_cr attr=$attr aeol=$aeol crlf=$crlf " ' check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err ' - test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol crlf=$crlf CRLF_nul" ' + test_expect_success "commit file with mixed EOL onto CRLF_nul attr=$attr aeol=$aeol crlf=$crlf" ' check_warning "$crlfnul" ${pfx}_CRLF_nul.err ' } @@ -378,7 +378,7 @@ test_expect_success 'setup master' ' printf "\$Id: \$\r\nLINEONE\r\nLINETWO\rLINETHREE" >CRLF_mix_CR && printf "\$Id: \$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul && printf "\$Id: \$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul && - create_NNO_MIX_files CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF && + create_NNO_MIX_files && git -c core.autocrlf=false add NNO_*.txt MIX_*.txt && git commit -m "mixed line endings" && test_tick @@ -441,13 +441,14 @@ test_expect_success 'commit files attr=crlf' ' ' # Commit "CRLFmixLF" on top of these files already in the repo: -# LF, CRLF, CRLFmixLF LF_mix_CR CRLFNULL +# mixed mixed mixed mixed mixed +# onto onto ontoonto onto # attrLFCRLF CRLFmixLF LF_mix_CR CRLFNUL commit_MIX_chkwrn "" "" false """""" "" "" commit_MIX_chkwrn "" "" true"LF_CRLF" """" "LF_CRLF" "LF_CRLF" commit_MIX_chkwrn "" "" input "CRLF_LF" """" "CRLF_LF" "CRLF_LF" -commit_MIX_chkwrn "auto" "" false "CRLF_LF" """" "CRLF_LF" "CRLF_LF" +commit_MIX_chkwrn "auto" "" false "$WAMIX" """" "$WAMIX""$WAMIX" commit_MIX_chkwrn "auto" "" true"LF_CRLF" """" "LF_CRLF" "LF_CRLF" commit_MIX_chkwrn "auto" "" input "CRLF_LF" """" "CRLF_LF" "CRLF_LF" -- 2.15.1.271.g1a4e40aa5d
[PATCH v1 1/2] t0027: Don't use git commit
From: Torsten Bögershausen Replace `git commit -m "comment" ""` with `git commit -m "comment"` to remove the empty path spec. Signed-off-by: Torsten Bögershausen --- t/t0027-auto-crlf.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh index c2c208fdcd..97154f5c79 100755 --- a/t/t0027-auto-crlf.sh +++ b/t/t0027-auto-crlf.sh @@ -370,7 +370,7 @@ test_expect_success 'setup master' ' echo >.gitattributes && git checkout -b master && git add .gitattributes && - git commit -m "add .gitattributes" "" && + git commit -m "add .gitattributes" && printf "\$Id: \$\nLINEONE\nLINETWO\nLINETHREE" >LF && printf "\$Id: \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF && printf "\$Id: \$\nLINEONE\r\nLINETWO\nLINETHREE" >CRLF_mix_LF && -- 2.15.1.271.g1a4e40aa5d
Re: [PATCH 2/2] version --build-options: report commit, too, if possible
Jonathan Nieder writes: >> We need to be careful, though, to report when the current commit cannot be >> determined, e.g. when building from a tarball without any associated Git >> repository. > > This means that on Debian, it would always print > > built from commit: (unknown) > > Maybe I shouldn't care, but I wonder if there's a way to improve on > that. E.g. should there be a makefile knob to allow Debian to specify > what to put there? Another "interesting" possibility is to build from a tarball extracted into a directory hierarchy that is controlled by an unrelated Git repository. E.g. "my $HOME is under $HOME/.git repository, and then I have a tarball extract in $HOME/src/git". We shouldn't embed the HEAD commit of that $HOME directory project in the resulting executable in such a case. We should be able to do this by being a bit more careful than the presented patch. Make sure that the toplevel is at the same directory as we assumed to be (i.e. where we found that Makefile) and trust rev-parse output only when that is the case, or something like that.
Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
Jeff Hostetler writes: > From: Jeff Hostetler > > This is V7 of part 3 of partial clone. It builds upon V7 of part 2 > (which builds upon V6 of part 1). Aren't the three patches at the bottom sort-of duplicate from the part 2 series?
Re: [PATCH] doc: reword gitworflows for neutrality
On Fri, Dec 8, 2017 at 10:18 AM, Daniel Bensoussan wrote: > doc: reword gitworflows for neutrality s/gitworflows/gitworkflows/ > Changed 'he' to 'them' to be more neutral in "gitworkflows.txt". > > See discussion at: > https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/ > > Signed-off-by: Matthieu Moy > Signed-off-by: Timothee Albertin > Signed-off-by: Nathan Payre > Signed-off-by: Daniel Bensoussan > --- > diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt > @@ -407,8 +407,8 @@ follows. > -Occasionally, the maintainer may get merge conflicts when he tries to > -pull changes from downstream. In this case, he can ask downstream to > +Occasionally, the maintainer may get merge conflicts when they try to > +pull changes from downstream. In this case, they can ask downstream to As a native English speaker, I find the new phrasing odd, and think this may a step backward. How about trying a different approach? For example: Occasionally, the maintainer may get merge conflicts when trying to pull changes from downstream. In this case, it may make sense to ask downstream to do the merge and resolve the conflicts instead (since, presumably, downstream will know better how to resolve them). > do the merge and resolve the conflicts themselves (perhaps they will > know better how to resolve them). It is one of the rare cases where > downstream 'should' merge from upstream.
Re: [WIP 01/15] pkt-line: introduce packet_read_with_status
On 12/07, Stefan Beller wrote: > On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams wrote: > > > diff --git a/pkt-line.h b/pkt-line.h > > index 3dad583e2..f1545929b 100644 > > --- a/pkt-line.h > > +++ b/pkt-line.h > > @@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t > > len, int fd_out); > > * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if > > * present) is removed from the buffer before returning. > > */ > > +enum packet_read_status { > > + PACKET_READ_ERROR = -1, > > + PACKET_READ_NORMAL, > > + PACKET_READ_FLUSH, > > +}; > > #define PACKET_READ_GENTLE_ON_EOF (1u<<0) > > #define PACKET_READ_CHOMP_NEWLINE (1u<<1) > > +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > > size_t *src_len, > > + char *buffer, unsigned > > size, int *pktlen, > > + int options); > > int packet_read(int fd, char **src_buffer, size_t *src_len, char > > *buffer, unsigned size, int options); > > The documentation that is preceding these lines is very specific to > packet_read, e.g. > > If options does contain PACKET_READ_GENTLE_ON_EOF, > we will not die on condition 4 (truncated input), but instead return -1 > > which doesn't hold true for the _status version. Can you adapt the comment > to explain both read functions? Good point, I'll makes changes and document the _status version separately. -- Brandon Williams
Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
On 12/8/2017 12:58 PM, Junio C Hamano wrote: Jeff Hostetler writes: From: Jeff Hostetler This is V7 of part 3 of partial clone. It builds upon V7 of part 2 (which builds upon V6 of part 1). Aren't the three patches at the bottom sort-of duplicate from the part 2 series? oops. yes, you're right. it looks like i selected pc*6*_p2..pc7_p3 rather than pc*7*_p2..pc7_p3. sorry for the typo. and since the only changes in p2 were to squash those 2 commits near the tip of p2, only those 3 commits changed SHAs in v7 over v6. so, please disregard the duplicates. would you like me to send a corrected V8 for p3 ? Jeff
Re: [WIP 02/15] pkt-line: introduce struct packet_reader
On 12/07, Stefan Beller wrote: > On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams wrote: > > Sometimes it is advantageous to be able to peek the next packet line > > without consuming it (e.g. to be able to determine the protocol version > > a server is speaking). In order to do that introduce 'struct > > packet_reader' which is an abstraction around the normal packet reading > > logic. This enables a caller to be able to peek a single line at a time > > using 'packet_reader_peek()' and having a caller consume a line by > > calling 'packet_reader_read()'. > > > > Signed-off-by: Brandon Williams > > --- > > pkt-line.c | 61 > > + > > pkt-line.h | 20 > > 2 files changed, 81 insertions(+) > > > > diff --git a/pkt-line.c b/pkt-line.c > > index ac619f05b..518109bbe 100644 > > --- a/pkt-line.c > > +++ b/pkt-line.c > > @@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct > > strbuf *sb_out) > > } > > return sb_out->len - orig_len; > > } > > + > > +/* Packet Reader Functions */ > > +void packet_reader_init(struct packet_reader *reader, int fd, > > + char *src_buffer, size_t src_len) > > +{ > > + reader->fd = fd; > > + reader->src_buffer = src_buffer; > > + reader->src_len = src_len; > > + reader->buffer = packet_buffer; > > + reader->buffer_size = sizeof(packet_buffer); > > + reader->options = PACKET_READ_CHOMP_NEWLINE | > > PACKET_READ_GENTLE_ON_EOF; > > I assume the future user of this packet reader will need exactly > these options coincidentally. ;) > > I think it might be ok for now and later we can extend the reader if needed > to also take the flags as args. However given this set of args, this is a > gentle > packet reader, as it corresponds to the _gently version of reading > packets AFAICT. > > Unlike the pkt_read function this constructor of a packet reader doesn't need > the arguments for its buffer (packet_buffer and sizeof thereof), which > packet_read > unfortunately needs. We pass in packet_buffer all the time except in > builtin/receive-pack > for obtaining the gpg cert. I think that's ok here. Yeah, all of the callers I wanted to migrate all passed the same flags and same buffer. I figured there may be a point in the future where we'd want to extend this so instead of hard coding the flags in the read functions, I did so in the constructor. That way if we wanted to extend it in the future, it would be a little bit easier. > > > +enum packet_read_status packet_reader_read(struct packet_reader *reader) > > +{ > > + if (reader->line_peeked) { > > + reader->line_peeked = 0; > > + return reader->status; > > + } > > + > > + reader->status = packet_read_with_status(reader->fd, > > +&reader->src_buffer, > > +&reader->src_len, > > +reader->buffer, > > +reader->buffer_size, > > +&reader->pktlen, > > +reader->options); > > + > > + switch (reader->status) { > > + case PACKET_READ_ERROR: > > + reader->pktlen = -1; > > In case of error the read function might not > have assigned the pktlen as requested, so we assign > it to -1/NULL here. Though the caller ought to already know > that they handle bogus, as the state is surely the first thing > they'd inspect? Exactly, I thought it would be easier to ensure that pktlen and line were had consistent state no matter what the output of the read was. But yes, callers should be looking at the status to determine what to do. > > > + reader->line = NULL; > > + break; > > + case PACKET_READ_NORMAL: > > + reader->line = reader->buffer; > > + break; > > + case PACKET_READ_FLUSH: > > + reader->pktlen = 0; > > + reader->line = NULL; > > + break; > > + } > > Oh, this gives an interesting interface for someone who is > just inspecting the len/line instead of the state, so it might be > worth keeping it this way. > > Can we have an API documentation in the header file, > explaining what to expect in each field given the state > of the (read, peaked) packet? Yep I can write some better API docs for these functions. -- Brandon Williams
Re: [PATCH Outreachy 1/2] format: create pretty.h file
On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya wrote: > Create header for pretty.c to make formatting interface more structured. > This is a middle point, this file would be merged futher with other s/futher/further/ > files which contain formatting stuff. > > Signed-off-by: Olga Telezhnaia > Mentored-by: Christian Couder > Mentored by: Jeff King
Re: [PATCH v1 1/2] t0027: Don't use git commit
tbo...@web.de writes: > From: Torsten Bögershausen > > Replace `git commit -m "comment" ""` with `git commit -m "comment"` to > remove the empty path spec. > > Signed-off-by: Torsten Bögershausen > --- > t/t0027-auto-crlf.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) This looks a bit strange. The intent seems to commit all changes made in the working tree, so I'd understand it if it replaced the empty string with a single dot. I also thought that we deprecated use of an empty string as "match all" pathspec recently, and expected that this to break. ... goes and looks ... Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec element", 2017-06-23) does exactly that. > diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh > index c2c208fdcd..97154f5c79 100755 > --- a/t/t0027-auto-crlf.sh > +++ b/t/t0027-auto-crlf.sh > @@ -370,7 +370,7 @@ test_expect_success 'setup master' ' > echo >.gitattributes && > git checkout -b master && > git add .gitattributes && > - git commit -m "add .gitattributes" "" && > + git commit -m "add .gitattributes" && > printf "\$Id: > \$\nLINEONE\nLINETWO\nLINETHREE" >LF && > printf "\$Id: > \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF && > printf "\$Id: > \$\nLINEONE\r\nLINETWO\nLINETHREE" >CRLF_mix_LF &&
Re: [PATCH v1 1/2] t0027: Don't use git commit
Junio C Hamano writes: > tbo...@web.de writes: > >> From: Torsten Bögershausen >> >> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to >> remove the empty path spec. >> >> Signed-off-by: Torsten Bögershausen >> --- >> t/t0027-auto-crlf.sh | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) > > This looks a bit strange. The intent seems to commit all changes > made in the working tree, so I'd understand it if it replaced the > empty string with a single dot. > > I also thought that we deprecated use of an empty string as "match > all" pathspec recently, and expected that this to break. > > ... goes and looks ... > > Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec > element", 2017-06-23) does exactly that. OK, I think I can safely omit this patch, because (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf, because ex/deprecate-empty-pathspec-as-match-all is not yet in the topic, an empty pathspec still means "match all" and nothing breaks; and (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any branch with ex/deprecate-empty-pathspec-as-match-all, the topic already fixes what this 1/2 tries to Thanks for being thorough, though.
Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests
Jeff Hostetler writes: > On 12/8/2017 12:58 PM, Junio C Hamano wrote: >> Jeff Hostetler writes: >> >>> From: Jeff Hostetler >>> >>> This is V7 of part 3 of partial clone. It builds upon V7 of part 2 >>> (which builds upon V6 of part 1). >> >> Aren't the three patches at the bottom sort-of duplicate from the >> part 2 series? >> > > oops. yes, you're right. it looks like i selected pc*6*_p2..pc7_p3 > rather than pc*7*_p2..pc7_p3. sorry for the typo. > > and since the only changes in p2 were to squash those 2 commits near > the tip of p2, only those 3 commits changed SHAs in v7 over v6. > > so, please disregard the duplicates. > > would you like me to send a corrected V8 for p3 ? Nah. I just wanted to make sure that I am discarding the right ones (i.e. 1-3/16 of partial-clone, not 8-10/10 of fsck-promisors). Thanks for an update.
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
René Scharfe writes: >> I'm not sure it's string-list's fault. Many callers (including this one) >> ... > The two modes (dup/nodup) make string_list code tricky. Not sure > how far we'd get with something simpler (e.g. an array of char pointers), > but having the caller do all string allocations would make the code > easier to analyze. Yes. It probably would have been more sensible if the API did not have two modes (instead, have the caller pass whatever string to be stored, *and* make the caller responsible for freeing them *if* it passed an allocated string). For the push_refs_with_push() patch you sent, another possible fix would be to make cas_options a nodup kind so that the result of strbuf_detach() does not get an extra strdup to be lost when placed in cas_options. With the current string-list API that would not quite work, because freeing done in _release() is tied to the "dup/nodup" ness of the string list. I think there even is a codepath that initializes a string_list as nodup kind, stuffs string in it giving the ownership, and then flips it into dup kind just before calling _release() only to have it free the strings, or something silly/ugly like that. In any case, the patch looks sensible. Thanks for plugging the leaks.
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
René Scharfe writes: > Am 07.12.2017 um 22:27 schrieb Jeff King: >> Grepping for "list_append.*detach" shows a few other possible cases in >> transport-helper.c, which I think are leaks. > > -- >8 -- > Subject: [PATCH] transport-helper: plug strbuf and string_list leaks > > Transfer ownership of detached strbufs to string_lists of the > duplicating variety by calling string_list_append_nodup() instead of > string_list_append() to avoid duplicating and then leaking the buffer. > > While at it make sure to release the string_list when done; > push_refs_with_export() already does that. > > Reported-by: Jeff King > Signed-off-by: Rene Scharfe > --- > transport-helper.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/transport-helper.c b/transport-helper.c > index bf05a2dcf1..f682e7c534 100644 > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport > *transport, > struct strbuf cas = STRBUF_INIT; > strbuf_addf(&cas, "%s:%s", > ref->name, > oid_to_hex(&ref->old_oid_expect)); > - string_list_append(&cas_options, strbuf_detach(&cas, > NULL)); > + string_list_append_nodup(&cas_options, > + strbuf_detach(&cas, NULL)); > } > } > if (buf.len == 0) { > @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport > *transport, > strbuf_addch(&buf, '\n'); > sendline(data, &buf); > strbuf_release(&buf); > + string_list_release(&cas_options, 0); There is no such function; you meant _clear() perhaps?
Re: [PATCH v1 1/2] t0027: Don't use git commit
On Fri, Dec 08, 2017 at 10:21:19AM -0800, Junio C Hamano wrote: > Junio C Hamano writes: > > > tbo...@web.de writes: > > > >> From: Torsten Bögershausen > >> > >> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to > >> remove the empty path spec. > >> > >> Signed-off-by: Torsten Bögershausen > >> --- > >> t/t0027-auto-crlf.sh | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > > > This looks a bit strange. The intent seems to commit all changes > > made in the working tree, so I'd understand it if it replaced the > > empty string with a single dot. > > > > I also thought that we deprecated use of an empty string as "match > > all" pathspec recently, and expected that this to break. > > > > ... goes and looks ... > > > > Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec > > element", 2017-06-23) does exactly that. > > OK, I think I can safely omit this patch, because > > (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf, > because ex/deprecate-empty-pathspec-as-match-all is not yet in > the topic, an empty pathspec still means "match all" and > nothing breaks; and > > (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any > branch with ex/deprecate-empty-pathspec-as-match-all, the topic > already fixes what this 1/2 tries to > > Thanks for being thorough, though. > Sure, the credit goes 100% to you: commit 229a95aafa77b583b46a3156b4fad469c264ddfd Author: Junio C Hamano Date: Fri Jun 23 11:04:14 2017 -0700 t0027: do not use an empty string as a pathspec element My brain did just assume that this had mad it to master, sorry for the noise
Re: [PATCH Outreachy 2/2] format: create docs for pretty.h
On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya wrote: > Write some docs for functions in pretty.h. > Take it as a first draft, they would be changed later. > > Signed-off-by: Olga Telezhnaia > Mentored-by: Christian Couder > Mentored by: Jeff King > --- > diff --git a/pretty.h b/pretty.h > @@ -57,31 +58,74 @@ struct userformat_want { > +/* > + * Create a text message about commit using given "format" and "context". > + * Put the result to "sb". > + * Please use this function for custom formats. > + */ > void format_commit_message(const struct commit *commit, > const char *format, struct strbuf *sb, > const struct pretty_print_context *context); > > +/* > + * Parse given arguments from "arg", check it for correctness and > + * fill struct rev_info. To be consistent with the way you formatted the other comments, I think you'd want quotes around rev_info. > + */ > void get_commit_format(const char *arg, struct rev_info *);
[PATCH] Partial clone design document
From: Jeff Hostetler This patch contains a design document that Jonathan Tan and I have been working on that describes the partial clone feature currently under development. Since edits to this document are independent of the code, I did not include in the part 1,2,3 patch series. Please let us know if there are any major sections we should add or any areas that need clarification. Thanks! Jeff Hostetler (1): partial-clone: design doc Documentation/technical/partial-clone.txt | 240 ++ 1 file changed, 240 insertions(+) create mode 100644 Documentation/technical/partial-clone.txt -- 2.9.3
[PATCH] partial-clone: design doc
From: Jeff Hostetler First draft of design document for partial clone feature. Signed-off-by: Jeff Hostetler Signed-off-by: Jonathan Tan --- Documentation/technical/partial-clone.txt | 240 ++ 1 file changed, 240 insertions(+) create mode 100644 Documentation/technical/partial-clone.txt diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt new file mode 100644 index 000..7ab39d8 --- /dev/null +++ b/Documentation/technical/partial-clone.txt @@ -0,0 +1,240 @@ +Partial Clone Design Notes +== + +The "Partial Clone" feature is a performance optimization for git that +allows git to function without having a complete copy of the repository. + +During clone and fetch operations, git normally downloads the complete +contents and history of the repository. That is, during clone the client +receives all of the commits, trees, and blobs in the repository into a +local ODB. Subsequent fetches extend the local ODB with any new objects. +For large repositories, this can take significant time to download and +large amounts of diskspace to store. + +The goal of this work is to allow git better handle extremely large +repositories. Often in these repositories there are many files that the +user does not need such as ancient versions of source files, files in +portions of the worktree outside of the user's work area, or large binary +assets. If we can avoid downloading such unneeded objects *in advance* +during clone and fetch operations, we can decrease download times and +reduce ODB disk usage. + + +Non-Goals +- + +Partial clone is independent of and not intended to conflict with +shallow-clone, refspec, or limited-ref mechanisms since these all operate +at the DAG level whereas partial clone and fetch works *within* the set +of commits already chosen for download. + + +Design Overview +--- + +Partial clone logically consists of the following parts: + +- A mechanism for the client to describe unneeded or unwanted objects to + the server. + +- A mechanism for the server to omit such unwanted objects from packfiles + sent to the client. + +- A mechanism for the client to gracefully handle missing objects (that + were previously omitted by the server). + +- A mechanism for the client to backfill missing objects as needed. + + +Design Details +-- + +- A new pack-protocol capability "filter" is added to the fetch-pack and + upload-pack negotiation. + + This uses the existing capability discovery mechanism. + See "filter" in Documentation/technical/pack-protocol.txt. + +- Clients pass a "filter-spec" to clone and fetch which is passed to the + server to request filtering during packfile construction. + + There are various filters available to accomodate different situations. + See "--filter=" in Documentation/rev-list-options.txt. + +- On the server pack-objects applies the requested filter-spec as it + creates "filtered" packfiles for the client. + + These filtered packfiles are incomplete in the traditional sense because + they may contain trees that reference blobs that the client does not have. + + + How the local repository gracefully handles missing objects + +With partial clone, the fact that objects can be missing makes such +repositories incompatible with older versions of Git, necessitating a +repository extension (see the documentation of "extensions.partialClone" +for more information). + +An object may be missing due to a partial clone or fetch, or missing due +to repository corruption. To differentiate these cases, the local +repository specially indicates packfiles obtained from the promisor +remote. These "promisor packfiles" consist of a ".promisor" file +with arbitrary contents (like the ".keep" files), in addition to +their ".pack" and ".idx" files. (In the future, this ability +may be extended to loose objects[a].) + +The local repository considers a "promisor object" to be an object that +it knows (to the best of its ability) that the promisor remote has, +either because the local repository has that object in one of its +promisor packfiles, or because another promisor object refers to it. Git +can then check if the missing object is a promisor object, and if yes, +this situation is common and expected. This also means that there is no +need to explicitly maintain an expensive-to-modify list of missing +objects on the client. + +Almost all Git code currently expects any objects referred to by other +objects to be present. Therefore, a fallback mechanism is added: +whenever Git attempts to read an object that is found to be missing, it +will attempt to fetch it from the promisor remote, expanding the subset +of objects available locally, then reattempt the read. This allows +objects to be "faulted in" from the promisor remote without complicated +prediction algorithms. For efficiency reasons, no check as to whether +the missing object is a promisor o
Re: [WIP 03/15] pkt-line: add delim packet support
On 12/07, Stefan Beller wrote: > On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams wrote: > > One of the design goals of protocol-v2 is to improve the semantics of > > flush packets. Currently in protocol-v1, flush packets are used both to > > indicate a break in a list of packet lines as well as an indication that > > one side has finished speaking. This makes it particularly difficult > > to implement proxies as a proxy would need to completely understand git > > protocol instead of simply looking for a flush packet. > > > > To do this, introduce the special deliminator packet '0001'. A delim > > packet can then be used as a deliminator between lists of packet lines > > while flush packets can be reserved to indicate the end of a response. > > > > Signed-off-by: Brandon Williams > > I presume the update for Documentation/technical/* comes at a later patch in > the > series, clarifying the exact semantic difference between the packet types? Yeah, currently there isn't a use for the delim packet but there will be one when v2 is introduced. -- Brandon Williams
Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()
Am 08.12.2017 um 19:44 schrieb Junio C Hamano: > René Scharfe writes: > >> Am 07.12.2017 um 22:27 schrieb Jeff King: >>> Grepping for "list_append.*detach" shows a few other possible cases in >>> transport-helper.c, which I think are leaks. >> >> -- >8 -- >> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks >> >> Transfer ownership of detached strbufs to string_lists of the >> duplicating variety by calling string_list_append_nodup() instead of >> string_list_append() to avoid duplicating and then leaking the buffer. >> >> While at it make sure to release the string_list when done; >> push_refs_with_export() already does that. >> >> Reported-by: Jeff King >> Signed-off-by: Rene Scharfe >> --- >> transport-helper.c | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/transport-helper.c b/transport-helper.c >> index bf05a2dcf1..f682e7c534 100644 >> --- a/transport-helper.c >> +++ b/transport-helper.c >> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport >> *transport, >> struct strbuf cas = STRBUF_INIT; >> strbuf_addf(&cas, "%s:%s", >> ref->name, >> oid_to_hex(&ref->old_oid_expect)); >> -string_list_append(&cas_options, strbuf_detach(&cas, >> NULL)); >> +string_list_append_nodup(&cas_options, >> + strbuf_detach(&cas, NULL)); >> } >> } >> if (buf.len == 0) { >> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport >> *transport, >> strbuf_addch(&buf, '\n'); >> sendline(data, &buf); >> strbuf_release(&buf); >> +string_list_release(&cas_options, 0); > > There is no such function; you meant _clear() perhaps? Yes, of course, I'm sorry. Not sure what happened there. O_o René
Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads
On 12/07, Junio C Hamano wrote: > Brandon Williams writes: > > > While we could wrap the preamble into a function it sort of defeats the > > purpose since you want to be able to call different functions based on > > the protocol version you're speaking. That way you can have hard > > separations between the code paths which operate on v0/v1 and v2. > > As I envision that the need for different processing across protocol > versions in real code would become far greater than it would fit in > cases within a single switch() statement, I imagined that the reader > structure would have a field that says which version of the protocol > it is, so that the caller of the preamble thing can inspect it later > to switch on it. > Yes, patch 9 changes this so that the protocol version is stored in the transport structure. This way the fetch and push code can know what to do in v2. -- Brandon Williams
Re: [WIP 04/15] upload-pack: convert to a builtin
On 12/06, Junio C Hamano wrote: > Brandon Williams writes: > > > In order to allow for code sharing with the server-side of fetch in > > protocol-v2 convert upload-pack to be a builtin. > > > > Signed-off-by: Brandon Williams > > --- > > This looks obvious and straight-forward to a cursory look. > > I vaguely recalled and feared that we on purpose kept this program > separate from the rest of the system for a reason, but my mailing > list search is coming up empty. > > > Makefile | 3 ++- > > builtin.h | 1 + > > git.c | 1 + > > upload-pack.c | 2 +- > > 4 files changed, 5 insertions(+), 2 deletions(-) > > ... > > diff --git a/upload-pack.c b/upload-pack.c > > index ef99a029c..2d16952a3 100644 > > --- a/upload-pack.c > > +++ b/upload-pack.c > > @@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const > > char *value, void *unused) > > return parse_hide_refs_config(var, value, "uploadpack"); > > } > > > > -int cmd_main(int argc, const char **argv) > > +int cmd_upload_pack(int argc, const char **argv, const char *prefix) > > { > > const char *dir; > > int strict = 0; > > Shouldn't this file be moved to builtin/ directory, though? I can definitely move the file to builtin if you would prefer. -- Brandon Williams
Re: [PATCH] partial-clone: design doc
Jeff Hostetler writes: > From: Jeff Hostetler > > First draft of design document for partial clone feature. > > Signed-off-by: Jeff Hostetler > Signed-off-by: Jonathan Tan > --- Thanks. > +Non-Goals > +- > + > +Partial clone is independent of and not intended to conflict with > +shallow-clone, refspec, or limited-ref mechanisms since these all operate > +at the DAG level whereas partial clone and fetch works *within* the set > +of commits already chosen for download. It probably is not a huge deal (simply because it is about "Non-Goals") but I have no idea what "refspec" and "limited-ref mechanism" refer to in the above sentence, and I suspect many others share the same puzzlement. > +An object may be missing due to a partial clone or fetch, or missing due > +to repository corruption. To differentiate these cases, the local > +repository specially indicates packfiles obtained from the promisor > +remote. These "promisor packfiles" consist of a ".promisor" file > +with arbitrary contents (like the ".keep" files), in addition to > +their ".pack" and ".idx" files. (In the future, this ability > +may be extended to loose objects[a].) > + ... > +Foot Notes > +-- > + > +[a] Remembering that loose objects are promisor objects is mainly > +important for trees, since they may refer to promisor blobs that > +the user does not have. We do not need to mark loose blobs as > +promisor because they do not refer to other objects. I fail to see any logical link between the "loose" and "tree". Putting it differently, I do not see why "tree" is so special. A promisor pack that contains a tree but lacks blobs the tree refers to would be sufficient to let us remember that these missing blobs are not corruption. A loose commit or a tag that is somehow marked as obtained from a promisor, if it can serve just like a commit or a tag in a promisor pack to promise its direct pointee, would equally be useful (if very inefficient). In any case, I suspect "since they may refer to promisor blobs" is a typo of "since they may refer to promised blobs". > +- Currently, dynamic object fetching invokes fetch-pack for each item > + because most algorithms stumble upon a missing object and need to have > + it resolved before continuing their work. This may incur significant > + overhead -- and multiple authentication requests -- if many objects are > + needed. > + > + We need to investigate use of a long-running process, such as proposed > + in [5,6] to reduce process startup and overhead costs. Also perhaps in some operations we can enumerate the objects we will need upfront and ask for them in one go (e.g. "git log -p A..B" may internally want to do "rev-list --objects A..B" to enumerate trees and blobs that we may lack upfront). I do not think having the other side guess is a good idea, though. > +- We currently only promisor packfiles. We need to add support for > + promisor loose objects as described earlier. The earlier description was not convincing enough to feel the need to me; at least not yet.
Re: [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader
On 12/06, Junio C Hamano wrote: > Brandon Williams writes: > > > EXPECTING_DONE sounded like we are expecting to see 'done' packet > sent from the other side, but I was mistaken. It is the state > where we are "done" expecting anything ;-). > > Having an (unconditional) assignment to 'state' in the above switch > makes me feel somewhat uneasy, as the next "switch (state)" is what > is meant as the state machine that would allow us to say things like > "from this state, transition to that state is impossible". When we > get a flush while we are expecting the first ref, for example, we'd > just go into the "done" state. There is no provision for a future > update to say "no, getting a flush in this state is an error". I believe this is accepted behavior, receiving a flush in that state. And I don't think there is ever an instance during the ref advertisement where a flush would be an error. It just indicates that the advertisement is finished. > > That is no different from the current code; when read_remote_ref() > notices that it got a flush, it just leaves the loop without even > touching 'state' variable. But at least, I find that the current > code is more honest---it does not even touch 'state' and allows the > code after the loop to inspect it, if needed. From that point of > vhew, the way the new code uses 'state' to leave the loop upon > seeing a flush is a regression---it makes it harder to notice and > report when we got a flush in a wrong state. > > Perhaps getting rid of "EXPECTING_DONE" from the enum and then: > > int got_flush = 0; > while (1) { > switch (reader_read()) { > case PACKET_READ_FLUSH: > got_flush = 1; > break; > ... other cases ... > } > > if (got_flush) > break; > > switch (state) { > ... current code ... > } > } > > would be an improvement; we can later extend "if (got_flush)" part > to check what state we are in if we wanted to notice and report an > error there before breaking out of the loop. > I don't really see how this is any clearer from what this patch does though. I thought it made it easier to read as you no longer have an infinite loop, but rather know when it will end (when you move to the done state). -- Brandon Williams
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote: > On 08/12/17 09:34, Jeff King wrote: > > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote: > [snip] > >> Junio hinted at a different approach of solving this problem, which this > >> patch implements. Teach the diff machinery another flag for restricting > >> the information to what is shown. For example: > >> > >> $ ./git log --oneline --blobfind=v2.0.0:Makefile > >> b2feb64309 Revert the whole "ask curl-config" topic for now > >> 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" > >> > >> we observe that the Makefile as shipped with 2.0 was introduced in > >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by > >> a different blob. > > Sorry, this has nothing to do with this specific thread. :( > > However, the above caught my eye. Commit b2feb64309 does not 'replace > the Makefile with a different blob'. That commit was a 'last minute' > revert of a topic _prior_ to v2.0.0, which resulted in the _same_ > blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20). Right, I think the patch is working as advertised but the commit message misinterprets the results. :) If you add --raw, you can see that both commits introduce that blob, and it never "goes away". That's because that happened in a merge, which we don't diff in a default log invocation. And in fact finding where this "goes away" is hard, because the merge doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was forked before the revert in b2feb64309, and then the merge was able to replace that blob completely with the other side of the merge. Viewing with "-m" turns up a bunch of uninteresting merges. Looking at "--first-parent" is how I got 8eaf517835. So I think this one is tricky because of the revert. In the same way that pathspec-limiting is often tricky in the face of a revert, because the merges "hide" interesting things happening. -Peff
Re: [WIP 11/15] serve: introduce git-serve
On 12/07, Junio C Hamano wrote: > Brandon Williams writes: > > > +static struct protocol_capability *get_capability(const char *key) > > +{ > > + int i; > > + > > + if (!key) > > + return NULL; > > + > > + for (i = 0; i < ARRAY_SIZE(capabilities); i++) { > > + struct protocol_capability *c = &capabilities[i]; > > + const char *out; > > + if (skip_prefix(key, c->name, &out) && (!*out || *out == '=')) > > + return c; > > Looks familiar and resembles what was recently discussed on list ;-) > > > +int cmd_serve(int argc, const char **argv, const char *prefix) > > +{ > > + > > + struct option options[] = { > > + OPT_END() > > + }; > > + > > + /* ignore all unknown cmdline switches for now */ > > + argc = parse_options(argc, argv, prefix, options, grep_usage, > > +PARSE_OPT_KEEP_DASHDASH | > > +PARSE_OPT_KEEP_UNKNOWN); > > + serve(); > > + > > + return 0; > > +} I assume that at some point we may want to have a new endpoint that just does v2 without needing the side channel to tell it to do so. Maybe for brand new server commands, like a remote grep or a remote object-stat or something that don't have a v1 equivalent that can be fallen back to. That's why I included a builtin/serve.c > > ... > > +/* Main serve loop for protocol version 2 */ > > +void serve(void) > > +{ > > + /* serve by default supports v2 */ > > + packet_write_fmt(1, "version 2\n"); > > + > > + advertise_capabilities(); > > + > > + for (;;) > > + if (process_request()) > > + break; > > +} > > I am guessing that this would be run just like upload-pack, > i.e. invoked via ssh or via git-daemon, and that is why it can just > assume that fd#0/fd#1 are already connected to the other end. It > may be helpful to document somewhere how we envision to invoke this > program. > This function I was planning to just be executed by upload-pack and receive-pack when a client requests protocol v2. But yes the idea would be that fd#0/fd#1 would be already setup like they are for upload-pack and receive-pack. -- Brandon Williams
Re: [WIP 04/15] upload-pack: convert to a builtin
Johannes Schindelin writes: > On Wed, 6 Dec 2017, Junio C Hamano wrote: > ... >> I vaguely recalled and feared that we on purpose kept this program >> separate from the rest of the system for a reason, but my mailing >> list search is coming up empty. > > I only recall that we kept it in the bin/ directory (as opposed to > mlibexec/git-core/) to help with fetching via SSH. Yes, that is about where it is installed (i.e. on $PATH), which is a different issue. My vague recollection was about what is (and what is not) included in and linked into the program built, with some reason that is different from but similar to the reason why remote helpers that link to curl and openssl libraries are excluded from the builtin deliberately. I know we exclude remote-helpers from builtin in order to save the start-up overhead for other more important built-in commands by not having to link these heavyweight libs. I suspect there was some valid reason why we didn't make upload-pack a built-in, but am failing to recall what the reason was.
Re: Re: bug deleting "unmerged" branch (2.12.3)
From: "Ulrich Windl" Hi Philip! I'm unsure what you are asking for... Ulrich Hi Ulrich, I was doing a retrospective follow up (of the second kind [1]). In your initial email https://public-inbox.org/git/5a1d70fd02a100029...@gwsmtp1.uni-regensburg.de/ you said "I wanted to delete the temporary branch (which is of no use now), I got a message that the branch is unmerged. I think if more than one branches are pointing to the same commit, one should be allowed to delete all but the last one without warning." My retrospectives question was to find what what part of the documentation could be improved to assist fellow coders and Git users in gaining a better understanding here. I think it's an easy mistake [2] to make and that we should try to make the man pages more assistive. I suspect that the description for the `git branch -d` needs a few more words to clarify the 'merged/unmerged' issue for those who recieve the warning message. Or maybe the git-glossary, etc. I tend to believe that most users will read some of the man pages, and would continue to do so if they are useful. I'd welcome any feedback or suggestions you could provide. -- Philip >>> "Philip Oakley" 04.12.17 0.30 Uhr >>> From: "Junio C Hamano" > "Philip Oakley" writes: > >> I think it was that currently you are on M, and neither A nor B are >> ancestors (i.e. merged) of M. >> >> As Junio said:- "branch -d" protects branches that are yet to be >> merged to the **current branch**. > > Actually, I think people loosened this over time and removal of > branch X is not rejected even if the range HEAD..X is not empty, as > long as X is marked to integrate with/build on something else with > branch.X.{remote,merge} and the range X@{upstream}..X is empty. > > So the stress of "current branch" above you added is a bit of a > white lie. Ah, thanks. [I haven't had chance to check the code] The man page does say: .-d .Delete a branch. The branch must be fully merged in its upstream .branch, or in HEAD if no upstream was set with --track .or --set-upstream. It's whether or not Ulrich had joined the two aspects together, and if the doc was sufficient to help recognise the 'unmerged' issue. Ulrich? -- Philip [1] Retrospective Second Directive, section 3.4.2 of (15th Ed) Agile Processes in software engineering and extreme programming. ISBN 1628251042 (for the perspective of the retrospective..) [2] 'mistake' colloquial part of the error categories of slips lapses and mistakes : Human Error, by Reason (James, prof) ISBN 0521314194 (worthwhile)
Re: [PATCH Outreachy 1/2] format: create pretty.h file
Olga Telezhnaya writes: > archive.c | 1 + > builtin/notes.c | 2 +- > builtin/reset.c | 2 +- > builtin/show-branch.c | 2 +- > combine-diff.c| 1 + > commit.c | 1 + > commit.h | 80 -- > diffcore-pickaxe.c| 1 + > grep.c| 1 + > log-tree.c| 1 + > notes-cache.c | 1 + > pretty.h | 87 > +++ > revision.h| 2 +- > sequencer.c | 1 + > sha1_name.c | 1 + > submodule.c | 1 + > 16 files changed, 101 insertions(+), 84 deletions(-) > create mode 100644 pretty.h > > diff --git a/archive.c b/archive.c > index 0b7b62af0c3ec..60607e8c00857 100644 > --- a/archive.c > +++ b/archive.c > @@ -2,6 +2,7 @@ > #include "config.h" > #include "refs.h" > #include "commit.h" > +#include "pretty.h" > #include "tree-walk.h" > #include "attr.h" > #include "archive.h" This has a toll on topics in flight that expect the symbols for pretty are available in "commit.h"; they are forced to include this new file they did not even know about. I notice that "commit.h" is included in "builtin.h"; perhaps adding a new include for "pretty.h" there would be of lessor impact? I dunno.
Re: [PATCH 1/1] diffcore: add a filter to find a specific blob
On Fri, Dec 8, 2017 at 12:19 PM, Jeff King wrote: > On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote: > >> On 08/12/17 09:34, Jeff King wrote: >> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote: >> [snip] >> >> Junio hinted at a different approach of solving this problem, which this >> >> patch implements. Teach the diff machinery another flag for restricting >> >> the information to what is shown. For example: >> >> >> >> $ ./git log --oneline --blobfind=v2.0.0:Makefile >> >> b2feb64309 Revert the whole "ask curl-config" topic for now >> >> 47fbfded53 i18n: only extract comments marked with "TRANSLATORS:" >> >> >> >> we observe that the Makefile as shipped with 2.0 was introduced in >> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by >> >> a different blob. >> >> Sorry, this has nothing to do with this specific thread. :( >> >> However, the above caught my eye. Commit b2feb64309 does not 'replace >> the Makefile with a different blob'. That commit was a 'last minute' >> revert of a topic _prior_ to v2.0.0, which resulted in the _same_ >> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20). > > Right, I think the patch is working as advertised but the commit message > misinterprets the results. :) Thanks for digging. I came to a similar realization. > If you add --raw, you can see that both commits introduce that blob, and > it never "goes away". That's because that happened in a merge, which we > don't diff in a default log invocation. We should when --raw is given. --raw is documented as "For each commit, show a summary of changes using the raw diff format." and I would argue that 'each commit' includes merges. Though I guess this may have implications for long time users. > And in fact finding where this "goes away" is hard, because the merge > doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was > forked before the revert in b2feb64309, and then the merge was able to > replace that blob completely with the other side of the merge. > > Viewing with "-m" turns up a bunch of uninteresting merges. Looking at > "--first-parent" is how I got 8eaf517835. > > So I think this one is tricky because of the revert. In the same way > that pathspec-limiting is often tricky in the face of a revert, because > the merges "hide" interesting things happening. yup, hidden merges are unfortunate. Is there an easy way to find out about merges? (Junio hints at having tests around merges, which I'll do next)