Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Eric Sunshine
On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
> Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> with ",", 2017-10-01) switched the syntax of the trailers
> placeholder, but forgot to update the documentation in
> pretty-formats.txt.
>
> There's need to mention the old syntax; it was never in a

I suppose you mean: s/need/no need/

> released version of Git.
>
> Signed-off-by: Jeff King 


Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Jeff King
On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote:

> On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
> > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> > with ",", 2017-10-01) switched the syntax of the trailers
> > placeholder, but forgot to update the documentation in
> > pretty-formats.txt.
> >
> > There's need to mention the old syntax; it was never in a
> 
> I suppose you mean: s/need/no need/

Yes, indeed. Thanks.

-Peff


RE: URGENT PLEASE

2017-12-08 Thread Office Contact


Re: [PATCH] hashmap: adjust documentation to reflect reality

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 10:47:43PM +0100, Johannes Schindelin wrote:

> > We could add that example to the test helper as then we have a good (tested)
> > example for that case, too.
> 
> What we could *also* do, and what would probably make *even more* sense,
> is to simplify the example drastically, to a point where testing it in
> test-hashmap is pointless, and where a reader can gather *very* quickly
> what it takes to use the hashmap routines.

Yes, I'd be in favor of that, too.

> In any case, I would really like to see my patch applied first, as it is a
> separate concern from what you gentle people talk about: I simply want
> that incorrect documentation fixed. The earlier, the better.

Definitely. I think it is in "next" already.

-Peff


Re: [PATCH v2 5/7] diff: use skip-to-optional-val in parsing --relative

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 02:31:38PM -0800, Junio C Hamano wrote:

> If this goes on top as a standalone patch, then the reason why it is
> separate from the other users of _default() is not because the way
> it uses the null return is special, but because it was written by a
> different author, I would think.

Mostly I was just concerned that we missed a somewhat subtle bug in the
first conversion, and I think it's worth calling out in the commit
message why that callsite must be converted in a particular way. Whether
that happens in a separate commit or squashed, I don't care too much.

But I dunno. Maybe it is obvious now that the correct code is in there.
;)

-Peff


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:

> Sometimes users are given a hash of an object and they want to
> identify it further (ex.: Use verify-pack to find the largest blobs,
> but what are these? or [1])
> 
> One might be tempted to extend git-describe to also work with blobs,
> such that `git describe ` gives a description as
> ':'.  This was implemented at [2]; as seen by the sheer
> number of responses (>110), it turns out this is tricky to get right.
> The hard part to get right is picking the correct 'commit-ish' as that
> could be the commit that (re-)introduced the blob or the blob that
> removed the blob; the blob could exist in different branches.

Neat. I didn't follow the original thread very closely, but I think this
is a good idea (and is definitely something I've simulated manually with
"git log --raw | grep" before).

Bear with me while I pontificate for a moment.

We do something similar at GitHub to report on too-large objects
during a push (we identify the object during index-pack, but then want
to hand back a human-readable pathname).

We do it with a patch to "rev-list", so that you can use the normal
traversal options to limit the set of visited commits. Which effectively
makes it "traverse and find a place where this object is referenced, and
then tell me the path at which you found it (or tell me if it was not
referenced at all)".

That sidesteps the "describe" problem, because now it is the user's
responsibility to tell you which commits to look in. :)

But the rev-list solution only finds _a_ commit that contains the
object, and which likely has nothing to do with the object at all. Your
solution here finds the introduction and removal of the object, which
are interesting for a wider variety of searches.

So I wondered if this could replace my rev-list patch (which I've been
meaning to upstream for a while). But I think the answer is "no", there
is room for both features. What you're finding here is much more
specific and useful data, but it's also much more expensive to generate.
For my uses cases, it was enough to ask "show me a plausible path
quickly" or even "is this object reachable" (which you can answer out of
bitmaps!).

> Junio hinted at a different approach of solving this problem, which this
> patch implements. Teach the diff machinery another flag for restricting
> the information to what is shown. For example:
> 
>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>   b2feb64309 Revert the whole "ask curl-config" topic for now
>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> 
> we observe that the Makefile as shipped with 2.0 was introduced in
> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> a different blob.

So I like this very much as an addition to Git's toolbox. But does it
really need to be limited to blobs? We should be able to ask for trees
(both root trees and sub-trees), too. That's going to be less common, I
think, but I have definitely done

  git log --raw -t --pretty=raw --no-abbrev |
  less +/$tree_sha1

to debug things (corrupted repos, funny merges, etc).

You could even do it for submodule commits. Which demonstrates that this
feature does not even need to actually have a resolvable object; you're
just looking for the textual sha1.

You do your filtering on the diff queue, which I think will have the
entries you need if "-t" is specified (and should always have submodule
commits, I think). So you'd just need to relax the incoming object
check, like:

diff --git a/diff.c b/diff.c
index e4571df3bf..0007bb0a09 100644
--- a/diff.c
+++ b/diff.c
@@ -4490,8 +4490,12 @@ static int parse_blobfind_opt(struct diff_options *opt, 
const char *arg)
 {
struct object_id oid;
 
-   if (get_oid_blob(arg, &oid) || sha1_object_info(oid.hash, NULL) != 
OBJ_BLOB)
-   return error("object '%s' is not a blob", arg);
+   /*
+* We don't even need to have the object, and 40-hex sha1s will
+* just resolve here.
+*/
+   if (get_oid_blob(arg, &oid))
+   return error("unable to resolve '%s'", arg);
 
if (!opt->blobfind)
opt->blobfind = xcalloc(1, sizeof(*opt->blobfind));

At which point:

  ./git log --oneline -t --blobfind=v2.0.0:Documentation

just works (the results are kind of interesting, too; you see that tree
"go away" in a lot of different commits because many independent
branches touched the directory).

Also interesting:

  ./git log --oneline --blobfind=HEAD:sha1collisiondetection

Well, the output for this particular case isn't that interesting. But it
shows that we can find submodules, too.

-Peff


Re: Unfortunate interaction between git diff-index and fakeroot

2017-12-08 Thread Elazar Leibovich


On 12/07/2017 05:31 PM, Junio C Hamano wrote:

Correct.  fakeroot would report that the files that are actually
owned by the user who is running fakeroot are owned by root; the
cached stat information in the index would be "corrected" to say
that they are owned by root.  So once the index is refreshed like
so, things will become consistent.

I still fail to understand, what's the benefit of storing the owner ID in
the index, where no one seems to use this information.
I mean, one could store many other meta information in the index, such as
ACL lists,  number of blocks allocated, etc, but the code seems to
ignore unused information, such as commit 
2cb45e95438c113871fbbea5b4f629f9463034e7

which ignores st_dev, because it doesn't actually matter, or
commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores
mode changes not relevant to the owner.

In which situation does anyone care about the owner UID?
What's the difference between that, and, e.g., st_dev?

If you are using "diff-index" for the "is the tree dirty?" check
without running "update-index --refresh", then you are not using
the command correctly.

Just want to clarify. It seems like the linux kernel is using diff-index
to do just that, in scripts/setlocalversion

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/setlocalversion#n77

Do I understand correctly that linux should update the index
first, or better, use porcelain "git diff --name-only" instead?

It seems to be the case previously, but in commit 
cdf2bc632ebc9ef512345fe8e6015edfd367e256

git update-index was removed, to prevent error messages on
read-only linux tree.

Do I understand correctly that the more correct solution
for modern git, is to use git diff --name-only instead? It should
work on read-only as well as on read-write trees, and
would not break if anyone uses git under fakeroot.

Did I understand you correctly?


Re: [PATCH] decorate: clean up and document API

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 04:14:24PM -0800, Jonathan Tan wrote:

> Improve the names of the identifiers in decorate.h, document them, and
> add an example of how to use these functions.
> 
> The example is compiled and run as part of the test suite.
> 
> Signed-off-by: Jonathan Tan 
> ---
> This patch contains some example code in a test helper.
> 
> There is a discussion at [1] about where example code for hashmap should
> go. For something relatively simple, like decorate, perhaps example code
> isn't necessary; but if we include example code anyway, I think that we
> might as well make it compiled and tested, so this patch contains that
> example code in a test helper.

I have mixed feelings. On the one hand, compiling and running the code
ensures that those things actually work. On the other hand, I expect you
can make a much clearer example if instead of having running code, you
show snippets of almost-code.

E.g.:

  struct decoration d = { NULL };

  add_decoration(&d, obj, "foo");
  ...
  str = lookup_decoration(obj);

pretty much gives the relevant overview, with very little boilerplate.
Yes, it omits things like the return value of add_decoration(), but
those sorts of details are probably better left to the function
docstrings.

Other than that philosophical point, the documentation you added looks
pretty good to me. Two possible improvements to the API we could do on
top:

  1. Should there be a DECORATION_INIT macro (possibly taking the "name"
 as an argument)? (Actually, the whole name thing seems like a
 confusing and bad API design in the first place).

  2. This is really just an oidmap to a void pointer. I wonder if we
 ought to be wrapping that code (I think we still want some
 interface so that the caller doesn't have to declare their own
 structs).

-Peff


[PATCH] cvsimport: apply shell-quoting regex globally

2017-12-08 Thread Jeff King
Commit 5b4efea666 (cvsimport: shell-quote variable used in
backticks, 2017-09-11) tried to shell-quote a variable, but
forgot to use the "/g" modifier to apply the quoting to the
whole variable. This means we'd miss any embedded
single-quotes after the first one.

Reported-by: 
Signed-off-by: Jeff King 
---
 git-cvsimport.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index 36929921ea..2d8df83172 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -642,7 +642,7 @@ sub is_sha1 {
 
 sub get_headref ($) {
my $name = shift;
-   $name =~ s/'/'\\''/;
+   $name =~ s/'/'\\''/g;
my $r = `git rev-parse --verify '$name' 2>/dev/null`;
return undef unless $? == 0;
chomp $r;
-- 
2.15.1.659.g8bd2eae3ea


Re: [PATCH] setup.c: fix comment about order of .git directory discovery

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 09:59:49AM +0100, SZEDER Gábor wrote:

> Since gitfiles were introduced in b44ebb19e (Add platform-independent
> .git "symlink", 2008-02-20) the order of checks during .git directory
> discovery is: gitfile, gitdir, bare repo.  However, that commit did
> only partially update the in-code comment describing this order,
> missing the last line which still puts gitdir before gitfile.

I was confused at first, because your patch only changes to ".git/" to
".git". So there is no ordering change, only a swapping out of one for
the other. But that pushes done the ".git/" into the "etc." which is
below.

So I think this makes sense.

Thanks.

-Peff


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Jeff King
On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
> > index 22034f87e7..8e8a15ea4a 100644
> > --- a/builtin/fmt-merge-msg.c
> > +++ b/builtin/fmt-merge-msg.c
> > @@ -377,7 +377,8 @@ static void shortlog(const char *name,
> > string_list_append(&subjects,
> >oid_to_hex(&commit->object.oid));
> > else
> > -   string_list_append(&subjects, strbuf_detach(&sb, NULL));
> > +   string_list_append_nodup(&subjects,
> > +strbuf_detach(&sb, NULL));
> > }
> >  
> > if (opts->credit_people)
> 
> What is leaked comes from strbuf, so the title is not a lie, but I
> tend to think that this leak is caused by a somewhat strange
> string_list API.  The subjects string-list is initialized as a "dup"
> kind, but a caller that wants to avoid leaking can (and should) use
> _nodup() call to add a string without duping.  It all feels a bit
> too convoluted.

I'm not sure it's string-list's fault. Many callers (including this one)
have _some_ entries whose strings must be duplicated and others which do
not.

So either:

  1. The list gets marked as "nodup", and we add an extra xstrdup() to the
 oid_to_hex call above. And also need to remember to free() the
 strings later, since the list does not own them.

or

  2. We mark it as "dup" and incur an extra allocation and copy, like:

   string_list_append(&subjects, sb.buf);
   strbuf_release(&buf);

So I'd really blame the caller, which doesn't want to do (2) out of a
sense of optimization. It could also perhaps write it as:

  while (commit = get_revision(rev)) {
strbuf_reset(&sb);
... maybe put some stuff in sb ...
if (!sb.len)
string_list_append(&subjects, oid_to_hex(obj));
else
string_list_append(&subjects, sb.buf);
  }
  strbuf_release(&sb);

which at least avoids the extra allocations.

By the way, I think there's another quite subtle leak in this function.
We do this:

  format_commit_message(commit, "%s", &sb, &ctx);
  strbuf_ltrim(&sb);

and then only use "sb" if sb.len is non-zero. But we may have actually
allocated to create our zero-length string (e.g., if we had a strbuf
full of spaces and trimmed them all off). Since we reuse "sb" over and
over as we loop, this will actually only leak once for the whole loop,
not once per iteration. So it's probably not a big deal, but writing it
with the explicit reset/release pattern fixes that (and is more
idiomatic for our code base, I think).

-Peff


[PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH

2017-12-08 Thread Jeff King
You may want to run the test suite with a different shell
than you use to build Git. For instance, you may build with
SHELL_PATH=/bin/sh (because it's faster, or it's what you
expect to exist on systems where the build will be used) but
want to run the test suite with bash (e.g., since that
allows using "-x" reliably across the whole test suite).
There's currently no good way to do this.

You might think that doing two separate make invocations,
like:

  make &&
  make -C t SHELL_PATH=/bin/bash

would work. And it _almost_ does. The second make will see
our bash SHELL_PATH, and we'll use that to run the
individual test scripts (or tell prove to use it to do so).
So far so good.

But this breaks down when "--tee" or "--verbose-log" is
used. Those options cause the test script to actually
re-exec itself using $SHELL_PATH. But wait, wouldn't our
second make invocation have set SHELL_PATH correctly in the
environment?

Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
built during the first "make". And that overrides the
environment, giving us the original SHELL_PATH again.

Let's introduce a new variable that lets you specify a
specific shell to be run for the test scripts. Note that we
have to touch both the main and t/ Makefiles, since we have
to record it in GIT-BUILD-OPTIONS in one, and use it in the
latter.

Signed-off-by: Jeff King 
---
 Makefile  | 8 
 t/Makefile| 6 --
 t/test-lib.sh | 2 +-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index fef9c8d272..8a21c4d8f1 100644
--- a/Makefile
+++ b/Makefile
@@ -425,6 +425,10 @@ all::
 #
 # to say "export LESS=FRX (and LV=-c) if the environment variable
 # LESS (and LV) is not set, respectively".
+#
+# Define TEST_SHELL_PATH if you want to use a shell besides SHELL_PATH for
+# running the test scripts (e.g., bash has better support for "set -x"
+# tracing).
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -729,6 +733,8 @@ endif
 export PERL_PATH
 export PYTHON_PATH
 
+TEST_SHELL_PATH = $(SHELL_PATH)
+
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
@@ -1725,6 +1731,7 @@ prefix_SQ = $(subst ','\'',$(prefix))
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 PYTHON_PATH_SQ = $(subst ','\'',$(PYTHON_PATH))
 TCLTK_PATH_SQ = $(subst ','\'',$(TCLTK_PATH))
@@ -2355,6 +2362,7 @@ GIT-LDFLAGS: FORCE
 # and the first level quoting from the shell that runs "echo".
 GIT-BUILD-OPTIONS: FORCE
@echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@+
+   @echo TEST_SHELL_PATH=\''$(subst ','\'',$(TEST_SHELL_PATH_SQ))'\' >>$@+
@echo PERL_PATH=\''$(subst ','\'',$(PERL_PATH_SQ))'\' >>$@+
@echo DIFF=\''$(subst ','\'',$(subst ','\'',$(DIFF)))'\' >>$@+
@echo PYTHON_PATH=\''$(subst ','\'',$(PYTHON_PATH_SQ))'\' >>$@+
diff --git a/t/Makefile b/t/Makefile
index 1bb06c36f2..96317a35f4 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -8,6 +8,7 @@
 
 #GIT_TEST_OPTS = --verbose --debug
 SHELL_PATH ?= $(SHELL)
+TEST_SHELL_PATH ?= $(SHELL_PATH)
 PERL_PATH ?= /usr/bin/perl
 TAR ?= $(TAR)
 RM ?= rm -f
@@ -23,6 +24,7 @@ endif
 
 # Shell quote;
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 TEST_RESULTS_DIRECTORY_SQ = $(subst ','\'',$(TEST_RESULTS_DIRECTORY))
 
@@ -42,11 +44,11 @@ failed:
test -z "$$failed" || $(MAKE) $$failed
 
 prove: pre-clean $(TEST_LINT)
-   @echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
+   @echo "*** prove ***"; $(PROVE) --exec '$(TEST_SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
$(MAKE) clean-except-prove-cache
 
 $(T):
-   @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
+   @echo "*** $@ ***"; '$(TEST_SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
 
 pre-clean:
$(RM) -r '$(TEST_RESULTS_DIRECTORY_SQ)'
diff --git a/t/test-lib.sh b/t/test-lib.sh
index b8dd5e79ac..1ae0fc02d0 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -80,7 +80,7 @@ done,*)
# from any previous runs.
>"$GIT_TEST_TEE_OUTPUT_FILE"
 
-   (GIT_TEST_TEE_STARTED=done ${SHELL_PATH} "$0" "$@" 2>&1;
+   (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
 echo $? >"$BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"
test "$(cat "$BASE.exit")" = 0
exit
-- 
2.15.1.659.g8bd2eae3ea


[PATCH v2 3/4] test-lib: make "-x" work with "--verbose-log"

2017-12-08 Thread Jeff King
The "-x" tracing option implies "--verbose". This is a
problem when running under a TAP harness like "prove", where
we need to use "--verbose-log" instead. Instead, let's
handle this the same way we do for --valgrind, including the
recent fix from 88c6e9d31c (test-lib: --valgrind should not
override --verbose-log, 2017-09-05). Namely, let's enable
--verbose only when we know there isn't a more specific
verbosity option indicated.

Note that we also have to tweak `want_trace` to turn it on
(previously we just lumped $verbose_log in with $verbose,
but now we don't necessarily auto-set the latter).

Signed-off-by: Jeff King 
---
 t/test-lib.sh | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 7914453a3b..b8dd5e79ac 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -264,7 +264,6 @@ do
shift ;;
-x)
trace=t
-   verbose=t
shift ;;
--verbose-log)
verbose_log=t
@@ -283,6 +282,11 @@ then
test -z "$verbose_log" && verbose=t
 fi
 
+if test -n "$trace" && test -z "$verbose_log"
+then
+   verbose=t
+fi
+
 if test -n "$color"
 then
# Save the color control sequences now rather than run tput
@@ -586,7 +590,9 @@ maybe_setup_valgrind () {
 }
 
 want_trace () {
-   test "$trace" = t && test "$verbose" = t
+   test "$trace" = t && {
+   test "$verbose" = t || test "$verbose_log" = t
+   }
 }
 
 # This is a separate function because some tests use
-- 
2.15.1.659.g8bd2eae3ea



[PATCH v2 2/4] t5615: avoid re-using descriptor 4

2017-12-08 Thread Jeff King
File descriptors 3 and 4 are special in our test suite, as
they link back to the test script's original stdout and
stderr. Normally this isn't something tests need to worry
about: they are free to clobber these descriptors for
sub-commands without affecting the overall script.

But there's one very special thing about descriptor 4: since
d88785e424 (test-lib: set BASH_XTRACEFD automatically,
2016-05-11), we ask bash to output "set -x" output to it by
number. This goes to _any_ descriptor 4, even if it no
longer points to the place it did when we set BASH_XTRACEFD.

But in t5615, we run a shell loop with descriptor 4
redirected.  As a result, t5615 works with non-bash shells
even with "-x". And it works with bash without "-x". But the
combination of "bash t5615-alternate-env.sh -x" gets a test
failure (because our "set -x" output pollutes one of the
files).

We can fix this by using any descriptor _except_ the magical
4. So let's switch arbitrarily to using 5/6 in this loop,
not 3/4.

Another alternative is to use a different descriptor for
BASH_XTRACEFD. But picking an unused one turns out to be
hard. Most shells limit us to 9 numbered descriptors. Bash
can handle more, but:

  - while the BASH_XTRACEFD is specific to bash, GIT_TRACE=4
has a similar problem, and would affect all shells

  - constructs like "999>/dev/null" are synticatically
invalid to non-bash shells. So we have to actually bury
it inside an eval, which creates more complications.

Of the numbers 1-9, you might think that "9" would be less
used than "4". But it's not; many of our scripts use
descriptors 8 and 9 (probably under the assumption that they
are high and therefore unused). The least-used descriptor is
currently "7". We could switch to that, but we're just
trading one magic number for another.

Signed-off-by: Jeff King 
---
 t/t5615-alternate-env.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t5615-alternate-env.sh b/t/t5615-alternate-env.sh
index d2d883f3a1..b4905b822c 100755
--- a/t/t5615-alternate-env.sh
+++ b/t/t5615-alternate-env.sh
@@ -7,9 +7,9 @@ check_obj () {
alt=$1; shift
while read obj expect
do
-   echo "$obj" >&3 &&
-   echo "$obj $expect" >&4
-   done 3>input 4>expect &&
+   echo "$obj" >&5 &&
+   echo "$obj $expect" >&6
+   done 5>input 6>expect &&
GIT_ALTERNATE_OBJECT_DIRECTORIES=$alt \
git "$@" cat-file --batch-check='%(objectname) %(objecttype)' \
actual &&
-- 
2.15.1.659.g8bd2eae3ea



[PATCH v2 0/4] making test-suite tracing more useful

2017-12-08 Thread Jeff King
This series fixes some rough edges in the "-x" feature of the test
suite. With it, it should be possible to turn on tracing for CI runs.

This is mostly a repost of v1 at:

  
https://public-inbox.org/git/20171019210140.64lb52cqtgdh2...@sigill.intra.peff.net

which had some discussion, but wasn't picked up.

I fixed a few typos pointed out by reviewers, and I tried to summarize
the discussion around "magic descriptor 4" in the commit message of
patch 2.  My conclusion there was that none of the options is
particularly good.  The only thing thing that might make sense is making
"7" the magical descriptor. But given that "4" only had one troubling
call, I'm inclined to just leave it as-is and see if this ever comes up
again (especially if we start using "-x" in the Travis builds, then we'd
catch any problems).

  [1/4]: test-lib: silence "-x" cleanup under bash
  [2/4]: t5615: avoid re-using descriptor 4
  [3/4]: test-lib: make "-x" work with "--verbose-log"
  [4/4]: t/Makefile: introduce TEST_SHELL_PATH

 Makefile |  8 
 t/Makefile   |  6 --
 t/t5615-alternate-env.sh |  6 +++---
 t/test-lib.sh| 46 +-
 4 files changed, 48 insertions(+), 18 deletions(-)

-Peff


[PATCH v2 1/4] test-lib: silence "-x" cleanup under bash

2017-12-08 Thread Jeff King
When the test suite's "-x" option is used with bash, we end
up seeing cleanup cruft in the output:

  $ bash t0001-init.sh -x
  [...]
  ++ diff -u expected actual
  + test_eval_ret_=0
  + want_trace
  + test t = t
  + test t = t
  + set +x
  ok 42 - re-init from a linked worktree

This ranges from mildly annoying (for a successful test) to
downright confusing (when we say "last command exited with
error", but it's really 5 commands back).

We normally are able to suppress this cleanup. As the
in-code comment explains, we can't convince the shell not to
print it, but we can redirect its stderr elsewhere.

But since d88785e424 (test-lib: set BASH_XTRACEFD
automatically, 2016-05-11), that doesn't hold for bash. It
sends the "set -x" output directly to descriptor 4, not to
stderr.

We can fix this by also redirecting descriptor 4, and
paying close attention to which commands redirected and
which are not (see the updated comment).

Two alternatives I considered and rejected:

  - unsetting and setting BASH_XTRACEFD; doing so closes the
descriptor, which we must avoid

  - we could keep everything in a single block as before,
redirect 4>/dev/null there, but retain 5>&4 as a copy.
And then selectively restore 4>&5 for commands which
should be allowed to trace. This would work, but the
descriptor swapping seems unnecessarily confusing.

Signed-off-by: Jeff King 
---
 t/test-lib.sh | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 116bd6a70c..7914453a3b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -601,26 +601,40 @@ test_eval_inner_ () {
 }
 
 test_eval_ () {
-   # We run this block with stderr redirected to avoid extra cruft
-   # during a "-x" trace. Once in "set -x" mode, we cannot prevent
+   # If "-x" tracing is in effect, then we want to avoid polluting stderr
+   # with non-test commands. But once in "set -x" mode, we cannot prevent
# the shell from printing the "set +x" to turn it off (nor the saving
# of $? before that). But we can make sure that the output goes to
# /dev/null.
#
-   # The test itself is run with stderr put back to &4 (so either to
-   # /dev/null, or to the original stderr if --verbose was used).
+   # There are a few subtleties here:
+   #
+   #   - we have to redirect descriptor 4 in addition to 2, to cover
+   # BASH_XTRACEFD
+   #
+   #   - the actual eval has to come before the redirection block (since
+   # it needs to see descriptor 4 to set up its stderr)
+   #
+   #   - likewise, any error message we print must be outside the block to
+   # access descriptor 4
+   #
+   #   - checking $? has to come immediately after the eval, but it must
+   # be _inside_ the block to avoid polluting the "set -x" output
+   #
+
+   test_eval_inner_ "$@" &3 2>&4
{
-   test_eval_inner_ "$@" &3 2>&4
test_eval_ret_=$?
if want_trace
then
set +x
-   if test "$test_eval_ret_" != 0
-   then
-   say_color error >&4 "error: last command exited 
with \$?=$test_eval_ret_"
-   fi
fi
-   } 2>/dev/null
+   } 2>/dev/null 4>&2
+
+   if test "$test_eval_ret_" != 0 && want_trace
+   then
+   say_color error >&4 "error: last command exited with 
\$?=$test_eval_ret_"
+   fi
return $test_eval_ret_
 }
 
-- 
2.15.1.659.g8bd2eae3ea



[PATCH] refs: drop "clear packed-refs while locked" assertion

2017-12-08 Thread Jeff King
This patch fixes a regression in v2.14.0. It's actually fixed already in
v2.15.0 because all of the packed-ref code there was rewritten. So
there's no point in applying this on "master" or even "maint". But I
figured it was worth sharing here in case somebody else runs across it,
and in case we ever do a v2.14.4 release.

-- >8 --
In clear_packed_ref_cache(), we assert that we're not
currently holding the packed-refs lock. But in each of the
three code paths that can hit this, the assertion is either
a noop or actively does the wrong thing:

 1. in rollback_packed_refs(), we will have just released
the lock before calling the function, and so the
assertion can never trigger.

 2. get_packed_ref_cache() can reach this assertion via
validate_packed_ref_cache(). But it calls the validate
function only when it knows that we're not holding the
lock, so again, the assertion can never trigger.

 3. lock_packed_refs() also calls validate_packed_ref_cache().
In this case we're _always_ holding the lock, which
means any time the validate function has to clear the
cache, we'll trigger this assertion and die.

This doesn't happen often in practice because the
validate function clears the cache only if we find that
somebody else has racily rewritten the packed-refs file
between the time we read it and the time we took the lock.

So most of the time we don't reach the assertion at all
(nobody has racily written the file so there's no need
to clear the cache). And when we do, it is not actually
indicative of a bug; clearing the cache while holding
the lock is the right thing to do here.

This final case is relatively new, being triggerd by the
extra validation added in fed6ebebf1 (lock_packed_refs():
fix cache validity check, 2017-06-12).

Signed-off-by: Jeff King 
---
 refs/files-backend.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f21a954ce7..dd41e1d382 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -99,8 +99,6 @@ static void clear_packed_ref_cache(struct files_ref_store 
*refs)
if (refs->packed) {
struct packed_ref_cache *packed_refs = refs->packed;
 
-   if (is_lock_file_locked(&refs->packed_refs_lock))
-   die("BUG: packed-ref cache cleared while locked");
refs->packed = NULL;
release_packed_ref_cache(packed_refs);
}
-- 
2.15.1.659.g8bd2eae3ea


Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Johannes Schindelin
Hi,

On Fri, 8 Dec 2017, Torsten Bögershausen wrote:

> > * tb/check-crlf-for-safe-crlf (2017-11-27) 1 commit
> >   (merged to 'next' on 2017-12-05 at 7adaa1fe01)
> >  + convert: tighten the safe autocrlf handling
> > 
> >  The "safe crlf" check incorrectly triggered for contents that does
> >  not use CRLF as line endings, which has been corrected.
> > 
> >  Broken on Windows???
> >  cf. 
> 
> Yes, broken on Windows. A fix is coming the next days.

We might want to consider using a saner Continuous Testing workflow, to
avoid re-testing (and re-finding) breakages in individual patch series
just because completely unrelated patch got updated.

I mean, yes, it seemed like a good idea a long time ago to have One Branch
that contains All The Patch Series Currently Cooking, back when our most
reliable (because only) test facilities were poor humans.

But we see how many more subtle bugs are spotted nowadays where Git's
source code is tested automatically on a growing number of Operating
System/CPU architecture "coordinates", and it is probably time to save
some human resources.

How about testing the individual branches instead?

This would save me a ton of time, as bisecting is just too expensive given
the scattered base commits of the branches smooshed into `pu`. (There is a
new Git/Error.pm breakage in pu for about a week that I simply have not
gotten around to, or better put: that I did not want to tackle given the
time committment).)

Ciao,
Dscho

Kindly Assist Me

2017-12-08 Thread Mr. Sonami
Kindly Assist Me


In good faith from Mr.Sonami, actually could you please consider to
help me to relocate this sum of five million, three hundred thousand
dollars (US$5.3 m) to your country for establishing a medium industry
in your country? The said 5.3 million dollars was deposited in our
bank by Mr. Jean-Noël Rey a Swiss land citizen who died in a terrorist
attacks in 2016. We have never met before, but need not to worry as I
am using the only secured and confidential medium available to seek
for your foreign assistance in this business. I am contacting you
independently of my investigation and no one is informed of this
communication.

I contacted you in this transaction on my search for a reliable,
capable partner and I discovered that the late Swiss died along side
with his supposed to be next of kin through CNN News on terrorist
attacks. I will give you all vital information concerning the Swiss
and the 5.3 million dollars in our custody so that you will contact my
bank for them to release the money to you. You can come here in person
or you can request the bank to give you the contact of the bank
lawyer's who can represent your interest in the transfer process.

Reply and send this information to me at: sonami...@gmail.com

Your full Name.
Age...
Country
Occupation..
Your Telephone Numbers.

Have a nice day and good luck
Mr.Sonami


[PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Olga Telezhnaya
Create header for pretty.c to make formatting interface more structured.
This is a middle point, this file would be merged futher with other
files which contain formatting stuff.

Signed-off-by: Olga Telezhnaia 
Mentored-by: Christian Couder 
Mentored by: Jeff King 
---
 archive.c |  1 +
 builtin/notes.c   |  2 +-
 builtin/reset.c   |  2 +-
 builtin/show-branch.c |  2 +-
 combine-diff.c|  1 +
 commit.c  |  1 +
 commit.h  | 80 --
 diffcore-pickaxe.c|  1 +
 grep.c|  1 +
 log-tree.c|  1 +
 notes-cache.c |  1 +
 pretty.h  | 87 +++
 revision.h|  2 +-
 sequencer.c   |  1 +
 sha1_name.c   |  1 +
 submodule.c   |  1 +
 16 files changed, 101 insertions(+), 84 deletions(-)
 create mode 100644 pretty.h

diff --git a/archive.c b/archive.c
index 0b7b62af0c3ec..60607e8c00857 100644
--- a/archive.c
+++ b/archive.c
@@ -2,6 +2,7 @@
 #include "config.h"
 #include "refs.h"
 #include "commit.h"
+#include "pretty.h"
 #include "tree-walk.h"
 #include "attr.h"
 #include "archive.h"
diff --git a/builtin/notes.c b/builtin/notes.c
index 1a2c7d92ad7e7..7c8176164561b 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -12,7 +12,7 @@
 #include "builtin.h"
 #include "notes.h"
 #include "blob.h"
-#include "commit.h"
+#include "pretty.h"
 #include "refs.h"
 #include "exec_cmd.h"
 #include "run-command.h"
diff --git a/builtin/reset.c b/builtin/reset.c
index 906e541658230..e15f595799c40 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -12,7 +12,7 @@
 #include "lockfile.h"
 #include "tag.h"
 #include "object.h"
-#include "commit.h"
+#include "pretty.h"
 #include "run-command.h"
 #include "refs.h"
 #include "diff.h"
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 2e24b5c330e8e..e8a4aa40cb4b6 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -1,6 +1,6 @@
 #include "cache.h"
 #include "config.h"
-#include "commit.h"
+#include "pretty.h"
 #include "refs.h"
 #include "builtin.h"
 #include "color.h"
diff --git a/combine-diff.c b/combine-diff.c
index 2505de119a2be..01ba1b03a06d2 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "commit.h"
+#include "pretty.h"
 #include "blob.h"
 #include "diff.h"
 #include "diffcore.h"
diff --git a/commit.c b/commit.c
index cab8d4455bdbd..ac17a27a4ab0a 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "pretty.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
diff --git a/commit.h b/commit.h
index 99a3fea68d3f6..41a2067809444 100644
--- a/commit.h
+++ b/commit.h
@@ -121,93 +121,13 @@ struct commit_list *copy_commit_list(struct commit_list 
*list);
 
 void free_commit_list(struct commit_list *list);
 
-/* Commit formats */
-enum cmit_fmt {
-   CMIT_FMT_RAW,
-   CMIT_FMT_MEDIUM,
-   CMIT_FMT_DEFAULT = CMIT_FMT_MEDIUM,
-   CMIT_FMT_SHORT,
-   CMIT_FMT_FULL,
-   CMIT_FMT_FULLER,
-   CMIT_FMT_ONELINE,
-   CMIT_FMT_EMAIL,
-   CMIT_FMT_MBOXRD,
-   CMIT_FMT_USERFORMAT,
-
-   CMIT_FMT_UNSPECIFIED
-};
-
-static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
-{
-   return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
-}
-
 struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
 
-struct pretty_print_context {
-   /*
-* Callers should tweak these to change the behavior of pp_* functions.
-*/
-   enum cmit_fmt fmt;
-   int abbrev;
-   const char *after_subject;
-   int preserve_subject;
-   struct date_mode date_mode;
-   unsigned date_mode_explicit:1;
-   int print_email_subject;
-   int expand_tabs_in_log;
-   int need_8bit_cte;
-   char *notes_message;
-   struct reflog_walk_info *reflog_info;
-   struct rev_info *rev;
-   const char *output_encoding;
-   struct string_list *mailmap;
-   int color;
-   struct ident_split *from_ident;
-
-   /*
-* Fields below here are manipulated internally by pp_* functions and
-* should not be counted on by callers.
-*/
-   struct string_list in_body_headers;
-   int graph_width;
-};
-
-struct userformat_want {
-   unsigned notes:1;
-};
-
 extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
-extern void get_commit_format(const char *arg, struct rev_info *);
-extern const char *format_subject(struct strbuf *sb, const char *msg,
- const char *line_separator);
-extern void userformat_find_requirements(const char *fmt, struct 
userformat_want *w);
-extern int commit_format_is_empty(enum cmit_fmt);
 ex

[PATCH Outreachy 2/2] format: create docs for pretty.h

2017-12-08 Thread Olga Telezhnaya
Write some docs for functions in pretty.h.
Take it as a first draft, they would be changed later.

Signed-off-by: Olga Telezhnaia 
Mentored-by: Christian Couder 
Mentored by: Jeff King 
---
 pretty.h | 44 
 1 file changed, 44 insertions(+)

diff --git a/pretty.h b/pretty.h
index ef5167484fb64..5c85d94e332d7 100644
--- a/pretty.h
+++ b/pretty.h
@@ -48,6 +48,7 @@ struct pretty_print_context {
int graph_width;
 };
 
+/* Check whether commit format is mail. */
 static inline int cmit_fmt_is_mail(enum cmit_fmt fmt)
 {
return (fmt == CMIT_FMT_EMAIL || fmt == CMIT_FMT_MBOXRD);
@@ -57,31 +58,74 @@ struct userformat_want {
unsigned notes:1;
 };
 
+/* Set the flag "w->notes" if there is placeholder %N in "fmt". */
 void userformat_find_requirements(const char *fmt, struct userformat_want *w);
+
+/*
+ * Shortcut for invoking pretty_print_commit if we do not have any context.
+ * Context would be set empty except "fmt".
+ */
 void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
struct strbuf *sb);
+
+/*
+ * Get information about user and date from "line", format it and
+ * put it into "sb".
+ * Format of "line" must be readable for split_ident_line function.
+ * The resulting format is "what: name  date".
+ */
 void pp_user_info(struct pretty_print_context *pp, const char *what,
struct strbuf *sb, const char *line,
const char *encoding);
+
+/*
+ * Format title line of commit message taken from "msg_p" and
+ * put it into "sb".
+ * First line of "msg_p" is also affected.
+ */
 void pp_title_line(struct pretty_print_context *pp, const char **msg_p,
struct strbuf *sb, const char *encoding,
int need_8bit_cte);
+
+/*
+ * Get current state of commit message from "msg_p" and continue formatting
+ * by adding indentation and '>' signs. Put result into "sb".
+ */
 void pp_remainder(struct pretty_print_context *pp, const char **msg_p,
struct strbuf *sb, int indent);
 
+/*
+ * Create a text message about commit using given "format" and "context".
+ * Put the result to "sb".
+ * Please use this function for custom formats.
+ */
 void format_commit_message(const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
 
+/*
+ * Parse given arguments from "arg", check it for correctness and
+ * fill struct rev_info.
+ */
 void get_commit_format(const char *arg, struct rev_info *);
 
+/*
+ * Make a commit message with all rules from given "pp"
+ * and put it into "sb".
+ * Please use this function if you have a context (candidate for "pp").
+ */
 void pretty_print_commit(struct pretty_print_context *pp,
const struct commit *commit,
struct strbuf *sb);
 
+/*
+ * Change line breaks in "msg" to "line_separator" and put it into "sb".
+ * Return "msg" itself.
+ */
 const char *format_subject(struct strbuf *sb, const char *msg,
const char *line_separator);
 
+/* Check if "cmit_fmt" will produce an empty output. */
 int commit_format_is_empty(enum cmit_fmt);
 
 #endif /* PRETTY_H */

--
https://github.com/git/git/pull/439


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/diffcore-blobfind.c b/diffcore-blobfind.c
> new file mode 100644
> index 00..e65c7cad6e
> --- /dev/null
> +++ b/diffcore-blobfind.c
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (c) 2017 Google Inc.
> + */
> +#include "cache.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +
> +static void diffcore_filter_blobs(struct diff_queue_struct *q,
> +   struct diff_options *options)
> +{
> + int src, dst;
> +
> + if (!options->blobfind)
> + BUG("blobfind oidset not initialized???");
> +
> + for (src = dst = 0; src < q->nr; src++) {
> + struct diff_filepair *p = q->queue[src];
> +
> + if ((DIFF_FILE_VALID(p->one) &&
> +  oidset_contains(options->blobfind, &p->one->oid)) ||
> + (DIFF_FILE_VALID(p->two) &&
> +  oidset_contains(options->blobfind, &p->two->oid))) {

Shouldn't this make sure that !DIFF_PAIR_UNMERGED(p) before looking
at the sides of the pair?  I think an unmerged pair is queued with
filespecs whose modes are cleared, but we should not be relying on
that---I think we even had bugs we fixed by introducing an explicit
is_unmerged field to the filepair struct.

> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>   simplify_merges(revs);
>   if (revs->children.name)
>   set_children(revs);
> + if (revs->diffopt.blobfind)
> + revs->simplify_history = 0;
>   return 0;
>  }

It makes sense to clear this bit by default, but is this an
unconditonal clearing?  In other words, is there a way for the user
to countermand this default and ask for merge simplification from
the command line, e.g. "diff --simplify-history --blobfind="?

> +test_description='test finding specific blobs in the revision walking'
> +. ./test-lib.sh
> +
> +test_expect_success 'setup ' '
> + git commit --allow-empty -m "empty initial commit" &&
> +
> + echo "Hello, world!" >greeting &&
> + git add greeting &&
> + git commit -m "add the greeting blob" && # borrowed from Git from the 
> Bottom Up
> + git tag -m "the blob" greeting $(git rev-parse HEAD:greeting) &&
> +
> + echo asdf >unrelated &&
> + git add unrelated &&
> + git commit -m "unrelated history" &&
> +
> + git revert HEAD^ &&
> +
> + git commit --allow-empty -m "another unrelated commit"
> +'
> +
> +test_expect_success 'find the greeting blob' '
> + cat >expect <<-EOF &&
> + Revert "add the greeting blob"
> + add the greeting blob
> + EOF
> +
> + git log --abbrev=12 --oneline --blobfind=greeting^{blob} >actual.raw &&
> + cut -c 14- actual.raw >actual &&
> + test_cmp expect actual

The hardcoded numbers look strange and smell like a workaround of
not asking for full object names.

Also, now it has the simplify-history bit in the change, don't we
want to check a mergy history, and also running "diff-files" while
the index has unmerged entries?

Other than that, the changes look quite sane and pleasnt read.

Thanks.


> +'
> +
> +test_done


Re: [PATCH v2 4/4] t/Makefile: introduce TEST_SHELL_PATH

2017-12-08 Thread Johannes Schindelin
Hi Peff,

the other three patches look good to me.

On Fri, 8 Dec 2017, Jeff King wrote:

> You may want to run the test suite with a different shell
> than you use to build Git. For instance, you may build with
> SHELL_PATH=/bin/sh (because it's faster, or it's what you
> expect to exist on systems where the build will be used) but
> want to run the test suite with bash (e.g., since that
> allows using "-x" reliably across the whole test suite).
> There's currently no good way to do this.
> 
> You might think that doing two separate make invocations,
> like:
> 
>   make &&
>   make -C t SHELL_PATH=/bin/bash
> 
> would work. And it _almost_ does. The second make will see
> our bash SHELL_PATH, and we'll use that to run the
> individual test scripts (or tell prove to use it to do so).
> So far so good.
> 
> But this breaks down when "--tee" or "--verbose-log" is
> used. Those options cause the test script to actually
> re-exec itself using $SHELL_PATH. But wait, wouldn't our
> second make invocation have set SHELL_PATH correctly in the
> environment?
> 
> Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
> built during the first "make". And that overrides the
> environment, giving us the original SHELL_PATH again.

... and we could simply see whether the environment variable
TEST_SHELL_PATH (which we would set in t/Makefile from the passed-in
SHELL_PATH) is set, and override it again.

I still think we can do without recording test-phase details in the
build-phase (which may, or may not, know what the test-phase wants to do).

In other words, I believe that we can make the invocation you mentioned
above work, by touching only t/Makefile (to pass SHELL_PATH as
TEST_SHELL_PATH) and t/test-lib.sh (to override the SHELL_PATH from
GIT-BUILD-OPTIONS with TEST_SHELL_PATH, if set).

Ciao,
Dscho


[PATCH] doc: reword gitworflows for neutrality

2017-12-08 Thread Daniel Bensoussan
Changed 'he' to 'them' to be more neutral in "gitworkflows.txt".

See discussion at: 
https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/

Signed-off-by: Matthieu Moy 
Signed-off-by: Timothee Albertin 
Signed-off-by: Nathan Payre 
Signed-off-by: Daniel Bensoussan 
---
 Documentation/gitworkflows.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt
index 02569d0..926e044 100644
--- a/Documentation/gitworkflows.txt
+++ b/Documentation/gitworkflows.txt
@@ -407,8 +407,8 @@ follows.
 `git pull  `
 =
 
-Occasionally, the maintainer may get merge conflicts when he tries to
-pull changes from downstream.  In this case, he can ask downstream to
+Occasionally, the maintainer may get merge conflicts when they try to
+pull changes from downstream.  In this case, they can ask downstream to
 do the merge and resolve the conflicts themselves (perhaps they will
 know better how to resolve them).  It is one of the rare cases where
 downstream 'should' merge from upstream.
-- 
2.11.0



[PATCH v7 00/10] Partial clone part 2: fsck and promisors

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

This is V7 of part 2 of partial clone.  This builds upon V6 of part 1.

This version squashes the fixup commits that I added to the V6p2 series.
The net result is identical.

Jonathan Tan (10):
  extension.partialclone: introduce partial clone extension
  fsck: introduce partialclone extension
  fsck: support refs pointing to promisor objects
  fsck: support referenced promisor objects
  fsck: support promisor objects as CLI argument
  index-pack: refactor writing of .keep files
  introduce fetch-object: fetch one promisor object
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles

 Documentation/git-pack-objects.txt |  11 +
 Documentation/gitremote-helpers.txt|   7 +
 Documentation/rev-list-options.txt |  11 +
 Documentation/technical/repository-version.txt |  12 +
 Makefile   |   1 +
 builtin/cat-file.c |   2 +
 builtin/fetch-pack.c   |  10 +
 builtin/fsck.c |  26 +-
 builtin/gc.c   |   3 +
 builtin/index-pack.c   | 113 
 builtin/pack-objects.c |  37 ++-
 builtin/prune.c|   7 +
 builtin/repack.c   |   8 +-
 builtin/rev-list.c |  71 -
 cache.h|  13 +-
 environment.c  |   1 +
 fetch-object.c |  27 ++
 fetch-object.h |   6 +
 fetch-pack.c   |  48 ++--
 fetch-pack.h   |   8 +
 list-objects.c |  29 ++-
 object.c   |   2 +-
 packfile.c |  77 +-
 packfile.h |  13 +
 remote-curl.c  |  14 +-
 revision.c |  33 ++-
 revision.h |   5 +-
 setup.c|   7 +-
 sha1_file.c|  32 ++-
 t/t0410-partial-clone.sh   | 343 +
 transport.c|   8 +
 transport.h|  11 +
 32 files changed, 899 insertions(+), 97 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h
 create mode 100755 t/t0410-partial-clone.sh

-- 
2.9.3



[PATCH v7 03/10] fsck: support refs pointing to promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  8 
 t/t0410-partial-clone.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 2934299..ee937bb 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -434,6 +434,14 @@ static int fsck_handle_ref(const char *refname, const 
struct object_id *oid,
 
obj = parse_object(oid);
if (!obj) {
+   if (is_promisor_object(oid)) {
+   /*
+* Increment default_refs anyway, because this is a
+* valid ref.
+*/
+default_refs++;
+return 0;
+   }
error("%s: invalid sha1 pointer %s", refname, oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
/* We'll continue with the rest despite the error.. */
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 3ddb3b9..bf75162 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -13,6 +13,14 @@ pack_as_from_promisor () {
>repo/.git/objects/pack/pack-$HASH.promisor
 }
 
+promise_and_delete () {
+   HASH=$(git -C repo rev-parse "$1") &&
+   git -C repo tag -a -m message my_annotated_tag "$HASH" &&
+   git -C repo rev-parse my_annotated_tag | pack_as_from_promisor &&
+   git -C repo tag -d my_annotated_tag &&
+   delete_object repo "$HASH"
+}
+
 test_expect_success 'missing reflog object, but promised by a commit, passes 
fsck' '
test_create_repo repo &&
test_commit -C repo my_commit &&
@@ -78,4 +86,20 @@ test_expect_success 'missing reflog object alone fails fsck, 
even with extension
test_must_fail git -C repo fsck
 '
 
+test_expect_success 'missing ref object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+
+   # Reference $A only from ref
+   git -C repo branch my_branch "$A" &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.9.3



[PATCH v7 01/10] extension.partialclone: introduce partial clone extension

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Introduce new repository extension option:
`extensions.partialclone`

See the update to Documentation/technical/repository-version.txt
in this patch for more information.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/repository-version.txt | 12 
 cache.h|  2 ++
 environment.c  |  1 +
 setup.c|  7 ++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad379..e03eacc 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,15 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`partialclone`
+~~
+
+When the config key `extensions.partialclone` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
diff --git a/cache.h b/cache.h
index 6440e2b..35e3f5e 100644
--- a/cache.h
+++ b/cache.h
@@ -860,10 +860,12 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_partial_clone;
 
 struct repository_format {
int version;
int precious_objects;
+   char *partial_clone; /* value of extensions.partialclone */
int is_bare;
char *work_tree;
struct string_list unknown_extensions;
diff --git a/environment.c b/environment.c
index 8289c25..e52aab3 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_partial_clone;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/setup.c b/setup.c
index 03f51e0..58536bd 100644
--- a/setup.c
+++ b/setup.c
@@ -420,7 +420,11 @@ static int check_repo_format(const char *var, const char 
*value, void *vdata)
;
else if (!strcmp(ext, "preciousobjects"))
data->precious_objects = git_config_bool(var, value);
-   else
+   else if (!strcmp(ext, "partialclone")) {
+   if (!value)
+   return config_error_nonbool(var);
+   data->partial_clone = xstrdup(value);
+   } else
string_list_append(&data->unknown_extensions, ext);
} else if (strcmp(var, "core.bare") == 0) {
data->is_bare = git_config_bool(var, value);
@@ -463,6 +467,7 @@ static int check_repository_format_gently(const char 
*gitdir, int *nongit_ok)
}
 
repository_format_precious_objects = candidate.precious_objects;
+   repository_format_partial_clone = candidate.partial_clone;
string_list_clear(&candidate.unknown_extensions, 0);
if (!has_common) {
if (candidate.is_bare != -1) {
-- 
2.9.3



[PATCH v7 07/10] introduce fetch-object: fetch one promisor object

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 Documentation/gitremote-helpers.txt |  7 ++
 Makefile|  1 +
 builtin/fetch-pack.c|  8 +++
 builtin/index-pack.c| 16 ++---
 fetch-object.c  | 24 +++
 fetch-object.h  |  6 +
 fetch-pack.c| 48 +
 fetch-pack.h|  8 +++
 remote-curl.c   | 14 ++-
 transport.c |  8 +++
 transport.h | 11 +
 11 files changed, 126 insertions(+), 25 deletions(-)
 create mode 100644 fetch-object.c
 create mode 100644 fetch-object.h

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 4a584f3..4b8c93e 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -466,6 +466,13 @@ set by Git if the remote helper has the 'option' 
capability.
Transmit  as a push option. As the push option
must not contain LF or NUL characters, the string is not encoded.
 
+'option from-promisor' {'true'|'false'}::
+   Indicate that these objects are being fetched from a promisor.
+
+'option no-dependents' {'true'|'false'}::
+   Indicate that only the objects wanted need to be fetched, not
+   their dependents.
+
 SEE ALSO
 
 linkgit:git-remote[1]
diff --git a/Makefile b/Makefile
index ca378a4..795e0c7 100644
--- a/Makefile
+++ b/Makefile
@@ -792,6 +792,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d1..02abe72 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -143,6 +143,14 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.update_shallow = 1;
continue;
}
+   if (!strcmp("--from-promisor", arg)) {
+   args.from_promisor = 1;
+   continue;
+   }
+   if (!strcmp("--no-dependents", arg)) {
+   args.no_dependents = 1;
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 4f305a7..24c2f05 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1429,14 +1429,16 @@ static void write_special_file(const char *suffix, 
const char *msg,
if (close(fd) != 0)
die_errno(_("cannot close written %s file '%s'"),
  suffix, filename);
-   *report = suffix;
+   if (report)
+   *report = suffix;
}
strbuf_release(&name_buf);
 }
 
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_msg, unsigned char *sha1)
+ const char *keep_msg, const char *promisor_msg,
+ unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
@@ -1455,6 +1457,9 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
if (keep_msg)
write_special_file("keep", keep_msg, final_pack_name, sha1,
   &report);
+   if (promisor_msg)
+   write_special_file("promisor", promisor_msg, final_pack_name,
+  sha1, NULL);
 
if (final_pack_name != curr_pack_name) {

[PATCH v7 09/10] rev-list: support termination at promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/rev-list-options.txt |  11 
 builtin/rev-list.c |  71 +++---
 list-objects.c |  29 ++-
 object.c   |   2 +-
 revision.c |  33 +++-
 revision.h |   5 +-
 t/t0410-partial-clone.sh   | 101 +
 7 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 8d8b7f4..0ce8ccd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object 
traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+   (For internal use only.)  Prefilter object traversal at
+   promisor boundary.  This is used with partial clone.  This is
+   stronger than `--missing=allow-promisor` because it limits the
+   traversal, rather than just silencing errors about missing
+   objects.
+
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 547a3e0..48f922d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
MA_ERROR = 0,/* fail if any missing objects are encountered */
MA_ALLOW_ANY,/* silently allow ALL missing objects */
MA_PRINT,/* print ALL missing objects in special section */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void 
*data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+   /*
+* Whether or not we try to dynamically fetch missing objects
+* from the server, we currently DO NOT have the object.  We
+* can either print, allow (ignore), or conditionally allow
+* (ignore) them.
+*/
switch (arg_missing_action) {
case MA_ERROR:
die("missing blob object '%s'", oid_to_hex(&obj->oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
oidset_insert(&missing_objects, &obj->oid);
return;
 
+   case MA_ALLOW_PROMISOR:
+   if (is_promisor_object(&obj->oid))
+   return;
+   die("unexpected missing blob object '%s'",
+   oid_to_hex(&obj->oid));
+   return;
+
default:
BUG("unhandled missing_action");
return;
}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
finish_object__ma(obj);
+   return 1;
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
parse_object(&obj->oid);
+   return 0;
 }
 
 static void show_object(stru

[PATCH v7 06/10] index-pack: refactor writing of .keep files

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan 
---
 builtin/index-pack.c | 99 
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8ec459f..4f305a7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1389,15 +1389,58 @@ static void fix_unresolved_deltas(struct sha1file *f)
free(sorted_by_pos);
 }
 
+static const char *derive_filename(const char *pack_name, const char *suffix,
+  struct strbuf *buf)
+{
+   size_t len;
+   if (!strip_suffix(pack_name, ".pack", &len))
+   die(_("packfile name '%s' does not end with '.pack'"),
+   pack_name);
+   strbuf_add(buf, pack_name, len);
+   strbuf_addch(buf, '.');
+   strbuf_addstr(buf, suffix);
+   return buf->buf;
+}
+
+static void write_special_file(const char *suffix, const char *msg,
+  const char *pack_name, const unsigned char *sha1,
+  const char **report)
+{
+   struct strbuf name_buf = STRBUF_INIT;
+   const char *filename;
+   int fd;
+   int msg_len = strlen(msg);
+
+   if (pack_name)
+   filename = derive_filename(pack_name, suffix, &name_buf);
+   else
+   filename = odb_pack_name(&name_buf, sha1, suffix);
+
+   fd = odb_pack_keep(filename);
+   if (fd < 0) {
+   if (errno != EEXIST)
+   die_errno(_("cannot write %s file '%s'"),
+ suffix, filename);
+   } else {
+   if (msg_len > 0) {
+   write_or_die(fd, msg, msg_len);
+   write_or_die(fd, "\n", 1);
+   }
+   if (close(fd) != 0)
+   die_errno(_("cannot close written %s file '%s'"),
+ suffix, filename);
+   *report = suffix;
+   }
+   strbuf_release(&name_buf);
+}
+
 static void final(const char *final_pack_name, const char *curr_pack_name,
  const char *final_index_name, const char *curr_index_name,
- const char *keep_name, const char *keep_msg,
- unsigned char *sha1)
+ const char *keep_msg, unsigned char *sha1)
 {
const char *report = "pack";
struct strbuf pack_name = STRBUF_INIT;
struct strbuf index_name = STRBUF_INIT;
-   struct strbuf keep_name_buf = STRBUF_INIT;
int err;
 
if (!from_stdin) {
@@ -1409,28 +1452,9 @@ static void final(const char *final_pack_name, const 
char *curr_pack_name,
die_errno(_("error while closing pack file"));
}
 
-   if (keep_msg) {
-   int keep_fd, keep_msg_len = strlen(keep_msg);
-
-   if (!keep_name)
-   keep_name = odb_pack_name(&keep_name_buf, sha1, "keep");
-
-   keep_fd = odb_pack_keep(keep_name);
-   if (keep_fd < 0) {
-   if (errno != EEXIST)
-   die_errno(_("cannot write keep file '%s'"),
- keep_name);
-   } else {
-   if (keep_msg_len > 0) {
-   write_or_die(keep_fd, keep_msg, keep_msg_len);
-   write_or_die(keep_fd, "\n", 1);
-   }
-   if (close(keep_fd) != 0)
-   die_errno(_("cannot close written keep file 
'%s'"),
- keep_name);
-   report = "keep";
-   }
-   }
+   if (keep_msg)
+   write_special_file("keep", keep_msg, final_pack_name, sha1,
+  &report);
 
if (final_pack_name != curr_pack_name) {
if (!final_pack_name)
@@ -1472,7 +1496,6 @@ static void final(const char *final_pack_name, const char 
*curr_pack_name,
 
strbuf_release(&index_name);
strbuf_release(&pack_name);
-   strbuf_release(&keep_name_buf);
 }
 
 static int git_index_pack_config(const char *k, const char *v, void *cb)
@@ -1615,26 +1638,13 @@ static void show_pack_info(int stat_only)
}
 }
 
-static const char *derive_filename(const char *pack_name, const char *suffix,
-  struct strbuf *buf)
-{
-   size_t len;
-   if (!strip_suffix(pack_name, ".pack", &len))
-   die(_("packfile name '%s' does not end with '.pack'"),
-   pack_name);
-   strbuf_add(buf, pack_name, len);
-   strbuf_addstr(buf, suffix);
-   return 

[PATCH v7 08/10] sha1_file: support lazily fetching missing objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c   |  2 ++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 32 ++
 t/t0410-partial-clone.sh | 51 
 8 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..cf9ea5c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, &sa, 0);
for_each_packed_object(batch_packed_object, &sa, 0);
+   if (repository_format_partial_clone)
+   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
 
cb.opt = opt;
cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe72..15eeed7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index c76f2e9..6980072 100644
--- a/cache.h
+++ b/cache.h
@@ -1727,6 +1727,14 @@ struct object_info {
 #define OBJECT_INFO

[PATCH v7 10/10] gc: do not repack promisor packfiles

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/git-pack-objects.txt | 11 
 builtin/gc.c   |  3 +++
 builtin/pack-objects.c | 37 --
 builtin/prune.c|  7 +
 builtin/repack.c   |  8 --
 t/t0410-partial-clone.sh   | 54 --
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index aa403d0..81bc490 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default 
action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+   Omit objects that are known to be in the promisor remote.  (This
+   option has the purpose of operating only on locally created objects,
+   so that when we repack, we still maintain a distinction between
+   locally created objects [without .promisor] and objects from the
+   promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..77fa720 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(&prune, prune_expire);
if (quiet)
argv_array_push(&prune, "--no-progress");
+   if (repository_format_partial_clone)
+   argv_array_push(&prune,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45ad35d..f5fc401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-   MA_ERROR = 0,/* fail if any missing objects are encountered */
-   MA_ALLOW_ANY,/* silently allow ALL missing objects */
+   MA_ERROR = 0,  /* fail if any missing objects are encountered */
+   MA_ALLOW_ANY,  /* silently allow ALL missing objects */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object 
*obj, const char *name, void
show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char 
*name, void *data)
+{
+   assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+   /*
+* Quietly ignore EXPECTED missing objects.  This avoids problems with
+* staging them now and getting an odd error later.
+*/
+   if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid))
+   return;
+
+   show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
   const char *arg, int unset)
 {
@@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct 
option *opt,
 
if (!strcmp(arg, "allow-any")) {
arg_missing_action = MA_ALLOW_ANY;
+   fetch_if_missing = 0;
fn_show_object = show_object__ma_allow_any;
return 0;
}
 
+   if (!strcmp(arg, "allow-promisor")) {
+   arg_missing_action = MA_ALLOW_PROMISOR;
+   fetch_if_missing = 0;
+   fn_show_object = show_object__ma_allow_promisor;
+   return 0;
+   }
+
die(_("invalid value for --missing"));
return 0;
 }
@@ -3008,6 +3033,8 @@ int cmd_pack

[PATCH v7 02/10] fsck: introduce partialclone extension

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  2 +-
 cache.h  |  3 +-
 packfile.c   | 77 +++--
 packfile.h   | 13 
 t/t0410-partial-clone.sh | 81 
 5 files changed, 171 insertions(+), 5 deletions(-)
 create mode 100755 t/t0410-partial-clone.sh

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 56afe40..2934299 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!is_promisor_object(oid)) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 35e3f5e..c76f2e9 100644
--- a/cache.h
+++ b/cache.h
@@ -1587,7 +1587,8 @@ extern struct packed_git {
unsigned pack_local:1,
 pack_keep:1,
 freshened:1,
-do_not_close:1;
+do_not_close:1,
+pack_promisor:1;
unsigned char sha1[20];
struct revindex_entry *revindex;
/* something like ".git/objects/pack/x.pack" */
diff --git a/packfile.c b/packfile.c
index 4a5fe7a..234797c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -8,6 +8,11 @@
 #include "list.h"
 #include "streaming.h"
 #include "sha1-lookup.h"
+#include "commit.h"
+#include "object.h"
+#include "tag.h"
+#include "tree-walk.h"
+#include "tree.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -643,10 +648,10 @@ struct packed_git *add_packed_git(const char *path, 
size_t path_len, int local)
return NULL;
 
/*
-* ".pack" is long enough to hold any suffix we're adding (and
+* ".promisor" is long enough to hold any suffix we're adding (and
 * the use xsnprintf double-checks that)
 */
-   alloc = st_add3(path_len, strlen(".pack"), 1);
+   alloc = st_add3(path_len, strlen(".promisor"), 1);
p = alloc_packed_git(alloc);
memcpy(p->pack_name, path, path_len);
 
@@ -654,6 +659,10 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
if (!access(p->pack_name, F_OK))
p->pack_keep = 1;
 
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".promisor");
+   if (!access(p->pack_name, F_OK))
+   p->pack_promisor = 1;
+
xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
free(p);
@@ -781,7 +790,8 @@ static void prepare_packed_git_one(char *objdir, int local)
if (ends_with(de->d_name, ".idx") ||
ends_with(de->d_name, ".pack") ||
ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep"))
+   ends_with(de->d_name, ".keep") ||
+   ends_with(de->d_name, ".promisor"))
string_list_append(&garbage, path.buf);
else
report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
@@ -1889,6 +1899,9 @@ int for_each_packed_object(each_packed_object_fn cb, void 
*data, unsigned flags)
for (p = packed_git; p; p = p->next) {
if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
continue;
+   if ((flags & FOR_EACH_OBJECT_PROMISOR_ONLY) &&
+   !p->pack_promisor)
+   continue;
if (open_pack_index(p)) {
pack_errors = 1;
continue;
@@ -1899,3 +1912,61 @@ int for_each_packed_object(each_packed_object_fn cb, 
void *data, unsigned flags)
}
return r ? r : pack_errors;
 }
+
+static int add_promisor_object(const struct object_id *oid,
+  struct packed_git *pack,
+

[PATCH v7 05/10] fsck: support promisor objects as CLI argument

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   |  2 ++
 t/t0410-partial-clone.sh | 13 +
 2 files changed, 15 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 4c2a56d..578a7c8 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -750,6 +750,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
struct object *obj = lookup_object(oid.hash);
 
if (!obj || !(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object(&oid))
+   continue;
error("%s: object missing", oid_to_hex(&oid));
errors_found |= ERROR_OBJECT;
continue;
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4f9931f..e96f436 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -125,4 +125,17 @@ test_expect_success 'missing object, but promised, passes 
fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing CLI object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo my_commit &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   promise_and_delete "$A" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck "$A"
+'
+
 test_done
-- 
2.9.3



[PATCH v7 04/10] fsck: support referenced promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan 
---
 builtin/fsck.c   | 11 +++
 t/t0410-partial-clone.sh | 23 +++
 2 files changed, 34 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ee937bb..4c2a56d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -149,6 +149,15 @@ static int mark_object(struct object *obj, int type, void 
*data, struct fsck_opt
if (obj->flags & REACHABLE)
return 0;
obj->flags |= REACHABLE;
+
+   if (is_promisor_object(&obj->oid))
+   /*
+* Further recursion does not need to be performed on this
+* object since it is a promisor object (so it does not need to
+* be added to "pending").
+*/
+   return 0;
+
if (!(obj->flags & HAS_OBJ)) {
if (parent && !has_object_file(&obj->oid)) {
printf("broken link from %7s %s\n",
@@ -208,6 +217,8 @@ static void check_reachable_object(struct object *obj)
 * do a full fsck
 */
if (!(obj->flags & HAS_OBJ)) {
+   if (is_promisor_object(&obj->oid))
+   return;
if (has_sha1_pack(obj->oid.hash))
return; /* it is in pack - forget about it */
printf("missing %s %s\n", printable_type(obj),
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index bf75162..4f9931f 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -102,4 +102,27 @@ test_expect_success 'missing ref object, but promised, 
passes fsck' '
git -C repo fsck
 '
 
+test_expect_success 'missing object, but promised, passes fsck' '
+   rm -rf repo &&
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+   test_commit -C repo 2 &&
+   test_commit -C repo 3 &&
+   git -C repo tag -a annotated_tag -m "annotated tag" &&
+
+   C=$(git -C repo rev-parse 1) &&
+   T=$(git -C repo rev-parse 2^{tree}) &&
+   B=$(git hash-object repo/3.t) &&
+   AT=$(git -C repo rev-parse annotated_tag) &&
+
+   promise_and_delete "$C" &&
+   promise_and_delete "$T" &&
+   promise_and_delete "$B" &&
+   promise_and_delete "$AT" &&
+
+   git -C repo config core.repositoryformatversion 1 &&
+   git -C repo config extensions.partialclone "arbitrary string" &&
+   git -C repo fsck
+'
+
 test_done
-- 
2.9.3



Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Christian Couder
On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano  wrote:


> * cc/skip-to-optional-val (2017-12-07) 7 commits
>  - t4045: test 'diff --relative' for real
>  - t4045: reindent to make helpers readable
>  - diff: use skip-to-optional-val in parsing --relative
>  - diff: use skip_to_optional_val_default()
>  - diff: use skip_to_optional_val()
>  - index-pack: use skip_to_optional_val()
>  - git-compat-util: introduce skip_to_optional_val()
>
>  Introduce a helper to simplify code to parse a common pattern that
>  expects either "--key" or "--key=".
>
>  Even though I queued fixes for "diff --relative" on top, it may
>  still want a final reroll to make it harder to misuse by allowing
>  NULL at the valp part of the argument.

Yeah, I already implemented that and it will be in the next v3 version.

> Also s/_val/_arg/.

I am not sure that is a good idea, because it could suggest that the
functions are designed to parse only command option arguments, while
they can be used to parse any "key=val" string where "key" is also
allowed.

>  cf. 
>  cf. 

It doesn't look like s/_val/_arg/ was discussed in the above messages.


Re: Unfortunate interaction between git diff-index and fakeroot

2017-12-08 Thread Junio C Hamano
Elazar Leibovich  writes:

> ignore unused information, such as commit
> 2cb45e95438c113871fbbea5b4f629f9463034e7
> which ignores st_dev, because it doesn't actually matter, or

I do not think it ignores because "it doesn't matter".  st_dev is
known not to be stable (e.g. across reboots and reconnects nfs), so
it is a poor indicator to use as a quick measure to see if the file
contents may have changed since we last checked.  On the other hand,
in a sane (it is of course open to debate "by whose definition")
environment, the fact that the owner of the file has changed _is_ a
significant enough sign to suspect that the file contents have
changed (i.e. the file is suspect to be dirty; you can actually
check and refresh, of course, though).

> commit e44794706eeb57f2ee38ed1604821aa38b8ad9d2 which ignores
> mode changes not relevant to the owner.

And that one is not even about the cached stat information used for
the quick "dirty-ness" check.  The change is about the mode bits in
recorded history.

File's mtime is not recorded in the history, either, but we do care
and record it in the index, because it can be used as a good (albeit
a bit too coarse grained) indicator that the contents of a file may
have changed.  

The owner and group fall into the same category.



Re: What's cooking in git.git (Dec 2017, #02; Thu, 7)

2017-12-08 Thread Junio C Hamano
Christian Couder  writes:

> On Thu, Dec 7, 2017 at 7:04 PM, Junio C Hamano  wrote:
>
>
>> * cc/skip-to-optional-val (2017-12-07) 7 commits
>>  - t4045: test 'diff --relative' for real
>>  - t4045: reindent to make helpers readable
>>  - diff: use skip-to-optional-val in parsing --relative
>>  - diff: use skip_to_optional_val_default()
>>  - diff: use skip_to_optional_val()
>>  - index-pack: use skip_to_optional_val()
>>  - git-compat-util: introduce skip_to_optional_val()
>>
>>  Introduce a helper to simplify code to parse a common pattern that
>>  expects either "--key" or "--key=".
>>
>>  Even though I queued fixes for "diff --relative" on top, it may
>>  still want a final reroll to make it harder to misuse by allowing
>>  NULL at the valp part of the argument.
>
> Yeah, I already implemented that and it will be in the next v3 version.

Good.  I am hoping that you've followed the discussion on the tests,
where all of us agreed that the approach taken by Jacob's one is
preferrable over what is queued above?

>> Also s/_val/_arg/.
>
> I am not sure that is a good idea, because it could suggest that the
> functions are designed to parse only command option arguments, while
> they can be used to parse any "key=val" string where "key" is also
> allowed.
>
>>  cf. 
>>  cf. 
>
> It doesn't look like s/_val/_arg/ was discussed in the above messages.

It came from your statement that was made before the thread, where
you said you'll rename it to use arg after I said I suspect that arg
would make more sense than val.

https://public-inbox.org/git/CAP8UFD2OSsqzhyAL-QG1TOowB-xgbf=kc9whre+flc+0j1x...@mail.gmail.com/


Thanks.


[PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).

This version adds additional tests, fixes test errors on the MAC version,
and squashes some fixup commits.

It also restores functionality accidentally dropped from the V6 series
for "git fetch" to automatically inherit the partial-clone filter-spec
when appropriate.  This version extends the --no-filter argument to
override this inheritance.

Jeff Hostetler (8):
  upload-pack: add object filtering for partial clone
  fetch-pack, index-pack, transport: partial clone
  fetch-pack: add --no-filter
  fetch: support filters
  partial-clone: define partial clone settings in config
  t5616: end-to-end tests for partial clone
  fetch: inherit filter-spec from partial clone
  t5616: test bulk prefetch after partial fetch

Jonathan Tan (8):
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles
  fetch-pack: test support excluding large blobs
  fetch: refactor calculation of remote list
  clone: partial clone
  unpack-trees: batch fetching of missing blobs
  fetch-pack: restore save_commit_buffer after use

 Documentation/config.txt  |   4 +
 Documentation/git-pack-objects.txt|  11 ++
 Documentation/rev-list-options.txt|  11 ++
 Documentation/technical/pack-protocol.txt |   8 +
 Documentation/technical/protocol-capabilities.txt |   8 +
 builtin/cat-file.c|   2 +
 builtin/clone.c   |  22 ++-
 builtin/fetch-pack.c  |  10 ++
 builtin/fetch.c   |  83 -
 builtin/fsck.c|   3 +
 builtin/gc.c  |   3 +
 builtin/index-pack.c  |   6 +
 builtin/pack-objects.c|  37 +++-
 builtin/prune.c   |   7 +
 builtin/repack.c  |   8 +-
 builtin/rev-list.c|  73 +++-
 cache.h   |   9 +
 config.c  |   5 +
 connected.c   |   2 +
 environment.c |   1 +
 fetch-object.c|  29 ++-
 fetch-object.h|   5 +
 fetch-pack.c  |  17 ++
 fetch-pack.h  |   2 +
 list-objects-filter-options.c |  92 --
 list-objects-filter-options.h |  18 ++
 list-objects.c|  29 ++-
 object.c  |   2 +-
 remote-curl.c |   6 +
 revision.c|  33 +++-
 revision.h|   5 +-
 sha1_file.c   |  32 +++-
 t/t0410-partial-clone.sh  | 206 +-
 t/t5500-fetch-pack.sh |  63 +++
 t/t5601-clone.sh  | 101 +++
 t/t5616-partial-clone.sh  | 146 +++
 transport-helper.c|   5 +
 transport.c   |   4 +
 transport.h   |   5 +
 unpack-trees.c|  22 +++
 upload-pack.c |  31 +++-
 41 files changed, 1110 insertions(+), 56 deletions(-)
 create mode 100755 t/t5616-partial-clone.sh

-- 
2.9.3



[PATCH v7 03/16] gc: do not repack promisor packfiles

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/git-pack-objects.txt | 11 
 builtin/gc.c   |  3 +++
 builtin/pack-objects.c | 37 --
 builtin/prune.c|  7 +
 builtin/repack.c   |  8 --
 t/t0410-partial-clone.sh   | 54 --
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index aa403d0..81bc490 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -255,6 +255,17 @@ a missing object is encountered.  This is the default 
action.
 The form '--missing=allow-any' will allow object traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
++
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing object will raise an error.
+
+--exclude-promisor-objects::
+   Omit objects that are known to be in the promisor remote.  (This
+   option has the purpose of operating only on locally created objects,
+   so that when we repack, we still maintain a distinction between
+   locally created objects [without .promisor] and objects from the
+   promisor remote [with .promisor].)  This is used with partial clone.
 
 SEE ALSO
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3c5eae0..77fa720 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -458,6 +458,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_push(&prune, prune_expire);
if (quiet)
argv_array_push(&prune, "--no-progress");
+   if (repository_format_partial_clone)
+   argv_array_push(&prune,
+   "--exclude-promisor-objects");
if (run_command_v_opt(prune.argv, RUN_GIT_CMD))
return error(FAILED_RUN, prune.argv[0]);
}
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 45ad35d..f5fc401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -75,6 +75,8 @@ static int use_bitmap_index = -1;
 static int write_bitmap_index;
 static uint16_t write_bitmap_options;
 
+static int exclude_promisor_objects;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -84,8 +86,9 @@ static unsigned long window_memory_limit = 0;
 static struct list_objects_filter_options filter_options;
 
 enum missing_action {
-   MA_ERROR = 0,/* fail if any missing objects are encountered */
-   MA_ALLOW_ANY,/* silently allow ALL missing objects */
+   MA_ERROR = 0,  /* fail if any missing objects are encountered */
+   MA_ALLOW_ANY,  /* silently allow ALL missing objects */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 static show_object_fn fn_show_object;
@@ -2577,6 +2580,20 @@ static void show_object__ma_allow_any(struct object 
*obj, const char *name, void
show_object(obj, name, data);
 }
 
+static void show_object__ma_allow_promisor(struct object *obj, const char 
*name, void *data)
+{
+   assert(arg_missing_action == MA_ALLOW_PROMISOR);
+
+   /*
+* Quietly ignore EXPECTED missing objects.  This avoids problems with
+* staging them now and getting an odd error later.
+*/
+   if (!has_object_file(&obj->oid) && is_promisor_object(&obj->oid))
+   return;
+
+   show_object(obj, name, data);
+}
+
 static int option_parse_missing_action(const struct option *opt,
   const char *arg, int unset)
 {
@@ -2591,10 +2608,18 @@ static int option_parse_missing_action(const struct 
option *opt,
 
if (!strcmp(arg, "allow-any")) {
arg_missing_action = MA_ALLOW_ANY;
+   fetch_if_missing = 0;
fn_show_object = show_object__ma_allow_any;
return 0;
}
 
+   if (!strcmp(arg, "allow-promisor")) {
+   arg_missing_action = MA_ALLOW_PROMISOR;
+   fetch_if_missing = 0;
+   fn_show_object = show_object__ma_allow_promisor;
+   return 0;
+   }
+
die(_("invalid value for --missing"));
return 0;
 }
@@ -3008,6 +3033,8 @@ int cmd_pack

[PATCH v7 09/16] fetch: support filters

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach fetch to support filters. This is only allowed for the remote
configured in extensions.partialcloneremote.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c   | 23 +--
 connected.c   |  2 ++
 remote-curl.c |  6 ++
 t/t5500-fetch-pack.sh | 36 
 4 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 1b1f039..14aab71 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -18,6 +18,7 @@
 #include "argv-array.h"
 #include "utf8.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
@@ -55,6 +56,7 @@ static int recurse_submodules_default = 
RECURSE_SUBMODULES_ON_DEMAND;
 static int shown_url = 0;
 static int refmap_alloc, refmap_nr;
 static const char **refmap_array;
+static struct list_objects_filter_options filter_options;
 
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
@@ -160,6 +162,7 @@ static struct option builtin_fetch_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END()
 };
 
@@ -1044,6 +1047,11 @@ static struct transport *prepare_transport(struct remote 
*remote, int deepen)
set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, "yes");
if (update_shallow)
set_option(transport, TRANS_OPT_UPDATE_SHALLOW, "yes");
+   if (filter_options.choice) {
+   set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+  filter_options.filter_spec);
+   set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
return transport;
 }
 
@@ -1328,6 +1336,8 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 
packet_trace_identity("fetch");
 
+   fetch_if_missing = 0;
+
/* Record the command line for the reflog */
strbuf_addstr(&default_rla, "fetch");
for (i = 1; i < argc; i++)
@@ -1361,6 +1371,9 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;
 
+   if (filter_options.choice && !repository_format_partial_clone)
+   die("--filter can only be used when extensions.partialClone is 
set");
+
if (all) {
if (argc == 1)
die(_("fetch --all does not take a repository 
argument"));
@@ -1390,10 +1403,16 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
}
 
-   if (remote)
+   if (remote) {
+   if (filter_options.choice &&
+   strcmp(remote->name, repository_format_partial_clone))
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
result = fetch_one(remote, argc, argv);
-   else
+   } else {
+   if (filter_options.choice)
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
result = fetch_multiple(&list);
+   }
 
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
diff --git a/connected.c b/connected.c
index f416b05..3a5bd67 100644
--- a/connected.c
+++ b/connected.c
@@ -56,6 +56,8 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
argv_array_push(&rev_list.args,"rev-list");
argv_array_push(&rev_list.args, "--objects");
argv_array_push(&rev_list.args, "--stdin");
+   if (repository_format_partial_clone)
+   argv_array_push(&rev_list.args, "--exclude-promisor-objects");
argv_array_push(&rev_list.args, "--not");
argv_array_push(&rev_list.args, "--all");
argv_array_push(&rev_list.args, "--quiet");
diff --git a/remote-curl.c b/remote-curl.c
index 4318391..6ec5352 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -24,6 +24,7 @@ struct options {
char *deepen_since;
struct string_list deepen_not;
struct string_list push_options;
+   char *filter;
unsigned progress : 1,
check_self_contained_and_connected : 1,
cloning : 1,
@@ -165,6 +166,9 @@ static int set_option(const char *name, const char *value)
} else if (!strcmp(name, "no-dependents")) {
options.no_dependents = 1;
return 0;
+   } else if (!strcmp(name, "filter")) {
+   options.filter = xstrdup(value);;
+   return 0;
} else {
return 1 /* unsupported */;
}
@@ -834,6 +838,8 @@ static int fetch_git(struct discovery *heads,
argv_

[PATCH v7 13/16] fetch-pack: restore save_commit_buffer after use

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 fetch-pack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 3c5f064..773453c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -717,6 +717,7 @@ static int everything_local(struct fetch_pack_args *args,
 {
struct ref *ref;
int retval;
+   int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
 
save_commit_buffer = 0;
@@ -786,6 +787,9 @@ static int everything_local(struct fetch_pack_args *args,
print_verbose(args, _("already have %s (%s)"), 
oid_to_hex(remote),
  ref->name);
}
+
+   save_commit_buffer = old_save_commit_buffer;
+
return retval;
 }
 
-- 
2.9.3



[PATCH v7 12/16] unpack-trees: batch fetching of missing blobs

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

When running checkout, first prefetch all blobs that are to be updated
but are missing. This means that only one pack is downloaded during such
operations, instead of one per missing blob.

This operates only on the blob level - if a repository has a missing
tree, they are still fetched one at a time.

This does not use the delayed checkout mechanism introduced in commit
2841e8f ("convert: add "status=delayed" to filter process protocol",
2017-06-30) due to significant conceptual differences - in particular,
for partial clones, we already know what needs to be fetched based on
the contents of the local repo alone, whereas for status=delayed, it is
the filter process that tells us what needs to be checked in the end.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 fetch-object.c   | 26 ++
 fetch-object.h   |  5 +
 t/t5601-clone.sh | 52 
 unpack-trees.c   | 22 ++
 4 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 258fcfa..853624f 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -5,11 +5,10 @@
 #include "transport.h"
 #include "fetch-object.h"
 
-void fetch_object(const char *remote_name, const unsigned char *sha1)
+static void fetch_refs(const char *remote_name, struct ref *ref)
 {
struct remote *remote;
struct transport *transport;
-   struct ref *ref;
int original_fetch_if_missing = fetch_if_missing;
 
fetch_if_missing = 0;
@@ -18,10 +17,29 @@ void fetch_object(const char *remote_name, const unsigned 
char *sha1)
die(_("Remote with no URL"));
transport = transport_get(remote, remote->url[0]);
 
-   ref = alloc_ref(sha1_to_hex(sha1));
-   hashcpy(ref->old_oid.hash, sha1);
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
transport_fetch_refs(transport, ref);
fetch_if_missing = original_fetch_if_missing;
 }
+
+void fetch_object(const char *remote_name, const unsigned char *sha1)
+{
+   struct ref *ref = alloc_ref(sha1_to_hex(sha1));
+   hashcpy(ref->old_oid.hash, sha1);
+   fetch_refs(remote_name, ref);
+}
+
+void fetch_objects(const char *remote_name, const struct oid_array *to_fetch)
+{
+   struct ref *ref = NULL;
+   int i;
+
+   for (i = 0; i < to_fetch->nr; i++) {
+   struct ref *new_ref = alloc_ref(oid_to_hex(&to_fetch->oid[i]));
+   oidcpy(&new_ref->old_oid, &to_fetch->oid[i]);
+   new_ref->next = ref;
+   ref = new_ref;
+   }
+   fetch_refs(remote_name, ref);
+}
diff --git a/fetch-object.h b/fetch-object.h
index f371300..4b269d0 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -1,6 +1,11 @@
 #ifndef FETCH_OBJECT_H
 #define FETCH_OBJECT_H
 
+#include "sha1-array.h"
+
 extern void fetch_object(const char *remote_name, const unsigned char *sha1);
 
+extern void fetch_objects(const char *remote_name,
+ const struct oid_array *to_fetch);
+
 #endif
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 6d37c6d..13610b7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,6 +611,58 @@ test_expect_success 'partial clone: warn if server does 
not support object filte
test_i18ngrep "filtering not recognized by server" err
 '
 
+test_expect_success 'batch missing blob request during checkout' '
+   rm -rf server client &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+
+   git -C server commit -m x &&
+   echo aa >server/a &&
+   echo bb >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   test_config -C server uploadpack.allowfilter 1 &&
+   test_config -C server uploadpack.allowanysha1inwant 1 &&
+
+   git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+   # Ensure that there is only one negotiation by checking that there is
+   # only "done" line sent. ("done" marks the end of negotiation.)
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C client checkout HEAD^ &&
+   grep "git> done" trace >done_lines &&
+   test_line_count = 1 done_lines
+'
+
+test_expect_success 'batch missing blob request does not inadvertently try to 
fetch gitlinks' '
+   rm -rf server client &&
+
+   test_create_repo repo_for_submodule &&
+   test_commit -C repo_for_submodule x &&
+
+   test_create_repo server &&
+   echo a >server/a &&
+   echo b >server/b &&
+   git -C server add a b &&
+   git -C server commit -m x &&
+
+   echo aa >server/a &&
+   echo bb >server/b &&
+   # Also add a gitlink pointing to an arbitrary repository
+   git -C server submodule add "$(pwd)/repo_for_submodule" c &&
+   git -C server add a b c &&
+  

[PATCH v7 11/16] clone: partial clone

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 builtin/clone.c  | 22 --
 t/t5601-clone.sh | 49 +
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index dbddd98..f519bd4 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -26,6 +26,7 @@
 #include "run-command.h"
 #include "connected.h"
 #include "packfile.h"
+#include "list-objects-filter-options.h"
 
 /*
  * Overall FIXMEs:
@@ -60,6 +61,7 @@ static struct string_list option_optional_reference = 
STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
+static struct list_objects_filter_options filter_options;
 
 static int recurse_submodules_cb(const struct option *opt,
 const char *arg, int unset)
@@ -135,6 +137,7 @@ static struct option builtin_clone_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END()
 };
 
@@ -886,6 +889,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
@@ -1073,6 +1078,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
warning(_("--shallow-since is ignored in local clones; 
use file:// instead."));
if (option_not.nr)
warning(_("--shallow-exclude is ignored in local 
clones; use file:// instead."));
+   if (filter_options.choice)
+   warning(_("--filter is ignored in local clones; use 
file:// instead."));
if (!access(mkpath("%s/shallow", path), F_OK)) {
if (option_local > 0)
warning(_("source repository is shallow, 
ignoring --local"));
@@ -1104,7 +,13 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_set_option(transport, TRANS_OPT_UPLOADPACK,
 option_upload_pack);
 
-   if (transport->smart_options && !deepen)
+   if (filter_options.choice) {
+   transport_set_option(transport, TRANS_OPT_LIST_OBJECTS_FILTER,
+filter_options.filter_spec);
+   transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   }
+
+   if (transport->smart_options && !deepen && !filter_options.choice)
transport->smart_options->check_self_contained_and_connected = 
1;
 
refs = transport_get_remote_refs(transport);
@@ -1164,13 +1177,17 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
write_refspec_config(src_ref_prefix, our_head_points_at,
remote_head_points_at, &branch_top);
 
+   if (filter_options.choice)
+   partial_clone_register("origin", &filter_options);
+
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport, 
!is_local);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local && !filter_options.choice);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
@@ -1191,6 +1208,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
junk_mode = JUNK_LEAVE_REPO;
+   fetch_if_missing = 1;
err = checkout(submodule_progress);
 
strbuf_release(&reflog_msg);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 9c56f77..6d37c6d 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -571,4 +571,53 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable 
pack' '
git -C replay.git index-pack -v --stdin  err &&
+
+   test_i18ngrep "filtering not recognized by server" err
+'
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'partial clone using HTTP' '
+   partial_clone "$HTTPD_DOCUMENT_ROOT_PATH/server" 
"$HTTPD_URL/smart/server"
+'
+
+stop_httpd
+
 test_done
-- 
2.9.3



[PATCH v7 15/16] fetch: inherit filter-spec from partial clone

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach (partial) fetch to inherit the filter-spec used by
the partial clone.  Extend --no-filter to override this
inheritance.

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c  |  2 +-
 builtin/fetch.c   | 56 ---
 builtin/rev-list.c|  2 +-
 list-objects-filter-options.c |  2 +-
 list-objects-filter-options.h | 12 ++
 t/t5616-partial-clone.sh  | 22 -
 6 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index cbf5035..a7bc136 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -158,7 +158,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
continue;
}
if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-   list_objects_filter_release(&args.filter_options);
+   list_objects_filter_set_no_filter(&args.filter_options);
continue;
}
usage(fetch_pack_usage);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 14aab71..79c866c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1275,6 +1275,56 @@ static int fetch_multiple(struct string_list *list)
return result;
 }
 
+/*
+ * Fetching from the promisor remote should use the given filter-spec
+ * or inherit the default filter-spec from the config.
+ */
+static inline void fetch_one_setup_partial(struct remote *remote)
+{
+   /*
+* Explicit --no-filter argument overrides everything, regardless
+* of any prior partial clones and fetches.
+*/
+   if (filter_options.no_filter)
+   return;
+
+   /*
+* If no prior partial clone/fetch and the current fetch DID NOT
+* request a partial-fetch, do a normal fetch.
+*/
+   if (!repository_format_partial_clone && !filter_options.choice)
+   return;
+
+   /*
+* If this is the FIRST partial-fetch request, we enable partial
+* on this repo and remember the given filter-spec as the default
+* for subsequent fetches to this remote.
+*/
+   if (!repository_format_partial_clone && filter_options.choice) {
+   partial_clone_register(remote->name, &filter_options);
+   return;
+   }
+
+   /*
+* We are currently limited to only ONE promisor remote and only
+* allow partial-fetches from the promisor remote.
+*/
+   if (strcmp(remote->name, repository_format_partial_clone)) {
+   if (filter_options.choice)
+   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   return;
+   }
+
+   /*
+* Do a partial-fetch from the promisor remote using either the
+* explicitly given filter-spec or inherit the filter-spec from
+* the config.
+*/
+   if (!filter_options.choice)
+   partial_clone_get_default_filter_spec(&filter_options);
+   return;
+}
+
 static int fetch_one(struct remote *remote, int argc, const char **argv)
 {
static const char **refs = NULL;
@@ -1404,13 +1454,13 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
 
if (remote) {
-   if (filter_options.choice &&
-   strcmp(remote->name, repository_format_partial_clone))
-   die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   if (filter_options.choice || repository_format_partial_clone)
+   fetch_one_setup_partial(remote);
result = fetch_one(remote, argc, argv);
} else {
if (filter_options.choice)
die(_("--filter can only be used with the remote 
configured in core.partialClone"));
+   /* TODO should this also die if we have a previous 
partial-clone? */
result = fetch_multiple(&list);
}
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 48f922d..8503dea 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -460,7 +460,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
continue;
}
if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
-   list_objects_filter_release(&filter_options);
+   list_objects_filter_set_no_filter(&filter_options);
continue;
}
if (!strcmp(arg, "--filter-print-omitted")) {
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 5c47e2b..6a3cc98 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -94,7 +94,7 @@ int opt_parse_list_objects_filter(const struct option *opt,
struct list_o

[PATCH v7 16/16] t5616: test bulk prefetch after partial fetch

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Add test to t5616 to bulk fetch missing objects following
a partial fetch.  A technique like this could be used in
a pre-command hook for example.

Signed-off-by: Jeff Hostetler 
---
 t/t5616-partial-clone.sh | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 433e07e..29d8631 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -105,7 +105,7 @@ test_expect_success 'push new commits to server for 
file.2.txt' '
git -C src push -u srv master
 '
 
-# Do FULL fetch by disabling filter-spec using --no-filter.
+# Do FULL fetch by disabling inherited filter-spec using --no-filter.
 # Verify we have all the new blobs.
 test_expect_success 'override inherited filter-spec using --no-filter' '
git -C pc1 fetch --no-filter origin &&
@@ -113,4 +113,34 @@ test_expect_success 'override inherited filter-spec using 
--no-filter' '
test_line_count = 0 observed
 '
 
+# create new commits in "src" repo to establish a history on file.3.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server for file.3.txt' '
+   for x in a b c d e f
+   do
+   echo "Mod file.3.txt $x" >>src/file.3.txt
+   git -C src add file.3.txt
+   git -C src commit -m "mod $x"
+   done &&
+   git -C src push -u srv master
+'
+
+# Do a partial fetch and then try to manually fetch the missing objects.
+# This can be used as the basis of a pre-command hook to bulk fetch objects
+# perhaps combined with a command in dry-run mode.
+test_expect_success 'manual prefetch of missing objects' '
+   git -C pc1 fetch --filter=blob:none origin &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print \
+   | awk -f print_1.awk \
+   | sed "s/?//" \
+   | sort >observed.oids &&
+   test_line_count = 6 observed.oids &&
+   git -C pc1 fetch-pack --stdin "file://$(pwd)/srv.bare" observed.oids &&
+   test_line_count = 0 observed.oids
+'
+
 test_done
-- 
2.9.3



[PATCH v7 10/16] partial-clone: define partial clone settings in config

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler 
---
 cache.h   |  1 +
 config.c  |  5 +++
 environment.c |  1 +
 list-objects-filter-options.c | 90 +++
 list-objects-filter-options.h |  6 +++
 5 files changed, 88 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 6980072..bccc510 100644
--- a/cache.h
+++ b/cache.h
@@ -861,6 +861,7 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
 extern char *repository_format_partial_clone;
+extern const char *core_partial_clone_filter_default;
 
 struct repository_format {
int version;
diff --git a/config.c b/config.c
index adb7d7a..adeee04 100644
--- a/config.c
+++ b/config.c
@@ -1241,6 +1241,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.partialclonefilter")) {
+   return git_config_string(&core_partial_clone_filter_default,
+var, value);
+   }
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
 }
diff --git a/environment.c b/environment.c
index e52aab3..7537565 100644
--- a/environment.c
+++ b/environment.c
@@ -28,6 +28,7 @@ int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
 char *repository_format_partial_clone;
+const char *core_partial_clone_filter_default;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index 4c5b34e..5c47e2b 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -21,29 +21,36 @@
  * subordinate commands when necessary.  We also "intern" the arg for
  * the convenience of the current command.
  */
-int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
- const char *arg)
+static int gently_parse_list_objects_filter(
+   struct list_objects_filter_options *filter_options,
+   const char *arg,
+   struct strbuf *errbuf)
 {
const char *v0;
 
-   if (filter_options->choice)
-   die(_("multiple object filter types cannot be combined"));
+   if (filter_options->choice) {
+   if (errbuf) {
+   strbuf_init(errbuf, 0);
+   strbuf_addstr(
+   errbuf,
+   _("multiple filter-specs cannot be combined"));
+   }
+   return 1;
+   }
 
filter_options->filter_spec = strdup(arg);
 
if (!strcmp(arg, "blob:none")) {
filter_options->choice = LOFC_BLOB_NONE;
return 0;
-   }
 
-   if (skip_prefix(arg, "blob:limit=", &v0)) {
-   if (!git_parse_ulong(v0, &filter_options->blob_limit_value))
-   die(_("invalid filter-spec expression '%s'"), arg);
-   filter_options->choice = LOFC_BLOB_LIMIT;
-   return 0;
-   }
+   } else if (skip_prefix(arg, "blob:limit=", &v0)) {
+   if (git_parse_ulong(v0, &filter_options->blob_limit_value)) {
+   filter_options->choice = LOFC_BLOB_LIMIT;
+   return 0;
+   }
 
-   if (skip_prefix(arg, "sparse:oid=", &v0)) {
+   } else if (skip_prefix(arg, "sparse:oid=", &v0)) {
struct object_context oc;
struct object_id sparse_oid;
 
@@ -57,15 +64,27 @@ int parse_list_objects_filter(struct 
list_objects_filter_options *filter_options
filter_options->sparse_oid_value = oiddup(&sparse_oid);
filter_options->choice = LOFC_SPARSE_OID;
return 0;
-   }
 
-   if (skip_prefix(arg, "sparse:path=", &v0)) {
+   } else if (skip_prefix(arg, "sparse:path=", &v0)) {
filter_options->choice = LOFC_SPARSE_PATH;
filter_options->sparse_path_value = strdup(v0);
return 0;
}
 
-   die(_("invalid filter-spec expression '%s'"), arg);
+   if (errbuf) {
+   strbuf_init(errbuf, 0);
+   strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
+   }
+   memset(filter_options, 0, sizeof(*filter_options));
+   return 1;
+}
+
+int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
+ const char *arg)
+{
+   struct strbuf buf = STRBUF_INIT;
+   if (gently_parse_list_objects_filter(filter_options, arg, &buf))
+   di

[PATCH v7 14/16] t5616: end-to-end tests for partial clone

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Additional end-to-end tests for partial clone.

Signed-off-by: Jeff Hostetler 
---
 t/t5616-partial-clone.sh | 96 
 1 file changed, 96 insertions(+)
 create mode 100755 t/t5616-partial-clone.sh

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
new file mode 100755
index 000..3b8cc0b
--- /dev/null
+++ b/t/t5616-partial-clone.sh
@@ -0,0 +1,96 @@
+#!/bin/sh
+
+test_description='git partial clone'
+
+. ./test-lib.sh
+
+# create a normal "src" repo where we can later create new commits.
+# expect_1.oids will contain a list of the OIDs of all blobs.
+test_expect_success 'setup normal src repo' '
+   echo "{print \$1}" >print_1.awk &&
+   echo "{print \$2}" >print_2.awk &&
+
+   git init src &&
+   for n in 1 2 3 4
+   do
+   echo "This is file: $n" > src/file.$n.txt
+   git -C src add file.$n.txt
+   git -C src commit -m "file $n"
+   git -C src ls-files -s file.$n.txt >>temp
+   done &&
+   awk -f print_2.awk expect_1.oids &&
+   test_line_count = 4 expect_1.oids
+'
+
+# bare clone "src" giving "srv.bare" for use as our server.
+test_expect_success 'setup bare clone for server' '
+   git clone --bare "file://$(pwd)/src" srv.bare &&
+   git -C srv.bare config --local uploadpack.allowfilter 1 &&
+   git -C srv.bare config --local uploadpack.allowanysha1inwant 1
+'
+
+# do basic partial clone from "srv.bare"
+# confirm we are missing all of the known blobs.
+# confirm partial clone was registered in the local config.
+test_expect_success 'do partial clone 1' '
+   git clone --no-checkout --filter=blob:none "file://$(pwd)/srv.bare" pc1 
&&
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print \
+   | awk -f print_1.awk \
+   | sed "s/?//" \
+   | sort >observed.oids &&
+   test_cmp expect_1.oids observed.oids &&
+   test "$(git -C pc1 config --local core.repositoryformatversion)" = "1" 
&&
+   test "$(git -C pc1 config --local extensions.partialclone)" = "origin" 
&&
+   test "$(git -C pc1 config --local core.partialclonefilter)" = 
"blob:none"
+'
+
+# checkout master to force dynamic object fetch of blobs at HEAD.
+test_expect_success 'verify checkout with dynamic object fetch' '
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   test_line_count = 4 observed &&
+   git -C pc1 checkout master &&
+   git -C pc1 rev-list HEAD --quiet --objects --missing=print >observed &&
+   test_line_count = 0 observed
+'
+
+# create new commits in "src" repo to establish a blame history on file.1.txt
+# and push to "srv.bare".
+test_expect_success 'push new commits to server' '
+   git -C src remote add srv "file://$(pwd)/srv.bare" &&
+   for x in a b c d e
+   do
+   echo "Mod $x" >>src/file.1.txt
+   git -C src add file.1.txt
+   git -C src commit -m "mod $x"
+   done &&
+   git -C src blame master -- file.1.txt >expect.blame &&
+   git -C src push -u srv master
+'
+
+# (partial) fetch in the partial clone repo from the promisor remote.
+# verify that fetch inherited the filter-spec from the config and DOES NOT
+# have the new blobs.
+test_expect_success 'partial fetch inherits filter settings' '
+   git -C pc1 fetch origin &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 5 observed
+'
+
+# force dynamic object fetch using diff.
+# we should only get 1 new blob (for the file in origin/master).
+test_expect_success 'verify diff causes dynamic object fetch' '
+   git -C pc1 diff master..origin/master -- file.1.txt &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 4 observed
+'
+
+# force full dynamic object fetch of the file's history using blame.
+# we should get the intermediate blobs for the file.
+test_expect_success 'verify blame causes dynamic object fetch' '
+   git -C pc1 blame origin/master -- file.1.txt >observed.blame &&
+   test_cmp expect.blame observed.blame &&
+   git -C pc1 rev-list master..origin/master --quiet --objects 
--missing=print >observed &&
+   test_line_count = 0 observed
+'
+
+test_done
-- 
2.9.3



[PATCH v7 06/16] fetch-pack: add --no-filter

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Fixup fetch-pack to accept --no-filter to be consistent with
rev-list and pack-objects.

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 7957807..cbf5035 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -157,6 +157,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
parse_list_objects_filter(&args.filter_options, arg);
continue;
}
+   if (!strcmp(arg, ("--no-" CL_ARG__FILTER))) {
+   list_objects_filter_release(&args.filter_options);
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
-- 
2.9.3



[PATCH v7 08/16] fetch: refactor calculation of remote list

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Separate out the calculation of remotes to be fetched from and the
actual fetching. This will allow us to include an additional step before
the actual fetching in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/fetch.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 225c734..1b1f039 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1322,7 +1322,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 {
int i;
struct string_list list = STRING_LIST_INIT_DUP;
-   struct remote *remote;
+   struct remote *remote = NULL;
int result = 0;
struct argv_array argv_gc_auto = ARGV_ARRAY_INIT;
 
@@ -1367,17 +1367,14 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
else if (argc > 1)
die(_("fetch --all does not make sense with refspecs"));
(void) for_each_remote(get_one_remote_for_fetch, &list);
-   result = fetch_multiple(&list);
} else if (argc == 0) {
/* No arguments -- use default remote */
remote = remote_get(NULL);
-   result = fetch_one(remote, argc, argv);
} else if (multiple) {
/* All arguments are assumed to be remotes or groups */
for (i = 0; i < argc; i++)
if (!add_remote_or_group(argv[i], &list))
die(_("No such remote or remote group: %s"), 
argv[i]);
-   result = fetch_multiple(&list);
} else {
/* Single remote or group */
(void) add_remote_or_group(argv[0], &list);
@@ -1385,14 +1382,19 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
/* More than one remote */
if (argc > 1)
die(_("Fetching a group and specifying refspecs 
does not make sense"));
-   result = fetch_multiple(&list);
} else {
/* Zero or one remotes */
remote = remote_get(argv[0]);
-   result = fetch_one(remote, argc-1, argv+1);
+   argc--;
+   argv++;
}
}
 
+   if (remote)
+   result = fetch_one(remote, argc, argv);
+   else
+   result = fetch_multiple(&list);
+
if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
struct argv_array options = ARGV_ARRAY_INIT;
 
-- 
2.9.3



[PATCH v7 05/16] fetch-pack, index-pack, transport: partial clone

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 builtin/fetch-pack.c |  4 
 fetch-pack.c | 13 +
 fetch-pack.h |  2 ++
 transport-helper.c   |  5 +
 transport.c  |  4 
 transport.h  |  5 +
 6 files changed, 33 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 15eeed7..7957807 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -153,6 +153,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
args.no_dependents = 1;
continue;
}
+   if (skip_prefix(arg, ("--" CL_ARG__FILTER "="), &arg)) {
+   parse_list_objects_filter(&args.filter_options, arg);
+   continue;
+   }
usage(fetch_pack_usage);
}
if (deepen_not.nr)
diff --git a/fetch-pack.c b/fetch-pack.c
index 0798e0b..3c5f064 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -29,6 +29,7 @@ static int deepen_not_ok;
 static int fetch_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
 static int agent_supported;
+static int server_supports_filtering;
 static struct lock_file shallow_lock;
 static const char *alternate_shallow_file;
 
@@ -379,6 +380,8 @@ static int find_common(struct fetch_pack_args *args,
if (deepen_not_ok)  strbuf_addstr(&c, " 
deepen-not");
if (agent_supported)strbuf_addf(&c, " agent=%s",

git_user_agent_sanitized());
+   if (args->filter_options.choice)
+   strbuf_addstr(&c, " filter");
packet_buf_write(&req_buf, "want %s%s\n", remote_hex, 
c.buf);
strbuf_release(&c);
} else
@@ -407,6 +410,9 @@ static int find_common(struct fetch_pack_args *args,
packet_buf_write(&req_buf, "deepen-not %s", s->string);
}
}
+   if (server_supports_filtering && args->filter_options.choice)
+   packet_buf_write(&req_buf, "filter %s",
+args->filter_options.filter_spec);
packet_buf_flush(&req_buf);
state_len = req_buf.len;
 
@@ -969,6 +975,13 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
else
prefer_ofs_delta = 0;
 
+   if (server_supports("filter")) {
+   server_supports_filtering = 1;
+   print_verbose(args, _("Server supports filter"));
+   } else if (args->filter_options.choice) {
+   warning("filtering not recognized by server, ignoring");
+   }
+
if ((agent_feature = server_feature_value("agent", &agent_len))) {
agent_supported = 1;
if (agent_len)
diff --git a/fetch-pack.h b/fetch-pack.h
index aeac152..3e224a1 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -3,6 +3,7 @@
 
 #include "string-list.h"
 #include "run-command.h"
+#include "list-objects-filter-options.h"
 
 struct oid_array;
 
@@ -12,6 +13,7 @@ struct fetch_pack_args {
int depth;
const char *deepen_since;
const struct string_list *deepen_not;
+   struct list_objects_filter_options filter_options;
unsigned deepen_relative:1;
unsigned quiet:1;
unsigned keep_pack:1;
diff --git a/transport-helper.c b/transport-helper.c
index c948d52..0650ca0 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -671,6 +671,11 @@ static int fetch(struct transport *transport,
if (data->transport_options.update_shallow)
set_helper_option(transport, "update-shallow", "true");
 
+   if (data->transport_options.filter_options.choice)
+   set_helper_option(
+   transport, "filter",
+   data->transport_options.filter_options.filter_spec);
+
if (data->fetch)
return fetch_with_fetch(transport, nr_heads, to_fetch);
 
diff --git a/transport.c b/transport.c
index f2fbc6f..cce50f5 100644
--- a/transport.c
+++ b/transport.c
@@ -166,6 +166,9 @@ static int set_git_option(struct git_transport_options 
*opts,
} else if (!strcmp(name, TRANS_OPT_NO_DEPENDENTS)) {
opts->no_dependents = !!value;
return 0;
+   } else if (!strcmp(name, TRANS_OPT_LIST_OBJECTS_FILTER)) {
+   parse_list_objects_filter(&opts->filter_options, value);
+   return 0;
}
return 1;
 }
@@ -236,6 +239,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.update_shallow = data->options.update_shallow;
args.from_promisor = data->options.from_promisor;
args.no_dependents = data->options.no_dependents;
+   args.filter_options = data->options.filter_options;
 
if (!data->got_remote_heads) {
connect_setup(transport, 

[PATCH v7 07/16] fetch-pack: test support excluding large blobs

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Created tests to verify fetch-pack and upload-pack support
for excluding large blobs using --filter=blobs:limit=
parameter.

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 t/t5500-fetch-pack.sh | 27 +++
 upload-pack.c | 13 +
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 80a1a32..c57916b 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -755,4 +755,31 @@ test_expect_success 'fetching deepen' '
)
 '
 
+test_expect_success 'filtering by size' '
+   rm -rf server client &&
+   test_create_repo server &&
+   test_commit -C server one &&
+   test_config -C server uploadpack.allowfilter 1 &&
+
+   test_create_repo client &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
+
+   # Ensure that object is not inadvertently fetched
+   test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
+'
+
+test_expect_success 'filtering by size has no effect if support for it is not 
advertised' '
+   rm -rf server client &&
+   test_create_repo server &&
+   test_commit -C server one &&
+
+   test_create_repo client &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
+
+   # Ensure that object is fetched
+   git -C client cat-file -e $(git hash-object server/one.t) &&
+
+   test_i18ngrep "filtering not recognized by server" err
+'
+
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index e6d38b9..15b6605 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -139,10 +139,15 @@ static void create_pack_file(void)
if (use_include_tag)
argv_array_push(&pack_objects.args, "--include-tag");
if (filter_options.filter_spec) {
-   struct strbuf buf = STRBUF_INIT;
-   sq_quote_buf(&buf, filter_options.filter_spec);
-   argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf);
-   strbuf_release(&buf);
+   if (pack_objects.use_shell) {
+   struct strbuf buf = STRBUF_INIT;
+   sq_quote_buf(&buf, filter_options.filter_spec);
+   argv_array_pushf(&pack_objects.args, "--filter=%s", 
buf.buf);
+   strbuf_release(&buf);
+   } else {
+   argv_array_pushf(&pack_objects.args, "--filter=%s",
+filter_options.filter_spec);
+   }
}
 
pack_objects.in = -1;
-- 
2.9.3



[PATCH v7 01/16] sha1_file: support lazily fetching missing objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
 regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
 about the loose/packed distinction and operate on both differently,
 and ensure that they can handle the concept of objects that are
 neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
 - this searches a single pack that is provided as an argument; the
   caller already knows (through other means) that the sought object
   is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
 it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
 object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
 not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c   |  2 ++
 builtin/fetch-pack.c |  2 ++
 builtin/fsck.c   |  3 +++
 builtin/index-pack.c |  6 ++
 cache.h  |  8 
 fetch-object.c   |  3 +++
 sha1_file.c  | 32 ++
 t/t0410-partial-clone.sh | 51 
 8 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index f5fa4fd..cf9ea5c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -475,6 +475,8 @@ static int batch_objects(struct batch_options *opt)
 
for_each_loose_object(batch_loose_object, &sa, 0);
for_each_packed_object(batch_packed_object, &sa, 0);
+   if (repository_format_partial_clone)
+   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
 
cb.opt = opt;
cb.expand = &data;
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 02abe72..15eeed7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -53,6 +53,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
 
+   fetch_if_missing = 0;
+
packet_trace_identity("fetch-pack");
 
memset(&args, 0, sizeof(args));
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 578a7c8..3b76c0e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -678,6 +678,9 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
int i;
struct alternate_object_database *alt;
 
+   /* fsck knows how to handle missing promisor objects */
+   fetch_if_missing = 0;
+
errors_found = 0;
check_replace_refs = 0;
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 24c2f05..a0a35e6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1657,6 +1657,12 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
unsigned foreign_nr = 1;/* zero is a "good" value, assume bad */
int report_end_of_input = 0;
 
+   /*
+* index-pack never needs to fetch missing objects, since it only
+* accesses the repo to do hash collision checks
+*/
+   fetch_if_missing = 0;
+
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(index_pack_usage);
 
diff --git a/cache.h b/cache.h
index c76f2e9..6980072 100644
--- a/cache.h
+++ b/cache.h
@@ -1727,6 +1727,14 @@ struct object_info {
 #define OBJECT_INFO

[PATCH v7 02/16] rev-list: support termination at promisor objects

2017-12-08 Thread Jeff Hostetler
From: Jonathan Tan 

Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 
---
 Documentation/rev-list-options.txt |  11 
 builtin/rev-list.c |  71 +++---
 list-objects.c |  29 ++-
 object.c   |   2 +-
 revision.c |  33 +++-
 revision.h |   5 +-
 t/t0410-partial-clone.sh   | 101 +
 7 files changed, 240 insertions(+), 12 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 8d8b7f4..0ce8ccd 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -745,10 +745,21 @@ The form '--missing=allow-any' will allow object 
traversal to continue
 if a missing object is encountered.  Missing objects will silently be
 omitted from the results.
 +
+The form '--missing=allow-promisor' is like 'allow-any', but will only
+allow object traversal to continue for EXPECTED promisor missing objects.
+Unexpected missing objects will raise an error.
++
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
 endif::git-rev-list[]
 
+--exclude-promisor-objects::
+   (For internal use only.)  Prefilter object traversal at
+   promisor boundary.  This is used with partial clone.  This is
+   stronger than `--missing=allow-promisor` because it limits the
+   traversal, rather than just silencing errors about missing
+   objects.
+
 --no-walk[=(sorted|unsorted)]::
Only show the given commits, but do not traverse their ancestors.
This has no effect if a range is specified. If the argument
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 547a3e0..48f922d 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "reflog-walk.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -67,6 +68,7 @@ enum missing_action {
MA_ERROR = 0,/* fail if any missing objects are encountered */
MA_ALLOW_ANY,/* silently allow ALL missing objects */
MA_PRINT,/* print ALL missing objects in special section */
+   MA_ALLOW_PROMISOR, /* silently allow all missing PROMISOR objects */
 };
 static enum missing_action arg_missing_action;
 
@@ -197,6 +199,12 @@ static void finish_commit(struct commit *commit, void 
*data)
 
 static inline void finish_object__ma(struct object *obj)
 {
+   /*
+* Whether or not we try to dynamically fetch missing objects
+* from the server, we currently DO NOT have the object.  We
+* can either print, allow (ignore), or conditionally allow
+* (ignore) them.
+*/
switch (arg_missing_action) {
case MA_ERROR:
die("missing blob object '%s'", oid_to_hex(&obj->oid));
@@ -209,25 +217,36 @@ static inline void finish_object__ma(struct object *obj)
oidset_insert(&missing_objects, &obj->oid);
return;
 
+   case MA_ALLOW_PROMISOR:
+   if (is_promisor_object(&obj->oid))
+   return;
+   die("unexpected missing blob object '%s'",
+   oid_to_hex(&obj->oid));
+   return;
+
default:
BUG("unhandled missing_action");
return;
}
 }
 
-static void finish_object(struct object *obj, const char *name, void *cb_data)
+static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
finish_object__ma(obj);
+   return 1;
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
parse_object(&obj->oid);
+   return 0;
 }
 
 static void show_object(stru

[PATCH v7 04/16] upload-pack: add object filtering for partial clone

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach upload-pack to negotiate object filtering over the protocol and
to send filter parameters to pack-objects.  This is intended for partial
clone and fetch.

The idea to make upload-pack configurable using uploadpack.allowFilter
comes from Jonathan Tan's work in [1].

[1] 
https://public-inbox.org/git/f211093280b422c32cc1b7034130072f35c5ed51.1506714999.git.jonathanta...@google.com/

Signed-off-by: Jeff Hostetler 
---
 Documentation/config.txt  |  4 
 Documentation/technical/pack-protocol.txt |  8 +++
 Documentation/technical/protocol-capabilities.txt |  8 +++
 upload-pack.c | 26 ++-
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1ac0ae6..e528210 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3268,6 +3268,10 @@ uploadpack.packObjectsHook::
was run. I.e., `upload-pack` will feed input intended for
`pack-objects` to the hook, and expects a completed packfile on
stdout.
+
+uploadpack.allowFilter::
+   If this option is set, `upload-pack` will advertise partial
+   clone and partial fetch object filtering.
 +
 Note that this configuration variable is ignored if it is seen in the
 repository-level config (this is a safety measure against fetching from
diff --git a/Documentation/technical/pack-protocol.txt 
b/Documentation/technical/pack-protocol.txt
index ed1eae8..a43a113 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -212,6 +212,7 @@ out of what the server said it could do with the first 
'want' line.
   upload-request=  want-list
   *shallow-line
   *1depth-request
+  [filter-request]
   flush-pkt
 
   want-list =  first-want
@@ -227,6 +228,8 @@ out of what the server said it could do with the first 
'want' line.
   additional-want   =  PKT-LINE("want" SP obj-id)
 
   depth =  1*DIGIT
+
+  filter-request=  PKT-LINE("filter" SP filter-spec)
 
 
 Clients MUST send all the obj-ids it wants from the reference
@@ -249,6 +252,11 @@ complete those commits. Commits whose parents are not 
received as a
 result are defined as shallow and marked as such in the server. This
 information is sent back to the client in the next step.
 
+The client can optionally request that pack-objects omit various
+objects from the packfile using one of several filtering techniques.
+These are intended for use with partial clone and partial fetch
+operations.  See `rev-list` for possible "filter-spec" values.
+
 Once all the 'want's and 'shallow's (and optional 'deepen') are
 transferred, clients MUST send a flush-pkt, to tell the server side
 that it is done sending the list.
diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 26dcc6f..332d209 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -309,3 +309,11 @@ to accept a signed push certificate, and asks the  
to be
 included in the push certificate.  A send-pack client MUST NOT
 send a push-cert packet unless the receive-pack server advertises
 this capability.
+
+filter
+--
+
+If the upload-pack server advertises the 'filter' capability,
+fetch-pack may send "filter" commands to request a partial clone
+or partial fetch and request that the server omit various objects
+from the packfile.
diff --git a/upload-pack.c b/upload-pack.c
index e25f725..e6d38b9 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -10,6 +10,8 @@
 #include "diff.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filter.h"
+#include "list-objects-filter-options.h"
 #include "run-command.h"
 #include "connect.h"
 #include "sigchain.h"
@@ -18,6 +20,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "quote.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -64,6 +67,10 @@ static int advertise_refs;
 static int stateless_rpc;
 static const char *pack_objects_hook;
 
+static int filter_capability_requested;
+static int filter_advertise;
+static struct list_objects_filter_options filter_options;
+
 static void reset_timeout(void)
 {
alarm(timeout);
@@ -131,6 +138,12 @@ static void create_pack_file(void)
argv_array_push(&pack_objects.args, "--delta-base-offset");
if (use_include_tag)
argv_array_push(&pack_objects.args, "--include-tag");
+   if (filter_options.filter_spec) {
+   struct strbuf buf = STRBUF_INIT;
+   sq_quote_buf(&buf, filter_options.filter_spec);
+   argv_array_pushf(&pack_objects.args, "--filter=%s", buf.buf);
+   strbuf_release(&buf);

Re: [SCRIPT/RFC 0/3] git-commit --onto-parent (three-way merge, no working tree file changes)

2017-12-08 Thread Junio C Hamano
Igor Djordjevic  writes:

> To get back on track, and regarding what`s already been said, would 
> having something like this(1) feel useful?
>
> (1) git commit --onto 

Are you asking me if _I_ find it useful?  It is not a very useful
question to ask, as I've taken things that I do not find useful
myself.

Having said that, I do not see me personally using it.  You keep
claiming that committing without ever materializing the exact state
that is committed in the working tree is a good thing.

I do not subscribe to that view.  

I'd rather do a quick fix-up on top (which ensures that at least the
fix-up works in the context of the tip), and then "rebase -i" to
move it a more appropriate place in the history (during which I have
a chance to ensure that the fix-up works in the context it is
intended to apply to).

I know that every time I say this, people who prefer to commit
things that never existed in the working tree will say "but we'll
test it later after we make these commit without having their state
in the working tree".  But I also know better that "later" often do
not come, ever, at least for people like me ;-).

The amount of work _required_ to record the fix-up at its final
resting place deeper in the history would be larger with "rebase -i"
approach, simply because approaches like "commit --onto" and "git
post" that throw a new commit deep in the history would not require
ever materializing it in the working tree.  But because I care about
what I am actually committing, and because I am just lazy as any
other human (if not more), I'd prefer an apporach that _forces_ me
to have a checkout of the exact state that I'd be committing.  That
would prod me to actually looking at and testing the state after the
change in the context it is meant to go.


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Ramsay Jones


On 08/12/17 09:34, Jeff King wrote:
> On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
[snip]
>> Junio hinted at a different approach of solving this problem, which this
>> patch implements. Teach the diff machinery another flag for restricting
>> the information to what is shown. For example:
>>
>>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>>   b2feb64309 Revert the whole "ask curl-config" topic for now
>>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>>
>> we observe that the Makefile as shipped with 2.0 was introduced in
>> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> a different blob.

Sorry, this has nothing to do with this specific thread. :(

However, the above caught my eye. Commit b2feb64309 does not 'replace
the Makefile with a different blob'. That commit was a 'last minute'
revert of a topic _prior_ to v2.0.0, which resulted in the _same_
blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

[I haven't been following this topic, so just ignore me if I have
misunderstood what the above was describing! :-D ]

ATB,
Ramsay Jones



Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Junio C Hamano
Jeff King  writes:

> Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
> with ",", 2017-10-01) switched the syntax of the trailers
> placeholder, but forgot to update the documentation in
> pretty-formats.txt.
>
> There's need to mention the old syntax; it was never in a
> released version of Git.

There's or There's no?

>
> Signed-off-by: Jeff King 
> ---
> This should go on top of tb/delimit-pretty-trailers-args-with-comma.
>
>  Documentation/pretty-formats.txt | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/pretty-formats.txt 
> b/Documentation/pretty-formats.txt
> index d433d50f81..e664c088a5 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -204,11 +204,13 @@ endif::git-rev-list[]
>than given and there are spaces on its left, use those spaces
>  - '%><()', '%><|()': similar to '% <()', '%<|()'
>respectively, but padding both sides (i.e. the text is centered)
> -- %(trailers): display the trailers of the body as interpreted by
> -  linkgit:git-interpret-trailers[1]. If the `:only` option is given,
> -  omit non-trailer lines from the trailer block.  If the `:unfold`
> -  option is given, behave as if interpret-trailer's `--unfold` option
> -  was given. E.g., `%(trailers:only:unfold)` to do both.
> +- %(trailers[:options]): display the trailers of the body as interpreted
> +  by linkgit:git-interpret-trailers[1]. The `trailers` string may be
> +  followed by a colon and zero or more comma-separated options. If the
> +  `only` option is given, omit non-trailer lines from the trailer block.
> +  If the `unfold` option is given, behave as if interpret-trailer's
> +  `--unfold` option was given.  E.g., `%(trailers:only,unfold)` to do
> +  both.
>  
>  NOTE: Some placeholders may depend on other options given to the
>  revision traversal engine. For example, the `%g*` reflog options will


Re: [PATCH] docs/pretty-formats: mention commas in %(trailers) syntax

2017-12-08 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Dec 08, 2017 at 03:10:34AM -0500, Eric Sunshine wrote:
>
>> On Fri, Dec 8, 2017 at 12:16 AM, Jeff King  wrote:
>> > Commit 84ff053d47 (pretty.c: delimit "%(trailers)" arguments
>> > with ",", 2017-10-01) switched the syntax of the trailers
>> > placeholder, but forgot to update the documentation in
>> > pretty-formats.txt.
>> >
>> > There's need to mention the old syntax; it was never in a
>> 
>> I suppose you mean: s/need/no need/
>
> Yes, indeed. Thanks.

Ah, I probably should switch to 'read-only' mode until I finish my
inbox.



[PATCH 0/2] Offer more information in `git version --build-options`'s output

2017-12-08 Thread Johannes Schindelin
In Git for Windows, we ask users to paste the output of said command
into their bug reports, with the idea that this frequently helps
identify where the problems are coming from.

There are some obvious missing bits of information in said output,
though, and this patch series tries to fill the gaps at least a little.


Adric Norris (1):
  git version --build-options: report the build platform, too

Johannes Schindelin (1):
  version --build-options: report commit, too, if possible

 Makefile  |  4 +++-
 help.c|  4 
 help.h| 13 +
 version.c |  1 +
 version.h |  1 +
 5 files changed, 22 insertions(+), 1 deletion(-)


base-commit: 95ec6b1b3393eb6e26da40c565520a8db9796e9f
Published-As: https://github.com/dscho/git/releases/tag/built-from-commit-v1
Fetch-It-Via: git fetch https://github.com/dscho/git built-from-commit-v1
-- 
2.15.1.windows.2



[PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Johannes Schindelin
In particular when local tags are used (or tags that are pushed to some
fork) to build Git, it is very hard to figure out from which particular
revision a particular Git executable was built.

Let's just report that in our build options.

We need to be careful, though, to report when the current commit cannot be
determined, e.g. when building from a tarball without any associated Git
repository.

Signed-off-by: Johannes Schindelin 
---
 Makefile  | 4 +++-
 help.c| 2 ++
 version.c | 1 +
 version.h | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index fef9c8d2725..92a0ae3d8e3 100644
--- a/Makefile
+++ b/Makefile
@@ -1893,7 +1893,9 @@ builtin/help.sp builtin/help.s builtin/help.o: 
EXTRA_CPPFLAGS = \
 version.sp version.s version.o: GIT-VERSION-FILE GIT-USER-AGENT
 version.sp version.s version.o: EXTRA_CPPFLAGS = \
'-DGIT_VERSION="$(GIT_VERSION)"' \
-   '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)'
+   '-DGIT_USER_AGENT=$(GIT_USER_AGENT_CQ_SQ)' \
+   '-DGIT_BUILT_FROM_COMMIT="$(shell git rev-parse -q --verify HEAD || \
+   echo "(unknown)")"'
 
 $(BUILT_INS): git$X
$(QUIET_BUILT_IN)$(RM) $@ && \
diff --git a/help.c b/help.c
index df1332fa3c9..6ebea665780 100644
--- a/help.c
+++ b/help.c
@@ -413,6 +413,8 @@ int cmd_version(int argc, const char **argv, const char 
*prefix)
printf("git version %s\n", git_version_string);
 
if (build_options) {
+   printf("built from commit: %s\n",
+  git_built_from_commit_string);
printf("sizeof-long: %d\n", (int)sizeof(long));
printf("machine: %s\n", build_platform);
/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
diff --git a/version.c b/version.c
index 6106a8098c8..41b718c29e1 100644
--- a/version.c
+++ b/version.c
@@ -3,6 +3,7 @@
 #include "strbuf.h"
 
 const char git_version_string[] = GIT_VERSION;
+const char git_built_from_commit_string[] = GIT_BUILT_FROM_COMMIT;
 
 const char *git_user_agent(void)
 {
diff --git a/version.h b/version.h
index 6911a4f1558..7c62e805771 100644
--- a/version.h
+++ b/version.h
@@ -2,6 +2,7 @@
 #define VERSION_H
 
 extern const char git_version_string[];
+extern const char git_built_from_commit_string[];
 
 const char *git_user_agent(void);
 const char *git_user_agent_sanitized(void);
-- 
2.15.1.windows.2


[PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Johannes Schindelin
From: Adric Norris 

When asking for bug reports to include the output of `git version
--build-options`, the idea is that we get a better idea of the
environment where said bug occurs. In this context, it is useful to
distinguish between 32 and 64-bit builds.

We start by distinguishing between x86 and x86_64 (hoping that
interested parties will extend this to other architectures in the
future).  In addition, all 32-bit variants (i686, i586, etc.) are
collapsed into `x86'. An example of the output is:

   $ git version --build-options
   git version 2.9.3.windows.2.826.g06c0f2f
   sizeof-long: 4
   machine: x86_64

The label of `machine' was chosen so the new information will approximate
the output of `uname -m'.

Signed-off-by: Adric Norris 
Signed-off-by: Johannes Schindelin 
---
 help.c |  2 ++
 help.h | 13 +
 2 files changed, 15 insertions(+)

diff --git a/help.c b/help.c
index 88a3aeaeb9f..df1332fa3c9 100644
--- a/help.c
+++ b/help.c
@@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
 
 int cmd_version(int argc, const char **argv, const char *prefix)
 {
+   static char build_platform[] = GIT_BUILD_PLATFORM;
int build_options = 0;
const char * const usage[] = {
N_("git version []"),
@@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
*prefix)
 
if (build_options) {
printf("sizeof-long: %d\n", (int)sizeof(long));
+   printf("machine: %s\n", build_platform);
/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
}
return 0;
diff --git a/help.h b/help.h
index b21d7c94e8c..42dd9852194 100644
--- a/help.h
+++ b/help.h
@@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
cmdnames *main_cmds, stru
  */
 extern void help_unknown_ref(const char *ref, const char *cmd, const char 
*error);
 #endif /* HELP_H */
+
+/*
+ * identify build platform
+ */
+#ifndef GIT_BUILD_PLATFORM
+   #if defined __x86__ || defined __i386__ || defined __i586__ || defined 
__i686__
+   #define GIT_BUILD_PLATFORM "x86"
+   #elif defined __x86_64__
+   #define GIT_BUILD_PLATFORM "x86_64"
+   #else
+   #define GIT_BUILD_PLATFORM "unknown"
+   #endif
+#endif
-- 
2.15.1.windows.2




Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
> ...
>> @@ -2883,6 +2884,8 @@ int prepare_revision_walk(struct rev_info *revs)
>>  simplify_merges(revs);
>>  if (revs->children.name)
>>  set_children(revs);
>> +if (revs->diffopt.blobfind)
>> +revs->simplify_history = 0;
>>  return 0;
>>  }
>
> It makes sense to clear this bit by default, but is this an
> unconditonal clearing?  In other words, is there a way for the user
> to countermand this default and ask for merge simplification from
> the command line, e.g. "diff --simplify-history --blobfind="?

Looking at the places that assign to revs->simplify_history in
revision.c it seems that "this option turns the merge simplification
off unconditionally" is pretty much the norm, and this change just
follows suit.  So perhaps it is OK to let this pass, at least for
now.



Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> From: Adric Norris 
>
> When asking for bug reports to include the output of `git version
> --build-options`, the idea is that we get a better idea of the
> environment where said bug occurs. In this context, it is useful to
> distinguish between 32 and 64-bit builds.

Neat!

The cover letter gives a clearer idea of the motivation:

In Git for Windows, we ask users to paste the output of said command
into their bug reports, with the idea that this frequently helps
identify where the problems are coming from.

There are some obvious missing bits of information in said output,
though, and this patch series tries to fill the gaps at least a little.

Could some of that text go here, too?

[...]
> --- a/help.c
> +++ b/help.c
> @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
>  
>  int cmd_version(int argc, const char **argv, const char *prefix)
>  {
> + static char build_platform[] = GIT_BUILD_PLATFORM;
>   int build_options = 0;
>   const char * const usage[] = {
>   N_("git version []"),
> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
> *prefix)
>  
>   if (build_options) {
>   printf("sizeof-long: %d\n", (int)sizeof(long));
> + printf("machine: %s\n", build_platform);

Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection
of a mutable static string?  That is, something like

printf("machine: %s\n", GIT_BUILD_PLATFORM);

>   /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */

What do you think of this idea?  uname_M isn't one of the variables
in that file, though, so that's orthogonal to this patch.

[...]
> --- a/help.h
> +++ b/help.h
> @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
> cmdnames *main_cmds, stru
>   */
>  extern void help_unknown_ref(const char *ref, const char *cmd, const char 
> *error);
>  #endif /* HELP_H */
> +
> +/*
> + * identify build platform
> + */
> +#ifndef GIT_BUILD_PLATFORM
> + #if defined __x86__ || defined __i386__ || defined __i586__ || defined 
> __i686__
> + #define GIT_BUILD_PLATFORM "x86"
> + #elif defined __x86_64__
> + #define GIT_BUILD_PLATFORM "x86_64"
> + #else
> + #define GIT_BUILD_PLATFORM "unknown"
> + #endif
> +#endif

This code needs to be inside the HELP_H header guard.

Thanks and hope that helps,
Jonathan


Re: [PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> In particular when local tags are used (or tags that are pushed to some
> fork) to build Git, it is very hard to figure out from which particular
> revision a particular Git executable was built.

Hm, can you say more about how this comes up in practice?  I wonder if
we should always augment the version string with the commit hash.
E.g. I am running

git version 2.15.1.424.g9478a66081

now, which already includes the commit hash because it disambiguates
the .424 thing, but depending on the motivation, maybe we would also
want

git version 2.15.1.0.g9b185bef0c15

for people running v2.15.1 (or at least when such a tag is not a well
known, published tag).

> We need to be careful, though, to report when the current commit cannot be
> determined, e.g. when building from a tarball without any associated Git
> repository.

This means that on Debian, it would always print

built from commit: (unknown)

Maybe I shouldn't care, but I wonder if there's a way to improve on
that. E.g. should there be a makefile knob to allow Debian to specify
what to put there?

Thanks,
Jonathan


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe
Am 08.12.2017 um 11:14 schrieb Jeff King:
> On Thu, Dec 07, 2017 at 01:47:14PM -0800, Junio C Hamano wrote:
> 
>>> diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
>>> index 22034f87e7..8e8a15ea4a 100644
>>> --- a/builtin/fmt-merge-msg.c
>>> +++ b/builtin/fmt-merge-msg.c
>>> @@ -377,7 +377,8 @@ static void shortlog(const char *name,
>>> string_list_append(&subjects,
>>>oid_to_hex(&commit->object.oid));
>>> else
>>> -   string_list_append(&subjects, strbuf_detach(&sb, NULL));
>>> +   string_list_append_nodup(&subjects,
>>> +strbuf_detach(&sb, NULL));
>>> }
>>>   
>>> if (opts->credit_people)
>>
>> What is leaked comes from strbuf, so the title is not a lie, but I
>> tend to think that this leak is caused by a somewhat strange
>> string_list API.  The subjects string-list is initialized as a "dup"
>> kind, but a caller that wants to avoid leaking can (and should) use
>> _nodup() call to add a string without duping.  It all feels a bit
>> too convoluted.
> 
> I'm not sure it's string-list's fault. Many callers (including this one)
> have _some_ entries whose strings must be duplicated and others which do
> not.
> 
> So either:
> 
>1. The list gets marked as "nodup", and we add an extra xstrdup() to the
>   oid_to_hex call above. And also need to remember to free() the
>   strings later, since the list does not own them.
> 
> or
> 
>2. We mark it as "dup" and incur an extra allocation and copy, like:
> 
> string_list_append(&subjects, sb.buf);
> strbuf_release(&buf);

The two modes (dup/nodup) make string_list code tricky.  Not sure
how far we'd get with something simpler (e.g. an array of char pointers),
but having the caller do all string allocations would make the code
easier to analyze.

> So I'd really blame the caller, which doesn't want to do (2) out of a
> sense of optimization. It could also perhaps write it as:
> 
>while (commit = get_revision(rev)) {
>   strbuf_reset(&sb);
>   ... maybe put some stuff in sb ...
>   if (!sb.len)
>   string_list_append(&subjects, oid_to_hex(obj));
>   else
>   string_list_append(&subjects, sb.buf);
>}
>strbuf_release(&sb);
> 
> which at least avoids the extra allocations.

Right, we'd just have extra string copies in that case.

> By the way, I think there's another quite subtle leak in this function.
> We do this:
> 
>format_commit_message(commit, "%s", &sb, &ctx);
>strbuf_ltrim(&sb);
> 
> and then only use "sb" if sb.len is non-zero. But we may have actually
> allocated to create our zero-length string (e.g., if we had a strbuf
> full of spaces and trimmed them all off). Since we reuse "sb" over and
> over as we loop, this will actually only leak once for the whole loop,
> not once per iteration. So it's probably not a big deal, but writing it
> with the explicit reset/release pattern fixes that (and is more
> idiomatic for our code base, I think).

It's subtle, but I think it's not leaking, at least not in your example
case (and I can't think of another way).  IIUC format_subject(), which
handles the "%s" part, doesn't touch sb if the subject is made up only
of whitespace.

René


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe
Am 07.12.2017 um 22:27 schrieb Jeff King:
> Grepping for "list_append.*detach" shows a few other possible cases in
> transport-helper.c, which I think are leaks.

-- >8 --
Subject: [PATCH] transport-helper: plug strbuf and string_list leaks

Transfer ownership of detached strbufs to string_lists of the
duplicating variety by calling string_list_append_nodup() instead of
string_list_append() to avoid duplicating and then leaking the buffer.

While at it make sure to release the string_list when done;
push_refs_with_export() already does that.

Reported-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 transport-helper.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index bf05a2dcf1..f682e7c534 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport *transport,
struct strbuf cas = STRBUF_INIT;
strbuf_addf(&cas, "%s:%s",
ref->name, 
oid_to_hex(&ref->old_oid_expect));
-   string_list_append(&cas_options, strbuf_detach(&cas, 
NULL));
+   string_list_append_nodup(&cas_options,
+strbuf_detach(&cas, NULL));
}
}
if (buf.len == 0) {
@@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport *transport,
strbuf_addch(&buf, '\n');
sendline(data, &buf);
strbuf_release(&buf);
+   string_list_release(&cas_options, 0);
 
return push_update_refs_status(data, remote_refs, flags);
 }
@@ -930,7 +932,8 @@ static int push_refs_with_export(struct transport 
*transport,
private = apply_refspecs(data->refspecs, data->refspec_nr, 
ref->name);
if (private && !get_oid(private, &oid)) {
strbuf_addf(&buf, "^%s", private);
-   string_list_append(&revlist_args, strbuf_detach(&buf, 
NULL));
+   string_list_append_nodup(&revlist_args,
+strbuf_detach(&buf, NULL));
oidcpy(&ref->old_oid, &oid);
}
free(private);
-- 
2.15.1


Re: [PATCH v5 4/4] builtin/branch: strip refs/heads/ using skip_prefix

2017-12-08 Thread Kaartic Sivaraam

On Friday 08 December 2017 04:44 AM, Junio C Hamano wrote:

Junio C Hamano  writes:


Somehow this fell underneath my radar horizon.  I see v4 and v5 of
4/4 but do not seem to find 1-3/4.  Is this meant to be a standalone
patch, or am I expected to already have 1-3 that we already are
committed to take?


Ah, I am guessing that this would apply on top of 1-3/4 in the
thread with <20171118172648.17918-1-kaartic.sivar...@gmail.com>



You guessed right; at the right time. I was about to ask why this got 
"out of your radar" in reply to your recent "What's cooking" email :-)




The base of the series seems to predate 16169285 ("Merge branch
'jc/branch-name-sanity'", 2017-11-28), so let me see how it looks by
applying those three plus this one on top of 'master' before that
point.



Let me know if this has terrible conflicts so that I can rebase the 
series on top of 'master'.



Thanks,
Kaartic


Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Junio C Hamano
Olga Telezhnaya  writes:

> -extern void get_commit_format(const char *arg, struct rev_info *);
> -extern const char *format_subject(struct strbuf *sb, const char *msg,
> -   const char *line_separator);
> -extern void userformat_find_requirements(const char *fmt, struct 
> userformat_want *w);
> -extern int commit_format_is_empty(enum cmit_fmt);
>  extern const char *skip_blank_lines(const char *msg);
> -extern void format_commit_message(const struct commit *commit,
> -   const char *format, struct strbuf *sb,
> -   const struct pretty_print_context *context);
> -extern void pretty_print_commit(struct pretty_print_context *pp,
> - const struct commit *commit,
> - struct strbuf *sb);
> -extern void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
> -struct strbuf *sb);
> -void pp_user_info(struct pretty_print_context *pp,
> -   const char *what, struct strbuf *sb,
> -   const char *line, const char *encoding);
> -void pp_title_line(struct pretty_print_context *pp,
> -const char **msg_p,
> -struct strbuf *sb,
> -const char *encoding,
> -int need_8bit_cte);
> -void pp_remainder(struct pretty_print_context *pp,
> -   const char **msg_p,
> -   struct strbuf *sb,
> -   int indent);
> ...
> +void userformat_find_requirements(const char *fmt, struct userformat_want 
> *w);
> +void pp_commit_easy(enum cmit_fmt fmt, const struct commit *commit,
> + struct strbuf *sb);
> +void pp_user_info(struct pretty_print_context *pp, const char *what,
> + struct strbuf *sb, const char *line,
> + const char *encoding);
> +void pp_title_line(struct pretty_print_context *pp, const char **msg_p,
> + struct strbuf *sb, const char *encoding,
> + int need_8bit_cte);
> +void pp_remainder(struct pretty_print_context *pp, const char **msg_p,
> + struct strbuf *sb, int indent);
> +
> +void format_commit_message(const struct commit *commit,
> + const char *format, struct strbuf *sb,
> + const struct pretty_print_context *context);
> +
> +void get_commit_format(const char *arg, struct rev_info *);
> +
> +void pretty_print_commit(struct pretty_print_context *pp,
> + const struct commit *commit,
> + struct strbuf *sb);
> +
> +const char *format_subject(struct strbuf *sb, const char *msg,
> + const char *line_separator);
> +
> +int commit_format_is_empty(enum cmit_fmt);

I see you've "standardized" to drop "extern" from the declarations
in the header; I have an impression that our preference however is
to go in the other direction.

The choice of bits that are moved to the new header looks quite
sensible to me.

Thanks.


Re: [PATCH 1/2] git version --build-options: report the build platform, too

2017-12-08 Thread Junio C Hamano
Jonathan Nieder  writes:

>> @@ -390,6 +390,7 @@ const char *help_unknown_cmd(const char *cmd)
>>  
>>  int cmd_version(int argc, const char **argv, const char *prefix)
>>  {
>> +static char build_platform[] = GIT_BUILD_PLATFORM;
>>  int build_options = 0;
>>  const char * const usage[] = {
>>  N_("git version []"),
>> @@ -413,6 +414,7 @@ int cmd_version(int argc, const char **argv, const char 
>> *prefix)
>>  
>>  if (build_options) {
>>  printf("sizeof-long: %d\n", (int)sizeof(long));
>> +printf("machine: %s\n", build_platform);
>
> Can this use GIT_BUILD_PLATFORM directly instead of going via the indirection
> of a mutable static string?  That is, something like
>
>   printf("machine: %s\n", GIT_BUILD_PLATFORM);

Good point.  And if this is externally identified as "machine",
probably the macro should also use the same word, not "platform".
We can go either way, as long as we are consistent, though.

>> --- a/help.h
>> +++ b/help.h
>> @@ -33,3 +33,16 @@ extern void list_commands(unsigned int colopts, struct 
>> cmdnames *main_cmds, stru
>>   */
>>  extern void help_unknown_ref(const char *ref, const char *cmd, const char 
>> *error);
>>  #endif /* HELP_H */
>> +
>> +/*
>> + * identify build platform
>> + */
>> +#ifndef GIT_BUILD_PLATFORM
>> +#if defined __x86__ || defined __i386__ || defined __i586__ || defined 
>> __i686__
>> +#define GIT_BUILD_PLATFORM "x86"
>> +#elif defined __x86_64__
>> +#define GIT_BUILD_PLATFORM "x86_64"
>> +#else
>> +#define GIT_BUILD_PLATFORM "unknown"
>> +#endif
>> +#endif
>
> This code needs to be inside the HELP_H header guard.

Certainly.

Thanks.


[PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows

2017-12-08 Thread tboegi
From: Torsten Bögershausen 

The new MIX tests don't pass under Windows, adapt them
to use the correct native line ending.

Signed-off-by: Torsten Bögershausen 
---

 Sorry for the breakage.
 This needs to go on top of tb/check-crlf-for-safe-crlf
 
 t/t0027-auto-crlf.sh | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 97154f5c79..8d929b76dc 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -170,22 +170,22 @@ commit_MIX_chkwrn () {
git -c core.autocrlf=$crlf add $fname 2>"${pfx}_$f.err"
done
 
-   test_expect_success "commit file with mixed EOL crlf=$crlf attr=$attr 
LF" '
+   test_expect_success "commit file with mixed EOL onto LF crlf=$crlf 
attr=$attr" '
check_warning "$lfwarn" ${pfx}_LF.err
'
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF" '
+   test_expect_success "commit file with mixed EOL onto CLRF attr=$attr 
aeol=$aeol crlf=$crlf" '
check_warning "$crlfwarn" ${pfx}_CRLF.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF_mix_LF" '
+   test_expect_success "commit file with mixed EOL onto CRLF_mix_LF 
attr=$attr aeol=$aeol crlf=$crlf" '
check_warning "$lfmixcrlf" ${pfx}_CRLF_mix_LF.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf LF_mix_cr" '
+   test_expect_success "commit file with mixed EOL onto LF_mix_cr 
attr=$attr aeol=$aeol crlf=$crlf " '
check_warning "$lfmixcr" ${pfx}_LF_mix_CR.err
'
 
-   test_expect_success "commit file with mixed EOL attr=$attr aeol=$aeol 
crlf=$crlf CRLF_nul" '
+   test_expect_success "commit file with mixed EOL onto CRLF_nul 
attr=$attr aeol=$aeol crlf=$crlf" '
check_warning "$crlfnul" ${pfx}_CRLF_nul.err
'
 }
@@ -378,7 +378,7 @@ test_expect_success 'setup master' '
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\rLINETHREE"   >CRLF_mix_CR &&
printf "\$Id:  
\$\r\nLINEONEQ\r\nLINETWO\r\nLINETHREE" | q_to_nul >CRLF_nul &&
printf "\$Id:  
\$\nLINEONEQ\nLINETWO\nLINETHREE" | q_to_nul >LF_nul &&
-   create_NNO_MIX_files CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF CRLF_mix_LF 
CRLF_mix_LF &&
+   create_NNO_MIX_files &&
git -c core.autocrlf=false add NNO_*.txt MIX_*.txt &&
git commit -m "mixed line endings" &&
test_tick
@@ -441,13 +441,14 @@ test_expect_success 'commit files attr=crlf' '
 '
 
 # Commit "CRLFmixLF" on top of these files already in the repo:
-# LF, CRLF, CRLFmixLF LF_mix_CR CRLFNULL
+# mixed mixed mixed   
mixed   mixed
+# onto  onto  ontoonto 
   onto
 # attrLFCRLF  CRLFmixLF   
LF_mix_CR   CRLFNUL
 commit_MIX_chkwrn ""  ""  false   """"""  ""   
   ""
 commit_MIX_chkwrn ""  ""  true"LF_CRLF" """"  
"LF_CRLF"   "LF_CRLF"
 commit_MIX_chkwrn ""  ""  input   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
 
-commit_MIX_chkwrn "auto"  ""  false   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
+commit_MIX_chkwrn "auto"  ""  false   "$WAMIX"  """"  
"$WAMIX""$WAMIX"
 commit_MIX_chkwrn "auto"  ""  true"LF_CRLF" """"  
"LF_CRLF"   "LF_CRLF"
 commit_MIX_chkwrn "auto"  ""  input   "CRLF_LF" """"  
"CRLF_LF"   "CRLF_LF"
 
-- 
2.15.1.271.g1a4e40aa5d



[PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread tboegi
From: Torsten Bögershausen 

Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
remove the empty path spec.

Signed-off-by: Torsten Bögershausen 
---
 t/t0027-auto-crlf.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index c2c208fdcd..97154f5c79 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -370,7 +370,7 @@ test_expect_success 'setup master' '
echo >.gitattributes &&
git checkout -b master &&
git add .gitattributes &&
-   git commit -m "add .gitattributes" "" &&
+   git commit -m "add .gitattributes" &&
printf "\$Id:  
\$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
printf "\$Id:  
\$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
printf "\$Id:  
\$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&
-- 
2.15.1.271.g1a4e40aa5d



Re: [PATCH 2/2] version --build-options: report commit, too, if possible

2017-12-08 Thread Junio C Hamano
Jonathan Nieder  writes:

>> We need to be careful, though, to report when the current commit cannot be
>> determined, e.g. when building from a tarball without any associated Git
>> repository.
>
> This means that on Debian, it would always print
>
>   built from commit: (unknown)
>
> Maybe I shouldn't care, but I wonder if there's a way to improve on
> that. E.g. should there be a makefile knob to allow Debian to specify
> what to put there?

Another "interesting" possibility is to build from a tarball
extracted into a directory hierarchy that is controlled by an
unrelated Git repository.  E.g. "my $HOME is under $HOME/.git
repository, and then I have a tarball extract in $HOME/src/git".
We shouldn't embed the HEAD commit of that $HOME directory project
in the resulting executable in such a case.

We should be able to do this by being a bit more careful than the
presented patch.  Make sure that the toplevel is at the same
directory as we assumed to be (i.e. where we found that Makefile)
and trust rev-parse output only when that is the case, or something
like that.


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).

Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?



Re: [PATCH] doc: reword gitworflows for neutrality

2017-12-08 Thread Eric Sunshine
On Fri, Dec 8, 2017 at 10:18 AM, Daniel Bensoussan
 wrote:
> doc: reword gitworflows for neutrality

s/gitworflows/gitworkflows/

> Changed 'he' to 'them' to be more neutral in "gitworkflows.txt".
>
> See discussion at: 
> https://public-inbox.org/git/xmqqvahieeqy@gitster.mtv.corp.google.com/
>
> Signed-off-by: Matthieu Moy 
> Signed-off-by: Timothee Albertin 
> Signed-off-by: Nathan Payre 
> Signed-off-by: Daniel Bensoussan 
> ---
> diff --git a/Documentation/gitworkflows.txt b/Documentation/gitworkflows.txt
> @@ -407,8 +407,8 @@ follows.
> -Occasionally, the maintainer may get merge conflicts when he tries to
> -pull changes from downstream.  In this case, he can ask downstream to
> +Occasionally, the maintainer may get merge conflicts when they try to
> +pull changes from downstream.  In this case, they can ask downstream to

As a native English speaker, I find the new phrasing odd, and think
this may a step backward. How about trying a different approach? For
example:

Occasionally, the maintainer may get merge conflicts when trying
to pull changes from downstream. In this case, it may make sense
to ask downstream to do the merge and resolve the conflicts
instead (since, presumably, downstream will know better how to
resolve them).

>  do the merge and resolve the conflicts themselves (perhaps they will
>  know better how to resolve them).  It is one of the rare cases where
>  downstream 'should' merge from upstream.


Re: [WIP 01/15] pkt-line: introduce packet_read_with_status

2017-12-08 Thread Brandon Williams
On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> 
> > diff --git a/pkt-line.h b/pkt-line.h
> > index 3dad583e2..f1545929b 100644
> > --- a/pkt-line.h
> > +++ b/pkt-line.h
> > @@ -60,8 +60,16 @@ int write_packetized_from_buf(const char *src_in, size_t 
> > len, int fd_out);
> >   * If options contains PACKET_READ_CHOMP_NEWLINE, a trailing newline (if
> >   * present) is removed from the buffer before returning.
> >   */
> > +enum packet_read_status {
> > +   PACKET_READ_ERROR = -1,
> > +   PACKET_READ_NORMAL,
> > +   PACKET_READ_FLUSH,
> > +};
> >  #define PACKET_READ_GENTLE_ON_EOF (1u<<0)
> >  #define PACKET_READ_CHOMP_NEWLINE (1u<<1)
> > +enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
> > size_t *src_len,
> > +   char *buffer, unsigned 
> > size, int *pktlen,
> > +   int options);
> >  int packet_read(int fd, char **src_buffer, size_t *src_len, char
> > *buffer, unsigned size, int options);
> 
> The documentation that is preceding these lines is very specific to
> packet_read, e.g.
> 
> If options does contain PACKET_READ_GENTLE_ON_EOF,
> we will not die on condition 4 (truncated input), but instead return -1
> 
> which doesn't hold true for the _status version. Can you adapt the comment
> to explain both read functions?

Good point, I'll makes changes and document the _status version
separately.

-- 
Brandon Williams


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler



On 12/8/2017 12:58 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).


Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?



oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
rather than pc*7*_p2..pc7_p3.  sorry for the typo.

and since the only changes in p2 were to squash those 2 commits near
the tip of p2, only those 3 commits changed SHAs in v7 over v6.

so, please disregard the duplicates.

would you like me to send a corrected V8 for p3 ?

Jeff


Re: [WIP 02/15] pkt-line: introduce struct packet_reader

2017-12-08 Thread Brandon Williams
On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> > Sometimes it is advantageous to be able to peek the next packet line
> > without consuming it (e.g. to be able to determine the protocol version
> > a server is speaking).  In order to do that introduce 'struct
> > packet_reader' which is an abstraction around the normal packet reading
> > logic.  This enables a caller to be able to peek a single line at a time
> > using 'packet_reader_peek()' and having a caller consume a line by
> > calling 'packet_reader_read()'.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  pkt-line.c | 61 
> > +
> >  pkt-line.h | 20 
> >  2 files changed, 81 insertions(+)
> >
> > diff --git a/pkt-line.c b/pkt-line.c
> > index ac619f05b..518109bbe 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -406,3 +406,64 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct 
> > strbuf *sb_out)
> > }
> > return sb_out->len - orig_len;
> >  }
> > +
> > +/* Packet Reader Functions */
> > +void packet_reader_init(struct packet_reader *reader, int fd,
> > +   char *src_buffer, size_t src_len)
> > +{
> > +   reader->fd = fd;
> > +   reader->src_buffer = src_buffer;
> > +   reader->src_len = src_len;
> > +   reader->buffer = packet_buffer;
> > +   reader->buffer_size = sizeof(packet_buffer);
> > +   reader->options = PACKET_READ_CHOMP_NEWLINE | 
> > PACKET_READ_GENTLE_ON_EOF;
> 
> I assume the future user of this packet reader will need exactly
> these options coincidentally. ;)
> 
> I think it might be ok for now and later we can extend the reader if needed
> to also take the flags as args. However given this set of args, this is a 
> gentle
> packet reader, as it corresponds to the _gently version of reading
> packets AFAICT.
> 
> Unlike the pkt_read function this constructor of a packet reader doesn't need
> the arguments for its buffer (packet_buffer and sizeof thereof), which
> packet_read
> unfortunately needs. We pass in packet_buffer all the time except in
> builtin/receive-pack
> for obtaining the gpg cert. I think that's ok here.

Yeah, all of the callers I wanted to migrate all passed the same flags
and same buffer.  I figured there may be a point in the future where
we'd want to extend this so instead of hard coding the flags in the read
functions, I did so in the constructor.  That way if we wanted to extend
it in the future, it would be a little bit easier.

> 
> > +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> > +{
> > +   if (reader->line_peeked) {
> > +   reader->line_peeked = 0;
> > +   return reader->status;
> > +   }
> > +
> > +   reader->status = packet_read_with_status(reader->fd,
> > +&reader->src_buffer,
> > +&reader->src_len,
> > +reader->buffer,
> > +reader->buffer_size,
> > +&reader->pktlen,
> > +reader->options);
> > +
> > +   switch (reader->status) {
> > +   case PACKET_READ_ERROR:
> > +   reader->pktlen = -1;
> 
> In case of error the read function might not
> have assigned the pktlen as requested, so we assign
> it to -1/NULL here. Though the caller ought to already know
> that they handle bogus, as the state is surely the first thing
> they'd inspect?

Exactly, I thought it would be easier to ensure that pktlen and line
were had consistent state no matter what the output of the read was.
But yes, callers should be looking at the status to determine what to
do.

> 
> > +   reader->line = NULL;
> > +   break;
> > +   case PACKET_READ_NORMAL:
> > +   reader->line = reader->buffer;
> > +   break;
> > +   case PACKET_READ_FLUSH:
> > +   reader->pktlen = 0;
> > +   reader->line = NULL;
> > +   break;
> > +   }
> 
> Oh, this gives an interesting interface for someone who is
> just inspecting the len/line instead of the state, so it might be
> worth keeping it this way.
> 
> Can we have an API documentation in the header file,
> explaining what to expect in each field given the state
> of the (read, peaked) packet?

Yep I can write some better API docs for these functions.

-- 
Brandon Williams


Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Eric Sunshine
On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya
 wrote:
> Create header for pretty.c to make formatting interface more structured.
> This is a middle point, this file would be merged futher with other

s/futher/further/

> files which contain formatting stuff.
>
> Signed-off-by: Olga Telezhnaia 
> Mentored-by: Christian Couder 
> Mentored by: Jeff King 


Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Junio C Hamano
tbo...@web.de writes:

> From: Torsten Bögershausen 
>
> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
> remove the empty path spec.
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  t/t0027-auto-crlf.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This looks a bit strange.  The intent seems to commit all changes
made in the working tree, so I'd understand it if it replaced the
empty string with a single dot.

I also thought that we deprecated use of an empty string as "match
all" pathspec recently, and expected that this to break.

... goes and looks ...

Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
element", 2017-06-23) does exactly that.


> diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
> index c2c208fdcd..97154f5c79 100755
> --- a/t/t0027-auto-crlf.sh
> +++ b/t/t0027-auto-crlf.sh
> @@ -370,7 +370,7 @@ test_expect_success 'setup master' '
>   echo >.gitattributes &&
>   git checkout -b master &&
>   git add .gitattributes &&
> - git commit -m "add .gitattributes" "" &&
> + git commit -m "add .gitattributes" &&
>   printf "\$Id:  
> \$\nLINEONE\nLINETWO\nLINETHREE" >LF &&
>   printf "\$Id:  
> \$\r\nLINEONE\r\nLINETWO\r\nLINETHREE" >CRLF &&
>   printf "\$Id:  
> \$\nLINEONE\r\nLINETWO\nLINETHREE"   >CRLF_mix_LF &&


Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Junio C Hamano
Junio C Hamano  writes:

> tbo...@web.de writes:
>
>> From: Torsten Bögershausen 
>>
>> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
>> remove the empty path spec.
>>
>> Signed-off-by: Torsten Bögershausen 
>> ---
>>  t/t0027-auto-crlf.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> This looks a bit strange.  The intent seems to commit all changes
> made in the working tree, so I'd understand it if it replaced the
> empty string with a single dot.
>
> I also thought that we deprecated use of an empty string as "match
> all" pathspec recently, and expected that this to break.
>
> ... goes and looks ...
>
> Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
> element", 2017-06-23) does exactly that.

OK, I think I can safely omit this patch, because

 (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf,
 because ex/deprecate-empty-pathspec-as-match-all is not yet in
 the topic, an empty pathspec still means "match all" and
 nothing breaks; and

 (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any
 branch with ex/deprecate-empty-pathspec-as-match-all, the topic
 already fixes what this 1/2 tries to

Thanks for being thorough, though.



Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano
Jeff Hostetler  writes:

> On 12/8/2017 12:58 PM, Junio C Hamano wrote:
>> Jeff Hostetler  writes:
>>
>>> From: Jeff Hostetler 
>>>
>>> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
>>> (which builds upon V6 of part 1).
>>
>> Aren't the three patches at the bottom sort-of duplicate from the
>> part 2 series?
>>
>
> oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
> rather than pc*7*_p2..pc7_p3.  sorry for the typo.
>
> and since the only changes in p2 were to squash those 2 commits near
> the tip of p2, only those 3 commits changed SHAs in v7 over v6.
>
> so, please disregard the duplicates.
>
> would you like me to send a corrected V8 for p3 ?

Nah.  I just wanted to make sure that I am discarding the right ones
(i.e. 1-3/16 of partial-clone, not 8-10/10 of fsck-promisors).

Thanks for an update.



Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Junio C Hamano
René Scharfe  writes:

>> I'm not sure it's string-list's fault. Many callers (including this one)
>> ...
> The two modes (dup/nodup) make string_list code tricky.  Not sure
> how far we'd get with something simpler (e.g. an array of char pointers),
> but having the caller do all string allocations would make the code
> easier to analyze.

Yes.

It probably would have been more sensible if the API did not have
two modes (instead, have the caller pass whatever string to be
stored, *and* make the caller responsible for freeing them *if* it
passed an allocated string).

For the push_refs_with_push() patch you sent, another possible fix
would be to make cas_options a nodup kind so that the result of
strbuf_detach() does not get an extra strdup to be lost when placed
in cas_options.  With the current string-list API that would not
quite work, because freeing done in _release() is tied to the
"dup/nodup" ness of the string list.  I think there even is a
codepath that initializes a string_list as nodup kind, stuffs string
in it giving the ownership, and then flips it into dup kind just
before calling _release() only to have it free the strings, or
something silly/ugly like that.

In any case, the patch looks sensible.  Thanks for plugging the
leaks.


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread Junio C Hamano
René Scharfe  writes:

> Am 07.12.2017 um 22:27 schrieb Jeff King:
>> Grepping for "list_append.*detach" shows a few other possible cases in
>> transport-helper.c, which I think are leaks.
>
> -- >8 --
> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>
> Transfer ownership of detached strbufs to string_lists of the
> duplicating variety by calling string_list_append_nodup() instead of
> string_list_append() to avoid duplicating and then leaking the buffer.
>
> While at it make sure to release the string_list when done;
> push_refs_with_export() already does that.
>
> Reported-by: Jeff King 
> Signed-off-by: Rene Scharfe 
> ---
>  transport-helper.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/transport-helper.c b/transport-helper.c
> index bf05a2dcf1..f682e7c534 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport 
> *transport,
>   struct strbuf cas = STRBUF_INIT;
>   strbuf_addf(&cas, "%s:%s",
>   ref->name, 
> oid_to_hex(&ref->old_oid_expect));
> - string_list_append(&cas_options, strbuf_detach(&cas, 
> NULL));
> + string_list_append_nodup(&cas_options,
> +  strbuf_detach(&cas, NULL));
>   }
>   }
>   if (buf.len == 0) {
> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport 
> *transport,
>   strbuf_addch(&buf, '\n');
>   sendline(data, &buf);
>   strbuf_release(&buf);
> + string_list_release(&cas_options, 0);

There is no such function; you meant _clear() perhaps?


Re: [PATCH v1 1/2] t0027: Don't use git commit

2017-12-08 Thread Torsten Bögershausen
On Fri, Dec 08, 2017 at 10:21:19AM -0800, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > tbo...@web.de writes:
> >
> >> From: Torsten Bögershausen 
> >>
> >> Replace `git commit -m "comment" ""` with `git commit -m "comment"` to
> >> remove the empty path spec.
> >>
> >> Signed-off-by: Torsten Bögershausen 
> >> ---
> >>  t/t0027-auto-crlf.sh | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > This looks a bit strange.  The intent seems to commit all changes
> > made in the working tree, so I'd understand it if it replaced the
> > empty string with a single dot.
> >
> > I also thought that we deprecated use of an empty string as "match
> > all" pathspec recently, and expected that this to break.
> >
> > ... goes and looks ...
> >
> > Indeed, 229a95aa ("t0027: do not use an empty string as a pathspec
> > element", 2017-06-23) does exactly that.
> 
> OK, I think I can safely omit this patch, because
> 
>  (1) when 2/2 is queued on top of tb/check-crlf-for-safe-crlf,
>  because ex/deprecate-empty-pathspec-as-match-all is not yet in
>  the topic, an empty pathspec still means "match all" and
>  nothing breaks; and
> 
>  (2) when tb/check-crlf-for-safe-crlf plus 2/2 is merged to any
>  branch with ex/deprecate-empty-pathspec-as-match-all, the topic
>  already fixes what this 1/2 tries to
> 
> Thanks for being thorough, though.
> 

Sure, the credit goes 100% to you:

commit 229a95aafa77b583b46a3156b4fad469c264ddfd
Author: Junio C Hamano 
Date:   Fri Jun 23 11:04:14 2017 -0700

t0027: do not use an empty string as a pathspec element

My brain did just assume that this had mad it to master, sorry for the noise



Re: [PATCH Outreachy 2/2] format: create docs for pretty.h

2017-12-08 Thread Eric Sunshine
On Fri, Dec 8, 2017 at 8:21 AM, Olga Telezhnaya
 wrote:
> Write some docs for functions in pretty.h.
> Take it as a first draft, they would be changed later.
>
> Signed-off-by: Olga Telezhnaia 
> Mentored-by: Christian Couder 
> Mentored by: Jeff King 
> ---
> diff --git a/pretty.h b/pretty.h
> @@ -57,31 +58,74 @@ struct userformat_want {
> +/*
> + * Create a text message about commit using given "format" and "context".
> + * Put the result to "sb".
> + * Please use this function for custom formats.
> + */
>  void format_commit_message(const struct commit *commit,
> const char *format, struct strbuf *sb,
> const struct pretty_print_context *context);
>
> +/*
> + * Parse given arguments from "arg", check it for correctness and
> + * fill struct rev_info.

To be consistent with the way you formatted the other comments, I
think you'd want quotes around rev_info.

> + */
>  void get_commit_format(const char *arg, struct rev_info *);


[PATCH] Partial clone design document

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

This patch contains a design document that Jonathan Tan and I have
been working on that describes the partial clone feature currently
under development.

Since edits to this document are independent of the code, I did not
include in the part 1,2,3 patch series.

Please let us know if there are any major sections we should add
or any areas that need clarification.

Thanks!


Jeff Hostetler (1):
  partial-clone: design doc

 Documentation/technical/partial-clone.txt | 240 ++
 1 file changed, 240 insertions(+)
 create mode 100644 Documentation/technical/partial-clone.txt

-- 
2.9.3



[PATCH] partial-clone: design doc

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

First draft of design document for partial clone feature.

Signed-off-by: Jeff Hostetler 
Signed-off-by: Jonathan Tan 
---
 Documentation/technical/partial-clone.txt | 240 ++
 1 file changed, 240 insertions(+)
 create mode 100644 Documentation/technical/partial-clone.txt

diff --git a/Documentation/technical/partial-clone.txt 
b/Documentation/technical/partial-clone.txt
new file mode 100644
index 000..7ab39d8
--- /dev/null
+++ b/Documentation/technical/partial-clone.txt
@@ -0,0 +1,240 @@
+Partial Clone Design Notes
+==
+
+The "Partial Clone" feature is a performance optimization for git that
+allows git to function without having a complete copy of the repository.
+
+During clone and fetch operations, git normally downloads the complete
+contents and history of the repository.  That is, during clone the client
+receives all of the commits, trees, and blobs in the repository into a
+local ODB.  Subsequent fetches extend the local ODB with any new objects.
+For large repositories, this can take significant time to download and
+large amounts of diskspace to store.
+
+The goal of this work is to allow git better handle extremely large
+repositories.  Often in these repositories there are many files that the
+user does not need such as ancient versions of source files, files in
+portions of the worktree outside of the user's work area, or large binary
+assets.  If we can avoid downloading such unneeded objects *in advance*
+during clone and fetch operations, we can decrease download times and
+reduce ODB disk usage.
+
+
+Non-Goals
+-
+
+Partial clone is independent of and not intended to conflict with
+shallow-clone, refspec, or limited-ref mechanisms since these all operate
+at the DAG level whereas partial clone and fetch works *within* the set
+of commits already chosen for download.
+
+
+Design Overview
+---
+
+Partial clone logically consists of the following parts:
+
+- A mechanism for the client to describe unneeded or unwanted objects to
+  the server.
+
+- A mechanism for the server to omit such unwanted objects from packfiles
+  sent to the client.
+
+- A mechanism for the client to gracefully handle missing objects (that
+  were previously omitted by the server).
+
+- A mechanism for the client to backfill missing objects as needed.
+
+
+Design Details
+--
+
+- A new pack-protocol capability "filter" is added to the fetch-pack and
+  upload-pack negotiation.
+
+  This uses the existing capability discovery mechanism.
+  See "filter" in Documentation/technical/pack-protocol.txt.
+
+- Clients pass a "filter-spec" to clone and fetch which is passed to the
+  server to request filtering during packfile construction.
+
+  There are various filters available to accomodate different situations.
+  See "--filter=" in Documentation/rev-list-options.txt.
+
+- On the server pack-objects applies the requested filter-spec as it
+  creates "filtered" packfiles for the client.
+
+  These filtered packfiles are incomplete in the traditional sense because
+  they may contain trees that reference blobs that the client does not have.
+
+
+ How the local repository gracefully handles missing objects
+
+With partial clone, the fact that objects can be missing makes such
+repositories incompatible with older versions of Git, necessitating a
+repository extension (see the documentation of "extensions.partialClone"
+for more information).
+
+An object may be missing due to a partial clone or fetch, or missing due
+to repository corruption. To differentiate these cases, the local
+repository specially indicates packfiles obtained from the promisor
+remote. These "promisor packfiles" consist of a ".promisor" file
+with arbitrary contents (like the ".keep" files), in addition to
+their ".pack" and ".idx" files. (In the future, this ability
+may be extended to loose objects[a].)
+
+The local repository considers a "promisor object" to be an object that
+it knows (to the best of its ability) that the promisor remote has,
+either because the local repository has that object in one of its
+promisor packfiles, or because another promisor object refers to it. Git
+can then check if the missing object is a promisor object, and if yes,
+this situation is common and expected. This also means that there is no
+need to explicitly maintain an expensive-to-modify list of missing
+objects on the client.
+
+Almost all Git code currently expects any objects referred to by other
+objects to be present. Therefore, a fallback mechanism is added:
+whenever Git attempts to read an object that is found to be missing, it
+will attempt to fetch it from the promisor remote, expanding the subset
+of objects available locally, then reattempt the read. This allows
+objects to be "faulted in" from the promisor remote without complicated
+prediction algorithms. For efficiency reasons, no check as to whether
+the missing object is a promisor o

Re: [WIP 03/15] pkt-line: add delim packet support

2017-12-08 Thread Brandon Williams
On 12/07, Stefan Beller wrote:
> On Mon, Dec 4, 2017 at 3:58 PM, Brandon Williams  wrote:
> > One of the design goals of protocol-v2 is to improve the semantics of
> > flush packets.  Currently in protocol-v1, flush packets are used both to
> > indicate a break in a list of packet lines as well as an indication that
> > one side has finished speaking.  This makes it particularly difficult
> > to implement proxies as a proxy would need to completely understand git
> > protocol instead of simply looking for a flush packet.
> >
> > To do this, introduce the special deliminator packet '0001'.  A delim
> > packet can then be used as a deliminator between lists of packet lines
> > while flush packets can be reserved to indicate the end of a response.
> >
> > Signed-off-by: Brandon Williams 
> 
> I presume the update for Documentation/technical/* comes at a later patch in 
> the
> series, clarifying the exact semantic difference between the packet types?

Yeah, currently there isn't a use for the delim packet but there will be
one when v2 is introduced.

-- 
Brandon Williams


Re: [PATCH] fmt-merge-msg: avoid leaking strbuf in shortlog()

2017-12-08 Thread René Scharfe
Am 08.12.2017 um 19:44 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 07.12.2017 um 22:27 schrieb Jeff King:
>>> Grepping for "list_append.*detach" shows a few other possible cases in
>>> transport-helper.c, which I think are leaks.
>>
>> -- >8 --
>> Subject: [PATCH] transport-helper: plug strbuf and string_list leaks
>>
>> Transfer ownership of detached strbufs to string_lists of the
>> duplicating variety by calling string_list_append_nodup() instead of
>> string_list_append() to avoid duplicating and then leaking the buffer.
>>
>> While at it make sure to release the string_list when done;
>> push_refs_with_export() already does that.
>>
>> Reported-by: Jeff King 
>> Signed-off-by: Rene Scharfe 
>> ---
>>   transport-helper.c | 7 +--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/transport-helper.c b/transport-helper.c
>> index bf05a2dcf1..f682e7c534 100644
>> --- a/transport-helper.c
>> +++ b/transport-helper.c
>> @@ -882,7 +882,8 @@ static int push_refs_with_push(struct transport 
>> *transport,
>>  struct strbuf cas = STRBUF_INIT;
>>  strbuf_addf(&cas, "%s:%s",
>>  ref->name, 
>> oid_to_hex(&ref->old_oid_expect));
>> -string_list_append(&cas_options, strbuf_detach(&cas, 
>> NULL));
>> +string_list_append_nodup(&cas_options,
>> + strbuf_detach(&cas, NULL));
>>  }
>>  }
>>  if (buf.len == 0) {
>> @@ -897,6 +898,7 @@ static int push_refs_with_push(struct transport 
>> *transport,
>>  strbuf_addch(&buf, '\n');
>>  sendline(data, &buf);
>>  strbuf_release(&buf);
>> +string_list_release(&cas_options, 0);
> 
> There is no such function; you meant _clear() perhaps?

Yes, of course, I'm sorry.  Not sure what happened there. O_o

René


Re: [WIP 08/15] connect: discover protocol version outside of get_remote_heads

2017-12-08 Thread Brandon Williams
On 12/07, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > While we could wrap the preamble into a function it sort of defeats the
> > purpose since you want to be able to call different functions based on
> > the protocol version you're speaking.  That way you can have hard
> > separations between the code paths which operate on v0/v1 and v2.
> 
> As I envision that the need for different processing across protocol
> versions in real code would become far greater than it would fit in
> cases within a single switch() statement, I imagined that the reader
> structure would have a field that says which version of the protocol
> it is, so that the caller of the preamble thing can inspect it later
> to switch on it.
> 

Yes, patch 9 changes this so that the protocol version is stored in the
transport structure.  This way the fetch and push code can know what to
do in v2.

-- 
Brandon Williams


Re: [WIP 04/15] upload-pack: convert to a builtin

2017-12-08 Thread Brandon Williams
On 12/06, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > In order to allow for code sharing with the server-side of fetch in
> > protocol-v2 convert upload-pack to be a builtin.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> 
> This looks obvious and straight-forward to a cursory look.
> 
> I vaguely recalled and feared that we on purpose kept this program
> separate from the rest of the system for a reason, but my mailing
> list search is coming up empty.
> 
> >  Makefile  | 3 ++-
> >  builtin.h | 1 +
> >  git.c | 1 +
> >  upload-pack.c | 2 +-
> >  4 files changed, 5 insertions(+), 2 deletions(-)
> > ...
> > diff --git a/upload-pack.c b/upload-pack.c
> > index ef99a029c..2d16952a3 100644
> > --- a/upload-pack.c
> > +++ b/upload-pack.c
> > @@ -1033,7 +1033,7 @@ static int upload_pack_config(const char *var, const 
> > char *value, void *unused)
> > return parse_hide_refs_config(var, value, "uploadpack");
> >  }
> >  
> > -int cmd_main(int argc, const char **argv)
> > +int cmd_upload_pack(int argc, const char **argv, const char *prefix)
> >  {
> > const char *dir;
> > int strict = 0;
> 
> Shouldn't this file be moved to builtin/ directory, though?

I can definitely move the file to builtin if you would prefer.

-- 
Brandon Williams


Re: [PATCH] partial-clone: design doc

2017-12-08 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> First draft of design document for partial clone feature.
>
> Signed-off-by: Jeff Hostetler 
> Signed-off-by: Jonathan Tan 
> ---

Thanks.

> +Non-Goals
> +-
> +
> +Partial clone is independent of and not intended to conflict with
> +shallow-clone, refspec, or limited-ref mechanisms since these all operate
> +at the DAG level whereas partial clone and fetch works *within* the set
> +of commits already chosen for download.

It probably is not a huge deal (simply because it is about
"Non-Goals") but I have no idea what "refspec" and "limited-ref
mechanism" refer to in the above sentence, and I suspect many others
share the same puzzlement.

> +An object may be missing due to a partial clone or fetch, or missing due
> +to repository corruption. To differentiate these cases, the local
> +repository specially indicates packfiles obtained from the promisor
> +remote. These "promisor packfiles" consist of a ".promisor" file
> +with arbitrary contents (like the ".keep" files), in addition to
> +their ".pack" and ".idx" files. (In the future, this ability
> +may be extended to loose objects[a].)
> + ...
> +Foot Notes
> +--
> +
> +[a] Remembering that loose objects are promisor objects is mainly
> +important for trees, since they may refer to promisor blobs that
> +the user does not have.  We do not need to mark loose blobs as
> +promisor because they do not refer to other objects.

I fail to see any logical link between the "loose" and "tree".
Putting it differently, I do not see why "tree" is so special.

A promisor pack that contains a tree but lacks blobs the tree refers
to would be sufficient to let us remember that these missing blobs
are not corruption.  A loose commit or a tag that is somehow marked
as obtained from a promisor, if it can serve just like a commit or a
tag in a promisor pack to promise its direct pointee, would equally
be useful (if very inefficient).

In any case, I suspect "since they may refer to promisor blobs" is a
typo of "since they may refer to promised blobs".

> +- Currently, dynamic object fetching invokes fetch-pack for each item
> +  because most algorithms stumble upon a missing object and need to have
> +  it resolved before continuing their work.  This may incur significant
> +  overhead -- and multiple authentication requests -- if many objects are
> +  needed.
> +
> +  We need to investigate use of a long-running process, such as proposed
> +  in [5,6] to reduce process startup and overhead costs.

Also perhaps in some operations we can enumerate the objects we will
need upfront and ask for them in one go (e.g. "git log -p A..B" may
internally want to do "rev-list --objects A..B" to enumerate trees
and blobs that we may lack upfront).  I do not think having the
other side guess is a good idea, though.

> +- We currently only promisor packfiles.  We need to add support for
> +  promisor loose objects as described earlier.

The earlier description was not convincing enough to feel the need
to me; at least not yet.


Re: [WIP 07/15] connect: convert get_remote_heads to use struct packet_reader

2017-12-08 Thread Brandon Williams
On 12/06, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> 
> EXPECTING_DONE sounded like we are expecting to see 'done' packet
> sent from the other side, but I was mistaken.  It is the state
> where we are "done" expecting anything ;-).
> 
> Having an (unconditional) assignment to 'state' in the above switch
> makes me feel somewhat uneasy, as the next "switch (state)" is what
> is meant as the state machine that would allow us to say things like
> "from this state, transition to that state is impossible".  When we
> get a flush while we are expecting the first ref, for example, we'd
> just go into the "done" state.  There is no provision for a future
> update to say "no, getting a flush in this state is an error".

I believe this is accepted behavior, receiving a flush in that state.
And I don't think there is ever an instance during the ref advertisement
where a flush would be an error.  It just indicates that the
advertisement is finished.

> 
> That is no different from the current code; when read_remote_ref()
> notices that it got a flush, it just leaves the loop without even
> touching 'state' variable.  But at least, I find that the current
> code is more honest---it does not even touch 'state' and allows the
> code after the loop to inspect it, if needed.  From that point of
> vhew, the way the new code uses 'state' to leave the loop upon
> seeing a flush is a regression---it makes it harder to notice and
> report when we got a flush in a wrong state.
> 
> Perhaps getting rid of "EXPECTING_DONE" from the enum and then:
> 
>   int got_flush = 0;
>   while (1) {
>   switch (reader_read()) {
>   case PACKET_READ_FLUSH:
>   got_flush = 1;
>   break;
>   ... other cases ...
>   }
> 
>   if (got_flush)
>   break;
> 
>   switch (state) {
>   ... current code ...
>   }
>   }
> 
> would be an improvement; we can later extend "if (got_flush)" part
> to check what state we are in if we wanted to notice and report an
> error there before breaking out of the loop.
> 

I don't really see how this is any clearer from what this patch does
though.  I thought it made it easier to read as you no longer have an
infinite loop, but rather know when it will end (when you move to the
done state).

-- 
Brandon Williams


Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Jeff King
On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote:

> On 08/12/17 09:34, Jeff King wrote:
> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
> [snip]
> >> Junio hinted at a different approach of solving this problem, which this
> >> patch implements. Teach the diff machinery another flag for restricting
> >> the information to what is shown. For example:
> >>
> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
> >>
> >> we observe that the Makefile as shipped with 2.0 was introduced in
> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
> >> a different blob.
> 
> Sorry, this has nothing to do with this specific thread. :(
> 
> However, the above caught my eye. Commit b2feb64309 does not 'replace
> the Makefile with a different blob'. That commit was a 'last minute'
> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).

Right, I think the patch is working as advertised but the commit message
misinterprets the results. :)

If you add --raw, you can see that both commits introduce that blob, and
it never "goes away". That's because that happened in a merge, which we
don't diff in a default log invocation.

And in fact finding where this "goes away" is hard, because the merge
doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
forked before the revert in b2feb64309, and then the merge was able to
replace that blob completely with the other side of the merge.

Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
"--first-parent" is how I got 8eaf517835.

So I think this one is tricky because of the revert. In the same way
that pathspec-limiting is often tricky in the face of a revert, because
the merges "hide" interesting things happening.

-Peff


Re: [WIP 11/15] serve: introduce git-serve

2017-12-08 Thread Brandon Williams
On 12/07, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +static struct protocol_capability *get_capability(const char *key)
> > +{
> > +   int i;
> > +
> > +   if (!key)
> > +   return NULL;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
> > +   struct protocol_capability *c = &capabilities[i];
> > +   const char *out;
> > +   if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
> > +   return c;
> 
> Looks familiar and resembles what was recently discussed on list ;-)
> 
> > +int cmd_serve(int argc, const char **argv, const char *prefix)
> > +{
> > +
> > +   struct option options[] = {
> > +   OPT_END()
> > +   };
> > +
> > +   /* ignore all unknown cmdline switches for now */
> > +   argc = parse_options(argc, argv, prefix, options, grep_usage,
> > +PARSE_OPT_KEEP_DASHDASH |
> > +PARSE_OPT_KEEP_UNKNOWN);
> > +   serve();
> > +
> > +   return 0;
> > +}

I assume that at some point we may want to have a new endpoint that just
does v2 without needing the side channel to tell it to do so.  Maybe for
brand new server commands, like a remote grep or a remote object-stat or
something that don't have a v1 equivalent that can be fallen back to.
That's why I included a builtin/serve.c 

> > ...
> > +/* Main serve loop for protocol version 2 */
> > +void serve(void)
> > +{
> > +   /* serve by default supports v2 */
> > +   packet_write_fmt(1, "version 2\n");
> > +
> > +   advertise_capabilities();
> > +
> > +   for (;;)
> > +   if (process_request())
> > +   break;
> > +}
> 
> I am guessing that this would be run just like upload-pack,
> i.e. invoked via ssh or via git-daemon, and that is why it can just
> assume that fd#0/fd#1 are already connected to the other end.  It
> may be helpful to document somewhere how we envision to invoke this
> program.
> 

This function I was planning to just be executed by upload-pack and
receive-pack when a client requests protocol v2.  But yes the idea would
be that fd#0/fd#1 would be already setup like they are for upload-pack
and receive-pack.

-- 
Brandon Williams


Re: [WIP 04/15] upload-pack: convert to a builtin

2017-12-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Wed, 6 Dec 2017, Junio C Hamano wrote:
> ...
>> I vaguely recalled and feared that we on purpose kept this program
>> separate from the rest of the system for a reason, but my mailing
>> list search is coming up empty.
>
> I only recall that we kept it in the bin/ directory (as opposed to
> mlibexec/git-core/) to help with fetching via SSH.

Yes, that is about where it is installed (i.e. on $PATH), which is a
different issue.  

My vague recollection was about what is (and what is not) included
in and linked into the program built, with some reason that is
different from but similar to the reason why remote helpers that
link to curl and openssl libraries are excluded from the builtin
deliberately.  I know we exclude remote-helpers from builtin in
order to save the start-up overhead for other more important
built-in commands by not having to link these heavyweight libs.  I
suspect there was some valid reason why we didn't make upload-pack
a built-in, but am failing to recall what the reason was.




Re: Re: bug deleting "unmerged" branch (2.12.3)

2017-12-08 Thread Philip Oakley

From: "Ulrich Windl" 

Hi Philip!

I'm unsure what you are asking for...

Ulrich


Hi Ulrich,

I was doing a retrospective follow up (of the second kind [1]).

In your initial email
https://public-inbox.org/git/5a1d70fd02a100029...@gwsmtp1.uni-regensburg.de/
you said

"I wanted to delete the temporary branch (which is of no use now), I got a
message that the branch is unmerged.
I think if more than one branches are pointing to the same commit, one
should be allowed to delete all but the last one without warning."

My retrospectives question was to find what what part of the documentation
could be improved to assist fellow coders and Git users in gaining a better
understanding here. I think it's an easy mistake [2] to make and that we
should try to make the man pages more assistive.

I suspect that the description for the `git branch -d` needs a few more
words to clarify the 'merged/unmerged' issue for those who recieve the
warning message. Or maybe the git-glossary, etc. I tend to believe that most
users will read some of the man pages, and would continue to do so if they
are useful.

I'd welcome any feedback or suggestions you could provide.
--
Philip


>>> "Philip Oakley"  04.12.17 0.30 Uhr >>>
From: "Junio C Hamano" 
> "Philip Oakley"  writes:
>
>> I think it was that currently you are on M, and neither A nor B are
>> ancestors (i.e. merged) of M.
>>
>> As Junio said:- "branch -d" protects branches that are yet to be
>> merged to the **current branch**.
>
> Actually, I think people loosened this over time and removal of
> branch X is not rejected even if the range HEAD..X is not empty, as
> long as X is marked to integrate with/build on something else with
> branch.X.{remote,merge} and the range X@{upstream}..X is empty.
>
> So the stress of "current branch" above you added is a bit of a
> white lie.

Ah, thanks. [I haven't had chance to check the code]

The man page does say:
.-d
.Delete a branch. The branch must be fully merged in its upstream
.branch, or in HEAD if no upstream was set with --track
.or --set-upstream.

It's whether or not Ulrich had joined the two aspects together, and if the
doc was sufficient to help recognise the 'unmerged' issue. Ulrich?
--
Philip




[1] Retrospective Second Directive, section 3.4.2 of (15th Ed) Agile
Processes in software engineering and extreme programming. ISBN 1628251042
(for the perspective of the retrospective..)
[2] 'mistake' colloquial part of the error categories of slips lapses and
mistakes : Human Error, by Reason (James, prof) ISBN 0521314194 (worthwhile)



Re: [PATCH Outreachy 1/2] format: create pretty.h file

2017-12-08 Thread Junio C Hamano
Olga Telezhnaya  writes:

>  archive.c |  1 +
>  builtin/notes.c   |  2 +-
>  builtin/reset.c   |  2 +-
>  builtin/show-branch.c |  2 +-
>  combine-diff.c|  1 +
>  commit.c  |  1 +
>  commit.h  | 80 --
>  diffcore-pickaxe.c|  1 +
>  grep.c|  1 +
>  log-tree.c|  1 +
>  notes-cache.c |  1 +
>  pretty.h  | 87 
> +++
>  revision.h|  2 +-
>  sequencer.c   |  1 +
>  sha1_name.c   |  1 +
>  submodule.c   |  1 +
>  16 files changed, 101 insertions(+), 84 deletions(-)
>  create mode 100644 pretty.h
>
> diff --git a/archive.c b/archive.c
> index 0b7b62af0c3ec..60607e8c00857 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -2,6 +2,7 @@
>  #include "config.h"
>  #include "refs.h"
>  #include "commit.h"
> +#include "pretty.h"
>  #include "tree-walk.h"
>  #include "attr.h"
>  #include "archive.h"

This has a toll on topics in flight that expect the symbols for
pretty are available in "commit.h"; they are forced to include
this new file they did not even know about.

I notice that "commit.h" is included in "builtin.h"; perhaps adding
a new include for "pretty.h" there would be of lessor impact?  I
dunno.



Re: [PATCH 1/1] diffcore: add a filter to find a specific blob

2017-12-08 Thread Stefan Beller
On Fri, Dec 8, 2017 at 12:19 PM, Jeff King  wrote:
> On Fri, Dec 08, 2017 at 04:28:23PM +, Ramsay Jones wrote:
>
>> On 08/12/17 09:34, Jeff King wrote:
>> > On Thu, Dec 07, 2017 at 04:24:47PM -0800, Stefan Beller wrote:
>> [snip]
>> >> Junio hinted at a different approach of solving this problem, which this
>> >> patch implements. Teach the diff machinery another flag for restricting
>> >> the information to what is shown. For example:
>> >>
>> >>   $ ./git log --oneline --blobfind=v2.0.0:Makefile
>> >>   b2feb64309 Revert the whole "ask curl-config" topic for now
>> >>   47fbfded53 i18n: only extract comments marked with "TRANSLATORS:"
>> >>
>> >> we observe that the Makefile as shipped with 2.0 was introduced in
>> >> v1.9.2-471-g47fbfded53 and replaced in v2.0.0-rc1-5-gb2feb64309 by
>> >> a different blob.
>>
>> Sorry, this has nothing to do with this specific thread. :(
>>
>> However, the above caught my eye. Commit b2feb64309 does not 'replace
>> the Makefile with a different blob'. That commit was a 'last minute'
>> revert of a topic _prior_ to v2.0.0, which resulted in the _same_
>> blob as commit 47fbfded53. (i.e. a53f3a8326c2e62dc79bae7169d64137ac3dab20).
>
> Right, I think the patch is working as advertised but the commit message
> misinterprets the results. :)

Thanks for digging. I came to a similar realization.

> If you add --raw, you can see that both commits introduce that blob, and
> it never "goes away". That's because that happened in a merge, which we
> don't diff in a default log invocation.

We should when --raw is given.
--raw is documented as  "For each commit, show a summary of changes
using the raw diff format." and I would argue that 'each commit' includes
merges. Though I guess this may have implications for long time users.

> And in fact finding where this "goes away" is hard, because the merge
> doesn't trigger "-c" or "--cc". It's 8eaf517835, which presumably was
> forked before the revert in b2feb64309, and then the merge was able to
> replace that blob completely with the other side of the merge.
>
> Viewing with "-m" turns up a bunch of uninteresting merges. Looking at
> "--first-parent" is how I got 8eaf517835.
>
> So I think this one is tricky because of the revert. In the same way
> that pathspec-limiting is often tricky in the face of a revert, because
> the merges "hide" interesting things happening.

yup, hidden merges are unfortunate. Is there an easy way to find out
about merges? (Junio hints at having tests around merges, which I'll do
next)


  1   2   >