Re: [PATCH v2 08/21] read_packed_refs(): read references with minimal copying

2017-09-21 Thread Michael Haggerty
On 09/20/2017 08:27 PM, Jeff King wrote:
> On Tue, Sep 19, 2017 at 08:22:16AM +0200, Michael Haggerty wrote:
> 
>> Instead of copying data from the `packed-refs` file one line at time
>> and then processing it, process the data in place as much as possible.
>>
>> Also, instead of processing one line per iteration of the main loop,
>> process a reference line plus its corresponding peeled line (if
>> present) together.
>>
>> Note that this change slightly tightens up the parsing of the
>> `parse-ref` file. Previously, the parser would have accepted multiple
> 
> s/parse-ref/packed-refs/, I assume

Thanks; will fix.

> The patch itself looks good, though I did notice an interesting tangent.
> 
>> +if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
>> +parse_oid_hex(p, &oid, &p) ||
>> +!isspace(*p++))
>> +die_invalid_line(refs->path, pos, eof - pos);
> 
> I wondered why you didn't just check the output of parse_oid_hex(), and
> included the length check (since in the long run we'd like to get rid of
> uses of the static GIT_SHA1_HEXSZ macro). I imagine the answer is that
> this is an mmap'd buffer, and we can't guarantee that parse_oid_hex()
> wouldn't walk off the end of it.

Yes.

> That's fine for now, but I suspect it may become a problem when we move
> to having a second hash function with a different length. You can't just
> say "it must have as many bytes as the longest hash", because of course
> we could have the shorter hash at the end of the buffer. But we also
> can't say "it must have as many bytes as the shortest hash", because if
> the content implies it's a longer hash, we'd read off the end of the
> buffer.
> 
> I think in the long run we will need a parse_oid_hex() function that
> takes a ptr/len (or start/end) pair.

Yes, that makes sense.

> [...]

Michael


Re: [PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix

2017-09-21 Thread Michael Haggerty
On 09/20/2017 10:25 PM, Stefan Beller wrote:
> On Mon, Sep 18, 2017 at 11:22 PM, Michael Haggerty  
> wrote:
>> [...]
>> +/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */
>> +static int compare_prefix(const char *refname, const char *prefix)
>> +{
>> +   while (*prefix) {
>> +   if (*refname != *prefix)
>> +   return ((unsigned char)*refname < (unsigned 
>> char)*prefix) ? -1 : +1;
> 
> This looks interesting.
> 
> We get a signed char* , cast it to unsigned char* and then
> compare byte by byte.

Not quite. We get a `char *` of unknown signedness (it's
implementation-dependent), dereference it into a `char`, then cast that
value to `unsigned char`. What you described would be

*(const unsigned char *)refname < *(const unsigned char *)prefix

But I assume that these two variants would result in identical assembly
code.

> [...]

Michael


[PATCH v3 2/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_*

2017-09-21 Thread Ian Campbell
These are modified by set_ident() but a subsequent patch would like to operate
on their original values.

Signed-off-by: Ian Campbell 
---
 git-filter-branch.sh | 33 +
 1 file changed, 33 insertions(+)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 3da281f8a..9edb94206 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -219,6 +219,13 @@ trap 'cd "$orig_dir"; rm -rf "$tempdir"' 0
 ORIG_GIT_DIR="$GIT_DIR"
 ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
 ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
+ORIG_GIT_AUTHOR_NAME="$GIT_AUTHOR_NAME"
+ORIG_GIT_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
+ORIG_GIT_AUTHOR_DATE="$GIT_AUTHOR_DATE"
+ORIG_GIT_COMMITTER_NAME="$GIT_COMMITTER_NAME"
+ORIG_GIT_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
+ORIG_GIT_COMMITTER_DATE="$GIT_COMMITTER_DATE"
+
 GIT_WORK_TREE=.
 export GIT_DIR GIT_WORK_TREE
 
@@ -545,6 +552,8 @@ if [ "$filter_tag_name" ]; then
 fi
 
 unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
+unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
+unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE
 test -z "$ORIG_GIT_DIR" || {
GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
 }
@@ -556,6 +565,30 @@ test -z "$ORIG_GIT_INDEX_FILE" || {
GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
export GIT_INDEX_FILE
 }
+test -z "$ORIG_GIT_AUTHOR_NAME" || {
+   GIT_AUTHOR_NAME="$ORIG_GIT_AUTHOR_NAME" &&
+   export GIT_AUTHOR_NAME
+}
+test -z "$ORIG_GIT_AUTHOR_EMAIL" || {
+   GIT_AUTHOR_EMAIL="$ORIG_GIT_AUTHOR_EMAIL" &&
+   export GIT_AUTHOR_EMAIL
+}
+test -z "$ORIG_GIT_AUTHOR_DATE" || {
+   GIT_AUTHOR_DATE="$ORIG_GIT_AUTHOR_DATE" &&
+   export GIT_AUTHOR_DATE
+}
+test -z "$ORIG_GIT_COMMITTER_NAME" || {
+   GIT_COMMITTER_NAME="$ORIG_GIT_COMMITTER_NAME" &&
+   export GIT_COMMITTER_NAME
+}
+test -z "$ORIG_GIT_COMMITTER_EMAIL" || {
+   GIT_COMMITTER_EMAIL="$ORIG_GIT_COMMITTER_EMAIL" &&
+   export GIT_COMMITTER_EMAIL
+}
+test -z "$ORIG_GIT_COMMITTER_DATE" || {
+   GIT_COMMITTER_DATE="$ORIG_GIT_COMMITTER_DATE" &&
+   export GIT_COMMITTER_DATE
+}
 
 cd "$orig_dir"
 rm -rf "$tempdir"
-- 
2.11.0



[PATCH v3 0/4] filter-branch: support for incremental update + fix for ancient tag format

2017-09-21 Thread Ian Campbell
This is the third version of my patches to add incremental support to
git-filter-branch. Since the last time I have replaced `git mktag --
allow-missing-tagger` with `git hash-object -t tag -w --stdin`.

I've force pushed to [1] (Travis is still running) and have set off the
process of re-rewriting the devicetree tree from scratch (a multi-day
affair) to validate (it's looking good).

Ian.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/
[1] https://github.com/ijc/git/tree/git-filter-branch


[PATCH v3 4/4] filter-branch: use hash-object instead of mktag

2017-09-21 Thread Ian Campbell
This allows us to recreate even historical tags which would now be consider
invalid, such as v2.6.12-rc2..v2.6.13-rc3 in the Linux kernel source tree which
lack the `tagger` header.

$ git rev-parse v2.6.12-rc2
9e734775f7c22d2f89943ad6c745571f1930105f
$ git cat-file tag v2.6.12-rc2 | git mktag
error: char76: could not find "tagger "
fatal: invalid tag signature file
$ git cat-file tag v2.6.12-rc2 | git hash-object -t tag -w --stdin
9e734775f7c22d2f89943ad6c745571f1930105f

Signed-off-by: Ian Campbell 
---
 git-filter-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 956869b8e..3365a3b86 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -561,7 +561,7 @@ if [ "$filter_tag_name" ]; then
}' \
-e '/^-BEGIN PGP SIGNATURE-/q' \
-e 'p' ) |
-   git mktag) ||
+   git hash-object -t tag -w --stdin) ||
die "Could not create new tag object for $ref"
if git cat-file tag "$ref" | \
   sane_grep '^-BEGIN PGP SIGNATURE-' 
>/dev/null 2>&1
-- 
2.11.0



[PATCH v3 1/4] filter-branch: reset $GIT_* before cleaning up

2017-09-21 Thread Ian Campbell
This is pure code motion to enable a subsequent patch to add code which needs
to happen with the reset $GIT_* but before the temporary directory has been
cleaned up.

Signed-off-by: Ian Campbell 
---
 git-filter-branch.sh | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 3a74602ef..3da281f8a 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -544,11 +544,6 @@ if [ "$filter_tag_name" ]; then
done
 fi
 
-cd "$orig_dir"
-rm -rf "$tempdir"
-
-trap - 0
-
 unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
 test -z "$ORIG_GIT_DIR" || {
GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
@@ -562,6 +557,11 @@ test -z "$ORIG_GIT_INDEX_FILE" || {
export GIT_INDEX_FILE
 }
 
+cd "$orig_dir"
+rm -rf "$tempdir"
+
+trap - 0
+
 if [ "$(is_bare_repository)" = false ]; then
git read-tree -u -m HEAD || exit
 fi
-- 
2.11.0



[PATCH v3 3/4] filter-branch: stash away ref map in a branch

2017-09-21 Thread Ian Campbell
With "--state-branch=" option, the mapping from old object names
and filtered ones in ./map/ directory is stashed away in the object database,
and the one from the previous run is read to populate the ./map/ directory,
allowing for incremental updates of large trees.

Signed-off-by: Ian Campbell 
---
I have been using this as part of the device tree extraction from the Linux
kernel source since 2013, about time I sent the patch upstream!

v2:
- added several preceding cleanup patches, including:
  - new: use of mktag --allow-missing tagger.
  - split-out: preserving $GIT_*.
- use git rev-parse rather than git show-ref.
- improved error handling for Perl sub-processes.
- collapsed some shell pipelines involving piping output of git and ls into
  Perl into the Perl scripts.
- style fixes for conditionals and sub-shells.
- fixup indentation.
- added documentation.
- improved commit message.
---
 Documentation/git-filter-branch.txt |  8 +-
 git-filter-branch.sh| 49 -
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-filter-branch.txt 
b/Documentation/git-filter-branch.txt
index 9e5169aa6..bebdcdec5 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -14,7 +14,7 @@ SYNOPSIS
[--commit-filter ] [--tag-name-filter ]
[--subdirectory-filter ] [--prune-empty]
[--original ] [-d ] [-f | --force]
-   [--] [...]
+   [--state-branch ] [--] [...]
 
 DESCRIPTION
 ---
@@ -198,6 +198,12 @@ to other tags will be rewritten to point to the underlying 
commit.
directory or when there are already refs starting with
'refs/original/', unless forced.
 
+--state-branch ::
+   This option will cause the mapping from old to new objects to
+   be loaded from named branch upon startup and saved as a new
+   commit to that branch upon exit, enabling incremental of large
+   trees. If '' does not exist it will be created.
+
 ...::
Arguments for 'git rev-list'.  All positive refs included by
these options are rewritten.  You may also specify options
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 9edb94206..956869b8e 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -86,7 +86,7 @@ USAGE="[--setup ] [--env-filter ]
[--parent-filter ] [--msg-filter ]
[--commit-filter ] [--tag-name-filter ]
[--subdirectory-filter ] [--original ]
-   [-d ] [-f | --force]
+   [-d ] [-f | --force] [--state-branch ]
[--] [...]"
 
 OPTIONS_SPEC=
@@ -106,6 +106,7 @@ filter_msg=cat
 filter_commit=
 filter_tag_name=
 filter_subdir=
+state_branch=
 orig_namespace=refs/original/
 force=
 prune_empty=
@@ -181,6 +182,9 @@ do
--original)
orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
;;
+   --state-branch)
+   state_branch="$OPTARG"
+   ;;
*)
usage
;;
@@ -259,6 +263,26 @@ export GIT_INDEX_FILE
 # map old->new commit ids for rewriting parents
 mkdir ../map || die "Could not create map/ directory"
 
+if test -n "$state_branch"
+then
+   state_commit=$(git rev-parse --no-flags --revs-only "$state_branch")
+   if test -n "$state_commit"
+   then
+   echo "Populating map from $state_branch ($state_commit)" 1>&2
+   perl -e'open(MAP, "-|", "git show $ARGV[0]:filter.map") or die;
+   while () {
+   m/(.*):(.*)/ or die;
+   open F, ">../map/$1" or die;
+   print F "$2" or die;
+   close(F) or die;
+   }
+   close(MAP) or die;' "$state_commit" \
+   || die "Unable to load state from 
$state_branch:filter.map"
+   else
+   echo "Branch $state_branch does not exist. Will create" 1>&2
+   fi
+fi
+
 # we need "--" only if there are no path arguments in $@
 nonrevs=$(git rev-parse --no-revs "$@") || exit
 if test -z "$nonrevs"
@@ -590,6 +614,29 @@ test -z "$ORIG_GIT_COMMITTER_DATE" || {
export GIT_COMMITTER_DATE
 }
 
+if test -n "$state_branch"
+then
+   echo "Saving rewrite state to $state_branch" 1>&2
+   state_blob=$(
+   perl -e'opendir D, "../map" or die;
+   open H, "|-", "git hash-object -w --stdin" or die;
+   foreach (sort readdir(D)) {
+   next if m/^\.\.?$/;
+   open F, "<../map/$_" or die;
+   chomp($f = );
+   print H "$_:$f\n" or die;
+   }
+   close(H) or die;' || die "Unable to save state")
+   state_tree=$(/bin/echo -e "100644 blob $state_blob\tfilter.map" | git 
mktree)
+   if test -n "$state_commit"

Re: [PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible

2017-09-21 Thread Michael Haggerty
On 09/20/2017 08:40 PM, Jeff King wrote:
> [...]
> The overall strategy for this compile-time knob makes sense, but one
> thing confused me:
> 
>> +ifdef MMAP_PREVENTS_DELETE
>> +BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE
>> +else
>> +ifdef USE_WIN32_MMAP
>> +BASIC_CFLAGS += -DMMAP_PREVENTS_DELETE
>> +endif
>> +endif
> 
> So setting the knob does what you'd expect. But if you don't set it,
> then we still auto-tweak it based on the USE_WIN32_MMAP knob. Do we need
> that? It seems like we set our new knob in config.mak.uname any time
> we'd set USE_WIN32_MMAP. So this only has an effect in two cases:
> 
>  1. You aren't on Windows, but you set USE_WIN32_MMAP yourself.
> 
>  2. You are on Windows, but you manually unset MMAP_PREVENTS_DELETE.
> 
> I expect both cases are rare (and would probably involve somebody
> actively debugging these knobs). Probably it's a minor convenience in
> case 1, but in case 2 it would be actively confusing, I'd think.

Makes sense. I'll delete the `else` clause from the above hunk.

On 09/20/2017 08:51 PM, Jeff King wrote:
> On Wed, Sep 20, 2017 at 02:40:58PM -0400, Jeff King wrote:
> 
>>> +enum mmap_strategy {
>>> +   /*
>>> +* Don't use mmap() at all for reading `packed-refs`.
>>> +*/
>>> +   MMAP_NONE,
>>> +
>>> +   /*
>>> +* Can use mmap() for reading `packed-refs`, but the file must
>>> +* not remain mmapped. This is the usual option on Windows,
>>> +* where you cannot rename a new version of a file onto a file
>>> +* that is currently mmapped.
>>> +*/
>>> +   MMAP_TEMPORARY,
>>
>> I suspect you originally distinguished these cases so that NO_MMAP does
>> not read into a fake-mmap buffer, followed by us copying it into another
>> buffer. But AFAICT we handle the "NONE" and "TEMPORARY" cases exactly
>> the same (by just doing a read_in_full() into our own buffer). Do we
>> actually need separate strategies?
> 
> In case you are reading these sequentially, I think I talked myself into
> the utility of this during the next patch. ;)

Sorry about that. I'll add a forward breadcrumb to the log message of
this commit.

Michael


Re: [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read

2017-09-21 Thread Michael Haggerty
On 09/20/2017 08:50 PM, Jeff King wrote:
> On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote:
>> If `packed-refs` was already sorted, then (if the system allows it) we
>> can use the mmapped file contents directly. But if the system doesn't
>> allow a file that is currently mmapped to be replaced using
>> `rename()`, then it would be bad for us to keep the file mmapped for
>> any longer than necessary. So, on such systems, always make a copy of
>> the file contents, either as part of the sorting process, or
>> afterwards.
> 
> So this sort-of answers my question from the previous commit (why we
> care about the distinction between NONE and TEMPORARY), since we now
> start treating them differently.
> 
> But I'm a bit confused why there's any advantage in the TEMPORARY case
> to doing the mmap-and-copy versus just treating it like NONE and reading
> it directly.
> 
> Hmm, I guess it comes down to the double-allocation thing again? Let me
> see if I have this right:
> 
>   1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If
>  not, then we temporarily hold it in memory twice while we copy it
>  into the new sorted buffer.
> 
>   2. For TEMPORARY, we mmap once. If it's sorted, then we make a single
>  copy. If it's not sorted, then we do the copy+sort as a single
>  step.
> 
>   3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the
>  single copy.
> 
> So this is really there to help the TEMPORARY case reading an old
> unsorted file avoid needing to use double-the-ram during the copy?
> 
> That seems like a good reason (and it does not seem to add too much
> complexity).

Yes, that's correct. A deeper question is whether it's worth this extra
effort to optimize the conversion from "unsorted" to "known-sorted",
which it seems should only happen once per repository. But

* Many more Git commands read the `packed-refs` file than write it.
  So an "unsorted" file might have to be read multiple times before
  the first time it is rewritten with the "sorted" trait.

* Users might have multiple Git clients writing their `packed-refs`
  file (e.g., the git CLI plus JGit in Eclipse), and they might not
  upgrade both clients at the same time to versions that write the
  "sorted" trait. So a single repository might go back and forth
  between "unsorted" and "known-sorted" multiple times in its
  life.

* Theoretically, somebody might have a `packed-refs` file that is so
  big that it takes up more than half of memory. I think there are
  scenarios where such a file could be converted to "sorted" form
  in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode.

On the downside, in my benchmarking on Linux, there were hints that the
`MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK`
branch when operating on a known-sorted `packed-refs` file. But the
speed difference was barely detectable (and might be illusory). And
anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK`
mode will usually be used. It *would* be interesting to know if there is
a significant speed difference on Windows. Dscho?

Michael


Re: [RFC v2] refs: strip out not allowed flags from ref_transaction_update

2017-09-21 Thread Michael Haggerty
On 09/13/2017 12:59 AM, Thomas Gummerer wrote:
> Callers are only allowed to pass certain flags into
> ref_transaction_update, other flags are internal to it.  To prevent
> mistakes from the callers, strip the internal only flags out before
> continuing.
> 
> This was noticed because of a compiler warning gcc 7.1.1 issued about
> passing a NULL parameter as second parameter to memcpy (through
> hashcpy):
> 
> In file included from refs.c:5:0:
> refs.c: In function ‘ref_transaction_verify’:
> cache.h:948:2: error: argument 2 null where non-null expected 
> [-Werror=nonnull]
>   memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
>   ^~~~
> In file included from git-compat-util.h:165:0,
>  from cache.h:4,
>  from refs.c:5:
> /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared 
> here
>  extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
>   ^~
> 
> The call to hascpy in ref_transaction_add_update is protected by the

s/hascpy/hashcpy/

> passed in flags, but as we only add flags there, gcc notices
> REF_HAVE_NEW or REF_HAVE_OLD flags could be passed in from the outside,
> which would potentially result in passing in NULL as second parameter to
> memcpy.
> 
> Fix both the compiler warning, and make the interface safer for its
> users by stripping the internal flags out.
> 
> Suggested-by: Michael Haggerty 
> Signed-off-by: Thomas Gummerer 
> ---
> 
>> This might be a nice change to have anyway, to isolate
>> `ref_transaction_update()` from mistakes by its callers. For that
>> matter, one might want to be even more selective about what bits are
>> allowed in the `flags` argument to `ref_transaction_update()`'s
>> callers:
>>
>>> flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */
> 
> Here's my attempt at doing this.
> 
> The odd flag out as the flag that's advertised as internal but can't
> stripped out is REF_ISPRUNING.  REF_ISPRUNING is passed in as argument
> to 'ref_transaction_delete()' in 'prune_ref()'.
> 
> Maybe this flag should be public, or maybe I'm missing something here?
> Having only this internal flags as part of the allowed flags feels a
> bit ugly, but I'm also unfamiliar with the refs code, hence the RFC.
> If someone has more suggestions they would be very welcome :)

I wouldn't worry too much about this anomaly. `REF_ISPRUNING` is an ugly
internal kludge, but allowing it in the mask doesn't make anything worse.

>  refs.c | 2 ++
>  refs.h | 8 
>  2 files changed, 10 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index ba22f4acef..fad61be1da 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -921,6 +921,8 @@ int ref_transaction_update(struct ref_transaction 
> *transaction,
>   return -1;
>   }
>  
> + flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
> +

I would advocate considering it a bug if the caller passes in options
that we are going to ignore anyway:

if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
BUG("illegal flags %x in ref_transaction_update", flags);

Would this also squelch the compiler warning?

>   flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);
>  
>   ref_transaction_add_update(transaction, refname, flags,
> diff --git a/refs.h b/refs.h
> index 6daa78eb50..4d75c207e1 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -354,6 +354,14 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
> flags);
>  #define REF_NODEREF  0x01
>  #define REF_FORCE_CREATE_REFLOG 0x40
>  
> +/*
> + * Flags that can be passed in to ref_transaction_update
> + */
> +#define REF_TRANSACTION_UPDATE_ALLOWED_FLAGS \
> + REF_ISPRUNING |  \
> + REF_FORCE_CREATE_REFLOG |\
> + REF_NODEREF
> +
>  /*
>   * Setup reflog before using. Fill in err and return -1 on failure.
>   */
> 

Thanks for working on this.

Michael


Re: [PATCH] fast-export: do not copy from modified file

2017-09-21 Thread Juraj Oršulić
Thank you, Jonathan!

On Thu, Sep 21, 2017 at 1:55 AM, Jonathan Tan  wrote:
> When run with the "-C" option, fast-export writes 'C' commands in its
> output whenever the internal diff mechanism detects a file copy,
> indicating that fast-import should copy the given existing file to the
> given new filename. However, the diff mechanism works against the
> prior version of the file, whereas fast-import uses whatever is current.
> This causes issues when a commit both modifies a file and uses it as the
> source for a copy.
>
> Therefore, teach fast-export to refrain from writing 'C' when it has
> already written a modification command for a file.
>
> An existing test in t9350-fast-export is also fixed in this patch. The
> existing line "C file6 file7" copies the wrong version of file6, but it
> has coincidentally worked because file7 was subsequently overridden.
>
> Reported-by: Juraj Oršulić 
> Signed-off-by: Jonathan Tan 
> ---
> I tested this with the reproduction commands given by Juraj Oršulić and
> it works.
>
> I also have a version that prints the redundant 'C' (and does not
> require the change to t9350), omitting it only if it would cause a wrong
> result. That seems imprecise to me, but I can send that out if the
> redundant 'C' is preferred.
> ---
>  builtin/fast-export.c  | 46 --
>  t/t9350-fast-export.sh | 20 +++-
>  2 files changed, 51 insertions(+), 15 deletions(-)
>
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index d412c0a8f..da42ee5e6 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -344,6 +344,7 @@ static void show_filemodify(struct diff_queue_struct *q,
> struct diff_options *options, void *data)
>  {
> int i;
> +   struct string_list *changed = data;
>
> /*
>  * Handle files below a directory first, in case they are all deleted
> @@ -359,20 +360,31 @@ static void show_filemodify(struct diff_queue_struct *q,
> case DIFF_STATUS_DELETED:
> printf("D ");
> print_path(spec->path);
> +   string_list_insert(changed, spec->path);
> putchar('\n');
> break;
>
> case DIFF_STATUS_COPIED:
> case DIFF_STATUS_RENAMED:
> -   printf("%c ", q->queue[i]->status);
> -   print_path(ospec->path);
> -   putchar(' ');
> -   print_path(spec->path);
> -   putchar('\n');
> -
> -   if (!oidcmp(&ospec->oid, &spec->oid) &&
> -   ospec->mode == spec->mode)
> -   break;
> +   /*
> +* If a change in the file corresponding to 
> ospec->path
> +* has been observed, we cannot trust its contents
> +* because the diff is calculated based on the prior
> +* contents, not the current contents.  So, declare a
> +* copy or rename only if there was no change 
> observed.
> +*/
> +   if (!string_list_has_string(changed, ospec->path)) {
> +   printf("%c ", q->queue[i]->status);
> +   print_path(ospec->path);
> +   putchar(' ');
> +   print_path(spec->path);
> +   string_list_insert(changed, spec->path);
> +   putchar('\n');
> +
> +   if (!oidcmp(&ospec->oid, &spec->oid) &&
> +   ospec->mode == spec->mode)
> +   break;
> +   }
> /* fallthrough */
>
> case DIFF_STATUS_TYPE_CHANGED:
> @@ -393,6 +405,7 @@ static void show_filemodify(struct diff_queue_struct *q,
>get_object_mark(object));
> }
> print_path(spec->path);
> +   string_list_insert(changed, spec->path);
> putchar('\n');
> break;
>
> @@ -528,7 +541,8 @@ static void anonymize_ident_line(const char **beg, const 
> char **end)
> *end = out->buf + out->len;
>  }
>
> -static void handle_commit(struct commit *commit, struct rev_info *rev)
> +static void handle_commit(struct commit *commit, struct rev_info *rev,
> + struct string_list *paths_of_changed_objects)
>  {
> int saved_output_format = rev->diffopt.output_format;
> const char *commit_buffer;
> @@ -615,6 +629,7 @@ static void handle_commit(struct commit *commit, struct 
> rev_info *rev)
> if (full_tree)
> printf("deleteal

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-09-21 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 21.09.2017 08:27:
> Junio C Hamano  writes:
> 
>> ...  I agree that there is a value in what your patch 2/3
>> wants to do when the current one that is more strict would say
>> "there is no known fork-point"---we would gain a way to say "... but
>> this is the best guess based on available data that may be better
>> than getting no answer." which we lack.
>>
>> Having said all that, I do not agree with your last sentence in the
>> paragraph I quoted above.  It is a mere implementation detail to
>> consult the reflog to find out the set of "historical tips of the
>> Branch"; the current tip by definition is among the commits in that
>> set, even when the reflog of Branch is missing.  What 4f21454b55 did
>> was a reasonable "fix" that is still in line with the definition of
>> "--fork-point" from that point of view.
>>
>> Whether we add a "looser" version of "--fork-point" to the system or
>> not, the more strict version should still use the current tip as one
>> of the historical tips (i.e. those that we would take from the
>> reflog if the reflog were not empty) in the more "strict" mode.  The
>> looser version may also want to do so as well.
> 
> So, should I mark this in What's cooking report as "expecting a
> reroll", anticipating that a new option would be added to trigger
> the new & looser behaviour?
> 

I dunno. Some participants in this thread considered my patch to be a
fix rather than alternative behaviour. So I hoped for more responses to
your response. (Re-adding dscho on cc - our thread graph forked...)

Also, I'm undecided about about your reflog argument above - if we leave
"--fork-point" to be the current behaviour including Jeff's fix then the
documentation would need an even bigger overhaul, because it's neither
"reflog also" (as claimed in the doc) nor "reflog only" (as in the
original implementation) but "historical tips as inferred from the
current value and the reflog".

In any case, for two modes we need two names for the options. Maybe
--fork-point and --fork-base because in the loose mode, you may get a
"base of a strict fork point"?

Michael


[PATCH 0/2] Two small comment fixes.

2017-09-21 Thread Han-Wen Nienhuys
Han-Wen Nienhuys (2):
  Fix typo in submodule.h
  Document the string_list structure

 string-list.h | 6 ++
 submodule.h   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

--
2.14.1.821.g8fa685d3b7-goog


[PATCH 2/2] Document the string_list structure

2017-09-21 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 string-list.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/string-list.h b/string-list.h
index 29bfb7ae4..08b534166 100644
--- a/string-list.h
+++ b/string-list.h
@@ -8,6 +8,12 @@ struct string_list_item {
 
 typedef int (*compare_strings_fn)(const char *, const char *);
 
+/* A resizable array of strings. The strings are owned if
+ * 'strdup_strings' is set. It can be used as a sorted array, and a
+ * custom comparison may be given in 'cmp'. The field 'items[i].util'
+ * may be used to implement an array of pairs. In that case, the
+ * caller is responsible for managing memory pointed to by 'util'.
+ */
 struct string_list {
struct string_list_item *items;
unsigned int nr, alloc;
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 1/2] Fix typo in submodule.h

2017-09-21 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 submodule.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/submodule.h b/submodule.h
index 6b52133c8..f0da0277a 100644
--- a/submodule.h
+++ b/submodule.h
@@ -120,7 +120,7 @@ extern int submodule_move_head(const char *path,
 
 /*
  * Prepare the "env_array" parameter of a "struct child_process" for executing
- * a submodule by clearing any repo-specific envirionment variables, but
+ * a submodule by clearing any repo-specific environment variables, but
  * retaining any config in the environment.
  */
 extern void prepare_submodule_repo_env(struct argv_array *out);
-- 
2.14.1.821.g8fa685d3b7-goog



Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-21 Thread Ben Peart



On 9/20/2017 10:24 PM, Ben Peart wrote:



On 9/20/2017 10:00 PM, Junio C Hamano wrote:

Ben Peart  writes:


Pretty much the same places you would also use CE_MATCH_IGNORE_VALID
and CE_MATCH_IGNORE_SKIP_WORKTREE which serve the same role for those
features.  That is generally when you are about to overwrite data so
want to be *really* sure you have what you think you have.


Now that makes me worried gravely.

IGNORE_VALID is ignored in these places because we have been burned
by end-users lying to us.  IGNORE_SKIP_WORKTREE must be ignored
because we know that the working tree state does not match the
"reality" the index wants to have.  The fact that the code treats
the status reported and kept up to date by fsmonitor the same way as
these two implies that it is merely advisory and cannot be trusted?
Is that the reason why we tell the codepath with IGNORE_FSMONITOR to
ignore the state fsmonitor reported and check the state ourselves?



Sorry for causing unnecessary worry.  The fsmonitor data can be trusted 
(as much as you can trust that Watchman or your file system monitor is 
not buggy).  I wasn't 100% sure *why* these places passed the various 
IGNORE_VALID and IGNORE_SKIP_WORKTREE flags.  When I looked at them, 
that lack of trust seemed to be the reason.


Adding IGNORE_FSMONITOR in those same places was simply an abundance of 
caution on my part.  The only down side of passing the flag for 
fsmonitor is that we will end up calling lstat() on a file where we 
technically didn't need too.  That seemed safer than potentially missing 
a change if I had misunderstood the code.


I'd much rather return correct results (and fall back to the old 
performance) than potentially be incorrect.  I followed that same 
principal in the entire design of fsmonitor - if anything doesn't look 
right, fall back to the old code path just in case...




I spent some time with git blame/show trying to figure out the *why* for 
all the places CE_MATCH_IGNORE_* are passed without gaining a lot of 
additional understanding.  Based on your description above of why these 
exist, I believe there are very few places we actually need to pass 
CE_MATCH_IGNORE_FSMONITOR and that I was being overly cautious.


Here is a patch that removes the unnecessary CE_MATCH_IGNORE_FSMONITOR 
instances.  While the test suite passes with this change, I'm not 100% 
confident that we actually have test cases that would have detected all 
the places that we needed the CE_MATCH_IGNORE_* flags.


If this seems like a reasonable additional optimization to make, I can 
roll it into the next iteration of the patch series as I have some 
spelling, documentation changes and other tweaks as a result of all the 
feedback.



From 6ff7ed0467fd736dca73efe62391bb3ee9b4e771 Mon Sep 17 00:00:00 2001
From: Ben Peart 
Date: Thu, 21 Sep 2017 09:09:42 -0400
Subject: [PATCH] fsmonitor: remove unnecessary uses of
 CE_MATCH_IGNORE_FSMONITOR

With a better understanding of *why* the CE_MATCH_IGNORE_* flags are
used, it is now more clear they are not required in most cases where
CE_MATCH_IGNORE_FSMONITOR was being passed out of an abundance of
caution.

Since the fsmonitor data can be trusted and is kept in sync with the
working directory, the only remaining valid uses are those locations
where we don't want to trigger an unneeded refresh_fsmonitor() call.

One is where preload_index() is doing a fast precompute of state for
the bulk of the index entries but is not required for correctness as
refresh_cache_ent() will ensure any "missed" by preload_index() are
up-to-date if/when they are needed.

The second is in is_staging_gitmodules_ok() where we don't want to
trigger a complete refresh just to check the .gitignore file.

The net result of this change will be that there are more cases where
we will be able to use the cached index state and avoid unnecessary
lstat() calls.

Signed-off-by: Ben Peart 
---
 apply.c| 2 +-
 entry.c| 2 +-
 read-cache.c   | 4 ++--
 unpack-trees.c | 6 +++---
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/apply.c b/apply.c
index 9061cc5f15..71cbbd141c 100644
--- a/apply.c
+++ b/apply.c
@@ -3399,7 +3399,7 @@ static int verify_index_match(const struct 
cache_entry *ce, struct stat *st)

return -1;
return 0;
}
-	return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR);
+	return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);

 }

 #define SUBMODULE_PATCH_WITHOUT_INDEX 1
diff --git a/entry.c b/entry.c
index 5e6794f9fc..3a7b667373 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@ int checkout_entry(struct cache_entry *ce,

if (!check_path(path.buf, path.len, &st, state->base_dir_len)) {
const struct submodule *sub;
-		unsigned changed = ce_match_stat(ce, &st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR);
+		unsigned changed =

[GSoC][PATCH v4 1/4] submodule--helper: introduce get_submodule_displaypath()

2017-09-21 Thread Han-Wen Nienhuys
LGTM with nits.

+static char *get_submodule_displaypath(const char *path, const char *prefix)

this could do with a comment

  /* the result should be freed by the caller. */

+   } else if (super_prefix) {
+   int len = strlen(super_prefix);
+   const char *format = is_dir_sep(super_prefix[len - 1]) ? "%s%s" 
: "%s/%s";

what if len == 0? The handling of '/' looks like a change from the original.


[GSoC][PATCH v4 3/4] submodule: port set_name_rev() from shell to C

2017-09-21 Thread Han-Wen Nienhuys
LGTM with nits

commit message:

"revision name, and later handles its formating and printing."

typo: formatting

+   if (!capture_command(&cp, &sb, 0) && sb.len) {
+   strbuf_strip_suffix(&sb, "\n");
+   return strbuf_detach(&sb, NULL);
+   }

you discard all output if these commands fail, so if the argument is a
not a submodule, or the other is not a sha1, it will just print
nothing without error message. Maybe that is OK, though? I don't see
documentation for these commands, so maybe this is not meant to be
usable?


Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list

2017-09-21 Thread Andreas Schwab
On Sep 21 2017, Junio C Hamano  wrote:

> Now you make me curious.  How would that variant be different from
> option C. in Jonathan's message?

Only in the parity of the condition.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


[GSoC][PATCH v4 4/4] submodule: port submodule subcommand 'status' from shell to C

2017-09-21 Thread Han-Wen Nienhuys
+   const char *const git_submodule_helper_usage[] = {
+   N_("git submodule status [--quiet] [--cached] [--recursive] 
[]"),
+   NULL

the manpage over here says

  git submodule [--quiet] status [--cached] [--recursive] [--] [...]

ie. multiple path arguments. Should this usage string be tweaked?

+static void print_status(struct status_cb *info, char state, const char *path,
+const struct object_id *oid, const char *displaypath)
+{

could do with a comment. What are the options for the `state` char?

+   if (state == ' ' || state == '+') {
+   struct argv_array get_rev_args = ARGV_ARRAY_INIT;
+
+   argv_array_pushl(&get_rev_args, "get-rev-name",
+path, oid_to_hex(oid), NULL);
+   get_rev_name(get_rev_args.argc, get_rev_args.argv,
+info->prefix);

since you're not really subprocessing, can't you simply have a

  do_print_rev_name(char *path, char *sha) {
 ..
 printf("\n");
  }

and call that directly? Or call compute_rev_name directly. Then you
don't have to do argv setup here.

Also, the name get_rev_name() is a little unfortunate, since it
doesn't return a name, but rather prints it. Maybe the functions
implementing helper commands could be named like:

  command_get_rev_name

or similar.


Re: [PATCH 2/2] Document the string_list structure

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 02:43:38PM +0200, Han-Wen Nienhuys wrote:

> Signed-off-by: Han-Wen Nienhuys 
> ---
>  string-list.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/string-list.h b/string-list.h
> index 29bfb7ae4..08b534166 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -8,6 +8,12 @@ struct string_list_item {
>  
>  typedef int (*compare_strings_fn)(const char *, const char *);
>  
> +/* A resizable array of strings. The strings are owned if
> + * 'strdup_strings' is set. It can be used as a sorted array, and a
> + * custom comparison may be given in 'cmp'. The field 'items[i].util'
> + * may be used to implement an array of pairs. In that case, the
> + * caller is responsible for managing memory pointed to by 'util'.
> + */
>  struct string_list {
>   struct string_list_item *items;
>   unsigned int nr, alloc;

There's a considerable amount of documentation for string-list in
Documentation/technical/api-string-list.txt.

Perhaps we should look at migrating it into string-list.h, where it's
more likely to be found (and kept up to date). We did something similar
for strbuf.h a while back, and I think the result is much better. See
the commits in:

  git log bdfdaa4978^..d468fa2721

for an example.

-Peff


[PATCH 0/4] avoid some -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones

This series removes 687 '-Wsign-compare' warnings when applied to the
current master branch (@ commit 59c0ea183). Using gcc v5.4.0 and adding
'-Wextra' to the compilation flags, we can summarize the warnings as
follows:

  $ grep warning out1 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn
 1437 [-Wunused-parameter]
 1410 [-Wsign-compare]
  670 [-Wmissing-field-initializers]
7 [-Wempty-body]
  $ 

After applying this series, we see:

  $ grep warning out2 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn
 1437 [-Wunused-parameter]
  723 [-Wsign-compare]
  670 [-Wmissing-field-initializers]
7 [-Wempty-body]
  $ 

The number of -Wunused-parameter warnings is not as bad as it seems; for
example, we can get rid of 690 such warning with the following patch:

  $ git diff
  diff --git a/git-compat-util.h b/git-compat-util.h
  index cedad4d58..8e7388082 100644
  --- a/git-compat-util.h
  +++ b/git-compat-util.h
  @@ -118,6 +118,9 @@
   /* Approximation of the length of the decimal representation of this type. */
   #define decimal_length(x)((int)(sizeof(x) * 2.56 + 0.5) + 1)
   
  +/* suppress 'unused parameter' warnings */
  +#define UNUSED(x) ((void)(x))
  +
   #if defined(__sun__)
/*
 * On Solaris, when _XOPEN_EXTENDED is set, its header file
  @@ -341,6 +344,7 @@ typedef uintmax_t timestamp_t;
   #ifndef has_dos_drive_prefix
   static inline int git_has_dos_drive_prefix(const char *path)
   {
  + UNUSED(path);
return 0;
   }
   #define has_dos_drive_prefix git_has_dos_drive_prefix
  @@ -349,6 +353,7 @@ static inline int git_has_dos_drive_prefix(const char 
*path)
   #ifndef skip_dos_drive_prefix
   static inline int git_skip_dos_drive_prefix(char **path)
   {
  + UNUSED(path);
return 0;
   }
   #define skip_dos_drive_prefix git_skip_dos_drive_prefix
  $ 
  
  $ grep warning out3 | sed -e 's/.*\[/\[/' | sort | uniq -c | sort -rn
  747 [-Wunused-parameter]
  723 [-Wsign-compare]
  670 [-Wmissing-field-initializers]
7 [-Wempty-body]
  $ 

The original version of the UNUSED macro used the gcc __unused__ attribute,
but that caused some issues with msvc IIRC, so this version has been
simplified. I have been meaning to check that this does not cause any bloat
in the git executable (with all optimisation levels), so I haven't submitted
it before. What do you think?

These patches reduce the error count quite a bit, without touching too many
files, but additional patches may have to be batched up and submitted over
several releases, viz:

  $ grep warning out2 | cut -d: -f1 | sort | uniq | wc -l
  264
  $ 

Ramsay Jones (4):
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  ALLOC_GROW: avoid -Wsign-compare warnings

 builtin/pack-objects.c |  4 ++--
 cache.h|  4 ++--
 commit-slab.h  |  6 +++---
 config.c   |  2 +-
 diff.c |  2 +-
 git-compat-util.h  |  6 --
 line-log.c | 18 +-
 line-log.h |  2 +-
 revision.c |  2 +-
 tree-walk.c|  3 +--
 10 files changed, 25 insertions(+), 24 deletions(-)

-- 
2.14.0


[PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones

Signed-off-by: Ramsay Jones 
---
 git-compat-util.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 9bc15b036..cedad4d58 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str)
 
 static inline size_t xsize_t(off_t len)
 {
-   if (len > (size_t) len)
+   size_t size = (size_t) len;
+
+   if (len != (off_t) size)
die("Cannot handle files this big");
-   return (size_t)len;
+   return size;
 }
 
 __attribute__((format (printf, 3, 4)))
-- 
2.14.0


[PATCH 2/4] commit-slab.h: avoid -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones

Signed-off-by: Ramsay Jones 
---
 commit-slab.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/commit-slab.h b/commit-slab.h
index 333d81e37..dcaab8ca0 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -78,7 +78,7 @@ static MAYBE_UNUSED void init_ ##slabname(struct slabname *s) 
\
\
 static MAYBE_UNUSED void clear_ ##slabname(struct slabname *s) \
 {  \
-   int i;  \
+   unsigned int i; \
for (i = 0; i < s->slab_count; i++) \
free(s->slab[i]);   \
s->slab_count = 0;  \
@@ -89,13 +89,13 @@ static MAYBE_UNUSED elemtype *slabname## _at_peek(struct 
slabname *s,   \
  const struct commit *c, \
  int add_if_missing)   \
 {  \
-   int nth_slab, nth_slot; \
+   unsigned int nth_slab, nth_slot;\
\
nth_slab = c->index / s->slab_size; \
nth_slot = c->index % s->slab_size; \
\
if (s->slab_count <= nth_slab) {\
-   int i;  \
+   unsigned int i; \
if (!add_if_missing)\
return NULL;\
REALLOC_ARRAY(s->slab, nth_slab + 1);   \
-- 
2.14.0


[PATCH 3/4] cache.h: hex2chr() - avoid -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones

Signed-off-by: Ramsay Jones 
---
 cache.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index a916bc79e..a0e3e362c 100644
--- a/cache.h
+++ b/cache.h
@@ -1243,8 +1243,8 @@ static inline unsigned int hexval(unsigned char c)
  */
 static inline int hex2chr(const char *s)
 {
-   int val = hexval(s[0]);
-   return (val < 0) ? val : (val << 4) | hexval(s[1]);
+   unsigned int val = hexval(s[0]);
+   return (val & ~0xf) ? val : (val << 4) | hexval(s[1]);
 }
 
 /* Convert to/from hex/sha1 representation */
-- 
2.14.0


[PATCH 4/4] ALLOC_GROW: avoid -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones

Signed-off-by: Ramsay Jones 
---
 builtin/pack-objects.c |  4 ++--
 config.c   |  2 +-
 diff.c |  2 +-
 line-log.c | 18 +-
 line-log.h |  2 +-
 revision.c |  2 +-
 tree-walk.c|  3 +--
 7 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a57b4f058..a6ee653bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2563,8 +2563,8 @@ struct in_pack_object {
 };
 
 struct in_pack {
-   int alloc;
-   int nr;
+   unsigned int alloc;
+   unsigned int  nr;
struct in_pack_object *array;
 };
 
diff --git a/config.c b/config.c
index cd5a69e63..aeab02c06 100644
--- a/config.c
+++ b/config.c
@@ -2200,7 +2200,7 @@ static struct {
size_t *offset;
unsigned int offset_alloc;
enum { START, SECTION_SEEN, SECTION_END_SEEN, KEY_SEEN } state;
-   int seen;
+   unsigned int seen;
 } store;
 
 static int matches(const char *key, const char *value)
diff --git a/diff.c b/diff.c
index ea7e5978b..be94ad4f4 100644
--- a/diff.c
+++ b/diff.c
@@ -1541,7 +1541,7 @@ static void emit_rewrite_diff(const char *name_a,
 
 struct diff_words_buffer {
mmfile_t text;
-   long alloc;
+   unsigned long alloc;
struct diff_words_orig {
const char *begin, *end;
} *orig;
diff --git a/line-log.c b/line-log.c
index ab0709f9a..545ad0f28 100644
--- a/line-log.c
+++ b/line-log.c
@@ -90,7 +90,7 @@ static int range_cmp(const void *_r, const void *_s)
  */
 static void range_set_check_invariants(struct range_set *rs)
 {
-   int i;
+   unsigned int i;
 
if (!rs)
return;
@@ -110,8 +110,8 @@ static void range_set_check_invariants(struct range_set *rs)
  */
 void sort_and_merge_range_set(struct range_set *rs)
 {
-   int i;
-   int o = 0; /* output cursor */
+   unsigned int i;
+   unsigned int o = 0; /* output cursor */
 
QSORT(rs->ranges, rs->nr, range_cmp);
 
@@ -144,7 +144,7 @@ void sort_and_merge_range_set(struct range_set *rs)
 static void range_set_union(struct range_set *out,
 struct range_set *a, struct range_set *b)
 {
-   int i = 0, j = 0;
+   unsigned int i = 0, j = 0;
struct range *ra = a->ranges;
struct range *rb = b->ranges;
/* cannot make an alias of out->ranges: it may change during grow */
@@ -186,7 +186,7 @@ static void range_set_union(struct range_set *out,
 static void range_set_difference(struct range_set *out,
  struct range_set *a, struct range_set *b)
 {
-   int i, j =  0;
+   unsigned int i, j =  0;
for (i = 0; i < a->nr; i++) {
long start = a->ranges[i].start;
long end = a->ranges[i].end;
@@ -397,7 +397,7 @@ static void diff_ranges_filter_touched(struct diff_ranges 
*out,
   struct diff_ranges *diff,
   struct range_set *rs)
 {
-   int i, j = 0;
+   unsigned int i, j = 0;
 
assert(out->target.nr == 0);
 
@@ -426,7 +426,7 @@ static void range_set_shift_diff(struct range_set *out,
 struct range_set *rs,
 struct diff_ranges *diff)
 {
-   int i, j = 0;
+   unsigned int i, j = 0;
long offset = 0;
struct range *src = rs->ranges;
struct range *target = diff->target.ranges;
@@ -873,7 +873,7 @@ static char *output_prefix(struct diff_options *opt)
 
 static void dump_diff_hacky_one(struct rev_info *rev, struct line_log_data 
*range)
 {
-   int i, j = 0;
+   unsigned int i, j = 0;
long p_lines, t_lines;
unsigned long *p_ends = NULL, *t_ends = NULL;
struct diff_filepair *pair = range->pair;
@@ -906,7 +906,7 @@ static void dump_diff_hacky_one(struct rev_info *rev, 
struct line_log_data *rang
long t_start = range->ranges.ranges[i].start;
long t_end = range->ranges.ranges[i].end;
long t_cur = t_start;
-   int j_last;
+   unsigned int j_last;
 
while (j < diff->target.nr && diff->target.ranges[j].end < 
t_start)
j++;
diff --git a/line-log.h b/line-log.h
index 7a5c24e2d..e2a5ee7c6 100644
--- a/line-log.h
+++ b/line-log.h
@@ -14,7 +14,7 @@ struct range {
 
 /* A set of ranges.  The ranges must always be disjoint and sorted. */
 struct range_set {
-   int alloc, nr;
+   unsigned int alloc, nr;
struct range *ranges;
 };
 
diff --git a/revision.c b/revision.c
index f9a90d71d..c8c9cb32c 100644
--- a/revision.c
+++ b/revision.c
@@ -1105,7 +1105,7 @@ static void add_rev_cmdline(struct rev_info *revs,
unsigned flags)
 {
struct rev_cmdline_info *info = &revs->cmdline;
-   int nr = info->nr;
+   unsigned int nr = info->nr

Re: [PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix

2017-09-21 Thread Stefan Beller
On Wed, Sep 20, 2017 at 9:59 PM, Jeff King  wrote:

>> But this compare function is not to order by the natural encoding order,
>> but it's used to detect the '0' at the end of prefix, which orders
>> before *any* unsigned char.
>
> It's not just detecting the "0". We care about the ordering overall (so
> that "refs/foo" comes after "refs/bar", and we know that "refs/bar/baz"
> cannot come after "refs/foo", and we can stop iterating).

refs/{foo,bar,bar/baz} is all ASCII, such that the ordering by byte value
and byte position (2nd order) orders 'correctly'. correct in the sense as
Git expects. But if you use other encodings, the natural encoding
may differ from the arbitrary order that we have here. (Example
utf-8 with BOM and smileys)

However these different orders do not matter (according to my initial
conclusion), because we need (a) find out about '\0' and (b) only need
only 'arbitrary' order (which is referred to by the " ordered" trait).

>> Essentially compare_prefix wants to provide the same return
>> value as `strncmp(refname, prefix, min(strlen(refname), strlen(prefix)));`
>> except that it is optimized as we do not have to walk over a string
>> multiple times (to determine length and then pass it to compare).
>
> Hmm, yeah, I think that would be an equivalent. I didn't think of that,
> but as you say it would be less efficient.

I was just too lazy to check if we had the results of strlen around already,
such that the comparison would be equally cheap, but more readable.
Also IIRC later patches enhance this function, so we shall not make use
of the strncmp function here.

> The patch is credited to me, but I actually screwed up the ordering by
> failing to do the unsigned cast. Michael fixed that part before posting
> it. :)

Thanks,
Stefan


Re: [PATCH 2/2] Document the string_list structure

2017-09-21 Thread Brandon Williams
On 09/21, Han-Wen Nienhuys wrote:
> Signed-off-by: Han-Wen Nienhuys 
> ---
>  string-list.h | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/string-list.h b/string-list.h
> index 29bfb7ae4..08b534166 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -8,6 +8,12 @@ struct string_list_item {
>  
>  typedef int (*compare_strings_fn)(const char *, const char *);
>  
> +/* A resizable array of strings. The strings are owned if

nit: the start of the comment block should be on its own line:

  /*
   * A resizable ...

> + * 'strdup_strings' is set. It can be used as a sorted array, and a
> + * custom comparison may be given in 'cmp'. The field 'items[i].util'
> + * may be used to implement an array of pairs. In that case, the
> + * caller is responsible for managing memory pointed to by 'util'.
> + */

The util field can be freed if 'free_util' is set when calling
string_list_clear() but yeah, in general the caller is managing that
memory especially if a complex free function is required (in the case of
allocating a struct with dynamically allocated fields to be stored in
the util feild).

Just pointing that out, and I'm happy with this and the previous patch.

>  struct string_list {
>   struct string_list_item *items;
>   unsigned int nr, alloc;
> -- 
> 2.14.1.821.g8fa685d3b7-goog
> 



-- 
Brandon Williams


Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-21 Thread Jeff Hostetler


There's a lot in this patch series.  I'm still studying it, but here
are some notes and questions.  I'll start with direct responses to
the RFC here and follow up in a second email with specific questions
and comments to keep this from being too long).


On 9/15/2017 4:43 PM, Jonathan Tan wrote:

For those interested in partial clones and/or missing objects in repos,
I've updated my original partialclone patches to not require an explicit
list of promises. Fetch/clone still only permits exclusion of blobs, but
the infrastructure is there for a local repo to support missing trees
and commits as well.

They can be found here:

https://github.com/jonathantanmy/git/tree/partialclone2

To make the number of patches more manageable, I have omitted support
for a custom fetching hook (but it can be readded in fetch-object.c),
and only support promisor packfiles for now (but I don't have any
objection to supporting promisor loose objects in the future).

Let me know what you think of the overall approach. In particular, I'm
still wondering if there is a better approach than to toggle
"fetch_if_missing" whenever we need lazy fetching (or need to suppress
it).

Also, if there any patches that you think might be useful to others, let
me know and I'll send them to this mailing list for review.

A demo and an overview of the design (also available from that
repository's README):

Demo


Obtain a repository.

 $ make prefix=$HOME/local install
 $ cd $HOME/tmp
 $ git clone https://github.com/git/git

Make it advertise the new feature and allow requests for arbitrary blobs.

 $ git -C git config uploadpack.advertiseblobmaxbytes 1
 $ git -C git config uploadpack.allowanysha1inwant 1

Perform the partial clone and check that it is indeed smaller. Specify
"file://" in order to test the partial clone mechanism. (If not, Git will
perform a local clone, which unselectively copies every object.)

 $ git clone --blob-max-bytes=10 "file://$(pwd)/git" git2
 $ git clone "file://$(pwd)/git" git3
 $ du -sh git2 git3
 116M   git2
 129M   git3

Observe that the new repo is automatically configured to fetch missing objects
from the original repo. Subsequent fetches will also be partial.

 $ cat git2/.git/config
 [core]
repositoryformatversion = 1
filemode = true
bare = false
logallrefupdates = true
 [remote "origin"]
url = [snip]
fetch = +refs/heads/*:refs/remotes/origin/*
blobmaxbytes = 10
 [extensions]
partialclone = origin
 [branch "master"]
remote = origin
merge = refs/heads/master



I like that git-clone saves the partial clone settings in the
.git/config.  This should make it easier for subsequent commands to
default to the right settings.

Do we allow a partial-fetch following a regular clone (such that git-fetch
would create these config file settings)?  This seems like a reasonable
upgrade path for a user with an existing repo to take advantage of partial
fetches going forward.

Or do we require that git-fetch only be allowed to do partial-fetches
after a partial-clone (so that only clone creates these settings) and
fetch always does partial-fetches thereafter?  It might be useful to
allow fetch to do a full fetch on top of a partial-clone, but there
are probably thin-pack issues to work out first.

Also, there is an assumption here that the user will want to keep
using the same filtering settings on subsequent fetches.  That's
probably fine for now and until we get a chance to try it out for
real.



Unlike in an older version of this code (see the `partialclone` branch), this
also works with the HTTP/HTTPS protocols.

Design
==

Local repository layout
---

A repository declares its dependence on a *promisor remote* (a remote that
declares that it can serve certain objects when requested) by a repository
extension "partialclone". `extensions.partialclone` must be set to the name of
the remote ("origin" in the demo above).



Do we allow EXACTLY ONE promisor-remote?  That is, can we later add
another remote as a promisor-remote?  And then do partial-fetches from
either?  Do we need to disallow removing or altering a remote that is
listed as a promisor-remote?

I think for now, one promisor-remote is fine.  Just asking.

Changing a remote's URL might be fine, but deleting the promisor-remote
would put the user in a weird state.  We don't need to worry about it
now though.



A packfile can be annotated as originating from the promisor remote by the
existence of a "(packfile name).promisor" file with arbitrary contents (similar
to the ".keep" file). Whenever a promisor remote sends an object, it declares
that it can serve every object directly or indirectly referenced by the sent
object.

A promisor packfile is a packfile annotated with the ".promisor" file. A
promisor object is an object in a promisor packfile. A promised object is an
object dire

commit-msg hook with verbose commit flag

2017-09-21 Thread GaveUp
When using the --verbose flag on git-commit with a commit-msg hook the
contents of the diff are passed to the commit-msg hook.  I would have
expected the diff to have been removed since it's below the scissor
line.  Since the scissor line is removed and not visible to commit-msg
there doesn't seem to be a great way to ignore the diff contents either.


Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)

2017-09-21 Thread Jeff Hostetler

(part 2)

Additional overall comments on:
https://github.com/jonathantanmy/git/commits/partialclone2

{} I think it would help to split the blob-max-bytes filtering and the
   promisor/promised concepts and discuss them independently.

   {} Then we can talk about about the promisor/promised functionality
  independent of any kind of filter.  The net-net is that the client
  has missing objects and it doesn't matter what filter criteria
  or mechanism caused that to happened.

   {} blob-max-bytes is but one such filter we should have.  This might
  be sufficient if the goal is replace LFS (where you rarely ever
  need any given very very large object) and dynamically loading
  them as needed is sufficient and the network round-trip isn't
  too much of a perf penalty.

   {} But if we want to do things like a "sparse-enlistments" where the
  client only needs a small part of the tree using sparse-checkout.
  For example, only populating 50,000 files from a tree of 3.5M files
  at HEAD, then we need a more general filtering.

   {} And as I said above, how we chose to filter should be independent
  of how the client handles promisor/promised objects.


{} Also, if we rely strictly on dynamic object fetching to fetch missing
   objects, we are effectively tethered to the server during operations
   (such as checkout) that the user might not think about as requiring
   a network connection.  And we are forced to keep the same limitations
   of LFS in that you can't prefetch and go offline (without actually
   checking out to your worktree first).  And we can't bulk or parallel
   fetch objects.


{} I think it would also help to move the blob-max-bytes calculation out
   of pack-objects.c : add_object_entry() [1].  The current code isolates
   the computation there so that only pack-objects can do the filtering.

   Instead, put it in list-objects.c and traverse_commit_list() so that
   pack-objects and rev-list can share it (as Peff suggested [2] in
   response to my first patch series in March).

   For example, this would let the client have a pre-checkout hook, use
   rev-list to compute the set of missing objects needed for that commit,
   and pipe that to a command to BULK fetch them from the server BEFORE
   starting the actual checkout.  This would allow the savy user to
   manually run a prefetch before going offline.

[1] 
https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110

[2] 
https://public-inbox.org/git/20170309073117.g3br5btsfwntc...@sigill.intra.peff.net/


{} This also locks us into size-only filtering and makes it more
   difficult to add other filters.  In that the add_object_entry()
   code gets called on an object after the traversal has decided
   what to do with it.  It would be difficult to add tree-trimming
   at this level, for example.


{} An early draft of this type of filtering is here [3].  I hope to push
   up a revised draft of this shortly.

[3] https://public-inbox.org/git/20170713173459.3559-1-...@jeffhostetler.com/


Thanks,
Jeff



Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)

2017-09-21 Thread Jeff Hostetler

(part 3)

Additional overall comments on:
https://github.com/jonathantanmy/git/commits/partialclone2

{} WRT the code in is_promised() [1]

[1] 
https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960

   {} it looked like it was adding ALL promisor- and promised-objects to
  the "promised" OIDSET, rather than just promised-objects, but I
  could be mistaken.

   {} Is this iterating over ALL promisor-packfiles?

   {} It looked like this was being used by fsck and rev-list.  I have
  concerns about how big this OIDSET will get and how it will scale,
  since if we start with a partial-clone all packfiles will be
  promisor-packfiles.

   {} When iterating thru a tree object, you add everything that it
  references (everything in that folder).  This adds all of the
  child OIDs -- without regard to whether they are new to this
  version of the tree object. (Granted, this is hard to compute.)

  My concern is that this will add too many objects to the OIDSET.
  That is, a new tree object (because of a recent change to something
  in that folder) will also have the OIDs of the other *unchanged*
  files which may be present in an earlier non-provisor packfile
  from an earlier commit.

  I worry that this will grow the OIDSET to essentially include
  everything.  And possibly defeating its own purpose.  I could be
  wrong here, but that's my concern.


{} I'm not opposed to the .promisor file concept, but I have concerns
   that in practice all packfiles after a partial clone will be
   promisor-packfiles and therefore short-cut during fsck, so fsck
   still won't gain anything.

   It would help if there are also non-promisor packfiles present,
   but only for objects referenced by non-promisor packfiles.

   But then I also have to wonder whether we can even support non-promisor
   packfiles after starting with a partial clone -- because of the
   properties of received thin-packs on a non-partial fetch.
   


Thanks,
Jeff



Re: [PATCH] git: add --no-optional-locks option

2017-09-21 Thread Johannes Sixt

Am 21.09.2017 um 06:32 schrieb Jeff King:

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e5..8dd3ae05ae 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` 
git config
Add "icase" magic to all pathspec. This is equivalent to setting
the `GIT_ICASE_PATHSPECS` environment variable to `1`.
  
+--no-optional-locks::

+   Do not perform optional operations that require locks. This is
+   equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+
  GIT COMMANDS
  
  
@@ -697,6 +701,15 @@ of clones and fetches.

which feed potentially-untrusted URLS to git commands.  See
linkgit:git-config[1] for more details.
  
+`GIT_OPTIONAL_LOCKS`::

+   If set to `0`, Git will avoid performing any operations which
+   require taking a lock and which are not required to complete the
+   requested operation. For example, this will prevent `git status`
+   from refreshing the index as a side effect. This is useful for
+   processes running in the background which do not want to cause
+   lock contention with other operations on the repository.
+   Defaults to `1`.


I don't think we should pass this environment variable to remote 
repositories. It should be listed in local_repo_env[] in environment.c.


-- Hannes


Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings

2017-09-21 Thread Torsten Bögershausen
On 2017-09-21 18:46, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
>  git-compat-util.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 9bc15b036..cedad4d58 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str)
>  
>  static inline size_t xsize_t(off_t len)
>  {
> - if (len > (size_t) len)
> + size_t size = (size_t) len;
> +
> + if (len != (off_t) size)
>   die("Cannot handle files this big");
> - return (size_t)len;
> + return size;

Hm, can someone help me out ?
Why is the cast not needed ?

>  }
>  
>  __attribute__((format (printf, 3, 4)))
> 



Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings

2017-09-21 Thread Stefan Beller
On Thu, Sep 21, 2017 at 11:49 AM, Torsten Bögershausen  wrote:
> On 2017-09-21 18:46, Ramsay Jones wrote:
>>
>> Signed-off-by: Ramsay Jones 
>> ---
>>  git-compat-util.h | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 9bc15b036..cedad4d58 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str)
>>
>>  static inline size_t xsize_t(off_t len)
>>  {
>> - if (len > (size_t) len)
>> + size_t size = (size_t) len;
>> +
>> + if (len != (off_t) size)
>>   die("Cannot handle files this big");
>> - return (size_t)len;
>> + return size;
>
> Hm, can someone help me out ?
> Why is the cast not needed ?

Because 'size' is already size_t,
(cast was done in first line of the function:
size_t size = (size_t) len;
), previously we cast len multiple times,
now we cast it once and use size thereafter.

>
>>  }
>>
>>  __attribute__((format (printf, 3, 4)))
>>
>


Re: [PATCH 1/4] git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings

2017-09-21 Thread Ramsay Jones


On 21/09/17 19:49, Torsten Bögershausen wrote:
> On 2017-09-21 18:46, Ramsay Jones wrote:
>>
>> Signed-off-by: Ramsay Jones 
>> ---
>>  git-compat-util.h | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index 9bc15b036..cedad4d58 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -898,9 +898,11 @@ static inline char *xstrdup_or_null(const char *str)
>>  
>>  static inline size_t xsize_t(off_t len)
>>  {
>> -if (len > (size_t) len)
>> +size_t size = (size_t) len;
>> +
>> +if (len != (off_t) size)
>>  die("Cannot handle files this big");
>> -return (size_t)len;
>> +return size;
> 
> Hm, can someone help me out ?
> Why is the cast not needed ?

The cast in the return statement? If so, because the 'size' variable
has been declared with the size_t type. (The truncation, if any, would
happen when 'size' is initialised in the declaration).

Hope that helps.

ATB,
Ramsay Jones




[PATCH] Documentation/githooks: mention merge in commit-msg hook

2017-09-21 Thread Stefan Beller
The commit-msg hook is invoked by both commit and merge now.

Reported-by: Kaartic Sivaraam 
Signed-off-by: Stefan Beller 
---
 Documentation/githooks.txt | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 623ed1a138..f4e75b9c90 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -127,11 +127,10 @@ help message found in the commented portion of the commit 
template.
 commit-msg
 ~~
 
-This hook is invoked by 'git commit', and can be bypassed
-with the `--no-verify` option.  It takes a single parameter, the
-name of the file that holds the proposed commit log message.
-Exiting with a non-zero status causes the 'git commit' to
-abort.
+This hook is invoked by 'git commit' and 'git merge', and can be
+bypassed with the `--no-verify` option.  It takes a single parameter,
+the name of the file that holds the proposed commit log message.
+Exiting with a non-zero status causes the command to abort.
 
 The hook is allowed to edit the message file in place, and can be used
 to normalize the message into some project standard format. It
-- 
2.14.0.rc0.3.g6c2e499285



[PATCH] connect: in ref advertisement, shallows are last

2017-09-21 Thread Jonathan Tan
Currently, get_remote_heads() parses the ref advertisement in one loop,
allowing refs and shallow lines to intersperse, despite this not being
allowed by the specification. Refactor get_remote_heads() to use two
loops instead, enforcing that refs come first, and then shallows.

This also makes it easier to teach get_remote_heads() to interpret other
lines in the ref advertisement, which will be done in a subsequent
patch.

Signed-off-by: Jonathan Tan 
---
It seems that some people are concerned about looseness in interpreting
the ref advertisement, so here is a patch to tighten it instead. This is
a replacement for Brandon's PATCH 1.5.

I think this is what Jonathan Nieder meant by his instruction flow idea.

I've checked that Brandon's other patches apply cleanly on this patch,
except for "connect: teach client to recognize v1 server response" which
has to be modified to the following:

@@ -149,6 +150,26 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
*list = NULL;
 
len = read_remote_ref(in, &src_buf, &src_len, &responded);
+
+   switch (determine_protocol_version_client(packet_buffer)) {
+   case protocol_v1:
+   /*
+* First pkt-line contained the version string.
+* Continue on to process the ref advertisement.
+*/
+   len = read_remote_ref(in, &src_buf, &src_len, &responded);
+   break;
+   case protocol_v0:
+   /*
+* Server is speaking protocol v0 and sent a
+* ref so we need to process it.
+*/
+   break;
+   default:
+   die("server is speaking an unknown protocol");
+   break;
+   }
+

 connect.c | 112 ++
 1 file changed, 61 insertions(+), 51 deletions(-)

diff --git a/connect.c b/connect.c
index 49b28b83b..9bf97adf6 100644
--- a/connect.c
+++ b/connect.c
@@ -107,6 +107,26 @@ static void annotate_refs_with_symref_info(struct ref *ref)
string_list_clear(&symref, 0);
 }
 
+/*
+ * Read one line of a server's ref advertisement into packet_buffer.
+ */
+int read_remote_ref(int in, char **src_buf, size_t *src_len, int *responded)
+{
+   int len = packet_read(in, src_buf, src_len,
+ packet_buffer, sizeof(packet_buffer),
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_CHOMP_NEWLINE);
+   const char *arg;
+   if (len < 0)
+   die_initial_contact(*responded);
+   if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
+   die("remote error: %s", arg);
+
+   *responded = 1;
+
+   return len;
+}
+
 /*
  * Read all the refs from the other end
  */
@@ -123,46 +143,23 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 * willing to talk to us.  A hang-up before seeing any
 * response does not necessarily mean an ACL problem, though.
 */
-   int saw_response;
-   int got_dummy_ref_with_capabilities_declaration = 0;
+   int responded = 0;
+   int len;
 
*list = NULL;
-   for (saw_response = 0; ; saw_response = 1) {
-   struct ref *ref;
+
+   len = read_remote_ref(in, &src_buf, &src_len, &responded);
+   do {
struct object_id old_oid;
char *name;
-   int len, name_len;
-   char *buffer = packet_buffer;
-   const char *arg;
+   int name_len;
 
-   len = packet_read(in, &src_buf, &src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   if (len < 0)
-   die_initial_contact(saw_response);
-
-   if (!len)
+   if (len < GIT_SHA1_HEXSZ + 2 ||
+   get_oid_hex(packet_buffer, &old_oid) ||
+   packet_buffer[GIT_SHA1_HEXSZ] != ' ')
break;
 
-   if (len > 4 && skip_prefix(buffer, "ERR ", &arg))
-   die("remote error: %s", arg);
-
-   if (len == GIT_SHA1_HEXSZ + strlen("shallow ") &&
-   skip_prefix(buffer, "shallow ", &arg)) {
-   if (get_oid_hex(arg, &old_oid))
-   die("protocol error: expected shallow sha-1, 
got '%s'", arg);
-   if (!shallow_points)
-   die("repository on the other end cannot be 
shallow");
-   oid_array_append(shallow_points, &old_oid);
-   continue;
-   }
-
-   if (len < GIT_SHA1_HEXSZ + 2 || get_oid_hex(buffer, &old_oid) ||
-   buffer[GIT_SHA1_HEXSZ] != ' ')
- 

[PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison

2017-09-21 Thread larsxschneider
From: Lars Schneider 

09f5e97 ("travis-ci: skip a branch build if equal tag is present",
2017-09-17) introduced the "skip_branch_tip_with_tag" function with
a broken string comparison. Fix it!

Reported-by: SZEDER Gábor 
Signed-off-by: Lars Schneider 
---

Hi,

previous discussion:
https://public-inbox.org/git/3b175d35-5b1c-43cd-a7e9-85693335b...@gmail.com/

Sorry that this trivial fix took so long.

Cheers,
Lars


Notes:
Base Commit: a81423d7cf (a81423d7cfdc57238783f053941064c99165)
Diff on Web: https://github.com/larsxschneider/git/commit/6b532a42f0
Checkout:git fetch https://github.com/larsxschneider/git 
travisci/fix-skip-branch-v1 && git checkout 6b532a42f0

 ci/lib-travisci.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ci/lib-travisci.sh b/ci/lib-travisci.sh
index 9c4ae9bdd0..c3b46f4a7d 100755
--- a/ci/lib-travisci.sh
+++ b/ci/lib-travisci.sh
@@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
# of a tag.

if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
-   $TAG != $TRAVIS_BRANCH
+   [ "$TAG" != "$TRAVIS_BRANCH" ]
then
echo "Tip of $TRAVIS_BRANCH is exactly at $TAG"
exit 0
--
2.14.1



Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-21 Thread Sahil Dua
>
> * sd/branch-copy (2017-06-18) 3 commits
>  - branch: add a --copy (-c) option to go with --move (-m)
>  - branch: add test for -m renaming multiple config sections
>  - config: create a function to format section headers
>
>  "git branch" learned "-c/-C" to create and switch to a new branch
>  by copying an existing one.
>
>  I personally do not think "branch --copy master backup" while on
>  "master" that switches to "backup" is a good UI, and I *will* say
>  "I told you so" when users complain after we merge this down to
>  'master'.

Junio, what's up with this one? It's been a long time for this being
cooked in next. Do we have a plan/idea going forward or is it just
waiting for more feedback?


Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison

2017-09-21 Thread Jonathan Nieder
larsxschnei...@gmail.com wrote:

> 09f5e97 ("travis-ci: skip a branch build if equal tag is present",
> 2017-09-17) introduced the "skip_branch_tip_with_tag" function with
> a broken string comparison. Fix it!
>
> Reported-by: SZEDER Gábor 
> Signed-off-by: Lars Schneider 
> ---

Thanks for the fix.

09f5e97 appears to be for the ls/travis-scriptify branch, which is
already part of "next" (if it weren't, I'd suggest just squashing your
patch into that commit).

> --- a/ci/lib-travisci.sh
> +++ b/ci/lib-travisci.sh
> @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
>   # of a tag.
> 
>   if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
> - $TAG != $TRAVIS_BRANCH
> + [ "$TAG" != "$TRAVIS_BRANCH" ]

Git style is to use 'test' instead of '[' for this.  See
https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/
for more on that subject.

Could you squash in the following?

Thanks,
Jonathan

diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh
index c3b46f4a7d..b3ed0a0dda 100755
--- i/ci/lib-travisci.sh
+++ w/ci/lib-travisci.sh
@@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
# of a tag.
 
if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
-   [ "$TAG" != "$TRAVIS_BRANCH" ]
+   test "$TAG" != "$TRAVIS_BRANCH"
then
echo "Tip of $TRAVIS_BRANCH is exactly at $TAG"
exit 0


Re: [PATCH 3/8] daemon: recognize hidden request arguments

2017-09-21 Thread Brandon Williams
On 09/20, Jonathan Tan wrote:
> On Wed, 20 Sep 2017 17:24:43 -0700
> Jonathan Tan  wrote:
> 
> > On Wed, 13 Sep 2017 14:54:43 -0700
> > Brandon Williams  wrote:
> > 
> > > A normal request to git-daemon is structured as
> > > "command path/to/repo\0host=..\0" and due to a bug in an old version of
> > > git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the
> > > command, 2009-06-04) we aren't able to place any extra args (separated
> > > by NULs) besides the host.
> > > 
> > > In order to get around this limitation teach git-daemon to recognize
> > > additional request arguments hidden behind a second NUL byte.  Requests
> > > can then be structured like:
> > > "command path/to/repo\0host=..\0\0version=1\0key=value\0".  git-daemon
> > > can then parse out the extra arguments and set 'GIT_PROTOCOL'
> > > accordingly.
> > 
> > A test in this patch (if possible) would be nice, but it is probably
> > clearer to test this when one of the commands (e.g. upload-pack) is
> > done. Could a test be added to the next patch to verify (using
> > GIT_TRACE_PACKET, maybe) that the expected strings are sent? Then
> > mention in this commit message that this will be tested in the next
> > patch too.
> 
> Ah, I see that it is tested in 6/8. You can ignore this comment.

Yeah I felt it would have been difficult to test any earlier without
both the client and server sides done.

-- 
Brandon Williams


Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison

2017-09-21 Thread Lars Schneider

> On 21 Sep 2017, at 23:28, Jonathan Nieder  wrote:
> 
> larsxschnei...@gmail.com wrote:
> 
>> 09f5e97 ("travis-ci: skip a branch build if equal tag is present",
>> 2017-09-17) introduced the "skip_branch_tip_with_tag" function with
>> a broken string comparison. Fix it!
>> 
>> Reported-by: SZEDER Gábor 
>> Signed-off-by: Lars Schneider 
>> ---
> 
> Thanks for the fix.
> 
> 09f5e97 appears to be for the ls/travis-scriptify branch, which is
> already part of "next" (if it weren't, I'd suggest just squashing your
> patch into that commit).
> 
>> --- a/ci/lib-travisci.sh
>> +++ b/ci/lib-travisci.sh
>> @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
>>  # of a tag.
>> 
>>  if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
>> -$TAG != $TRAVIS_BRANCH
>> +[ "$TAG" != "$TRAVIS_BRANCH" ]
> 
> Git style is to use 'test' instead of '[' for this.  See
> https://public-inbox.org/git/2f3cdc85-f051-c0ae-b9db-fd13cac78...@gmail.com/
> for more on that subject.

Oh, you're right!


> Could you squash in the following?

@Junio: Can you squash it when you apply the patch?

Thank you,
Lars


> 
> Thanks,
> Jonathan
> 
> diff --git i/ci/lib-travisci.sh w/ci/lib-travisci.sh
> index c3b46f4a7d..b3ed0a0dda 100755
> --- i/ci/lib-travisci.sh
> +++ w/ci/lib-travisci.sh
> @@ -14,7 +14,7 @@ skip_branch_tip_with_tag () {
>   # of a tag.
> 
>   if TAG=$(git describe --exact-match "$TRAVIS_BRANCH" 2>/dev/null) &&
> - [ "$TAG" != "$TRAVIS_BRANCH" ]
> + test "$TAG" != "$TRAVIS_BRANCH"
>   then
>   echo "Tip of $TRAVIS_BRANCH is exactly at $TAG"
>   exit 0



Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-21 Thread Jonathan Tan
On Thu, 21 Sep 2017 13:57:30 -0400
Jeff Hostetler  wrote:

> There's a lot in this patch series.  I'm still studying it, but here
> are some notes and questions.  I'll start with direct responses to
> the RFC here and follow up in a second email with specific questions
> and comments to keep this from being too long).

OK - thanks for your detailed comments.

> I like that git-clone saves the partial clone settings in the
> .git/config.  This should make it easier for subsequent commands to
> default to the right settings.
> 
> Do we allow a partial-fetch following a regular clone (such that
> git-fetch would create these config file settings)?  This seems like
> a reasonable upgrade path for a user with an existing repo to take
> advantage of partial fetches going forward.

A "git-fetch ...options... --save-options" does not sound unreasonable,
although I would think that (i) partial fetches/clones are useful on
very large repositories, and (ii) the fact that you had already cloned
this large repository shows that you can handle the disk and network
load, so partial fetch after non-partial clone doesn't seem very useful.

But if there is a use case, I think it could work. Although note that
the GC in my patch set stops at promisor objects, so if an object
originally cloned cannot be reached through a walk (with non-promisor
intermediate objects only), it might end up GC-ed (which might be fine).

> Or do we require that git-fetch only be allowed to do partial-fetches
> after a partial-clone (so that only clone creates these settings) and
> fetch always does partial-fetches thereafter?  It might be useful to
> allow fetch to do a full fetch on top of a partial-clone, but there
> are probably thin-pack issues to work out first.

About the thin-pack issue, I wonder if it is sufficient to turn on
fetch_if_missing while index-pack is trying to fix the thin pack.

If the thin-pack issues are worked out, then non-partial fetch after
partial clone seems doable (all commits from our tip to their tip are
fetched, as well as all new trees and all new blobs; any trees and blobs
still missing are already promised).

Thanks for these questions - I am concentrating on repos in which both
clone and fetch are partial, but it is good to discuss what happens if
the user does something else.

> Also, there is an assumption here that the user will want to keep
> using the same filtering settings on subsequent fetches.  That's
> probably fine for now and until we get a chance to try it out for
> real.

Yes. Having said that, the fetching of missing objects does not take
into account the filter at all, so the filter can be easily changed.

> Do we allow EXACTLY ONE promisor-remote?  That is, can we later add
> another remote as a promisor-remote?  And then do partial-fetches from
> either?

Yes, exactly one (because we need to know which remote to fetch missing
objects from, and which remote to allow partial fetches from). But the
promisor remote can be switched to another.

> Do we need to disallow removing or altering a remote that is
> listed as a promisor-remote?

Perhaps, although I think that right now configs are checked when we run
the command using the config, not when we run "git config".

> I think for now, one promisor-remote is fine.  Just asking.
> 
> Changing a remote's URL might be fine, but deleting the
> promisor-remote would put the user in a weird state.  We don't need
> to worry about it now though.

Agreed.

> I struggled with the terms here a little when looking at the source.
> () A remote responding to a partial-clone is termed a
> "promisor-remote". () Packfiles received from a promisor-remote are
> marked with ".promisor" like ".keep" names.
> () An object actually contained in such packfiles is called a
> "promisor-object". () An object not-present but referenced by one of
> the above promisor-objects is called a "promised-object" (aka a
> "missing-object").
> 
> I think the similarity of the words "promisOR" and "promisED" threw
> me here and as I was looking at the code.  The code in is_promised()
> [1] looked like it was adding all promisor- and promised-objects to
> the "promised" OIDSET, but I could be mistaken.
> 
> [1]
> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960

I was struggling a bit with the terminology, true.

Right now I'm thinking of:
 - promisor remote (as you defined)
 - promisor packfile (as you defined)
 - promisor object is an object known to belong to the promisor (whether
   because we have it in a promisor packfile or because it is referenced
   by an object in a promisor packfile)

This might eliminate "promise(d)", and thus eliminate the confusion
between "promised" and "promisor", but I haven't done an exhaustive
search.

> The contents of the ".promisor" file are described as arbitrary?
> Should we write the name of the remote (or some other structured data)
> into this file so that later fe

Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)

2017-09-21 Thread Jonathan Tan
On Thu, 21 Sep 2017 13:59:43 -0400
Jeff Hostetler  wrote:

> (part 2)
> 
> Additional overall comments on:
> https://github.com/jonathantanmy/git/commits/partialclone2
> 
> {} I think it would help to split the blob-max-bytes filtering and the
> promisor/promised concepts and discuss them independently.
> 
> {} Then we can talk about about the promisor/promised
> functionality independent of any kind of filter.  The net-net is that
> the client has missing objects and it doesn't matter what filter
> criteria or mechanism caused that to happened.
> 
> {} blob-max-bytes is but one such filter we should have.  This
> might be sufficient if the goal is replace LFS (where you rarely ever
>need any given very very large object) and dynamically loading
>them as needed is sufficient and the network round-trip isn't
>too much of a perf penalty.
> 
> {} But if we want to do things like a "sparse-enlistments" where
> the client only needs a small part of the tree using sparse-checkout.
>For example, only populating 50,000 files from a tree of 3.5M
> files at HEAD, then we need a more general filtering.
> 
> {} And as I said above, how we chose to filter should be
> independent of how the client handles promisor/promised objects.

I agree that they are independent. (I put everything together so that
people could see how they work together, but they can be changed
independently of each other.)

> {} Also, if we rely strictly on dynamic object fetching to fetch
> missing objects, we are effectively tethered to the server during
> operations (such as checkout) that the user might not think about as
> requiring a network connection.  And we are forced to keep the same
> limitations of LFS in that you can't prefetch and go offline (without
> actually checking out to your worktree first).  And we can't bulk or
> parallel fetch objects.

I don't think dynamic object fetching precludes any other more optimized
way of fetching or prefetching - I implemented dynamic object fetching
first so that we would have a fallback, but the others definitely can be
implemented too.

> {} I think it would also help to move the blob-max-bytes calculation
> out of pack-objects.c : add_object_entry() [1].  The current code
> isolates the computation there so that only pack-objects can do the
> filtering.
> 
> Instead, put it in list-objects.c and traverse_commit_list() so
> that pack-objects and rev-list can share it (as Peff suggested [2] in
> response to my first patch series in March).
> 
> For example, this would let the client have a pre-checkout hook,
> use rev-list to compute the set of missing objects needed for that
> commit, and pipe that to a command to BULK fetch them from the server
> BEFORE starting the actual checkout.  This would allow the savy user
> to manually run a prefetch before going offline.
> 
> [1]
> https://github.com/jonathantanmy/git/commit/68e529484169f4800115c5a32e0904c25ad14bd8#diff-a8d2c9cf879e775d748056cfed48440cR1110
> 
> [2]
> https://public-inbox.org/git/20170309073117.g3br5btsfwntc...@sigill.intra.peff.net/

In your specific example, how would rev-list know, on the client, to
include (or exclude) a large blob in its output if it does not have it,
and thus does not know its size?

My reason for including it in pack-objects.c is because I only needed it
there and it is much simpler, but I agree that if it can be used
elsewhere, we can put it in a more general place.

> {} This also locks us into size-only filtering and makes it more
> difficult to add other filters.  In that the add_object_entry()
> code gets called on an object after the traversal has decided
> what to do with it.  It would be difficult to add tree-trimming
> at this level, for example.

That is true.

> {} An early draft of this type of filtering is here [3].  I hope to
> push up a revised draft of this shortly.
> 
> [3]
> https://public-inbox.org/git/20170713173459.3559-1-...@jeffhostetler.com/

OK - I'll take a look when that is done (I think I commented on an
earlier version on that).


Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)

2017-09-21 Thread Jonathan Tan
On Thu, 21 Sep 2017 14:00:40 -0400
Jeff Hostetler  wrote:

> (part 3)
> 
> Additional overall comments on:
> https://github.com/jonathantanmy/git/commits/partialclone2
> 
> {} WRT the code in is_promised() [1]
> 
> [1]
> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960
> 
> {} it looked like it was adding ALL promisor- and
> promised-objects to the "promised" OIDSET, rather than just
> promised-objects, but I could be mistaken.

As far as I can tell, it is just adding the promised objects (some of
which may also be promisor objects). If you're saying that you expected
it to add the promisor objects as well, that might be a reasonable
expectation...I'm thinking of doing that.

> {} Is this iterating over ALL promisor-packfiles?

Yes.

> {} It looked like this was being used by fsck and rev-list.  I
> have concerns about how big this OIDSET will get and how it will
> scale, since if we start with a partial-clone all packfiles will be
>promisor-packfiles.

It's true that scaling is an issue. I'm not sure if omitting the oidset
will solve anything, though - as it is, Git maintains an object hash and
adds to it quite liberally.

One thing that might help is some sort of flushing of objects in
promisor packfiles from the local repository - that way, the oidset
won't be so large.

> 
> {} When iterating thru a tree object, you add everything that it
>references (everything in that folder).  This adds all of the
>child OIDs -- without regard to whether they are new to this
>version of the tree object. (Granted, this is hard to compute.)

The oidset will deduplicate OIDs.

>My concern is that this will add too many objects to the
> OIDSET. That is, a new tree object (because of a recent change to
> something in that folder) will also have the OIDs of the other
> *unchanged* files which may be present in an earlier non-provisor
> packfile from an earlier commit.
> 
>I worry that this will grow the OIDSET to essentially include
>everything.  And possibly defeating its own purpose.  I could
> be wrong here, but that's my concern.

Same answer as above (about flushing of objects in promisor packfiles).

> {} I'm not opposed to the .promisor file concept, but I have concerns
> that in practice all packfiles after a partial clone will be
> promisor-packfiles and therefore short-cut during fsck, so fsck
> still won't gain anything.
> 
> It would help if there are also non-promisor packfiles present,
> but only for objects referenced by non-promisor packfiles.
> 
> But then I also have to wonder whether we can even support
> non-promisor packfiles after starting with a partial clone -- because
> of the properties of received thin-packs on a non-partial fetch.

Same reply as to your other e-mail - locally created objects are not in
promisor packfiles. (Or were you thinking of a situation where locally
created objects are immediately uploaded to the promisor remote, thus
making them promisor objects too?)


[PATCH v2] connect: in ref advertisement, shallows are last

2017-09-21 Thread Jonathan Tan
Currently, get_remote_heads() parses the ref advertisement in one loop,
allowing refs and shallow lines to intersperse, despite this not being
allowed by the specification. Refactor get_remote_heads() to use two
loops instead, enforcing that refs come first, and then shallows.

This also makes it easier to teach get_remote_heads() to interpret other
lines in the ref advertisement, which will be done in a subsequent
patch.

Signed-off-by: Jonathan Tan 
---
In some in-office discussion, I was informed that my original patch
relaxed the ordering of ".keep" lines. Here is an update.

I'm also using a switch statement now, which avoids having multiple
lines of read_remote_ref().

 connect.c | 159 --
 1 file changed, 93 insertions(+), 66 deletions(-)

diff --git a/connect.c b/connect.c
index 49b28b83b..e0821dbff 100644
--- a/connect.c
+++ b/connect.c
@@ -107,6 +107,84 @@ static void annotate_refs_with_symref_info(struct ref *ref)
string_list_clear(&symref, 0);
 }
 
+/*
+ * Read one line of a server's ref advertisement into packet_buffer.
+ */
+static int read_remote_ref(int in, char **src_buf, size_t *src_len,
+  int *responded)
+{
+   int len = packet_read(in, src_buf, src_len,
+ packet_buffer, sizeof(packet_buffer),
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_CHOMP_NEWLINE);
+   const char *arg;
+   if (len < 0)
+   die_initial_contact(*responded);
+   if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
+   die("remote error: %s", arg);
+
+   *responded = 1;
+
+   return len;
+}
+
+#define EXPECTING_REF 0
+#define EXPECTING_SHALLOW 1
+
+static int process_ref(int *state, int len, struct ref ***list,
+  unsigned int flags, struct oid_array *extra_have)
+{
+   struct object_id old_oid;
+   char *name;
+   int name_len;
+
+   if (len < GIT_SHA1_HEXSZ + 2 ||
+   get_oid_hex(packet_buffer, &old_oid) ||
+   packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
+   (*state)++;
+   return 0;
+   }
+
+   name = packet_buffer + GIT_SHA1_HEXSZ + 1;
+   name_len = strlen(name);
+   if (len != name_len + GIT_SHA1_HEXSZ + 1) {
+   free(server_capabilities);
+   server_capabilities = xstrdup(name + name_len + 1);
+   }
+
+   if (extra_have && !strcmp(name, ".have")) {
+   oid_array_append(extra_have, &old_oid);
+   } else if (!strcmp(name, "capabilities^{}")) {
+   if (**list)
+   /* cannot coexist with other refs */
+   die("protocol error: unexpected capabilities^{}");
+   /* There should be no more refs; proceed to the next state. */
+   (*state)++;
+   } else if (check_ref(name, flags)) {
+   struct ref *ref = alloc_ref(name);
+   oidcpy(&ref->old_oid, &old_oid);
+   **list = ref;
+   *list = &ref->next;
+   }
+   return 1;
+}
+
+static int process_shallow(int *state, struct oid_array *shallow_points)
+{
+   const char *arg;
+   struct object_id old_oid;
+
+   if (!skip_prefix(packet_buffer, "shallow ", &arg))
+   return 0;
+
+   if (get_oid_hex(arg, &old_oid))
+   die("protocol error: expected shallow sha-1, got '%s'", arg);
+   if (!shallow_points)
+   die("repository on the other end cannot be shallow");
+   oid_array_append(shallow_points, &old_oid);
+   return 1;
+}
+
 /*
  * Read all the refs from the other end
  */
@@ -123,76 +201,25 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 * willing to talk to us.  A hang-up before seeing any
 * response does not necessarily mean an ACL problem, though.
 */
-   int saw_response;
-   int got_dummy_ref_with_capabilities_declaration = 0;
+   int responded = 0;
+   int len;
+   int state = EXPECTING_REF;
 
*list = NULL;
-   for (saw_response = 0; ; saw_response = 1) {
-   struct ref *ref;
-   struct object_id old_oid;
-   char *name;
-   int len, name_len;
-   char *buffer = packet_buffer;
-   const char *arg;
-
-   len = packet_read(in, &src_buf, &src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   if (len < 0)
-   die_initial_contact(saw_response);
-
-   if (!len)
-   break;
-
-   if (len > 4 && skip_prefix(buffer, "ERR ", &arg))
-   die("remote error: %s", arg);
-
-   if (len == GIT_SHA1_HEXSZ + strlen("sha

Re: [PATCH v2] connect: in ref advertisement, shallows are last

2017-09-21 Thread Brandon Williams
On 09/21, Jonathan Tan wrote:
> Currently, get_remote_heads() parses the ref advertisement in one loop,
> allowing refs and shallow lines to intersperse, despite this not being
> allowed by the specification. Refactor get_remote_heads() to use two
> loops instead, enforcing that refs come first, and then shallows.
> 
> This also makes it easier to teach get_remote_heads() to interpret other
> lines in the ref advertisement, which will be done in a subsequent
> patch.
> 
> Signed-off-by: Jonathan Tan 
> ---
> In some in-office discussion, I was informed that my original patch
> relaxed the ordering of ".keep" lines. Here is an update.
> 
> I'm also using a switch statement now, which avoids having multiple
> lines of read_remote_ref().


Looks cleaner than the last patch.

> 
>  connect.c | 159 
> --
>  1 file changed, 93 insertions(+), 66 deletions(-)
> 
> diff --git a/connect.c b/connect.c
> index 49b28b83b..e0821dbff 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -107,6 +107,84 @@ static void annotate_refs_with_symref_info(struct ref 
> *ref)
>   string_list_clear(&symref, 0);
>  }
>  
> +/*
> + * Read one line of a server's ref advertisement into packet_buffer.
> + */
> +static int read_remote_ref(int in, char **src_buf, size_t *src_len,
> +int *responded)
> +{
> + int len = packet_read(in, src_buf, src_len,
> +   packet_buffer, sizeof(packet_buffer),
> +   PACKET_READ_GENTLE_ON_EOF |
> +   PACKET_READ_CHOMP_NEWLINE);
> + const char *arg;
> + if (len < 0)
> + die_initial_contact(*responded);
> + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
> + die("remote error: %s", arg);
> +
> + *responded = 1;
> +
> + return len;
> +}
> +
> +#define EXPECTING_REF 0
> +#define EXPECTING_SHALLOW 1
> +
> +static int process_ref(int *state, int len, struct ref ***list,
> +unsigned int flags, struct oid_array *extra_have)
> +{
> + struct object_id old_oid;
> + char *name;
> + int name_len;
> +
> + if (len < GIT_SHA1_HEXSZ + 2 ||
> + get_oid_hex(packet_buffer, &old_oid) ||
> + packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
> + (*state)++;

I think it may be cleaner if the state variable is updated outside of
this function based on a return value from this function.

> + return 0;
> + }
> +
> + name = packet_buffer + GIT_SHA1_HEXSZ + 1;
> + name_len = strlen(name);
> + if (len != name_len + GIT_SHA1_HEXSZ + 1) {
> + free(server_capabilities);
> + server_capabilities = xstrdup(name + name_len + 1);
> + }
> +
> + if (extra_have && !strcmp(name, ".have")) {
> + oid_array_append(extra_have, &old_oid);
> + } else if (!strcmp(name, "capabilities^{}")) {
> + if (**list)
> + /* cannot coexist with other refs */
> + die("protocol error: unexpected capabilities^{}");
> + /* There should be no more refs; proceed to the next state. */
> + (*state)++;
> + } else if (check_ref(name, flags)) {
> + struct ref *ref = alloc_ref(name);
> + oidcpy(&ref->old_oid, &old_oid);
> + **list = ref;
> + *list = &ref->next;
> + }
> + return 1;
> +}
> +
> +static int process_shallow(int *state, struct oid_array *shallow_points)

state isn't needed here and could be dropped from the parameter list.

> +{
> + const char *arg;
> + struct object_id old_oid;
> +
> + if (!skip_prefix(packet_buffer, "shallow ", &arg))
> + return 0;
> +
> + if (get_oid_hex(arg, &old_oid))
> + die("protocol error: expected shallow sha-1, got '%s'", arg);
> + if (!shallow_points)
> + die("repository on the other end cannot be shallow");
> + oid_array_append(shallow_points, &old_oid);
> + return 1;
> +}
> +
>  /*
>   * Read all the refs from the other end
>   */
> @@ -123,76 +201,25 @@ struct ref **get_remote_heads(int in, char *src_buf, 
> size_t src_len,
>* willing to talk to us.  A hang-up before seeing any
>* response does not necessarily mean an ACL problem, though.
>*/
> - int saw_response;
> - int got_dummy_ref_with_capabilities_declaration = 0;
> + int responded = 0;
> + int len;
> + int state = EXPECTING_REF;
>  
>   *list = NULL;
> - for (saw_response = 0; ; saw_response = 1) {
> - struct ref *ref;
> - struct object_id old_oid;
> - char *name;
> - int len, name_len;
> - char *buffer = packet_buffer;
> - const char *arg;
> -
> - len = packet_read(in, &src_buf, &src_len,
> -   packet_buffer, sizeof(packet_buffer),
> -   PACKET_READ_GENTLE_ON_

[PATCH v3] connect: in ref advertisement, shallows are last

2017-09-21 Thread Jonathan Tan
Currently, get_remote_heads() parses the ref advertisement in one loop,
allowing refs and shallow lines to intersperse, despite this not being
allowed by the specification. Refactor get_remote_heads() to use two
loops instead, enforcing that refs come first, and then shallows.

This also makes it easier to teach get_remote_heads() to interpret other
lines in the ref advertisement, which will be done in a subsequent
patch.

Signed-off-by: Jonathan Tan 
---
I sent the wrong version of this patch :-(

This should be the correct one. A bit less clean because I introduced a
3rd state, however.

 connect.c | 167 +-
 1 file changed, 101 insertions(+), 66 deletions(-)

diff --git a/connect.c b/connect.c
index 49b28b83b..ef6358cfc 100644
--- a/connect.c
+++ b/connect.c
@@ -107,6 +107,91 @@ static void annotate_refs_with_symref_info(struct ref *ref)
string_list_clear(&symref, 0);
 }
 
+/*
+ * Read one line of a server's ref advertisement into packet_buffer.
+ */
+static int read_remote_ref(int in, char **src_buf, size_t *src_len,
+  int *responded)
+{
+   int len = packet_read(in, src_buf, src_len,
+ packet_buffer, sizeof(packet_buffer),
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_CHOMP_NEWLINE);
+   const char *arg;
+   if (len < 0)
+   die_initial_contact(*responded);
+   if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
+   die("remote error: %s", arg);
+
+   *responded = 1;
+
+   return len;
+}
+
+#define EXPECTING_REF_WITH_CAPABILITIES 0
+#define EXPECTING_REF 1
+#define EXPECTING_SHALLOW 2
+
+static int process_ref(int *state, int len, struct ref ***list,
+  unsigned int flags, struct oid_array *extra_have)
+{
+   struct object_id old_oid;
+   char *name;
+   int name_len;
+
+   if (len < GIT_SHA1_HEXSZ + 2 ||
+   get_oid_hex(packet_buffer, &old_oid) ||
+   packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
+   *state = EXPECTING_SHALLOW;
+   return 0;
+   }
+
+   name = packet_buffer + GIT_SHA1_HEXSZ + 1;
+   name_len = strlen(name);
+   if (*state == EXPECTING_REF_WITH_CAPABILITIES &&
+   len != name_len + GIT_SHA1_HEXSZ + 1) {
+   free(server_capabilities);
+   server_capabilities = xstrdup(name + name_len + 1);
+   } else if (*state == EXPECTING_REF) {
+   if (len != name_len + GIT_SHA1_HEXSZ + 1)
+   die("unexpected capabilities after ref name");
+   }
+
+   if (extra_have && !strcmp(name, ".have")) {
+   oid_array_append(extra_have, &old_oid);
+   } else if (!strcmp(name, "capabilities^{}")) {
+   if (**list)
+   /* cannot coexist with other refs */
+   die("protocol error: unexpected capabilities^{}");
+   /* There should be no more refs; proceed to the next state. */
+   *state = EXPECTING_SHALLOW;
+   return 1;
+   } else if (check_ref(name, flags)) {
+   struct ref *ref = alloc_ref(name);
+   oidcpy(&ref->old_oid, &old_oid);
+   **list = ref;
+   *list = &ref->next;
+   }
+   *state = EXPECTING_REF;
+   return 1;
+}
+
+static int process_shallow(struct oid_array *shallow_points)
+{
+   const char *arg;
+   struct object_id old_oid;
+
+   if (!skip_prefix(packet_buffer, "shallow ", &arg))
+   return 0;
+
+   if (get_oid_hex(arg, &old_oid))
+   die("protocol error: expected shallow sha-1, got '%s'", arg);
+   if (!shallow_points)
+   die("repository on the other end cannot be shallow");
+   oid_array_append(shallow_points, &old_oid);
+   return 1;
+}
+
 /*
  * Read all the refs from the other end
  */
@@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 * willing to talk to us.  A hang-up before seeing any
 * response does not necessarily mean an ACL problem, though.
 */
-   int saw_response;
-   int got_dummy_ref_with_capabilities_declaration = 0;
+   int responded = 0;
+   int len;
+   int state = EXPECTING_REF_WITH_CAPABILITIES;
 
*list = NULL;
-   for (saw_response = 0; ; saw_response = 1) {
-   struct ref *ref;
-   struct object_id old_oid;
-   char *name;
-   int len, name_len;
-   char *buffer = packet_buffer;
-   const char *arg;
-
-   len = packet_read(in, &src_buf, &src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   if (len < 0)
-   

Re: [PATCH 1/4] cat-file: handle NULL object_context.path

2017-09-21 Thread Jonathan Nieder
Jeff King wrote:

> Commit dc944b65f1 (get_sha1_with_context: dynamically
> allocate oc->path, 2017-05-19) changed the rules that
> callers must follow for seeing if we parsed a path in the
> object name. The rules switched from "check if the oc.path
> buffer is empty" to "check if the oc.path pointer is NULL".
> But that commit forgot to update some sites in
> cat_one_file(), meaning we might dereference a NULL pointer.
>
> You can see this by making a path-aware request like
> --textconv without specifying --path, and giving an object
> name that doesn't have a path in it. Like:
>
>   git cat-file --textconv HEAD
>
> which will reliably segfault.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/cat-file.c  | 4 ++--
>  t/t8010-cat-file-filters.sh | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)

Yikes.  Commit dc944b65f1 even touched this function, so we reviewers
have no excuse for not having found it.

Technically this changes the behavior of cat-file --path='', but I
don't think that matters.

Do other GET_SHA1_RECORD_PATH callers need similar treatment?

* builtin/grep.c appears to do the right thing (it stores NULL in
  list, so it passes NULL to grep_object, which calls grep_oid, which
  calls grep_source_init, which stores NULL for the grep machinery
  that is able to cope with a NULL).

* builtin/log.c is correctly updated as part of the patch.

Those are the only other callers.  So we're safe. *phew*

Reviewed-by: Jonathan Nieder 


Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-21 Thread Junio C Hamano
Sahil Dua  writes:

>>
>> * sd/branch-copy (2017-06-18) 3 commits
>>  - branch: add a --copy (-c) option to go with --move (-m)
>>  - branch: add test for -m renaming multiple config sections
>>  - config: create a function to format section headers
>>
>>  "git branch" learned "-c/-C" to create and switch to a new branch
>>  by copying an existing one.
>>
>>  I personally do not think "branch --copy master backup" while on
>>  "master" that switches to "backup" is a good UI, and I *will* say
>>  "I told you so" when users complain after we merge this down to
>>  'master'.
>
> Junio, what's up with this one? It's been a long time for this being
> cooked in next. Do we have a plan/idea going forward or is it just
> waiting for more feedback?

Thanks for pinging.

I was (in a strange way) hoping that it was just me who felt that a
"copy" operation of the current branch that moves me to a new branch
is a design mistake, and I was planning to start merging this down
to 'next' based on that assumption, but IIRC it turned out that it
wasn't just me.  During the renewed discussion I somehow got an
impression that the concensus was for "copy" to just copy without
ever changing what HEAD says, and if an operation that switches to a
new branch based on the current branch is needed, the right place to
do so is to enhance/extend "checkout -b".

My understanding of the next step was for those who are interested
in moving this topic forward to update these patches in that
direction.




Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-21 Thread Junio C Hamano
Ben Peart  writes:

> Since the fsmonitor data can be trusted and is kept in sync with the
> working directory, the only remaining valid uses are those locations
> where we don't want to trigger an unneeded refresh_fsmonitor() call.

Now that is a lot more assuring ;-)  And the exceptions below also
make sense to me.

Thanks for thinking this through.


Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable

2017-09-21 Thread Junio C Hamano
Torsten Bögershausen  writes:

> Junio, if you wouldn't mind to squash that in, 
> another fix is needed as well(trailing '-' after '-E') :
>
> s/'-n', '-e' or '-E-'/'-n', '-e' or '-E'

Yup.  Thanks all.



Re: [PATCH v3] connect: in ref advertisement, shallows are last

2017-09-21 Thread Junio C Hamano
Jonathan Tan  writes:

> Currently, get_remote_heads() parses the ref advertisement in one loop,
> allowing refs and shallow lines to intersperse, despite this not being
> allowed by the specification. Refactor get_remote_heads() to use two
> loops instead, enforcing that refs come first, and then shallows.
>
> This also makes it easier to teach get_remote_heads() to interpret other
> lines in the ref advertisement, which will be done in a subsequent
> patch.

Sounds sensible.  This still replaces the earlier 1.5?

>
> Signed-off-by: Jonathan Tan 
> ---
> I sent the wrong version of this patch :-(
>
> This should be the correct one. A bit less clean because I introduced a
> 3rd state, however.
>
>  connect.c | 167 
> +-
>  1 file changed, 101 insertions(+), 66 deletions(-)
>
> diff --git a/connect.c b/connect.c
> index 49b28b83b..ef6358cfc 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -107,6 +107,91 @@ static void annotate_refs_with_symref_info(struct ref 
> *ref)
>   string_list_clear(&symref, 0);
>  }
>  
> +/*
> + * Read one line of a server's ref advertisement into packet_buffer.
> + */
> +static int read_remote_ref(int in, char **src_buf, size_t *src_len,
> +int *responded)
> +{
> + int len = packet_read(in, src_buf, src_len,
> +   packet_buffer, sizeof(packet_buffer),
> +   PACKET_READ_GENTLE_ON_EOF |
> +   PACKET_READ_CHOMP_NEWLINE);
> + const char *arg;
> + if (len < 0)
> + die_initial_contact(*responded);
> + if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
> + die("remote error: %s", arg);
> +
> + *responded = 1;
> +
> + return len;
> +}
> +
> +#define EXPECTING_REF_WITH_CAPABILITIES 0
> +#define EXPECTING_REF 1
> +#define EXPECTING_SHALLOW 2
> +
> +static int process_ref(int *state, int len, struct ref ***list,
> +unsigned int flags, struct oid_array *extra_have)
> +{
> + struct object_id old_oid;
> + char *name;
> + int name_len;
> +
> + if (len < GIT_SHA1_HEXSZ + 2 ||
> + get_oid_hex(packet_buffer, &old_oid) ||
> + packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
> + *state = EXPECTING_SHALLOW;
> + return 0;
> + }
> +
> + name = packet_buffer + GIT_SHA1_HEXSZ + 1;
> + name_len = strlen(name);
> + if (*state == EXPECTING_REF_WITH_CAPABILITIES &&
> + len != name_len + GIT_SHA1_HEXSZ + 1) {
> + free(server_capabilities);
> + server_capabilities = xstrdup(name + name_len + 1);
> + } else if (*state == EXPECTING_REF) {
> + if (len != name_len + GIT_SHA1_HEXSZ + 1)
> + die("unexpected capabilities after ref name");
> + }
> +
> + if (extra_have && !strcmp(name, ".have")) {
> + oid_array_append(extra_have, &old_oid);
> + } else if (!strcmp(name, "capabilities^{}")) {
> + if (**list)
> + /* cannot coexist with other refs */
> + die("protocol error: unexpected capabilities^{}");
> + /* There should be no more refs; proceed to the next state. */
> + *state = EXPECTING_SHALLOW;
> + return 1;
> + } else if (check_ref(name, flags)) {
> + struct ref *ref = alloc_ref(name);
> + oidcpy(&ref->old_oid, &old_oid);
> + **list = ref;
> + *list = &ref->next;
> + }
> + *state = EXPECTING_REF;
> + return 1;
> +}
> +
> +static int process_shallow(struct oid_array *shallow_points)
> +{
> + const char *arg;
> + struct object_id old_oid;
> +
> + if (!skip_prefix(packet_buffer, "shallow ", &arg))
> + return 0;
> +
> + if (get_oid_hex(arg, &old_oid))
> + die("protocol error: expected shallow sha-1, got '%s'", arg);
> + if (!shallow_points)
> + die("repository on the other end cannot be shallow");
> + oid_array_append(shallow_points, &old_oid);
> + return 1;
> +}
> +
>  /*
>   * Read all the refs from the other end
>   */
> @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, 
> size_t src_len,
>* willing to talk to us.  A hang-up before seeing any
>* response does not necessarily mean an ACL problem, though.
>*/
> - int saw_response;
> - int got_dummy_ref_with_capabilities_declaration = 0;
> + int responded = 0;
> + int len;
> + int state = EXPECTING_REF_WITH_CAPABILITIES;
>  
>   *list = NULL;
> - for (saw_response = 0; ; saw_response = 1) {
> - struct ref *ref;
> - struct object_id old_oid;
> - char *name;
> - int len, name_len;
> - char *buffer = packet_buffer;
> - const char *arg;
> -
> - len = packet_read(in, &src_buf, &src_len,
> -   pack

Re: [PATCH v3] connect: in ref advertisement, shallows are last

2017-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Tan  writes:
>
>> Currently, get_remote_heads() parses the ref advertisement in one loop,
>> allowing refs and shallow lines to intersperse, despite this not being
>> allowed by the specification. Refactor get_remote_heads() to use two
>> loops instead, enforcing that refs come first, and then shallows.
>>
>> This also makes it easier to teach get_remote_heads() to interpret other
>> lines in the ref advertisement, which will be done in a subsequent
>> patch.
>
> Sounds sensible.  This still replaces the earlier 1.5?

Well, it does, but it also invalidates how the new "pick the version
offered and used" feature is integrated to this callchain.  I guess
we'd need a new "we are now expecting the version info" state in a
patch to replace "connect: teach client to recognize v1 server
response".

>> +static int process_ref(int *state, int len, struct ref ***list,
>> +   unsigned int flags, struct oid_array *extra_have)
>> +{
>> +struct object_id old_oid;
>> +char *name;
>> +int name_len;
>> +
>> +if (len < GIT_SHA1_HEXSZ + 2 ||
>> +get_oid_hex(packet_buffer, &old_oid) ||
>> +packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
>> +*state = EXPECTING_SHALLOW;
>> +return 0;
>> +}
>> +
>> +name = packet_buffer + GIT_SHA1_HEXSZ + 1;
>> +name_len = strlen(name);
>> +if (*state == EXPECTING_REF_WITH_CAPABILITIES &&
>> +len != name_len + GIT_SHA1_HEXSZ + 1) {
>> +free(server_capabilities);

Is this free() still needed?  After hitting this block, you'd set
*state to EXPECTING_REF before you return, so nobody would set
server_capabilities by hitting this block twice, and an attempt to
do so will hit the die("unexpected cap") below, no?

Or it may be a signal that this patch tightens it too much and
breaks older or third-party implementations of the other side that
can emit more than one refs with capability advertisement?

>> +server_capabilities = xstrdup(name + name_len + 1);
>> +} else if (*state == EXPECTING_REF) {
>> +if (len != name_len + GIT_SHA1_HEXSZ + 1)
>> +die("unexpected capabilities after ref name");
>> +}
>> +...
>> +}
>> +*state = EXPECTING_REF;
>> +return 1;
>> +}

>> @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, 
>> size_t src_len,
>>   * willing to talk to us.  A hang-up before seeing any
>>   * response does not necessarily mean an ACL problem, though.
>>   */
>> -int saw_response;
>> -int got_dummy_ref_with_capabilities_declaration = 0;
>> +int responded = 0;
>> +int len;
>> +int state = EXPECTING_REF_WITH_CAPABILITIES;
>>  
>>  *list = NULL;

>> +while ((len = read_remote_ref(in, &src_buf, &src_len, &responded))) {
>> +switch (state) {
>> +case EXPECTING_REF_WITH_CAPABILITIES:
>> +case EXPECTING_REF:
>> +if (process_ref(&state, len, &list, flags, extra_have))
>> +break;
>> +/* fallthrough */

OK.  This fallthrough is because expecting-ref is really expecting
ref or shallow and once we see a shallow, we no longer expect ref
and expect only shallow.  So from that point of view, an assignment
to set state to EXPECTING_SHALLOW could happen here, not inside
process_ref.  I mention this because in general, passing state
around and let it be updated in helper functions would make the
state transition harder to follow, not easier, even though
refactoring the processing needed in different stages into helper
functions like this patch does ought to make it easier to see by
shrinking the outer loop (i.e. this one) that controls the whole
process.

I think if we split process_ref() further into two, then we no
longer need to pass &state to that function?  We start this loop
with "expecting the dummy ref (or other)" state, have a new
process_dummy_ref() function check if we got "capabilities^{}" thing
and do its thing if that is the case (otherwise we fall through to
the call to process_ref(), just like the above code falls through to
call process_shallow() when it realizes what it got is not a ref),
and after the first call to process_dummy_ref() we'd be in the
"expecting ref (or other)" state---and the state transition can
happen in this caller, not in process_dummy_ref() or process_ref().

Inside process_dummy_ref() and process_ref(), there would be a call
to the same helper that notices and extracts the server capability
and stores it (or barfs against the second line that advertises the
capability, by noticing that server_capabilities is not NULL).

Wouldn't that make the presentation of the state machine cleaner?




Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-09-21 Thread Junio C Hamano
Michael J Gruber  writes:

> Also, I'm undecided about about your reflog argument above - if we leave
> "--fork-point" to be the current behaviour including Jeff's fix then the
> documentation would need an even bigger overhaul, because it's neither
> "reflog also" (as claimed in the doc) nor "reflog only" (as in the
> original implementation) but "historical tips as inferred from the
> current value and the reflog".

Even though things like "reflog only", "reflog also", may be
something implementors may care about, they are irrelevant
implementation details to the intended audience.  "The bases that
are not in reflogs are ignored" _does_ matter, as it affects the
outcome, and that may be a bit too strict a filter (which this
series takes us in a good direction to fix) but what the readers
need to know is the real-world implications of the choices made at
the implementation detail level, and more importantly, what the
implementation is trying to compute.

It is a documentation bug (with or without these patches) if the
current text gives an impression that the code is trying to do
anything but "guessing the fork point using historical tips".

> In any case, for two modes we need two names for the options. Maybe
> --fork-point and --fork-base because in the loose mode, you may get a
> "base of a strict fork point"?

Perhaps.


Re: [PATCH] Documentation/githooks: mention merge in commit-msg hook

2017-09-21 Thread Junio C Hamano
Stefan Beller  writes:

> The commit-msg hook is invoked by both commit and merge now.
>
> Reported-by: Kaartic Sivaraam 
> Signed-off-by: Stefan Beller 
> ---

Thanks for tying the loose end.  Very much appreciated.

>  Documentation/githooks.txt | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 623ed1a138..f4e75b9c90 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -127,11 +127,10 @@ help message found in the commented portion of the 
> commit template.
>  commit-msg
>  ~~
>  
> -This hook is invoked by 'git commit', and can be bypassed
> -with the `--no-verify` option.  It takes a single parameter, the
> -name of the file that holds the proposed commit log message.
> -Exiting with a non-zero status causes the 'git commit' to
> -abort.
> +This hook is invoked by 'git commit' and 'git merge', and can be
> +bypassed with the `--no-verify` option.  It takes a single parameter,
> +the name of the file that holds the proposed commit log message.
> +Exiting with a non-zero status causes the command to abort.
>  
>  The hook is allowed to edit the message file in place, and can be used
>  to normalize the message into some project standard format. It


Re: [PATCH v1] travis-ci: fix "skip_branch_tip_with_tag()" string comparison

2017-09-21 Thread Junio C Hamano
Lars Schneider  writes:

> Oh, you're right!
>
>
>> Could you squash in the following?
>
> @Junio: Can you squash it when you apply the patch?

I do not mind and I already did.

The patches in the series this patch is fixing up were solely about
splitting these scripts out of the YAML file as-is, and I think it
was correct to carry these style differences over without adjusting
them.

But resulting scripts in ci/ are riddled with styleguide deviations,
which may want to be cleaned up later.  There also are some
bash-isms marked with "#!env bash" in them, but I think they are OK
because we know we are running them only at a very specific place
and the need to make them portable is very small.

Thanks.


Re: What's cooking in git.git (Sep 2017, #03; Fri, 15)

2017-09-21 Thread Junio C Hamano
Junio C Hamano  writes:

> My understanding of the next step was for those who are interested
> in moving this topic forward to update these patches in that
> direction.

Well, I am one of those who are interested in moving this topic
forward, not because I'm likely to use it, but because the fewer
number of topics I have to keep in flight, the easier my work gets.

So here is such an update.  As the topic is not in 'next' yet, it
could also be implemented by replacing patch(es) in the series, but
doing it as a follow-up fix made it easier to see what got changed
(both in the code and in the tests), so that is how I decided to do
this patch.

-- >8 --
Subject: [PATCH] branch: fix "copy" to never touch HEAD

Probably because "git branch -c A B" piggybacked its implementation
on "git branch -m A B", when creating a new branch B by copying the
branch A that happens to be the current branch, it also updated HEAD
to point at the new branch.

This does not match the usual expectation.  If I were sitting on a
blue chair, and somebody comes and repaints it to red, I would
accept ending up sitting on a red chair, but if somebody creates a
new red chair, modelling it after the blue chair I am sitting on, I
do not expect to be booted off of the blue chair and ending up on
sitting on the red one.

Let's fix this strange behaviour before it hits 'next'.  Those who
want to create a new branch and switch to it can do "git checkout B"
after creating it by copying the current branch, or if that is so
useful to deserve a short-hand way to do so, perhaps extend "git
checkout -b B" to copy configurations while creating the new branch
B.  A "copy" should remain to be "copy", not "copy and sometimes
checkout".

Signed-off-by: Junio C Hamano 
---
 builtin/branch.c  |  9 +++--
 t/t3200-branch.sh | 10 +-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 89f64f4123..e2e3692838 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -506,12 +506,9 @@ static void copy_or_rename_branch(const char *oldname, 
const char *newname, int
oldref.buf + 11);
}
 
-   if (replace_each_worktree_head_symref(oldref.buf, newref.buf, 
logmsg.buf)) {
-   if (copy)
-   die(_("Branch copied to %s, but HEAD is not updated!"), 
newname);
-   else
-   die(_("Branch renamed to %s, but HEAD is not 
updated!"), newname);
-   }
+   if (!copy &&
+   replace_each_worktree_head_symref(oldref.buf, newref.buf, 
logmsg.buf))
+   die(_("Branch renamed to %s, but HEAD is not updated!"), 
newname);
 
strbuf_release(&logmsg);
 
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 5d03ad16f6..be9b3784c6 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -422,7 +422,7 @@ test_expect_success 'git branch --copy is a synonym for -c' 
'
test_cmp expect actual
 '
 
-test_expect_success 'git branch -c ee ef should copy and checkout branch ef' '
+test_expect_success 'git branch -c ee ef should copy to create branch ef' '
git checkout -b ee &&
git reflog exists refs/heads/ee &&
git config branch.ee.dummy Hello &&
@@ -431,7 +431,7 @@ test_expect_success 'git branch -c ee ef should copy and 
checkout branch ef' '
git reflog exists refs/heads/ef &&
test $(git config branch.ee.dummy) = Hello &&
test $(git config branch.ef.dummy) = Hello &&
-   test $(git rev-parse --abbrev-ref HEAD) = ef
+   test $(git rev-parse --abbrev-ref HEAD) = ee
 '
 
 test_expect_success 'git branch -c f/f g/g should work' '
@@ -494,12 +494,12 @@ test_expect_success 'git branch -C c1 c2 should succeed 
when c1 is checked out'
git checkout -b c1 &&
git branch c2 &&
git branch -C c1 c2 &&
-   test $(git rev-parse --abbrev-ref HEAD) = c2
+   test $(git rev-parse --abbrev-ref HEAD) = c1
 '
 
-test_expect_success 'git branch -C c1 c2 should add entries to .git/logs/HEAD' 
'
+test_expect_success 'git branch -C c1 c2 should never touch HEAD' '
msg="Branch: copied refs/heads/c1 to refs/heads/c2" &&
-   grep "$msg$" .git/logs/HEAD
+   ! grep "$msg$" .git/logs/HEAD
 '
 
 test_expect_success 'git branch -C master should work when master is checked 
out' '
-- 
2.14.1-907-g5aa63875cf





Re: [PATCH 2/4] test-line-buffer: simplify command parsing

2017-09-21 Thread Junio C Hamano
Jeff King  writes:

> However, each case arm of the switch falls through to the
> one below it. This is pointless (we know that a command
> starting with 'b' does not need to check any of the commands
> in the 'c' block), and it makes gcc's -Wimplicit-fallthrough
> complain.

Wow, this is an embarassment. The code tries to cleverly optimize
it, and then fails miserably.

The updated one reads good.  Thanks.


KDevelop developers obviously don't use git :(

2017-09-21 Thread Daniel Santos
https://bugs.kde.org/show_bug.cgi?id=379219

$ git push minor-rant-warning

KDevelop has a despicable habit of performing `git status --porcelain`
whenever it damn well pleases ... or rather when it sees an alteration. 
This this breaks git rebase because it grabs the $*()&@#$ lock file,
often leading to catastrophic heart failure and sometimes really forcing
you to git rebase --abort and start over (at least, that's been my
solution when git rebase --continue complained that there was nothing to
commit and maybe I had forgotten to add something).

I have argued that they should at least verify that there isn't a
.git/rebase-merge directory present before doing that, but if I want to
prevent a heart attack, it looks like I'm going to have to fix this one
myself.  (Sorry for the rant!)

$ git reset minor-rant-warning~1

So first I need to inquire if there is a current mechanism to say "git
status, but please fail if you're busy with a non-atomic operation".  If
there is not one, then it would seem that there should be and how do I
go about requesting it?

Thanks,
Daniel




Re: [PATCH 4/4] ALLOC_GROW: avoid -Wsign-compare warnings

2017-09-21 Thread Junio C Hamano
Ramsay Jones  writes:

> Signed-off-by: Ramsay Jones 
> ---
>  builtin/pack-objects.c |  4 ++--
>  config.c   |  2 +-
>  diff.c |  2 +-
>  line-log.c | 18 +-
>  line-log.h |  2 +-
>  revision.c |  2 +-
>  tree-walk.c|  3 +--
>  7 files changed, 16 insertions(+), 17 deletions(-)

Thanks.  

I did not spot any questionable conversion (e.g. "something that
used to be signed was because it wanted to store -1 as a sentinel"
would be broken if we just change that to unsigned) by going over
the places these variables and fields are actually used.

A review of a patch like this involves reading through 10x more
lines than we see in the above diffstat, and producing it would most
likely have taken the same amount of effort, at least.  Very much
appreciated.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index a57b4f058..a6ee653bf 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2563,8 +2563,8 @@ struct in_pack_object {
>  };
>  
>  struct in_pack {
> - int alloc;
> - int nr;
> + unsigned int alloc;
> + unsigned int  nr;

This is a bit questionable ;-) but it is something I can locally
tweak easily.



Re: KDevelop developers obviously don't use git :(

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 11:03:53PM -0500, Daniel Santos wrote:

> So first I need to inquire if there is a current mechanism to say "git
> status, but please fail if you're busy with a non-atomic operation".  If
> there is not one, then it would seem that there should be and how do I
> go about requesting it?

You might be interested in this patch from yesterday:

  
https://public-inbox.org/git/20170921043214.pyhdsrpy4omy5...@sigill.intra.peff.net/

-Peff


Re: [RFC v2] refs: strip out not allowed flags from ref_transaction_update

2017-09-21 Thread Junio C Hamano
Michael Haggerty  writes:

> I wouldn't worry too much about this anomaly. `REF_ISPRUNING` is an ugly
> internal kludge, but allowing it in the mask doesn't make anything worse.
>
>>  refs.c | 2 ++
>>  refs.h | 8 
>>  2 files changed, 10 insertions(+)
>> 
>> diff --git a/refs.c b/refs.c
>> index ba22f4acef..fad61be1da 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -921,6 +921,8 @@ int ref_transaction_update(struct ref_transaction 
>> *transaction,
>>  return -1;
>>  }
>>  
>> +flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
>> +
>
> I would advocate considering it a bug if the caller passes in options
> that we are going to ignore anyway:
>
> if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
> BUG("illegal flags %x in ref_transaction_update", flags);

It sounds like a sensible thing to do.  Thanks.


Re: [PATCH] git: add --no-optional-locks option

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote:

> > +`GIT_OPTIONAL_LOCKS`::
> > +   If set to `0`, Git will avoid performing any operations which
> > +   require taking a lock and which are not required to complete the
> > +   requested operation. For example, this will prevent `git status`
> > +   from refreshing the index as a side effect. This is useful for
> > +   processes running in the background which do not want to cause
> > +   lock contention with other operations on the repository.
> > +   Defaults to `1`.
> 
> I don't think we should pass this environment variable to remote
> repositories. It should be listed in local_repo_env[] in environment.c.

I'm not sure I agree. This is really about the context in which the
command is executing, not anything about the particular repository
you're operating on.

For fetch/push operations that touch a remote, I doubt it would matter
either way (and anyway, those often cross network boundaries that don't
propagate environment variables anyway).

But imagine that "git status" learns to recurse into submodules and run
"git status" inside them. Surely we would want the submodule repos to
also avoid taking any unnecessary locks?

-Peff


Re: KDevelop developers obviously don't use git :(

2017-09-21 Thread Junio C Hamano
Daniel Santos  writes:

> So first I need to inquire if there is a current mechanism to say "git
> status, but please fail if you're busy with a non-atomic operation".

Reading this list sometimes makes me wonder if there is some higher
intelligence telling independent souls to come up with the same wish
at the same time.

https://public-inbox.org/git/20170921050835.mrbgx2zryy3ju...@sigill.intra.peff.net/




Re: [PATCH v3 0/4] filter-branch: support for incremental update + fix for ancient tag format

2017-09-21 Thread Junio C Hamano
Ian Campbell  writes:

> This is the third version of my patches to add incremental support to
> git-filter-branch. Since the last time I have replaced `git mktag --
> allow-missing-tagger` with `git hash-object -t tag -w --stdin`.
>
> I've force pushed to [1] (Travis is still running) and have set off the
> process of re-rewriting the devicetree tree from scratch (a multi-day
> affair) to validate (it's looking good).

Thanks.


Re: KDevelop developers obviously don't use git :(

2017-09-21 Thread Daniel Santos
On 09/21/2017 11:38 PM, Junio C Hamano wrote:
> Daniel Santos  writes:
>
>> So first I need to inquire if there is a current mechanism to say "git
>> status, but please fail if you're busy with a non-atomic operation".
> Reading this list sometimes makes me wonder if there is some higher
> intelligence telling independent souls to come up with the same wish
> at the same time.
>
> https://public-inbox.org/git/20170921050835.mrbgx2zryy3ju...@sigill.intra.peff.net/

LMAO! This has been irritating me for some time, but I had to abort a
somewhat complex interactive rebase earlier and it finally drove me to
action.  I'm juggling so many projects right now that I really don't
want to fix this myself, but. grrr! and raaarh!, etc.

Daniel


[PATCH 4/3] branch: fix "copy" to never touch HEAD

2017-09-21 Thread Junio C Hamano
When creating a new branch B by copying the branch A that happens to
be the current branch, it also updates HEAD to point at the new
branch.  It probably was made this way because "git branch -c A B"
piggybacked its implementation on "git branch -m A B",

This does not match the usual expectation.  If I were sitting on a
blue chair, and somebody comes and repaints it to red, I would
accept ending up sitting on a chair that is now red (I am also OK to
stand, instead, as there no longer is my favourite blue chair).  But
if somebody creates a new red chair, modelling it after the blue
chair I am sitting on, I do not expect to be booted off of the blue
chair and ending up on sitting on the new red one.

Let's fix this before it hits 'next'.  Those who want to create a
new branch and switch to it can do "git checkout B" after doing a
"git branch -c B", and if that operation is so useful and deserves a
short-hand way to do so, perhaps extend "git checkout -b B" to copy
configurations while creating the new branch B.

Signed-off-by: Junio C Hamano 
---

 * Let's send an updated one as a follow-up to the discussion thread
   that it is a follow-up to.  The patch in this message is the same
   as the one I previously sent; the proposed log message has been
   updated somewhat.

 builtin/branch.c  |  9 +++--
 t/t3200-branch.sh | 10 +-
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 89f64f4123..e2e3692838 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -506,12 +506,9 @@ static void copy_or_rename_branch(const char *oldname, 
const char *newname, int
oldref.buf + 11);
}
 
-   if (replace_each_worktree_head_symref(oldref.buf, newref.buf, 
logmsg.buf)) {
-   if (copy)
-   die(_("Branch copied to %s, but HEAD is not updated!"), 
newname);
-   else
-   die(_("Branch renamed to %s, but HEAD is not 
updated!"), newname);
-   }
+   if (!copy &&
+   replace_each_worktree_head_symref(oldref.buf, newref.buf, 
logmsg.buf))
+   die(_("Branch renamed to %s, but HEAD is not updated!"), 
newname);
 
strbuf_release(&logmsg);
 
diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 5d03ad16f6..be9b3784c6 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -422,7 +422,7 @@ test_expect_success 'git branch --copy is a synonym for -c' 
'
test_cmp expect actual
 '
 
-test_expect_success 'git branch -c ee ef should copy and checkout branch ef' '
+test_expect_success 'git branch -c ee ef should copy to create branch ef' '
git checkout -b ee &&
git reflog exists refs/heads/ee &&
git config branch.ee.dummy Hello &&
@@ -431,7 +431,7 @@ test_expect_success 'git branch -c ee ef should copy and 
checkout branch ef' '
git reflog exists refs/heads/ef &&
test $(git config branch.ee.dummy) = Hello &&
test $(git config branch.ef.dummy) = Hello &&
-   test $(git rev-parse --abbrev-ref HEAD) = ef
+   test $(git rev-parse --abbrev-ref HEAD) = ee
 '
 
 test_expect_success 'git branch -c f/f g/g should work' '
@@ -494,12 +494,12 @@ test_expect_success 'git branch -C c1 c2 should succeed 
when c1 is checked out'
git checkout -b c1 &&
git branch c2 &&
git branch -C c1 c2 &&
-   test $(git rev-parse --abbrev-ref HEAD) = c2
+   test $(git rev-parse --abbrev-ref HEAD) = c1
 '
 
-test_expect_success 'git branch -C c1 c2 should add entries to .git/logs/HEAD' 
'
+test_expect_success 'git branch -C c1 c2 should never touch HEAD' '
msg="Branch: copied refs/heads/c1 to refs/heads/c2" &&
-   grep "$msg$" .git/logs/HEAD
+   ! grep "$msg$" .git/logs/HEAD
 '
 
 test_expect_success 'git branch -C master should work when master is checked 
out' '
-- 
2.14.1-929-g25eae544e9



Re: [PATCH 2/4] commit-slab.h: avoid -Wsign-compare warnings

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 05:47:36PM +0100, Ramsay Jones wrote:

> diff --git a/commit-slab.h b/commit-slab.h
> index 333d81e37..dcaab8ca0 100644
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -78,7 +78,7 @@ static MAYBE_UNUSED void init_ ##slabname(struct slabname 
> *s)   \
>   \
>  static MAYBE_UNUSED void clear_ ##slabname(struct slabname *s)   
> \
>  {\
> - int i;  \
> + unsigned int i; \
>   for (i = 0; i < s->slab_count; i++) \
>   free(s->slab[i]);   \
>   s->slab_count = 0;  \
> @@ -89,13 +89,13 @@ static MAYBE_UNUSED elemtype *slabname## _at_peek(struct 
> slabname *s, \
> const struct commit *c, \
> int add_if_missing)   \
>  {\
> - int nth_slab, nth_slot; \
> + unsigned int nth_slab, nth_slot;\

I have a feeling that in the long run these should all be size_t, but
it's probably pretty unlikely to overflow in practice. At any rate, the
slab index itself is an unsigned, so it probably makes sense to match
that for now.

-Peff


Re: [PATCH 3/4] cache.h: hex2chr() - avoid -Wsign-compare warnings

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 05:48:38PM +0100, Ramsay Jones wrote:

> diff --git a/cache.h b/cache.h
> index a916bc79e..a0e3e362c 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1243,8 +1243,8 @@ static inline unsigned int hexval(unsigned char c)
>   */
>  static inline int hex2chr(const char *s)
>  {
> - int val = hexval(s[0]);
> - return (val < 0) ? val : (val << 4) | hexval(s[1]);
> + unsigned int val = hexval(s[0]);
> + return (val & ~0xf) ? val : (val << 4) | hexval(s[1]);
>  }

Ironically, the unsigned return from hexval() comes from internally
converting the signed char in hexval_table. And then we again return it
as a signed int from hex2chr().

Would it make sense to return a signed int from hexval()? That would
make hex2chr just work as it tries to above. I admit that shifting
signed values is a little funny, but it should be fine here since we
know they're no larger than 8 bits in the first place.

As an aside, I also see some uses of hexval() that don't appear to be
quite as rigorous in checking for invalid characters. A few
unconditionally shift the first nibble and assume that there will still
be high bits set. I think that's generally true for twos-complement
negative numbers, but isn't shifting off the left side of a signed
integer undefined behavior?

And mailinfo's decode_q_segment() does not seem to check for errors at
all.

Handling those is getting far off your original patch, but I'm having
trouble figuring out if it's saner for us to consistently stick to
all-signed or all-unsigned here.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-21 Thread Daniel Santos
On 09/20/2017 11:32 PM, Jeff King wrote:
> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.
>
> -Peff
>
> -- >8 --
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository. Some of these commands may
> refresh the index and write out the result in an
> opportunistic way: if they can get the index lock, then they
> update the on-disk index with any updates they find. And if
> not, then their in-core refresh is lost and just has to be
> recomputed by the next caller.
>
> But taking the index lock may conflict with other operations
> in the repository. Especially ones that the user is doing
> themselves, which _aren't_ opportunistic. In other words,
> "git status" knows how to back off when somebody else is
> holding the lock, but other commands don't know that status
> would be happy to drop the lock if somebody else wanted it.

Interestingly, this usually slaps me when performing an _interactive_
rebase.  It occurred to me that if I'm performing an interaction
operation, it doesn't seem unreasonable for git wait up to 125ms or so
for the lock and then prompting the user to ask if they want to continue
waiting for the lock.

> There are a couple possible solutions:
>
>   1. Have some kind of "pseudo-lock" that allows other
>  commands to tell status that they want the lock.
>
>  This is likely to be complicated and error-prone to
>  implement (and maybe even impossible with just
>  dotlocks to work from, as it requires some
>  inter-process communication).
>
>   2. Avoid background runs of commands like "git status"
>  that want to do opportunistic updates, preferring
>  instead plumbing like diff-files, etc.
>
>  This is awkward for a couple of reasons. One is that
>  "status --porcelain" reports a lot more about the
>  repository state than is available from individual
>  plumbing commands. And two is that we actually _do_
>  want to see the refreshed index. We just don't want to
>  take a lock or write out the result. Whereas commands
>  like diff-files expect us to refresh the index
>  separately and write it to disk so that they can depend
>  on the result. But that write is exactly what we're
>  trying to avoid.
>
>   3. Ask "status" not to lock or write the index.
>
>  This is easy to implement. The big downside is that any
>  work done in refreshing the index for such a call is
>  lost when the process exits. So a background process
>  may end up re-hashing a changed file multiple times
>  until the user runs a command that does an index
>  refresh themselves.

That is not necessarily the case.  I don't actually know git on the
inside, but I would ask you to consider a read-write lock and a hybrid
of one and three.

I don't know what dotlocks are, but I'm certain that you can implement a
rw lock using lock files and no other IPC, although it does increase the
complexity.  The way this works is that `git status' acquires a read
lock and does its thing.  If it has real changes, instead of discarding
them it attempts to upgrade to a write lock.  If that fails, you throw
it away, otherwise you write them and release.

In order to implement rw locks with only lock files, "off the cuff" I
say you have a single "lock list" file that should never be deleted and
a "lock lock" file that is held in order to read or modify the list. 
The format of the lock list would have a pair of 32-bit wrapping
modification counts (or versions) at the top -- one for modifications to
the lock list its self and another for modifications to the underlying
data (i.e., the number of times a write lock has been acquired).  This
header is followed by entries something like this:

   

  'r' if waiting for a read lock
 'R' if actively reading
 'w' if waiting for write lock
 'W' if actively writing
The pid
If active, the version of the data at the time lock acquired or 
zero.
  Time began waiting or time lock acquired


An operation of 'r' or 'w' means that you are waiting and upper case
means that you are active.   is the version of the data at the
time the lock became active and writers increment it when they acquire a
lock.  You wait with file alteration notification on the lock list (if
there is any doubt based upon timestamp precision then you can examine
the lock list version).  When you want to read o