strange behavior of git diff-index

2019-05-22 Thread Oussama Ghorbel
Hi,

git diff-index is giving me incorrect information, however if I run git diff, 
then git diff-index again it will show the correct information.
The steps are the following:
$ git diff-index --name-only HEAD
git appears to list all files in the project irrespective if they modified or 
not
$ git diff
$ git diff --cached
will show nothing. So I don't have any modification. 
Now strangely if I run git diff-index, it will also show nothing which is 
correction behavior.

my git version is 2.7.4
Any explanation?

Thanks,
Oussama

[ANNOUNCE] Git Rev News edition 51

2019-05-22 Thread Christian Couder
Hi everyone,

The 51th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2019/05/22/edition-51/

Thanks a lot to David Pursehouse, Luca Milanesio and Denton Liu who
contributed this month!

Enjoy,
Christian, Jakub, Markus and Gabriel.


Re: [PATCH] upload-pack: strip namespace from symref data

2019-05-22 Thread Ævar Arnfjörð Bjarmason


On Wed, May 22 2019, Jeff King wrote:

> Since 7171d8c15f (upload-pack: send symbolic ref information as
> capability, 2013-09-17), we've sent cloning and fetching clients special
> information about which branch HEAD is pointing to, so that they don't
> have to guess based on matching up commit ids.

I'd add a 2nd paragraph here:

The client will then use the corresponding code added in a45b5f0552
("connect: annotate refs with their symref information in
get_remote_head()", 2013-09-17) to find the intended symref. See
9907d1359c ("Merge branch 'jc/upload-pack-send-symref'", 2013-10-30)
for the full set of changes related to this.

...because, see later...

> However, this feature has never worked properly with the GIT_NAMESPACE
> feature.  Because upload-pack uses head_ref_namespaced(find_symref), we
> do find and report on refs/namespaces/foo/HEAD instead of the actual
> HEAD of the repo. This makes sense, since the branch pointed to by the
> top-level HEAD may not be advertised at all. But we do two things wrong:
>
>   1. We report the full name refs/namespaces/foo/HEAD, instead of just
>  HEAD. Meaning no client is going to bother doing anything with that
>  symref, since we're not otherwise advertising it.
>
>   2. We report the symref destination using its full name (e.g.,
>  refs/namespaces/foo/refs/heads/master). That's similarly useless to
>  the client, who only saw "refs/heads/master" in the advertisement.
>
> We should be stripping the namespace prefix off of both places (which
> this patch fixes).
>
> Likely nobody noticed because bug (1) means that from the client's
> perspective, we did not report on HEAD at all. And thus it uses the
> pre-7171d8c15f fallback code to guess the correct HEAD, which is usually
> right. It only falls down in ambiguous cases (like the one laid out in
> the included test).

...because here you're talking about "the client's perspective" and "it
uses the pre-7171d8c15f [...] code", but this should say "the
pre-a45b5f0552 code", i.e. mention the commit that changed the *client*
logic.

Well, the client also "uses" the server-side code indirectly, but I
think it's easier to follow along if we note what both sides are (as I
dug up when reviewing this...).

> This also means that we don't have to worry about breaking anybody who
> was putting pre-stripped names into their namespace symrefs when we fix
> bug (2). Because of bug (1), nobody would have been using the symref we
> advertised in the first place (not to mention that those symrefs would
> have appeared broken for any non-namespaced access).
>
> Note that we have separate fixes here for the v0 and v2 protocols. The
> symref advertisement moved in v2 to be a part of the ls-refs command.
> This actually gets part (1) right, since the symref annotation
> piggy-backs on the existing ref advertisement, which is properly
> stripped. But it still needs a fix for part (2). The included tests
> cover both protocols.
>
> Reported-by: Bryan Turner 
> Signed-off-by: Jeff King 
> ---
> This is the same as my earlier fix, but with the v2 bit added, and of
> course tests and a commit message. Thanks (as usual) for a helpful bug
> report.
>
> I don't know if we have a general philosophy for testing v0 versus v2.
> Without specifying the protocol at all, we'd catch the former on a
> regular run and the latter under a GIT_TEST_PROTOCOL_VERSION=2 run. So
> we _could_ just rely on that, but since I had to do two separate fixes,
> it made sense to me to include explicit tests.

Yeah, we shouldn't rely on the GIT_TEST_* stuff for coverage, it should
just be used to find things that fall between the cracks.

>  ls-refs.c|  3 ++-
>  t/t5509-fetch-push-namespaces.sh | 28 
>  upload-pack.c|  4 ++--
>  3 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index 0a7dbc6442..818aef70a0 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -57,7 +57,8 @@ static int send_ref(const char *refname, const struct 
> object_id *oid,
>   if (!symref_target)
>   die("'%s' is a symref but it is not?", refname);
>
> - strbuf_addf(&refline, " symref-target:%s", symref_target);
> + strbuf_addf(&refline, " symref-target:%s",
> + strip_namespace(symref_target));
>   }
>
>   if (data->peel) {
> diff --git a/t/t5509-fetch-push-namespaces.sh 
> b/t/t5509-fetch-push-namespaces.sh
> index c88df78c0b..75cbfcc392 100755
> --- a/t/t5509-fetch-push-namespaces.sh
> +++ b/t/t5509-fetch-push-namespaces.sh
> @@ -124,4 +124,32 @@ test_expect_success 'try to update a hidden full ref' '
>   test_must_fail git -C original push pushee-namespaced master
>  '
>
> +test_expect_success 'set up ambiguous HEAD' '
> + git init ambiguous &&
> + (
> + cd ambiguous &&
> + git commit --allow-empty -m foo &&
> + 

GIT by github 2.21.0 got reviewed on Software Informer

2019-05-22 Thread Kasey Bloome
Good day!

 would like to inform you that your product GIT by github 2.21.0 has been 
reviewed by our editors and your program got "Editor's Pick Award", "100% Clean 
Award". You can read the review at https://git.software.informer.com/.

We would be grateful if you place our award with a link to our review on your 
website. On our part, we can offer featuring your program in our Today's 
Highlight block. This block is shown in the rotator at the top of the main page 
and also on every page of our website in the upper left corner.

You are welcome to order a free video tutorial for your program as well. Once 
it's finished, you can add it to your website.

We also offer you to take advantage of our free storage by hosting your 
installation package on our servers and listing us as one of the mirror 
downloads for your program. There is a selection of predesigned buttons 
available to fit the look of your website.

Please let me know if you're interested in any of these offers.

We are on the list of the world's 1000 most visited websites, so this could get 
your program some extra exposure.

Kind regards,
Kasey Bloome




Re: [PATCH 0/1] trace2: fix tracing when NO_PTHREADS is defined

2019-05-22 Thread Jeff Hostetler




On 5/21/2019 5:27 PM, Jeff King wrote:

On Tue, May 21, 2019 at 12:33:58PM -0700, Jeff Hostetler via GitGitGadget wrote:


As Duy suggested, pthread_getspecific() just returns NULL when NO_PTHREADS
is defined. And pthread_setspecific() silently does not nothing. So this
problem was hidden from view.

I have to wonder if we should update pthread_*specific() to call BUG() when
NO_PTHREADS is defined as a way to catch unguarded usages easier or make
this issue more clear.


I think it should actually store the data asked for by the caller, as if
we were the single thread running. We discussed this as the time of
refactoring NO_PTHREADS, but there was only one caller that would have
benefited. Now there are two. ;)

Discussion in the subthread of this patch:

   https://public-inbox.org/git/20181027071003.1347-2-pclo...@gmail.com/

-Peff



I was wondering about that too as the proper long term solution.
We would need (as the discussion suggests [1]) to properly
respect/represent the pthread_key_t argument.

For now, I've guarded my usage of pthread_getspecific() in the trace2
(similar to what index-pack does), so its not urgent that we update it.
And I'd rather we take this simple trace2 fix now and not try to combine
it with fixes for the pthread macros.  Especially now as we're in the RC
cycle for 2.22.

I'll make a note to revisit the pthread code after 2.22.

Thanks
Jeff


[1] 
https://public-inbox.org/git/CACsJy8DLW_smOJd6aCoRcJZxQ2Lzut5US=sPadj7=fhne0u...@mail.gmail.com/




RE: [Breakage] 2.22.0-rc1 t5401-update-hooks.sh

2019-05-22 Thread Randall S. Becker
On May 21, 2019 20:48, brian m. carlson wrote:
> To: Randall S. Becker 
> Cc: 'Git Mailing List' 
> Subject: Re: [Breakage] 2.22.0-rc1 t5401-update-hooks.sh
> 
> On 2019-05-21 at 21:47:54, Randall S. Becker wrote:
> > When running the test in isolation, it passes without incident whether
> > or not --verbose is used. So far, this only occurs on the first run
> > through. I wanted to report it, based on the inconsistency of results.
> > This is not the first time tests have acted in this fashion, and I
> > realize it is difficult to do anything about it without being able to 
> > recreate
> the situation.
> 
> Does running git clean -dxf cause it to be reproducible?

I will give it a go. Having exactly the same behaviour in t7519 subtest 19. I 
wonder whether there are breadcrumbs not being cleaned up. Will report back 
when I am able - may take a day or so.

Cheers,
Randall




Re: strange behavior of git diff-index

2019-05-22 Thread Johannes Schindelin
Hi Oussama,

On Wed, 22 May 2019, Oussama Ghorbel wrote:

> git diff-index is giving me incorrect information, however if I run git diff, 
> then git diff-index again it will show the correct information.
> The steps are the following:
> $ git diff-index --name-only HEAD
> git appears to list all files in the project irrespective if they modified or 
> not
> $ git diff
> $ git diff --cached
> will show nothing. So I don't have any modification. 
> Now strangely if I run git diff-index, it will also show nothing which is 
> correction behavior.
>
> my git version is 2.7.4
> Any explanation?

The low-level `diff-index` command is meant to be used in scripts, and
does specifically *not* refresh the index before running. Meaning that it
could possibly mistake a file for being modified even if it is not
modified, just because it is marked "modified" in the index [*1*].

Short answer: use the high-level command `git diff` that is intended for
human consumption.

Ciao,
Johannes

Footnote *1*: It is actually a bit more complicated than that: the index
stores metadata such as mtime, size, uid, etc, and compares that to the
metadata on disk. If there is any mismatch, or if everything matches but
the mtime *also* matches the index file's itself, the file is considered
not up to date, i.e. marked as modified.


Re: strange behavior of git diff-index

2019-05-22 Thread Oussama Ghorbel
Hi Johannes,

Thanks a lot for the clarification!

Regards,
Oussama
On 5/22/19 4:54 PM, Johannes Schindelin wrote:
> Hi Oussama,
>
> On Wed, 22 May 2019, Oussama Ghorbel wrote:
>
>> git diff-index is giving me incorrect information, however if I run git 
>> diff, then git diff-index again it will show the correct information.
>> The steps are the following:
>> $ git diff-index --name-only HEAD
>> git appears to list all files in the project irrespective if they modified 
>> or not
>> $ git diff
>> $ git diff --cached
>> will show nothing. So I don't have any modification.
>> Now strangely if I run git diff-index, it will also show nothing which is 
>> correction behavior.
>>
>> my git version is 2.7.4
>> Any explanation?
> The low-level `diff-index` command is meant to be used in scripts, and
> does specifically *not* refresh the index before running. Meaning that it
> could possibly mistake a file for being modified even if it is not
> modified, just because it is marked "modified" in the index [*1*].
>
> Short answer: use the high-level command `git diff` that is intended for
> human consumption.
>
> Ciao,
> Johannes
>
> Footnote *1*: It is actually a bit more complicated than that: the index
> stores metadata such as mtime, size, uid, etc, and compares that to the
> metadata on disk. If there is any mismatch, or if everything matches but
> the mtime *also* matches the index file's itself, the file is considered
> not up to date, i.e. marked as modified.


Re: Revision walking, commit dates, slop

2019-05-22 Thread Jakub Narebski
Derrick Stolee  writes:
> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
>> Jakub Narebski  writes:
>>> Derrick Stolee  writes:
 On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>
> Are there any blockers that prevent the switch to this
> "generation number v2"?
>> [...]
>> 
  Using the generation number column for the corrected
 commit-date offsets (assuming we also guarantee the offset is strictly
 increasing from parent to child), these new values will be backwards-
 compatible _except_ for 'git commit-graph verify'.
>>>
>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>> backward-compatibile in use (though not in generation and validation).
>>>
>>> Do I understand it correctly how it is calculated:
>>>
>>>   corrected_date(C) = max(committer_date(C),
>>>   max_{P ∈ parents(C)}(corrected_date(P)) + 1)
>> 
>> This should probably read
>> 
>> offset_date(P) = committer_date(P) + gen_v2(P)
>> corrected_date(C) = max(committer_date(C),
>> max_{P ∈ parents(C)}(offset_date(P)) + 1)

Restating it yet again:

   A.  corrected_date(C) = max(committer_date(C),
   max_P(committer_date(P) + offset(P)) + 1)

   B.  offset(C) = max(corrected_date(C) - committer_date(C),
   max_P(offset(P)) + 1)

> The final definition needs two conditions on the offset of a commit C for
> every parent P:
>
>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>  2. offset(C) > offset(P)

The equation (B) ensures the (2) condition, i.e offset(C) > offset(P).
The equation (A) ensures that condition (1) is fulfulled, because from
(B) we have

   corrected_date(C) <= committer_date(C) + offset(C)

This from (B) and (A( we get:

   committer_date(C) + offset(C) >= corrected_date(C) >
 >  committer_date(P) + offset(P)

> Condition (1) will give us the performance benefits related to the
> committer-date heuristic. Condition (2) will give us backwards-compatibility
> with generation numbers.

Well, we should check/test if performance benefits of "offset date"
("corrected date with rising offset") truly holds.

Best,
--
Jakub Narębski


standalone library/tool to query commit-graph?

2019-05-22 Thread Karl Ostmo
After producing the file ".git/objects/info/commit-graph" with the
command "git commit-graph write", is there a way to answer queries
like "git merge-base --is-ancestor" without having a .git directory?
E.g. is there a library that will operate on the "commit-graph" file
all by itself?

Thanks,
Karl


Re: standalone library/tool to query commit-graph?

2019-05-22 Thread Derrick Stolee
On 5/22/2019 2:49 PM, Karl Ostmo wrote:
> After producing the file ".git/objects/info/commit-graph" with the
> command "git commit-graph write", is there a way to answer queries
> like "git merge-base --is-ancestor" without having a .git directory?
> E.g. is there a library that will operate on the "commit-graph" file
> all by itself?

You could certainly build such a tool, assuming your merge-base parameters are
full-length commit ids. If you try to start at ref names, you'll need the .git
directory.

I would not expect such a tool to ever exist in the Git codebase. Instead, you
would need a new project, say "graph-analyzer --graph= --is-ancestor 
 "

Thanks,
-Stolee


Re: Revision walking, commit dates, slop

2019-05-22 Thread Derrick Stolee
On 5/22/2019 2:29 PM, Jakub Narebski wrote:
> Derrick Stolee  writes:
>> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
> Restating it yet again:
> 
>A.  corrected_date(C) = max(committer_date(C),
>max_P(committer_date(P) + offset(P)) + 1)
> 
>B.  offset(C) = max(corrected_date(C) - committer_date(C),
>max_P(offset(P)) + 1)

The problem with this definition is that it "defines" the corrected date, and
then _adjusts_ it by updating the offset. I consider

corrected_date(C) = committer_date(C) + offset(C)

to be part of the definition. You could restate the definition as follows:

corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
 max_P(corrected_date(P)))

or, equivalently

corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
 max_P(committer_date(P) + offset(P)))

This definition, in a single step, satisfies the conditions below:

> 
>> The final definition needs two conditions on the offset of a commit C for
>> every parent P:
>>
>>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>>  2. offset(C) > offset(P)

Plus, the "+ 1" in the first step takes into account that "0" is a special 
offset
value in the commit-graph file format meaning "not computed".

> Well, we should check/test if performance benefits of "offset date"
> ("corrected date with rising offset") truly holds.

Yes, a full performance test will be required. I have full confidence that the
monotonic offset requirement will have only positive effect. That is, it will
not affect the case where committer-date was better than generation number, but 
will
help the cases where all the committer-dates are equal.

Thanks,
-Stolee



[PATCH v2 07/11] commit-graph: write commit-graph chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Extend write_commit_graph() to write a commit-graph chain when given the
COMMIT_GRAPH_SPLIT flag.

This implementation is purposefully simplistic in how it creates a new
chain. The commits not already in the chain are added to a new tip
commit-graph file.

Much of the logic around writing a graph-{hash}.graph file and updating
the commit-graph-chain file is the same as the commit-graph file case.
However, there are several places where we need to do some extra logic
in the split case.

Track the list of graph filenames before and after the planned write.
This will be more important when we start merging graph files, but it
also allows us to upgrade our commit-graph file to the appropriate
graph-{hash}.graph file when we upgrade to a chain of commit-graphs.

Note that we use the eighth byte of the commit-graph header to store the
number of base graph files. This determines the length of the base
graphs chunk.

A subtle change of behavior with the new logic is that we do not write a
commit-graph if we our commit list is empty. This extends to the typical
case, which is reflected in t5318-commit-graph.sh.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 286 ++--
 commit-graph.h  |   2 +
 t/t5318-commit-graph.sh |   2 +-
 3 files changed, 278 insertions(+), 12 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b8a1444217..fb972c0bc2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -300,12 +300,18 @@ static struct commit_graph *load_commit_graph_one(const 
char *graph_file)
 
struct stat st;
int fd;
+   struct commit_graph *g;
int open_ok = open_commit_graph(graph_file, &fd, &st);
 
if (!open_ok)
return NULL;
 
-   return load_commit_graph_one_fd_st(fd, &st);
+   g = load_commit_graph_one_fd_st(fd, &st);
+
+   if (g)
+   g->filename = xstrdup(graph_file);
+
+   return g;
 }
 
 static int prepare_commit_graph_v1(struct repository *r, const char *obj_dir)
@@ -709,8 +715,19 @@ struct write_commit_graph_context {
struct progress *progress;
int progress_done;
uint64_t progress_cnt;
+
+   char *base_graph_name;
+   int num_commit_graphs_before;
+   int num_commit_graphs_after;
+   char **commit_graph_filenames_before;
+   char **commit_graph_filenames_after;
+   char **commit_graph_hash_after;
+   uint32_t new_num_commits_in_base;
+   struct commit_graph *new_base_graph;
+
unsigned append:1,
-report_progress:1;
+report_progress:1,
+split:1;
 };
 
 static void write_graph_chunk_fanout(struct hashfile *f,
@@ -780,6 +797,16 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
  ctx->commits.nr,
  commit_to_sha1);
 
+   if (edge_value >= 0)
+   edge_value += ctx->new_num_commits_in_base;
+   else {
+   uint32_t pos;
+   if (find_commit_in_graph(parent->item,
+ctx->new_base_graph,
+&pos))
+   edge_value = pos;
+   }
+
if (edge_value < 0)
BUG("missing parent %s for commit %s",
oid_to_hex(&parent->item->object.oid),
@@ -800,6 +827,17 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
  ctx->commits.list,
  ctx->commits.nr,
  commit_to_sha1);
+
+   if (edge_value >= 0)
+   edge_value += ctx->new_num_commits_in_base;
+   else {
+   uint32_t pos;
+   if (find_commit_in_graph(parent->item,
+ctx->new_base_graph,
+&pos))
+   edge_value = pos;
+   }
+
if (edge_value < 0)
BUG("missing parent %s for commit %s",
oid_to_hex(&parent->item->object.oid),
@@ -857,6 +895,16 @@ static void write_graph_chunk_extra_edges(struct hashfile 
*f,
  ctx->commits.nr,
  commit_to_sha1);
 
+   if (edge_value >= 0)
+   edge_value += ctx->new_num_commits_in_base;
+   else {
+   

[PATCH v2 05/11] commit-graph: add base graphs chunk

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

To quickly verify a commit-graph chain is valid on load, we will
read from the new "Base Graphs Chunk" of each file in the chain.
This will prevent accidentally loading incorrect data from manually
editing the commit-graph-chain file or renaming graph-{hash}.graph
files.

The commit_graph struct already had an object_id struct "oid", but
it was never initialized or used. Add a line to read the hash from
the end of the commit-graph file and into the oid member.

Signed-off-by: Derrick Stolee 
---
 .../technical/commit-graph-format.txt | 11 --
 commit-graph.c| 22 +++
 commit-graph.h|  1 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
index 16452a0504..a4f17441ae 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -44,8 +44,9 @@ HEADER:
 
   1-byte number (C) of "chunks"
 
-  1-byte (reserved for later use)
- Current clients should ignore this value.
+  1-byte number (B) of base commit-graphs
+  We infer the length (H*B) of the Base Graphs chunk
+  from this value.
 
 CHUNK LOOKUP:
 
@@ -92,6 +93,12 @@ CHUNK DATA:
   positions for the parents until reaching a value with the 
most-significant
   bit on. The other bits correspond to the position of the last parent.
 
+  Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
+  This list of H-byte hashes describe a set of B commit-graph files that
+  form a commit-graph chain. The graph position for the ith commit in this
+  file's OID Lookup chunk is equal to i plus the number of commits in all
+  base graphs.  If B is non-zero, this chunk must exist.
+
 TRAILER:
 
H-byte HASH-checksum of all of the above.
diff --git a/commit-graph.c b/commit-graph.c
index 70e44393b8..060897fff0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -22,6 +22,7 @@
 #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
 #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */
+#define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
@@ -262,6 +263,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
else
graph->chunk_extra_edges = data + chunk_offset;
break;
+
+   case GRAPH_CHUNKID_BASE:
+   if (graph->chunk_base_graphs)
+   chunk_repeated = 1;
+   else
+   graph->chunk_base_graphs = data + chunk_offset;
}
 
if (chunk_repeated) {
@@ -280,6 +287,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
last_chunk_offset = chunk_offset;
}
 
+   hashcpy(graph->oid.hash, graph->data + graph->data_len - 
graph->hash_len);
+
if (verify_commit_graph_lite(graph))
return NULL;
 
@@ -315,8 +324,21 @@ static int add_graph_to_chain(struct commit_graph *g,
 {
struct commit_graph *cur_g = chain;
 
+   if (n && !g->chunk_base_graphs) {
+   warning(_("commit-graph has no base graphs chunk"));
+   return 0;
+   }
+
while (n) {
n--;
+
+   if (!oideq(&oids[n], &cur_g->oid) ||
+   !hasheq(oids[n].hash, g->chunk_base_graphs + g->hash_len * 
n)) {
+   warning(_("commit-graph chain does not match"));
+   return 0;
+   }
+
+
cur_g = cur_g->base_graph;
}
 
diff --git a/commit-graph.h b/commit-graph.h
index f9fe32ebe3..80f4917ddb 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -55,6 +55,7 @@ struct commit_graph {
const unsigned char *chunk_oid_lookup;
const unsigned char *chunk_commit_data;
const unsigned char *chunk_extra_edges;
+   const unsigned char *chunk_base_graphs;
 };
 
 struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st);
-- 
gitgitgadget



[PATCH v2 04/11] commit-graph: load commit-graph chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Prepare the logic for reading a chain of commit-graphs.

First, look for a file at $OBJDIR/info/commit-graph. If it exists,
then use that file and stop.

Next, look for the chain file at $OBJDIR/info/commit-graphs/commit-graph-chain.
If this file exists, then load the hash values as line-separated values in that
file and load $OBJDIR/info/commit-graphs/graph-{hash[i]}.graph for each hash[i]
in that file. The file is given in order, so the first hash corresponds to the
"base" file and the final hash corresponds to the "tip" file.

This implementation assumes that all of the graph-{hash}.graph files are in
the same object directory as the commit-graph-chain file. This will be updated
in a future change. This change is purposefully simple so we can isolate the
different concerns.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 98 +++---
 1 file changed, 93 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e2f438f6a3..70e44393b8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,6 +45,19 @@ char *get_commit_graph_filename(const char *obj_dir)
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static char *get_split_graph_filename(const char *obj_dir,
+ const char *oid_hex)
+{
+   return xstrfmt("%s/info/commit-graphs/graph-%s.graph",
+  obj_dir,
+  oid_hex);
+}
+
+static char *get_chain_filename(const char *obj_dir)
+{
+   return xstrfmt("%s/info/commit-graphs/commit-graph-chain", obj_dir);
+}
+
 static uint8_t oid_version(void)
 {
return 1;
@@ -286,18 +299,93 @@ static struct commit_graph *load_commit_graph_one(const 
char *graph_file)
return load_commit_graph_one_fd_st(fd, &st);
 }
 
+static int prepare_commit_graph_v1(struct repository *r, const char *obj_dir)
+{
+   char *graph_name = get_commit_graph_filename(obj_dir);
+   r->objects->commit_graph = load_commit_graph_one(graph_name);
+   free(graph_name);
+
+   return r->objects->commit_graph ? 0 : -1;
+}
+
+static int add_graph_to_chain(struct commit_graph *g,
+ struct commit_graph *chain,
+ struct object_id *oids,
+ int n)
+{
+   struct commit_graph *cur_g = chain;
+
+   while (n) {
+   n--;
+   cur_g = cur_g->base_graph;
+   }
+
+   g->base_graph = chain;
+
+   if (chain)
+   g->num_commits_in_base = chain->num_commits + 
chain->num_commits_in_base;
+
+   return 1;
+}
+
+static void prepare_commit_graph_chain(struct repository *r, const char 
*obj_dir)
+{
+   struct strbuf line = STRBUF_INIT;
+   struct stat st;
+   struct object_id *oids;
+   int i = 0, valid = 1;
+   char *chain_name = get_chain_filename(obj_dir);
+   FILE *fp;
+
+   if (stat(chain_name, &st))
+   return;
+
+   if (st.st_size <= the_hash_algo->hexsz)
+   return;
+
+   fp = fopen(chain_name, "r");
+   free(chain_name);
+
+   if (!fp)
+   return;
+
+   oids = xcalloc(st.st_size / (the_hash_algo->hexsz + 1), sizeof(struct 
object_id));
+
+   while (strbuf_getline_lf(&line, fp) != EOF && valid) {
+   char *graph_name;
+   struct commit_graph *g;
+
+   if (get_oid_hex(line.buf, &oids[i])) {
+   warning(_("invalid commit-graph chain: line '%s' not a 
hash"),
+   line.buf);
+   valid = 0;
+   break;
+   }
+
+   graph_name = get_split_graph_filename(obj_dir, line.buf);
+   g = load_commit_graph_one(graph_name);
+   free(graph_name);
+
+   if (g && add_graph_to_chain(g, r->objects->commit_graph, oids, 
i))
+   r->objects->commit_graph = g;
+   else
+   valid = 0;
+   }
+
+   free(oids);
+   fclose(fp);
+}
+
 static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
 {
-   char *graph_name;
 
if (r->objects->commit_graph)
return;
 
-   graph_name = get_commit_graph_filename(obj_dir);
-   r->objects->commit_graph =
-   load_commit_graph_one(graph_name);
+   if (!prepare_commit_graph_v1(r, obj_dir))
+   return;
 
-   FREE_AND_NULL(graph_name);
+   prepare_commit_graph_chain(r, obj_dir);
 }
 
 /*
-- 
gitgitgadget



[PATCH v2 09/11] commit-graph: merge commit-graph chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When searching for a commit in a commit-graph chain of G graphs with N
commits, the search takes O(G log N) time. If we always add a new tip
graph with every write, the linear G term will start to dominate and
slow the lookup process.

To keep lookups fast, but also keep most incremental writes fast, create
a strategy for merging levels of the commit-graph chain. The strategy is
detailed in the commit-graph design document, but is summarized by these
two conditions:

  1. If the number of commits we are adding is more than half the number
 of commits in the graph below, then merge with that graph.

  2. If we are writing more than 64,000 commits into a single graph,
 then merge with all lower graphs.

The numeric values in the conditions above are currently constant, but
can become config options in a future update.

As we merge levels of the commit-graph chain, check that the commits
still exist in the repository. A garbage-collection operation may have
removed those commits from the object store and we do not want to
persist them in the commit-graph chain. This is a non-issue if the
'git gc' process wrote a new, single-level commit-graph file.

After we merge levels, the old graph-{hash}.graph files are no longer
referenced by the commit-graph-chain file. We will expire these files in
a future change.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt |  81 ++
 commit-graph.c   | 190 +++
 t/t5323-split-commit-graph.sh|  13 ++
 3 files changed, 251 insertions(+), 33 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index 1dca3bd8fe..083fff9927 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -186,6 +186,87 @@ positions to refer to their parents, which may be in 
`graph-{hash1}.graph` or
 its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
 X2).
 
+Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
+specifying the hashes of all files in the lower layers. In the above example,
+`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
+`{hash0}` and `{hash1}`.
+
+## Merging commit-graph files
+
+If we only added a new commit-graph file on every write, we would run into a
+linear search problem through many commit-graph files.  Instead, we use a merge
+strategy to decide when the stack should collapse some number of levels.
+
+The diagram below shows such a collapse. As a set of new commits are added, it
+is determined by the merge strategy that the files should collapse to
+`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
+the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
+file.
+
+   +-+
+   | |
+   |(new commits)|
+   | |
+   +-+
+   | |
+ +---+  +-+
+ |  graph-{hash2} |->| |
+ +---+  +-+
+ | | |
+ +---+  +-+
+ |   |  | |
+ |  graph-{hash1} |->| |
+ |   |  | |
+ +---+  +-+
+ |  tmp_graphXXX
+ +---+
+ |   |
+ |   |
+ |   |
+ |  graph-{hash0} |
+ |   |
+ |   |
+ |   |
+ +---+
+
+During this process, the commits to write are combined, sorted and we write the
+contents to a temporary file, all while holding a `commit-graph-chain.lock`
+lock-file.  When the file is flushed, we rename it to `graph-{hash3}`
+according to the computed `{hash3}`. Finally, we write the new chain data to
+`commit-graph-chain.lock`:
+
+```
+   {hash3}
+   {hash0}
+```
+
+We then close the lock-file.
+
+## Merge Strategy
+
+When writing a set of commits that do not exist in the commit-graph stack of
+height N, we default to creating a new file at level N + 1. We then decide to
+merge with the Nth level if one of two conditions hold:
+
+  1. The expected file size for level N + 1 is at least half the file size for
+ level N.
+
+  2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64,
+ commits).
+
+This decision cascades down the levels: when we merge a level we create a new
+set of commits that then compares to the next level.
+
+The first condition bounds the number of levels to be logarithmic in the t

[PATCH v2 11/11] commit-graph: expire commit-graph files

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

As we merge commit-graph files in a commit-graph chain, we should clean
up the files that are no longer used.

This change introduces an 'expiry_window' value to the context, which is
always zero (for now). We then check the modified time of each
graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and
unlink the files that are older than the expiry_window.

Since this is always zero, this immediately clears all unused graph
files. We will update the value to match a config setting in a future
change.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 15 +
 commit-graph.c   | 71 
 t/t5323-split-commit-graph.sh|  2 +-
 3 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index b6fe8b2321..4ecec54148 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -267,6 +267,21 @@ The merge strategy values (2 for the size multiple, 64,000 
for the maximum
 number of commits) could be extracted into config settings for full
 flexibility.
 
+## Deleting graph-{hash} files
+
+After a new tip file is written, some `graph-{hash}` files may no longer
+be part of a chain. It is important to remove these files from disk, 
eventually.
+The main reason to delay removal is that another process could read the
+`commit-graph-chain` file before it is rewritten, but then look for the
+`graph-{hash}` files after they are deleted.
+
+To allow holding old split commit-graphs for a while after they are 
unreferenced,
+we update the modified times of the files when they become unreferenced. Then,
+we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}`
+files whose modified times are older than a given expiry window. This window
+defaults to zero, but can be changed using command-line arguments or a config
+setting.
+
 ## Chains across multiple object directories
 
 In a repo with alternates, we look for the `commit-graph-chain` file starting
diff --git a/commit-graph.c b/commit-graph.c
index 0d8b942e2b..69c4f83c2a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -739,6 +739,8 @@ struct write_commit_graph_context {
unsigned append:1,
 report_progress:1,
 split:1;
+
+   time_t expire_window;
 };
 
 static void write_graph_chunk_fanout(struct hashfile *f,
@@ -1629,6 +1631,70 @@ static void merge_commit_graphs(struct 
write_commit_graph_context *ctx)
deduplicate_commits(ctx);
 }
 
+static void mark_commit_graphs(struct write_commit_graph_context *ctx)
+{
+   uint32_t i;
+   time_t now = time(NULL);
+
+   for (i = ctx->num_commit_graphs_after - 1; i < 
ctx->num_commit_graphs_before; i++) {
+   struct stat st;
+   struct utimbuf updated_time;
+
+   stat(ctx->commit_graph_filenames_before[i], &st);
+
+   updated_time.actime = st.st_atime;
+   updated_time.modtime = now;
+   utime(ctx->commit_graph_filenames_before[i], &updated_time);
+   }
+}
+
+static void expire_commit_graphs(struct write_commit_graph_context *ctx)
+{
+   struct strbuf path = STRBUF_INIT;
+   DIR *dir;
+   struct dirent *de;
+   size_t dirnamelen;
+   time_t expire_time = time(NULL) - ctx->expire_window;
+
+   strbuf_addstr(&path, ctx->obj_dir);
+   strbuf_addstr(&path, "/info/commit-graphs");
+   dir = opendir(path.buf);
+
+   if (!dir) {
+   strbuf_release(&path);
+   return;
+   }
+
+   strbuf_addch(&path, '/');
+   dirnamelen = path.len;
+   while ((de = readdir(dir)) != NULL) {
+   struct stat st;
+   uint32_t i, found = 0;
+
+   strbuf_setlen(&path, dirnamelen);
+   strbuf_addstr(&path, de->d_name);
+
+   stat(path.buf, &st);
+
+   if (st.st_mtime > expire_time)
+   continue;
+   if (path.len < 6 || strcmp(path.buf + path.len - 6, ".graph"))
+   continue;
+
+   for (i = 0; i < ctx->num_commit_graphs_after; i++) {
+   if (!strcmp(ctx->commit_graph_filenames_after[i],
+   path.buf)) {
+   found = 1;
+   break;
+   }
+   }
+
+   if (!found)
+   unlink(path.buf);
+
+   }
+}
+
 int write_commit_graph(const char *obj_dir,
   struct string_list *pack_indexes,
   struct string_list *commit_hex,
@@ -1741,6 +1807,11 @@ int write_commit_graph(const char *obj_dir,
 
res = write_commit_graph_file(ctx);
 
+   if (ctx->split) {
+   mark_commit_graphs(ctx);
+   expire_commit_graphs(ctx);
+   }
+
 cleanu

[PATCH v2 06/11] commit-graph: rearrange chunk count logic

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The number of chunks in a commit-graph file can change depending on
whether we need the Extra Edges Chunk. We are going to add more optional
chunks, and it will be helpful to rearrange this logic around the chunk
count before doing so.

Specifically, we need to finalize the number of chunks before writing
the commit-graph header. Further, we also need to fill out the chunk
lookup table dynamically and using "num_chunks" as we add optional
chunks is useful for adding optional chunks in the future.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 35 +--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 060897fff0..b8a1444217 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1192,7 +1192,7 @@ static int write_commit_graph_file(struct 
write_commit_graph_context *ctx)
uint64_t chunk_offsets[5];
const unsigned hashsz = the_hash_algo->rawsz;
struct strbuf progress_title = STRBUF_INIT;
-   int num_chunks = ctx->num_extra_edges ? 4 : 3;
+   int num_chunks = 3;
 
ctx->graph_name = get_commit_graph_filename(ctx->obj_dir);
if (safe_create_leading_directories(ctx->graph_name)) {
@@ -1205,27 +1205,34 @@ static int write_commit_graph_file(struct 
write_commit_graph_context *ctx)
hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
 
-   hashwrite_be32(f, GRAPH_SIGNATURE);
-
-   hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, oid_version());
-   hashwrite_u8(f, num_chunks);
-   hashwrite_u8(f, 0); /* unused padding byte */
-
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
chunk_ids[2] = GRAPH_CHUNKID_DATA;
-   if (ctx->num_extra_edges)
-   chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES;
-   else
-   chunk_ids[3] = 0;
-   chunk_ids[4] = 0;
+   if (ctx->num_extra_edges) {
+   chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES;
+   num_chunks++;
+   }
+
+   chunk_ids[num_chunks] = 0;
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
-   chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges;
+
+   num_chunks = 3;
+   if (ctx->num_extra_edges) {
+   chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
+   4 * ctx->num_extra_edges;
+   num_chunks++;
+   }
+
+   hashwrite_be32(f, GRAPH_SIGNATURE);
+
+   hashwrite_u8(f, GRAPH_VERSION);
+   hashwrite_u8(f, oid_version());
+   hashwrite_u8(f, num_chunks);
+   hashwrite_u8(f, 0);
 
for (i = 0; i <= num_chunks; i++) {
uint32_t chunk_write[3];
-- 
gitgitgadget



[PATCH v2 03/11] commit-graph: rename commit_compare to oid_compare

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The helper function commit_compare() actually compares object_id
structs, not commits. A future change to commit-graph.c will need
to sort commit structs, so rename this function in advance.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 3afedcd7f5..e2f438f6a3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -761,7 +761,7 @@ static void write_graph_chunk_extra_edges(struct hashfile 
*f,
}
 }
 
-static int commit_compare(const void *_a, const void *_b)
+static int oid_compare(const void *_a, const void *_b)
 {
const struct object_id *a = (const struct object_id *)_a;
const struct object_id *b = (const struct object_id *)_b;
@@ -1030,7 +1030,7 @@ static uint32_t count_distinct_commits(struct 
write_commit_graph_context *ctx)
_("Counting distinct commits in commit graph"),
ctx->oids.nr);
display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */
-   QSORT(ctx->oids.list, ctx->oids.nr, commit_compare);
+   QSORT(ctx->oids.list, ctx->oids.nr, oid_compare);
 
for (i = 1; i < ctx->oids.nr; i++) {
display_progress(ctx->progress, i + 1);
-- 
gitgitgadget



[PATCH v2 08/11] commit-graph: add --split option to builtin

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Add a new "--split" option to the 'git commit-graph write' subcommand. This
option allows the optional behavior of writing a commit-graph chain.

The current behavior will add a tip commit-graph containing any commits that
are not in the existing commit-graph or commit-graph chain. Later changes
will allow merging the chain and expiring out-dated files.

Add a new test script (t5323-split-commit-graph.sh) that demonstrates this
behavior.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c|  10 ++-
 t/t5323-split-commit-graph.sh | 122 ++
 2 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100755 t/t5323-split-commit-graph.sh

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 828b1a713f..c2c07d3917 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append|--split] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append|--split] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -35,9 +35,9 @@ static struct opts_commit_graph {
int stdin_packs;
int stdin_commits;
int append;
+   int split;
 } opts;
 
-
 static int graph_verify(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -156,6 +156,8 @@ static int graph_write(int argc, const char **argv)
N_("start walk at commits listed by stdin")),
OPT_BOOL(0, "append", &opts.append,
N_("include all commits already in the commit-graph 
file")),
+   OPT_BOOL(0, "split", &opts.split,
+   N_("allow writing an incremental commit-graph file")),
OPT_END(),
};
 
@@ -169,6 +171,8 @@ static int graph_write(int argc, const char **argv)
opts.obj_dir = get_object_directory();
if (opts.append)
flags |= COMMIT_GRAPH_APPEND;
+   if (opts.split)
+   flags |= COMMIT_GRAPH_SPLIT;
 
read_replace_refs = 0;
 
diff --git a/t/t5323-split-commit-graph.sh b/t/t5323-split-commit-graph.sh
new file mode 100755
index 00..96704b9f5b
--- /dev/null
+++ b/t/t5323-split-commit-graph.sh
@@ -0,0 +1,122 @@
+#!/bin/sh
+
+test_description='split commit graph'
+. ./test-lib.sh
+
+GIT_TEST_COMMIT_GRAPH=0
+
+test_expect_success 'setup repo' '
+   git init &&
+   git config core.commitGraph true &&
+   infodir=".git/objects/info" &&
+   graphdir="$infodir/commit-graphs" &&
+   test_oid_init
+'
+
+graph_read_expect() {
+   NUM_BASE=0
+   if test ! -z $2
+   then
+   NUM_BASE=$2
+   fi
+   cat >expect <<- EOF
+   header: 43475048 1 1 3 $NUM_BASE
+   num_commits: $1
+   chunks: oid_fanout oid_lookup commit_metadata
+   EOF
+   git commit-graph read >output &&
+   test_cmp expect output
+}
+
+test_expect_success 'create commits and write commit-graph' '
+   for i in $(test_seq 3)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git commit-graph write --reachable &&
+   test_path_is_file $infodir/commit-graph &&
+   graph_read_expect 3
+'
+
+graph_git_two_modes() {
+   git -c core.commitGraph=true $1 >output
+   git -c core.commitGraph=false $1 >expect
+   test_cmp expect output
+}
+
+graph_git_behavior() {
+   MSG=$1
+   BRANCH=$2
+   COMPARE=$3
+   test_expect_success "check normal git operations: $MSG" '
+   graph_git_two_modes "log --oneline $BRANCH" &&
+   graph_git_two_modes "log --topo-order $BRANCH" &&
+   graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+   graph_git_two_modes "branch -vv" &&
+   graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+   '
+}
+
+graph_git_behavior 'graph exists' commits/3 commits/1
+
+verify_chain_files_exist() {
+   for hash in $(cat $1/commit-graph-chain)
+   do
+   test_path_is_file $1/graph-$hash.graph
+   done
+}
+
+test_expect_success 'add more commits, and write a new base graph' '
+   git reset --hard commits/1 &&
+   for i in $(test_seq 4 5)
+   do
+   test_commit $i &&
+   g

[PATCH v2 01/11] commit-graph: document commit-graph chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Add a basic description of commit-graph chains. More details about the
feature will be added as we add functionality. This introduction gives a
high-level overview to the goals of the feature and the basic layout of
commit-graph chains.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 59 
 1 file changed, 59 insertions(+)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index fb53341d5e..1dca3bd8fe 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,6 +127,65 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
+Commit Graphs Chains
+
+
+Typically, repos grow with near-constant velocity (commits per day). Over time,
+the number of commits added by a fetch operation is much smaller than the
+number of commits in the full history. By creating a "chain" of commit-graphs,
+we enable fast writes of new commit data without rewriting the entire commit
+history -- at least, most of the time.
+
+## File Layout
+
+A commit-graph chain uses multiple files, and we use a fixed naming convention
+to organize these files. Each commit-graph file has a name
+`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex-
+valued hash stored in the footer of that file (which is a hash of the file's
+contents before that hash). For a chain of commit-graph files, a plain-text
+file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the
+hashes for the files in order from "lowest" to "highest".
+
+For example, if the `commit-graph-chain` file contains the lines
+
+```
+   {hash0}
+   {hash1}
+   {hash2}
+```
+
+then the commit-graph chain looks like the following diagram:
+
+ +---+
+ |  graph-{hash2}.graph  |
+ +---+
+ |
+ +---+
+ |   |
+ |  graph-{hash1}.graph  |
+ |   |
+ +---+
+ |
+ +---+
+ |   |
+ |   |
+ |   |
+ |  graph-{hash0}.graph  |
+ |   |
+ |   |
+ |   |
+ +---+
+
+Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of
+commits in `graph-{hash1}.graph`, and X2 be the number of commits in
+`graph-{hash2}.graph`. If a commit appears in position i in 
`graph-{hash2}.graph`,
+then we interpret this as being the commit in position (X0 + X1 + i), and that
+will be used as its "graph position". The commits in `graph-{hash2}.graph` use 
these
+positions to refer to their parents, which may be in `graph-{hash1}.graph` or
+`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by 
checking
+its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
+X2).
+
 Related Links
 -
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget



[PATCH v2 02/11] commit-graph: prepare for commit-graph chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

To prepare for a chain of commit-graph files, augment the
commit_graph struct to point to a base commit_graph. As we load
commits from the graph, we may actually want to read from a base
file according to the graph position.

The "graph position" of a commit is given by concatenating the
lexicographic commit orders from each of the commit-graph files in
the chain. This means that we must distinguish two values:

 * lexicographic index : the position within the lexicographic
   order in a single commit-graph file.

 * graph position: the posiiton within the concatenated order
   of multiple commit-graph files

Given the lexicographic index of a commit in a graph, we can
compute the graph position by adding the number of commits in
the lower-level graphs. To find the lexicographic index of
a commit, we subtract the number of commits in lower-level graphs.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 74 --
 commit-graph.h |  3 ++
 2 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7723156964..3afedcd7f5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -371,6 +371,25 @@ static int bsearch_graph(struct commit_graph *g, struct 
object_id *oid, uint32_t
g->chunk_oid_lookup, g->hash_len, pos);
 }
 
+static void load_oid_from_graph(struct commit_graph *g, int pos, struct 
object_id *oid)
+{
+   uint32_t lex_index;
+
+   if (!g)
+   BUG("NULL commit-graph");
+
+   while (pos < g->num_commits_in_base)
+   g = g->base_graph;
+
+   if (pos >= g->num_commits + g->num_commits_in_base)
+   BUG("position %d is beyond the scope of this commit-graph (%d 
local + %d base commits)",
+   pos, g->num_commits, g->num_commits_in_base);
+
+   lex_index = pos - g->num_commits_in_base;
+
+   hashcpy(oid->hash, g->chunk_oid_lookup + g->hash_len * lex_index);
+}
+
 static struct commit_list **insert_parent_or_die(struct repository *r,
 struct commit_graph *g,
 uint64_t pos,
@@ -379,10 +398,10 @@ static struct commit_list **insert_parent_or_die(struct 
repository *r,
struct commit *c;
struct object_id oid;
 
-   if (pos >= g->num_commits)
+   if (pos >= g->num_commits + g->num_commits_in_base)
die("invalid parent position %"PRIu64, pos);
 
-   hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+   load_oid_from_graph(g, pos, &oid);
c = lookup_commit(r, &oid);
if (!c)
die(_("could not find commit %s"), oid_to_hex(&oid));
@@ -392,7 +411,14 @@ static struct commit_list **insert_parent_or_die(struct 
repository *r,
 
 static void fill_commit_graph_info(struct commit *item, struct commit_graph 
*g, uint32_t pos)
 {
-   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
+   const unsigned char *commit_data;
+   uint32_t lex_index;
+
+   while (pos < g->num_commits_in_base)
+   g = g->base_graph;
+
+   lex_index = pos - g->num_commits_in_base;
+   commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
item->graph_pos = pos;
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
@@ -405,10 +431,26 @@ static int fill_commit_in_graph(struct repository *r,
uint32_t *parent_data_ptr;
uint64_t date_low, date_high;
struct commit_list **pptr;
-   const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+   const unsigned char *commit_data;
+   uint32_t lex_index;
 
-   item->object.parsed = 1;
+   while (pos < g->num_commits_in_base)
+   g = g->base_graph;
+
+   if (pos >= g->num_commits + g->num_commits_in_base)
+   BUG("position %d is beyond the scope of this commit-graph (%d 
local + %d base commits)",
+   pos, g->num_commits, g->num_commits_in_base);
+
+   /*
+* Store the "full" position, but then use the
+* "local" position for the rest of the calculation.
+*/
item->graph_pos = pos;
+   lex_index = pos - g->num_commits_in_base;
+
+   commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
+
+   item->object.parsed = 1;
 
item->maybe_tree = NULL;
 
@@ -452,7 +494,18 @@ static int find_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
*pos = item->graph_pos;
return 1;
} else {
-   return bsearch_graph(g, &(item->object.oid), pos);
+   struct commit_graph *cur_g = g;
+   uint32_t lex_index;
+
+   while (cur_g && !bsearch_graph(cur_g, &(item->object.oid), 
&lex_index))
+   cur_g = cur_g->base_graph;
+
+   if (cur_

[PATCH v2 10/11] commit-graph: allow cross-alternate chains

2019-05-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

In an environment like a fork network, it is helpful to have a
commit-graph chain that spans both the base repo and the fork repo. The
fork is usually a small set of data on top of the large repo, but
sometimes the fork is much larger. For example, git-for-windows/git has
almost double the number of commits as git/git because it rebases its
commits on every major version update.

To allow cross-alternate commit-graph chains, we need a few pieces:

1. When looking for a graph-{hash}.graph file, check all alternates.

2. When merging commit-graph chains, do not merge across alternates.

3. When writing a new commit-graph chain based on a commit-graph file
   in another object directory, do not allow success if the base file
   has of the name "commit-graph" instead of
   "commit-graphs/graoh-{hash}.graph".

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 40 +
 commit-graph.c   | 46 ++--
 commit-graph.h   |  1 +
 t/t5323-split-commit-graph.sh| 37 +++
 4 files changed, 114 insertions(+), 10 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index 083fff9927..b6fe8b2321 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -267,6 +267,42 @@ The merge strategy values (2 for the size multiple, 64,000 
for the maximum
 number of commits) could be extracted into config settings for full
 flexibility.
 
+## Chains across multiple object directories
+
+In a repo with alternates, we look for the `commit-graph-chain` file starting
+in the local object directory and then in each alternate. The first file that
+exists defines our chain. As we look for the `graph-{hash}` files for
+each `{hash}` in the chain file, we follow the same pattern for the host
+directories.
+
+This allows commit-graphs to be split across multiple forks in a fork network.
+The typical case is a large "base" repo with many smaller forks.
+
+As the base repo advances, it will likely update and merge its commit-graph
+chain more frequently than the forks. If a fork updates their commit-graph 
after
+the base repo, then it should "reparent" the commit-graph chain onto the new
+chain in the base repo. When reading each `graph-{hash}` file, we track
+the object directory containing it. During a write of a new commit-graph file,
+we check for any changes in the source object directory and read the
+`commit-graph-chain` file for that source and create a new file based on those
+files. During this "reparent" operation, we necessarily need to collapse all
+levels in the fork, as all of the files are invalid against the new base file.
+
+It is crucial to be careful when cleaning up "unreferenced" 
`graph-{hash}.graph`
+files in this scenario. It falls to the user to define the proper settings for
+their custom environment:
+
+ 1. When merging levels in the base repo, the unreferenced files may still be
+referenced by chains from fork repos.
+
+ 2. The expiry time should be set to a length of time such that every fork has
+time to recompute their commit-graph chain to "reparent" onto the new base
+file(s).
+
+ 3. If the commit-graph chain is updated in the base, the fork will not have
+access to the new chain until its chain is updated to reference those 
files.
+(This may change in the future [5].)
+
 Related Links
 -
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
@@ -293,3 +329,7 @@ Related Links
 
 [4] 
https://public-inbox.org/git/20180108154822.54829-1-...@jeffhostetler.com/T/#u
 A patch to remove the ahead-behind calculation from 'status'.
+
+[5] 
https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c...@gmail.com/
+A discussion of a "two-dimensional graph position" that can allow reading
+multiple commit-graph chains at the same time.
diff --git a/commit-graph.c b/commit-graph.c
index e9784cd559..0d8b942e2b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -320,6 +320,9 @@ static int prepare_commit_graph_v1(struct repository *r, 
const char *obj_dir)
r->objects->commit_graph = load_commit_graph_one(graph_name);
free(graph_name);
 
+   if (r->objects->commit_graph)
+   r->objects->commit_graph->obj_dir = obj_dir;
+
return r->objects->commit_graph ? 0 : -1;
 }
 
@@ -380,8 +383,7 @@ static void prepare_commit_graph_chain(struct repository 
*r, const char *obj_dir
oids = xcalloc(st.st_size / (the_hash_algo->hexsz + 1), sizeof(struct 
object_id));
 
while (strbuf_getline_lf(&line, fp) != EOF && valid) {
-   char *graph_name;
-   struct commit_graph *g;
+   struct object_directory *odb;
 
if (get_oid_hex(line.buf, &oids[i])) {
warning(_("invalid commit-graph chain: line '%s' not a 
hash"),

[PATCH v2 00/11] [RFC] Commit-graph: Write incremental files

2019-05-22 Thread Derrick Stolee via GitGitGadget
This patch series is marked as RFC quality because it is missing some key
features and tests, but hopefully starts a concrete discussion of how the
incremental commit-graph writes can work. This version takes the design
suggestions from the earlier discussion and tries to work out most of the
concerns.

The commit-graph is a valuable performance feature for repos with large
commit histories, but suffers from the same problem as git repack: it
rewrites the entire file every time. This can be slow when there are
millions of commits, especially after we stopped reading from the
commit-graph file during a write in 43d3561 (commit-graph write: don't die
if the existing graph is corrupt).

Instead, create a "chain" of commit-graphs in the
.git/objects/info/commit-graphs folder with name graph-{hash}.graph. The
list of hashes is given by the commit-graph-chain file, and also in a "base
graph chunk" in the commit-graph format. As we read a chain, we can verify
that the hashes match the trailing hash of each commit-graph we read along
the way and each hash below a level is expected by that graph file.

When writing, we don't always want to add a new level to the stack. This
would eventually result in performance degradation, especially when
searching for a commit (before we know its graph position). We decide to
merge levels of the stack when the new commits we will write satisfy two
conditions:

 1. The expected size of the new file is more than half the size of the tip
of the stack.
 2. The new file contains more than 64,000 commits.

The first condition alone would prevent more than a logarithmic number of
levels. The second condition is a stop-gap to prevent performance issues
when another process starts reading the commit-graph stack as we are merging
a large stack of commit-graph files. The reading process could be in a state
where the new file is not ready, but the levels above the new file were
already deleted. Thus, the commits that were merged down must be parsed from
pack-files.

The performance is necessarily amortized across multiple writes, so I tested
by writing commit-graphs from the (non-rc) tags in the Linux repo. My test
included 72 tags, and wrote everything reachable from the tag using 
--stdin-commits. Here are the overall perf numbers:

git commit-graph write --stdin-commits: 8m 12s
git commit-graph write --stdin-commits --split:48s

The test using --split included at least six full collapses to the full
commit-graph. I believe the commit-graph stack had at most three levels
during this test.

Here are a few points that still need to be addressed before this is ready
for full review:

 * The merge strategy values should be extracted into config options.
   
   
 * If we have a commit-graph chain and someone writes without "--split" it
   will make a new commit-graph file and not clean up the old files.
   
   
 * We need to update 'git commit-graph verify' to understand the chains, and
   test that it catches the new problems. It would be good to have a
   '--shallow' option to only verify the tip file, as if we run that after
   every write we can have some confidence that the files at rest are still
   valid and we only need to check the smaller file. (This is the main
   reason this is a priority to the VFS for Git team.)
   
   

This is based on ds/commit-graph-write-refactor.

Thanks, -Stolee

[1] 
https://github.com/git/git/commit/43d356180556180b4ef6ac232a14498a5bb2b446
commit-graph write: don't die if the existing graph is corrupt

Derrick Stolee (11):
  commit-graph: document commit-graph chains
  commit-graph: prepare for commit-graph chains
  commit-graph: rename commit_compare to oid_compare
  commit-graph: load commit-graph chains
  commit-graph: add base graphs chunk
  commit-graph: rearrange chunk count logic
  commit-graph: write commit-graph chains
  commit-graph: add --split option to builtin
  commit-graph: merge commit-graph chains
  commit-graph: allow cross-alternate chains
  commit-graph: expire commit-graph files

 .../technical/commit-graph-format.txt |  11 +-
 Documentation/technical/commit-graph.txt  | 195 +
 builtin/commit-graph.c|  10 +-
 commit-graph.c| 734 +-
 commit-graph.h|   7 +
 t/t5318-commit-graph.sh   |   2 +-
 t/t5323-split-commit-graph.sh | 172 
 7 files changed, 1088 insertions(+), 43 deletions(-)
 create mode 100755 t/t5323-split-commit-graph.sh


base-commit: 8520d7fc7c6edd4d71582c69a873436029b6cb1b
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-184%2Fderrickstolee%2Fgraph%2Fincremental-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-184/derrickstolee/graph/incremental-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/184

Range-diff vs v1:

  1:  0be7713a25 <  -:  -- commit-graph: fix the_repository reference
  2:  86b18f

[PATCH] fetch-pack: send server options after command

2019-05-22 Thread Jonathan Tan
Currently, if any server options are specified during a protocol v2
fetch, server options will be sent before "command=fetch". Write server
options to the request buffer in send_fetch_request() so that the
components of the request are sent in the correct order.

The protocol documentation states that the command must come first. The
Git server implementation in serve.c (see process_request() in that
file) tolerates any order of command and capability, which is perhaps
why we haven't noticed this. This was noticed when testing against a
JGit server implementation, which follows the documentation in this
regard.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 3f24d0c8a6..1c10f54e78 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1115,7 +1115,7 @@ static int send_fetch_request(struct fetch_negotiator 
*negotiator, int fd_out,
server_supports_v2("server-option", 1)) {
int i;
for (i = 0; i < args->server_options->nr; i++)
-   packet_write_fmt(fd_out, "server-option=%s",
+   packet_buf_write(&req_buf, "server-option=%s",
 args->server_options->items[i].string);
}
 
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH 4/5] send-email: fix regression in sendemail.identity parsing

2019-05-22 Thread Johannes Schindelin
Hi Ævar,

On Fri, 17 May 2019, Ævar Arnfjörð Bjarmason wrote:

> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 61d484d1a6..890e2874c3 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -1228,6 +1228,61 @@ test_expect_success $PREREQ 'sendemail.to works' '
>   grep "To: Somebody " stdout
>  '
>
> +test_expect_success $PREREQ 'setup sendemail.identity' '
> + git config --replace-all sendemail.to "defa...@example.com" &&
> + git config --replace-all sendemail.isp.to "i...@example.com" &&
> + git config --replace-all sendemail.cloud.to "cl...@example.com"
> +'
> +
> +test_expect_success $PREREQ 'sendemail.identity: reads the correct identity 
> config' '
> + git -c sendemail.identity=cloud send-email \
> + --dry-run \
> + --from="nob...@example.com" \
> + $patches >stdout &&
> + grep "To: cl...@example.com" stdout
> +'
> +
> +test_expect_success $PREREQ 'sendemail.identity: identity overrides 
> sendemail.identity' '
> + git -c sendemail.identity=cloud send-email \
> + --identity=isp \
> + --dry-run \
> + --from="nob...@example.com" \
> + $patches >stdout &&
> + grep "To: i...@example.com" stdout
> +'
> +
> +test_expect_success $PREREQ 'sendemail.identity: --no-identity clears 
> previous identity' '
> + git -c sendemail.identity=cloud send-email \
> + --no-identity \
> + --dry-run \
> + --from="nob...@example.com" \
> + $patches >stdout &&
> + grep "To: defa...@example.com" stdout
> +'
> +
> +test_expect_success $PREREQ 'sendemail.identity: bool identity variable 
> existance overrides' '
> + git -c sendemail.identity=cloud \
> + -c sendemail.xmailer=true \
> + -c sendemail.cloud.xmailer=false \
> + send-email \
> + --dry-run \
> + --from="nob...@example.com" \
> + $patches >stdout &&
> + grep "To: cl...@example.com" stdout &&
> + ! grep "X-Mailer" stdout
> +'
> +
> +test_expect_success $PREREQ 'sendemail.identity: bool variable fallback' '
> + git -c sendemail.identity=cloud \
> + -c sendemail.xmailer=false \
> + send-email \
> + --dry-run \
> + --from="nob...@example.com" \
> + $patches >stdout &&
> + grep "To: cl...@example.com" stdout &&
> + ! grep "X-Mailer" stdout
> +'
> +

These test cases all diligently use the `$PREREQ` prerequisite, but...

>  test_expect_success $PREREQ '--no-to overrides sendemail.to' '
>   git send-email \
>   --dry-run \
> @@ -1785,6 +1840,15 @@ test_expect_success '--dump-aliases must be used 
> alone' '
>   test_must_fail git send-email --dump-aliases --to=jan...@example.com -1 
> refs/heads/accounting
>  '
>
> +test_expect_success 'aliases and sendemail.identity' '
> + test_must_fail git \
> + -c sendemail.identity=cloud \
> + -c sendemail.aliasesfile=default-aliases \
> + -c sendemail.cloud.aliasesfile=cloud-aliases \
> + send-email -1 2>stderr &&
> + test_i18ngrep "cloud-aliases" stderr
> +'
> +

This one is missing it. That breaks the Windows job in our Azure Pipeline
where we leave out all of the Perl bits (to accelerate the tests somewhat).

Ciao,
Dscho

>  test_sendmail_aliases () {
>   msg="$1" && shift &&
>   expect="$@" &&
> --
> 2.21.0.1020.gf2820cf01a
>
>
>


[GSoC] Blogging with Rohit

2019-05-22 Thread Rohit Ashiwal
Hey all

Here I am with one more blog[1], where I talked about how we are
implementing `--skip` flag for git-cherry-pick. How you'll enjoy
reading this.

Thanks
Rohit

[1]: https://rashiwal.me/2019/to-pick-or-not-to-pick/
PS: comment section is now working, please leave your constructive remarks.



Re: [GSoC] Blogging with Rohit

2019-05-22 Thread Rohit Ashiwal
s/How/Hope/ 😅

> Thanks
> Rohit
> 
> [1]: https://rashiwal.me/2019/to-pick-or-not-to-pick/
> PS: comment section is now working, please leave your constructive remarks.
> 


Re: [PATCH v2] status: add an empty line when there is no hint

2019-05-22 Thread 林自均
Hi Junio,

林自均  於 2019年5月15日 週三 上午11:02寫道:
>
> Hi Junio,
>
> Junio C Hamano  於 2019年5月15日 週三 上午8:48寫道:
> >
> > 林自均  writes:
> >
> > > I was not talking about the messages in the editor session. I was
> > > talking about "git commit" without "git add" anything.
> > >
> > > For example:
> > >
> > > ```
> > > $ touch newfile.txt
> > > $ git commit
> > > On branch master
> > > Untracked files:
> > > newfile.txt
> > >
> > > nothing added to commit but untracked files present
> > > ```
> > >
> > > My current patch is trying to add an empty line between
> > > "Untracked files:" and "newfile.txt".
> >
> > I do not think that one is paged, so if you ask me, I'd say we
> > shouldn't add an extra blank there.  Is that message also reused in
> > the editor session, or do two different codepaths produce a similar
> > looking message, one for the above case direct to the terminal and
> > the other for the editor session?
>
> The messages produced in wt-status.c seem to be reused in
> both terminal and editor session. When I tried to modify the
> messages in terminal, the ones in editor session will also
> be modified accordingly.
>
> By the way, my new patch to remove extra blank line is here:
> https://github.com/gitgitgadget/git/pull/196
> I am still waiting for someone to comment "/allow johnlinp".

I've submitted the patch with the title
[PATCH 1/1] status: remove the empty line after hints
Please review if you are available. Thank you.

Best,
John Lin

>
> >
> > But again...
> >
> > >> At the same time, I think I've been happy enough with the current
> > >> output from both commands, so if you let me bikeshed freely, I'd
> > >> probably pick "let's not change anything then" ;-)
> >


git describe/contains for submodule commits

2019-05-22 Thread Jacob Keller
Hi,

I've had a few times where I was curious of when a submodule got set
to a specific commit.

I noticed that git describe has "blob" support, which outputs something like

:/path/to/file

using the revision walking machinery.

I'm curious if anyone knows if that sort of revision walk could be
expected to find the first treeish that had a submodule commit instead
of a blob.

I'm not that familiar with the revision walking, so I was hoping to
get some pointers of whre to look before I began implementing.

Ultimately, I'd like to have some sort of command like:

  git submodule contains  

and have it try to figure out the most recent commit htat has a
submodule change for which the submodule is a child of the specified
submodule commit.

I can sort of reverse engineer this through git log, but it's slow and
tedious, so I was hoping to be able to implement it into a revision
walk that did this.

Once I know the commit that introduces the submodule change, I could
feed that to git describe --contains to find the tag/version which
included the change easily enough.

Thanks,
Jake


Re: [PATCH] Make stashing nothing exit 1

2019-05-22 Thread Maksim Odnoletkov
Junio C Hamano  writes:

> Keith Smiley  writes:
> 
> > In the case there are no files to stash, but the user asked to stash, we
> > should exit 1 since the stashing failed.
> > ---
> 
> Sorry, but I fail to see why this is a good change.  Did you have
> some script that wanted the exit code from "git stash" to indicate
> if it had anything to stash and change the behaviour based on it?
> 
> Is it a big enough hassle to figure out if the "stash" command did
> something yourself that justifies forcing existing scripts that rely
> on "no-op is merely a normal exit" behaviour other people have
> written in the past several years?

The problem with current behaviour is it makes it hard to use stash in
scripts. A natural stash use case is: wrap some operation requiring a
clean working tree with a stash push-pop pair. But that doesn't work
properly when working tree is already clean - push silently does nothing
and following pop becomes unbalanced. You have to keep that in mind and
work around with something like:

if ! git diff-index --exit-code --quiet HEAD
then
git stash push
trap 'git stash pop' EXIT
fi

With this change this can be simplified to:

git stash push && trap 'git stash pop' EXIT

I don't mind keeping this new behaviour behind an option for
compatibility. Or alternatively resolve this use case by supporting
--allow-empty in stash-push. But my feeling is it is natural for 'git
stash push' to report error for no-op case because the command is
explicitly about creating new stash entries. A close analogy is 'git
commit' which errors on no-op. Contrary all commands treating no-op as a
success I'm aware of are not about creating new objects but about
querying or syncing.


Re: git describe/contains for submodule commits

2019-05-22 Thread Ævar Arnfjörð Bjarmason


On Thu, May 23 2019, Jacob Keller wrote:

> Hi,
>
> I've had a few times where I was curious of when a submodule got set
> to a specific commit.
>
> I noticed that git describe has "blob" support, which outputs something like
>
> :/path/to/file
>
> using the revision walking machinery.
>
> I'm curious if anyone knows if that sort of revision walk could be
> expected to find the first treeish that had a submodule commit instead
> of a blob.
>
> I'm not that familiar with the revision walking, so I was hoping to
> get some pointers of whre to look before I began implementing.
>
> Ultimately, I'd like to have some sort of command like:
>
>   git submodule contains  
>
> and have it try to figure out the most recent commit htat has a
> submodule change for which the submodule is a child of the specified
> submodule commit.
>
> I can sort of reverse engineer this through git log, but it's slow and
> tedious, so I was hoping to be able to implement it into a revision
> walk that did this.
>
> Once I know the commit that introduces the submodule change, I could
> feed that to git describe --contains to find the tag/version which
> included the change easily enough.
>
> Thanks,
> Jake

You can do this with --find-object, e.g. on git.git:

git log --find-object=855827c583bc30645ba427885caa40c5b81764d2

Plugging that into describe.c should be fairly straightforward.


Re: [PATCH] Make stashing nothing exit 1

2019-05-22 Thread Ævar Arnfjörð Bjarmason


On Thu, May 23 2019, Maksim Odnoletkov wrote:

> Junio C Hamano  writes:
>
>> Keith Smiley  writes:
>>
>> > In the case there are no files to stash, but the user asked to stash, we
>> > should exit 1 since the stashing failed.
>> > ---
>>
>> Sorry, but I fail to see why this is a good change.  Did you have
>> some script that wanted the exit code from "git stash" to indicate
>> if it had anything to stash and change the behaviour based on it?
>>
>> Is it a big enough hassle to figure out if the "stash" command did
>> something yourself that justifies forcing existing scripts that rely
>> on "no-op is merely a normal exit" behaviour other people have
>> written in the past several years?
>
> The problem with current behaviour is it makes it hard to use stash in
> scripts. A natural stash use case is: wrap some operation requiring a
> clean working tree with a stash push-pop pair. But that doesn't work
> properly when working tree is already clean - push silently does nothing
> and following pop becomes unbalanced. You have to keep that in mind and
> work around with something like:
>
> if ! git diff-index --exit-code --quiet HEAD
> then
>   git stash push
>   trap 'git stash pop' EXIT
> fi
>
> With this change this can be simplified to:
>
> git stash push && trap 'git stash pop' EXIT
>
> I don't mind keeping this new behaviour behind an option for
> compatibility. Or alternatively resolve this use case by supporting
> --allow-empty in stash-push. But my feeling is it is natural for 'git
> stash push' to report error for no-op case because the command is
> explicitly about creating new stash entries. A close analogy is 'git
> commit' which errors on no-op. Contrary all commands treating no-op as a
> success I'm aware of are not about creating new objects but about
> querying or syncing.

I view "stash push" more like just "push", or even a special case for
"reset --hard", in both of those cases we don't exit non-zero if there's
nothing to do, i.e. if there's nothing to push, or if "reset --hard"
ends up needing to do nothing.

On the other hand as you point out it can also be viewed as "create new
stash entry", just like "create new commit", and we error out on that.

In practice I bet there's very few scripters of "git commit" that don't
want to actually create a commit, whereas "stash push" is more likely to
be used like "reset --hard", i.e. just "wipe/save-wipe changes if
needed".

I don't mind an --exit-code for it, or even a change in the default
behavior, just pointing out that it's a bit more nuanced than just a
missing exit code, given "push", "reset" etc. prior art.


[PATCH] send-email: Add an option to suppress adding a specific email address

2019-05-22 Thread Eric W. Biederman


Make it easy to suppress sta...@vger.kernel.org.  Long story short it
is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes
sent to the linux kernel.  It is not always desirable to actually the
stable maintainer immediately as the patches are still being reviewed
etc.  Actually cc'd the stable maintainers in the linux kernel is not
even really necessary as they will always find the tag after the patch
has been merged in the commit body.

So I am adding yet another suppress command "suppress-addr" that will
take an email address keep that email address from being automatically
added to a destination the email will be sent to.

Signed-off-by: "Eric W. Biederman" 
---
 Documentation/git-send-email.txt |  5 +
 git-send-email.perl  | 20 +++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 1afe9fc858ea..9833d4dbd9f4 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -346,6 +346,11 @@ Default is the value of `sendemail.suppresscc` 
configuration value; if
 that is unspecified, default to 'self' if --suppress-from is
 specified, as well as 'body' if --no-signed-off-cc is specified.
 
+--suppress-addr=::
+   Specify an address that should not be automatically copied
+   on any email.
+   Default is the value of `sendemail.suppressaddr`.
+
 --[no-]suppress-from::
If this is set, do not add the From: address to the cc: list.
Default is the value of `sendemail.suppressFrom` configuration
diff --git a/git-send-email.perl b/git-send-email.perl
index 8eb63b5a2f8d..2ac0985f3f00 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -98,6 +98,7 @@ sub usage {
 --to-cmd  * Email To: via ` \$patch_path`
 --cc-cmd  * Email Cc: via ` \$patch_path`
 --suppress-cc * author, self, sob, cc, cccmd, body, 
bodycc, misc-by, all.
+--suppress-addr   * Don't automatically add the specified 
address
 --[no-]cc-cover* Email Cc: addresses in the cover letter.
 --[no-]to-cover* Email To: addresses in the cover letter.
 --[no-]signed-off-by-cc* Send to Signed-off-by: addresses. Default 
on.
@@ -237,6 +238,7 @@ sub do_edit {
 my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth);
 my ($validate, $confirm);
 my (@suppress_cc);
+my (@suppress_addr);
 my ($auto_8bit_encoding);
 my ($compose_encoding);
 my $target_xfer_encoding = 'auto';
@@ -274,6 +276,7 @@ sub do_edit {
 "aliasfiletype" => \$aliasfiletype,
 "bcc" => \@bcclist,
 "suppresscc" => \@suppress_cc,
+"suppressaddr" => \@suppress_addr,
 "envelopesender" => \$envelope_sender,
 "confirm"   => \$confirm,
 "from" => \$sender,
@@ -360,6 +363,7 @@ sub signal_handler {
"suppress-from!" => \$suppress_from,
"no-suppress-from" => sub {$suppress_from = 0},
"suppress-cc=s" => \@suppress_cc,
+   "suppress-addr=s" => \@suppress_addr,
"signed-off-cc|signed-off-by-cc!" => \$signed_off_by_cc,
"no-signed-off-cc|no-signed-off-by-cc" => sub 
{$signed_off_by_cc = 0},
"cc-cover|cc-cover!" => \$cover_cc,
@@ -489,6 +493,16 @@ sub read_config {
delete $suppress_cc{'body'};
 }
 
+# Set specific suppress addresses
+my (%suppress_addr);
+if (@suppress_addr) {
+   foreach my $addr (@suppress_addr) {
+   my $qaddr = unquote_rfc2047($addr);
+   my $saddr = sanitize_address($qaddr);
+   $suppress_addr{$saddr} = 1;
+   }
+}
+
 # Set confirm's default value
 my $confirm_unconfigured = !defined $confirm;
 if ($confirm_unconfigured) {
@@ -1623,6 +1637,7 @@ sub process_file {
$sauthor = sanitize_address($author);
next if $suppress_cc{'author'};
next if $suppress_cc{'self'} and $sauthor eq 
$sender;
+   next if ($suppress_addr{$sauthor});
printf(__("(mbox) Adding cc: %s from line 
'%s'\n"),
$1, $_) unless $quiet;
push @cc, $1;
@@ -1642,6 +1657,7 @@ sub process_file {
next if ($suppress_cc{'self'});
} else {
next if ($suppress_cc{'cc'});
+   next if 
($suppress_addr{$saddr});
}
printf(__("(mbox) Adding cc: %s from 
line '%s'\n"),
$addr, $_) unless $quiet;
@@ -1681,7 +1697,7 @@ sub process_file {
# line 2 = subject

[PATCH] send-email: Add an option to suppress adding a specific email address

2019-05-22 Thread Eric W. Biederman


Make it easy to suppress sta...@vger.kernel.org.  Long story short it
is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes
sent to the linux kernel.  It is not always desirable to actually the
stable maintainer immediately as the patches are still being reviewed
etc.  Actually cc'd the stable maintainers in the linux kernel is not
even really necessary as they will always find the tag after the patch
has been merged in the commit body.

So I am adding yet another suppress command "suppress-addr" that will
take an email address keep that email address from being automatically
added to a destination the email will be sent to.

Signed-off-by: "Eric W. Biederman" 
---
 Documentation/git-send-email.txt |  5 +
 git-send-email.perl  | 20 +++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 1afe9fc858ea..9833d4dbd9f4 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -346,6 +346,11 @@ Default is the value of `sendemail.suppresscc` 
configuration value; if
 that is unspecified, default to 'self' if --suppress-from is
 specified, as well as 'body' if --no-signed-off-cc is specified.
 
+--suppress-addr=::
+   Specify an address that should not be automatically copied
+   on any email.
+   Default is the value of `sendemail.suppressaddr`.
+
 --[no-]suppress-from::
If this is set, do not add the From: address to the cc: list.
Default is the value of `sendemail.suppressFrom` configuration
diff --git a/git-send-email.perl b/git-send-email.perl
index 8eb63b5a2f8d..2ac0985f3f00 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -98,6 +98,7 @@ sub usage {
 --to-cmd  * Email To: via ` \$patch_path`
 --cc-cmd  * Email Cc: via ` \$patch_path`
 --suppress-cc * author, self, sob, cc, cccmd, body, 
bodycc, misc-by, all.
+--suppress-addr   * Don't automatically add the specified 
address
 --[no-]cc-cover* Email Cc: addresses in the cover letter.
 --[no-]to-cover* Email To: addresses in the cover letter.
 --[no-]signed-off-by-cc* Send to Signed-off-by: addresses. Default 
on.
@@ -237,6 +238,7 @@ sub do_edit {
 my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth);
 my ($validate, $confirm);
 my (@suppress_cc);
+my (@suppress_addr);
 my ($auto_8bit_encoding);
 my ($compose_encoding);
 my $target_xfer_encoding = 'auto';
@@ -274,6 +276,7 @@ sub do_edit {
 "aliasfiletype" => \$aliasfiletype,
 "bcc" => \@bcclist,
 "suppresscc" => \@suppress_cc,
+"suppressaddr" => \@suppress_addr,
 "envelopesender" => \$envelope_sender,
 "confirm"   => \$confirm,
 "from" => \$sender,
@@ -360,6 +363,7 @@ sub signal_handler {
"suppress-from!" => \$suppress_from,
"no-suppress-from" => sub {$suppress_from = 0},
"suppress-cc=s" => \@suppress_cc,
+   "suppress-addr=s" => \@suppress_addr,
"signed-off-cc|signed-off-by-cc!" => \$signed_off_by_cc,
"no-signed-off-cc|no-signed-off-by-cc" => sub 
{$signed_off_by_cc = 0},
"cc-cover|cc-cover!" => \$cover_cc,
@@ -489,6 +493,16 @@ sub read_config {
delete $suppress_cc{'body'};
 }
 
+# Set specific suppress addresses
+my (%suppress_addr);
+if (@suppress_addr) {
+   foreach my $addr (@suppress_addr) {
+   my $qaddr = unquote_rfc2047($addr);
+   my $saddr = sanitize_address($qaddr);
+   $suppress_addr{$saddr} = 1;
+   }
+}
+
 # Set confirm's default value
 my $confirm_unconfigured = !defined $confirm;
 if ($confirm_unconfigured) {
@@ -1623,6 +1637,7 @@ sub process_file {
$sauthor = sanitize_address($author);
next if $suppress_cc{'author'};
next if $suppress_cc{'self'} and $sauthor eq 
$sender;
+   next if ($suppress_addr{$sauthor});
printf(__("(mbox) Adding cc: %s from line 
'%s'\n"),
$1, $_) unless $quiet;
push @cc, $1;
@@ -1642,6 +1657,7 @@ sub process_file {
next if ($suppress_cc{'self'});
} else {
next if ($suppress_cc{'cc'});
+   next if 
($suppress_addr{$saddr});
}
printf(__("(mbox) Adding cc: %s from 
line '%s'\n"),
$addr, $_) unless $quiet;
@@ -1681,7 +1697,7 @@ sub process_file {
# line 2 = subject

Re: [PATCH v2 09/11] commit-graph: merge commit-graph chains

2019-05-22 Thread Ævar Arnfjörð Bjarmason


On Wed, May 22 2019, Derrick Stolee via GitGitGadget wrote:

> To keep lookups fast, but also keep most incremental writes fast, create
> a strategy for merging levels of the commit-graph chain. The strategy is
> detailed in the commit-graph design document, but is summarized by these
> two conditions:
>
>   1. If the number of commits we are adding is more than half the number
>  of commits in the graph below, then merge with that graph.
>
>   2. If we are writing more than 64,000 commits into a single graph,
>  then merge with all lower graphs.
>
> The numeric values in the conditions above are currently constant, but
> can become config options in a future update.
> [...]
> +## Merge Strategy
> +
> +When writing a set of commits that do not exist in the commit-graph stack of
> +height N, we default to creating a new file at level N + 1. We then decide to
> +merge with the Nth level if one of two conditions hold:
> +
> +  1. The expected file size for level N + 1 is at least half the file size 
> for
> + level N.
> +
> +  2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64,
> + commits).
> +
> +This decision cascades down the levels: when we merge a level we create a new
> +set of commits that then compares to the next level.
> +
> +The first condition bounds the number of levels to be logarithmic in the 
> total
> +number of commits.  The second condition bounds the total number of commits 
> in
> +a `graph-{hashN}` file and not in the `commit-graph` file, preventing
> +significant performance issues when the stack merges and another process only
> +partially reads the previous stack.
> +
> +The merge strategy values (2 for the size multiple, 64,000 for the maximum
> +number of commits) could be extracted into config settings for full
> +flexibility.

As noted this can become configurable, so it's no big deal. But is there
any reason for ths 64K limit anymore?

While with the default expiry of 0sec we can still get that race, it
seems unlikely in practice, as the "commit-graph write" process would
write a new manifest at the end, then go and unlink() the old files.

So maybe at this point we could make this even dumber with something
that behaves like gc.autoPackLimit? I.e. keep writing new graphs, and
then coalesce them all (or maybe not the "base" graph, like
gc.bigPackThreshold)?

Also: These docs refer to MAX_SPLIT_COMMITS, but in v2 it's now a
"split_strategy_max_commits" variable instead.


Re: [PATCH] send-email: Add an option to suppress adding a specific email address

2019-05-22 Thread Ævar Arnfjörð Bjarmason


On Thu, May 23 2019, Eric W. Biederman wrote:

> Make it easy to suppress sta...@vger.kernel.org.  Long story short it
> is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes
> sent to the linux kernel.  It is not always desirable to actually the
> stable maintainer immediately as the patches are still being reviewed
> etc.  Actually cc'd the stable maintainers in the linux kernel is not
> even really necessary as they will always find the tag after the patch
> has been merged in the commit body.
>
> So I am adding yet another suppress command "suppress-addr" that will
> take an email address keep that email address from being automatically
> added to a destination the email will be sent to.

I have a rewrite of much of the adjacent code queued in "next", can you
check if applying it to that branch makes this work as you expect?

Whether it does or not, this change should have a corresponding test
update to t/t9001-send-email.sh, so we can just run that to see if it's
doing the right thing.


[PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines

2019-05-22 Thread Emily Shaffer
We check for a handy environment variable GIT_DEBUGGER when running via
bin-wrappers/, but this feature is undocumented. Add a hint to how to
use it into the CodingGuidelines (which is where other useful
environment settings like DEVELOPER are documented).

You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your
own debugger if you like to use something else (or if you want custom
flags for gdb). This commit documents that intent within
CodingGuidelines.

Signed-off-by: Emily Shaffer 
---

Since v2, only the commit message has changed. Removed some weak
language to make the commit message read more easily.

 Documentation/CodingGuidelines | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 32210a4386..e99af36df9 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -412,6 +412,12 @@ For C programs:
must be declared with "extern" in header files. However, function
declarations should not use "extern", as that is already the default.
 
+ - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
+   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
+   run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to
+   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
+   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH 0/1] trace2: fix tracing when NO_PTHREADS is defined

2019-05-22 Thread Jeff King
On Wed, May 22, 2019 at 09:23:39AM -0400, Jeff Hostetler wrote:

> I was wondering about that too as the proper long term solution.
> We would need (as the discussion suggests [1]) to properly
> respect/represent the pthread_key_t argument.
> 
> For now, I've guarded my usage of pthread_getspecific() in the trace2
> (similar to what index-pack does), so its not urgent that we update it.
> And I'd rather we take this simple trace2 fix now and not try to combine
> it with fixes for the pthread macros.  Especially now as we're in the RC
> cycle for 2.22.

Yeah, I think that makes sense.

> I'll make a note to revisit the pthread code after 2.22.

For fun, here's a constant-time zero-allocation implementation that I
came up with. It passes t0211 with NO_PTHREADS, but I didn't test it
beyond that.

diff --git a/thread-utils.h b/thread-utils.h
index 4961487ed9..f466215742 100644
--- a/thread-utils.h
+++ b/thread-utils.h
@@ -18,7 +18,7 @@
 #define pthread_t int
 #define pthread_mutex_t int
 #define pthread_cond_t int
-#define pthread_key_t int
+#define pthread_key_t git_pthread_key_t
 
 #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex)
 #define pthread_mutex_lock(mutex)
@@ -31,16 +31,49 @@
 #define pthread_cond_broadcast(cond)
 #define pthread_cond_destroy(cond)
 
-#define pthread_key_create(key, attr) dummy_pthread_init(key)
-#define pthread_key_delete(key)
+#define pthread_key_create(key, destroy) git_pthread_key_create(key, destroy)
+#define pthread_key_delete(key) git_pthread_key_delete(key)
 
 #define pthread_create(thread, attr, fn, data) \
dummy_pthread_create(thread, attr, fn, data)
 #define pthread_join(thread, retval) \
dummy_pthread_join(thread, retval)
 
-#define pthread_setspecific(key, data)
-#define pthread_getspecific(key) NULL
+#define pthread_setspecific(key, data) git_pthread_setspecific(key, data)
+#define pthread_getspecific(key) git_pthread_getspecific(key)
+
+typedef struct {
+   void *data;
+   /* extra indirection because setspecific is passed key by value */
+   void **vdata;
+} git_pthread_key_t;
+
+static inline int git_pthread_key_create(git_pthread_key_t *key,
+void (*destroy)(void *))
+{
+   key->data = NULL;
+   key->vdata = &key->data;
+   /* We don't use this; alternatively we could all via atexit(). */
+   if (destroy)
+   BUG("NO_PTHREADS does not support pthread key destructors");
+   return 0;
+}
+
+static inline int git_pthread_key_delete(git_pthread_key_t key)
+{
+   /* noop */
+   return 0;
+}
+
+static inline void git_pthread_setspecific(git_pthread_key_t key, void *data)
+{
+   *(key.vdata) = data;
+}
+
+static inline void *git_pthread_getspecific(git_pthread_key_t key)
+{
+   return key.data;
+}
 
 int dummy_pthread_create(pthread_t *pthread, const void *attr,
 void *(*fn)(void *), void *data);


[PATCH v2] upload-pack: strip namespace from symref data

2019-05-22 Thread Jeff King
On Wed, May 22, 2019 at 12:33:56PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Likely nobody noticed because bug (1) means that from the client's
> > perspective, we did not report on HEAD at all. And thus it uses the
> > pre-7171d8c15f fallback code to guess the correct HEAD, which is usually
> > right. It only falls down in ambiguous cases (like the one laid out in
> > the included test).
> 
> ...because here you're talking about "the client's perspective" and "it
> uses the pre-7171d8c15f [...] code", but this should say "the
> pre-a45b5f0552 code", i.e. mention the commit that changed the *client*
> logic.

Heh. I actually found this split, too, while digging in the history, but
thought it might be adding distracting details to the commit message.
But hey, if you're gonna read them _carefully_, I guess I can be more
precise. ;)

The fallback code isn't even from a45b5f0552; it's much older than that.
We started looking at ref annotations at all in fbb074c253 (remote: make
guess_remote_head() use exact HEAD lookup if it is available,
2009-02-27). The guessing code that falls back to comes from 8434c2f1af
(Build in clone, 2008-04-27).  I didn't bother finding the true origin
in the shell script. ;)

So instead of mentioning it at the top (because the bug really is just
on the server side), I re-wrote the paragraph about the client a little
more clearly and precisely:

Likely nobody noticed because we tend to do the right thing anyway. Bug
(1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD).
And so the client half of the code, from a45b5f0552 (connect: annotate
refs with their symref information in get_remote_head(), 2013-09-17),
does not annotate HEAD, and we use the fallback in guess_remote_head(),
matching refs by object id. Which is usually right. It only falls down
in ambiguous cases, like the one laid out in the included test.

Revised patch below.

-- >8 --
Subject: [PATCH] upload-pack: strip namespace from symref data

Since 7171d8c15f (upload-pack: send symbolic ref information as
capability, 2013-09-17), we've sent cloning and fetching clients special
information about which branch HEAD is pointing to, so that they don't
have to guess based on matching up commit ids.

However, this feature has never worked properly with the GIT_NAMESPACE
feature.  Because upload-pack uses head_ref_namespaced(find_symref), we
do find and report on refs/namespaces/foo/HEAD instead of the actual
HEAD of the repo. This makes sense, since the branch pointed to by the
top-level HEAD may not be advertised at all. But we do two things wrong:

  1. We report the full name refs/namespaces/foo/HEAD, instead of just
 HEAD. Meaning no client is going to bother doing anything with that
 symref, since we're not otherwise advertising it.

  2. We report the symref destination using its full name (e.g.,
 refs/namespaces/foo/refs/heads/master). That's similarly useless to
 the client, who only saw "refs/heads/master" in the advertisement.

We should be stripping the namespace prefix off of both places (which
this patch fixes).

Likely nobody noticed because we tend to do the right thing anyway. Bug
(1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD).
And so the client half of the code, from a45b5f0552 (connect: annotate
refs with their symref information in get_remote_head(), 2013-09-17),
does not annotate HEAD, and we use the fallback in guess_remote_head(),
matching refs by object id. Which is usually right. It only falls down
in ambiguous cases, like the one laid out in the included test.

This also means that we don't have to worry about breaking anybody who
was putting pre-stripped names into their namespace symrefs when we fix
bug (2). Because of bug (1), nobody would have been using the symref we
advertised in the first place (not to mention that those symrefs would
have appeared broken for any non-namespaced access).

Note that we have separate fixes here for the v0 and v2 protocols. The
symref advertisement moved in v2 to be a part of the ls-refs command.
This actually gets part (1) right, since the symref annotation
piggy-backs on the existing ref advertisement, which is properly
stripped. But it still needs a fix for part (2). The included tests
cover both protocols.

Reported-by: Bryan Turner 
Signed-off-by: Jeff King 
---
 ls-refs.c|  3 ++-
 t/t5509-fetch-push-namespaces.sh | 28 
 upload-pack.c|  4 ++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index 0a7dbc6442..818aef70a0 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -57,7 +57,8 @@ static int send_ref(const char *refname, const struct 
object_id *oid,
if (!symref_target)
die("'%s' is a symref but it is not?", refname);
 
-   strbuf_addf(&refline, " symref-target:%s", symref_target);
+   strbuf_addf(&refline, 

Re: [PATCH] Make stashing nothing exit 1

2019-05-22 Thread Johannes Sixt
Am 23.05.19 um 01:57 schrieb Maksim Odnoletkov:
> The problem with current behaviour is it makes it hard to use stash in
> scripts. A natural stash use case is: wrap some operation requiring a
> clean working tree with a stash push-pop pair. But that doesn't work
> properly when working tree is already clean - push silently does nothing
> and following pop becomes unbalanced. You have to keep that in mind and
> work around with something like:
> 
> if ! git diff-index --exit-code --quiet HEAD
> then
>   git stash push
>   trap 'git stash pop' EXIT
> fi
> 
> With this change this can be simplified to:
> 
> git stash push && trap 'git stash pop' EXIT

In a script, shouldn't you better use 'create' + 'store' instead of 'push'?

-- Hannes


Re: [PATCH 3/4] am: drop tty requirement for --interactive

2019-05-22 Thread Johannes Schindelin
Hi Peff,

On Mon, 20 May 2019, Jeff King wrote:

> On Mon, May 20, 2019 at 08:11:13AM -0400, Jeff King wrote:
>
> > We have required that the stdin of "am --interactive" be a tty since
> > a1451104ac (git-am: interactive should fail gracefully., 2005-10-12).
> > However, this isn't strictly necessary, and makes the tool harder to
> > test (and is unlike all of our other --interactive commands).
>
> I think this is worth doing for simplicity and consistency. But as you
> might guess, my ulterior motive was making it easier to add tests.
>
> In theory we _should_ be able to use test_terminal for this, but it
> seems to be racy, because it will quickly read all input and close the
> descriptor (to give the reader EOF). But after that close, isatty() will
> no longer report it correctly. E.g., if I run this:
>
>   perl test-terminal.perl sh -c '
>   for i in 0 1 2; do
>   echo $i is $(test -t $i || echo not) a tty
>   done
>   ' 
> it _usually_ says "0 is a tty", but racily may say "not a tty". If you
> put a sleep into the beginning of the shell, then it will basically
> always lose the race and say "not".

This is just another nail in the coffin for `test-terminal.perl`, as far
as I am concerned.

In the built-in `add -i` patch series, I followed a strategy where I move
totally away from `test-terminal`, in favor of using some knobs to force
Git into thinking that we are in a terminal.

But at the same time, I *also* remove the limitation (for most cases) of
"read from /dev/tty", in favor of reading from stdin, and making things
testable, and more importantly: scriptable.

So I am *very* much in favor of this here patch.

Thanks,
Dscho

P.S.: There are even more reasons to get rid of `test-terminal`, of
course: it is an unnecessary dependency on Perl, works only when certain
Perl modules are installed (that are *not* installed on Ubuntu by default,
for example), and it requires pseudo terminals, so it will *never* work on
Windows.

> It might be possible to overcome this by making test-terminal more
> clever (i.e., is there a way for us to send an "EOF" over the pty
> without actually _closing_ it? That would behave like a real terminal,
> where you can hit ^D to generate an EOF but then type more).
>
> But barring that, this works by just avoiding it entirely. :)
>
> Curiously, my script above also reports consistently that stdout is not
> a tty, but that stderr is. I'm not sure why this is, but it no tests
> seem to care either way.
>
> -Peff
>