strange behavior of git diff-index
Hi, git diff-index is giving me incorrect information, however if I run git diff, then git diff-index again it will show the correct information. The steps are the following: $ git diff-index --name-only HEAD git appears to list all files in the project irrespective if they modified or not $ git diff $ git diff --cached will show nothing. So I don't have any modification. Now strangely if I run git diff-index, it will also show nothing which is correction behavior. my git version is 2.7.4 Any explanation? Thanks, Oussama
[ANNOUNCE] Git Rev News edition 51
Hi everyone, The 51th edition of Git Rev News is now published: https://git.github.io/rev_news/2019/05/22/edition-51/ Thanks a lot to David Pursehouse, Luca Milanesio and Denton Liu who contributed this month! Enjoy, Christian, Jakub, Markus and Gabriel.
Re: [PATCH] upload-pack: strip namespace from symref data
On Wed, May 22 2019, Jeff King wrote: > Since 7171d8c15f (upload-pack: send symbolic ref information as > capability, 2013-09-17), we've sent cloning and fetching clients special > information about which branch HEAD is pointing to, so that they don't > have to guess based on matching up commit ids. I'd add a 2nd paragraph here: The client will then use the corresponding code added in a45b5f0552 ("connect: annotate refs with their symref information in get_remote_head()", 2013-09-17) to find the intended symref. See 9907d1359c ("Merge branch 'jc/upload-pack-send-symref'", 2013-10-30) for the full set of changes related to this. ...because, see later... > However, this feature has never worked properly with the GIT_NAMESPACE > feature. Because upload-pack uses head_ref_namespaced(find_symref), we > do find and report on refs/namespaces/foo/HEAD instead of the actual > HEAD of the repo. This makes sense, since the branch pointed to by the > top-level HEAD may not be advertised at all. But we do two things wrong: > > 1. We report the full name refs/namespaces/foo/HEAD, instead of just > HEAD. Meaning no client is going to bother doing anything with that > symref, since we're not otherwise advertising it. > > 2. We report the symref destination using its full name (e.g., > refs/namespaces/foo/refs/heads/master). That's similarly useless to > the client, who only saw "refs/heads/master" in the advertisement. > > We should be stripping the namespace prefix off of both places (which > this patch fixes). > > Likely nobody noticed because bug (1) means that from the client's > perspective, we did not report on HEAD at all. And thus it uses the > pre-7171d8c15f fallback code to guess the correct HEAD, which is usually > right. It only falls down in ambiguous cases (like the one laid out in > the included test). ...because here you're talking about "the client's perspective" and "it uses the pre-7171d8c15f [...] code", but this should say "the pre-a45b5f0552 code", i.e. mention the commit that changed the *client* logic. Well, the client also "uses" the server-side code indirectly, but I think it's easier to follow along if we note what both sides are (as I dug up when reviewing this...). > This also means that we don't have to worry about breaking anybody who > was putting pre-stripped names into their namespace symrefs when we fix > bug (2). Because of bug (1), nobody would have been using the symref we > advertised in the first place (not to mention that those symrefs would > have appeared broken for any non-namespaced access). > > Note that we have separate fixes here for the v0 and v2 protocols. The > symref advertisement moved in v2 to be a part of the ls-refs command. > This actually gets part (1) right, since the symref annotation > piggy-backs on the existing ref advertisement, which is properly > stripped. But it still needs a fix for part (2). The included tests > cover both protocols. > > Reported-by: Bryan Turner > Signed-off-by: Jeff King > --- > This is the same as my earlier fix, but with the v2 bit added, and of > course tests and a commit message. Thanks (as usual) for a helpful bug > report. > > I don't know if we have a general philosophy for testing v0 versus v2. > Without specifying the protocol at all, we'd catch the former on a > regular run and the latter under a GIT_TEST_PROTOCOL_VERSION=2 run. So > we _could_ just rely on that, but since I had to do two separate fixes, > it made sense to me to include explicit tests. Yeah, we shouldn't rely on the GIT_TEST_* stuff for coverage, it should just be used to find things that fall between the cracks. > ls-refs.c| 3 ++- > t/t5509-fetch-push-namespaces.sh | 28 > upload-pack.c| 4 ++-- > 3 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/ls-refs.c b/ls-refs.c > index 0a7dbc6442..818aef70a0 100644 > --- a/ls-refs.c > +++ b/ls-refs.c > @@ -57,7 +57,8 @@ static int send_ref(const char *refname, const struct > object_id *oid, > if (!symref_target) > die("'%s' is a symref but it is not?", refname); > > - strbuf_addf(&refline, " symref-target:%s", symref_target); > + strbuf_addf(&refline, " symref-target:%s", > + strip_namespace(symref_target)); > } > > if (data->peel) { > diff --git a/t/t5509-fetch-push-namespaces.sh > b/t/t5509-fetch-push-namespaces.sh > index c88df78c0b..75cbfcc392 100755 > --- a/t/t5509-fetch-push-namespaces.sh > +++ b/t/t5509-fetch-push-namespaces.sh > @@ -124,4 +124,32 @@ test_expect_success 'try to update a hidden full ref' ' > test_must_fail git -C original push pushee-namespaced master > ' > > +test_expect_success 'set up ambiguous HEAD' ' > + git init ambiguous && > + ( > + cd ambiguous && > + git commit --allow-empty -m foo && > +
GIT by github 2.21.0 got reviewed on Software Informer
Good day! would like to inform you that your product GIT by github 2.21.0 has been reviewed by our editors and your program got "Editor's Pick Award", "100% Clean Award". You can read the review at https://git.software.informer.com/. We would be grateful if you place our award with a link to our review on your website. On our part, we can offer featuring your program in our Today's Highlight block. This block is shown in the rotator at the top of the main page and also on every page of our website in the upper left corner. You are welcome to order a free video tutorial for your program as well. Once it's finished, you can add it to your website. We also offer you to take advantage of our free storage by hosting your installation package on our servers and listing us as one of the mirror downloads for your program. There is a selection of predesigned buttons available to fit the look of your website. Please let me know if you're interested in any of these offers. We are on the list of the world's 1000 most visited websites, so this could get your program some extra exposure. Kind regards, Kasey Bloome
Re: [PATCH 0/1] trace2: fix tracing when NO_PTHREADS is defined
On 5/21/2019 5:27 PM, Jeff King wrote: On Tue, May 21, 2019 at 12:33:58PM -0700, Jeff Hostetler via GitGitGadget wrote: As Duy suggested, pthread_getspecific() just returns NULL when NO_PTHREADS is defined. And pthread_setspecific() silently does not nothing. So this problem was hidden from view. I have to wonder if we should update pthread_*specific() to call BUG() when NO_PTHREADS is defined as a way to catch unguarded usages easier or make this issue more clear. I think it should actually store the data asked for by the caller, as if we were the single thread running. We discussed this as the time of refactoring NO_PTHREADS, but there was only one caller that would have benefited. Now there are two. ;) Discussion in the subthread of this patch: https://public-inbox.org/git/20181027071003.1347-2-pclo...@gmail.com/ -Peff I was wondering about that too as the proper long term solution. We would need (as the discussion suggests [1]) to properly respect/represent the pthread_key_t argument. For now, I've guarded my usage of pthread_getspecific() in the trace2 (similar to what index-pack does), so its not urgent that we update it. And I'd rather we take this simple trace2 fix now and not try to combine it with fixes for the pthread macros. Especially now as we're in the RC cycle for 2.22. I'll make a note to revisit the pthread code after 2.22. Thanks Jeff [1] https://public-inbox.org/git/CACsJy8DLW_smOJd6aCoRcJZxQ2Lzut5US=sPadj7=fhne0u...@mail.gmail.com/
RE: [Breakage] 2.22.0-rc1 t5401-update-hooks.sh
On May 21, 2019 20:48, brian m. carlson wrote: > To: Randall S. Becker > Cc: 'Git Mailing List' > Subject: Re: [Breakage] 2.22.0-rc1 t5401-update-hooks.sh > > On 2019-05-21 at 21:47:54, Randall S. Becker wrote: > > When running the test in isolation, it passes without incident whether > > or not --verbose is used. So far, this only occurs on the first run > > through. I wanted to report it, based on the inconsistency of results. > > This is not the first time tests have acted in this fashion, and I > > realize it is difficult to do anything about it without being able to > > recreate > the situation. > > Does running git clean -dxf cause it to be reproducible? I will give it a go. Having exactly the same behaviour in t7519 subtest 19. I wonder whether there are breadcrumbs not being cleaned up. Will report back when I am able - may take a day or so. Cheers, Randall
Re: strange behavior of git diff-index
Hi Oussama, On Wed, 22 May 2019, Oussama Ghorbel wrote: > git diff-index is giving me incorrect information, however if I run git diff, > then git diff-index again it will show the correct information. > The steps are the following: > $ git diff-index --name-only HEAD > git appears to list all files in the project irrespective if they modified or > not > $ git diff > $ git diff --cached > will show nothing. So I don't have any modification. > Now strangely if I run git diff-index, it will also show nothing which is > correction behavior. > > my git version is 2.7.4 > Any explanation? The low-level `diff-index` command is meant to be used in scripts, and does specifically *not* refresh the index before running. Meaning that it could possibly mistake a file for being modified even if it is not modified, just because it is marked "modified" in the index [*1*]. Short answer: use the high-level command `git diff` that is intended for human consumption. Ciao, Johannes Footnote *1*: It is actually a bit more complicated than that: the index stores metadata such as mtime, size, uid, etc, and compares that to the metadata on disk. If there is any mismatch, or if everything matches but the mtime *also* matches the index file's itself, the file is considered not up to date, i.e. marked as modified.
Re: strange behavior of git diff-index
Hi Johannes, Thanks a lot for the clarification! Regards, Oussama On 5/22/19 4:54 PM, Johannes Schindelin wrote: > Hi Oussama, > > On Wed, 22 May 2019, Oussama Ghorbel wrote: > >> git diff-index is giving me incorrect information, however if I run git >> diff, then git diff-index again it will show the correct information. >> The steps are the following: >> $ git diff-index --name-only HEAD >> git appears to list all files in the project irrespective if they modified >> or not >> $ git diff >> $ git diff --cached >> will show nothing. So I don't have any modification. >> Now strangely if I run git diff-index, it will also show nothing which is >> correction behavior. >> >> my git version is 2.7.4 >> Any explanation? > The low-level `diff-index` command is meant to be used in scripts, and > does specifically *not* refresh the index before running. Meaning that it > could possibly mistake a file for being modified even if it is not > modified, just because it is marked "modified" in the index [*1*]. > > Short answer: use the high-level command `git diff` that is intended for > human consumption. > > Ciao, > Johannes > > Footnote *1*: It is actually a bit more complicated than that: the index > stores metadata such as mtime, size, uid, etc, and compares that to the > metadata on disk. If there is any mismatch, or if everything matches but > the mtime *also* matches the index file's itself, the file is considered > not up to date, i.e. marked as modified.
Re: Revision walking, commit dates, slop
Derrick Stolee writes: > On 5/20/2019 7:27 PM, Jakub Narebski wrote: >> Jakub Narebski writes: >>> Derrick Stolee writes: On 5/20/2019 7:02 AM, Jakub Narebski wrote: > > Are there any blockers that prevent the switch to this > "generation number v2"? >> [...] >> Using the generation number column for the corrected commit-date offsets (assuming we also guarantee the offset is strictly increasing from parent to child), these new values will be backwards- compatible _except_ for 'git commit-graph verify'. >>> >>> O.K., so the "generation number v2 (legacy)" would be incremental and >>> backward-compatibile in use (though not in generation and validation). >>> >>> Do I understand it correctly how it is calculated: >>> >>> corrected_date(C) = max(committer_date(C), >>> max_{P ∈ parents(C)}(corrected_date(P)) + 1) >> >> This should probably read >> >> offset_date(P) = committer_date(P) + gen_v2(P) >> corrected_date(C) = max(committer_date(C), >> max_{P ∈ parents(C)}(offset_date(P)) + 1) Restating it yet again: A. corrected_date(C) = max(committer_date(C), max_P(committer_date(P) + offset(P)) + 1) B. offset(C) = max(corrected_date(C) - committer_date(C), max_P(offset(P)) + 1) > The final definition needs two conditions on the offset of a commit C for > every parent P: > > 1. committer_date(C) + offset(C) > committer_date(P) + offset(P) > 2. offset(C) > offset(P) The equation (B) ensures the (2) condition, i.e offset(C) > offset(P). The equation (A) ensures that condition (1) is fulfulled, because from (B) we have corrected_date(C) <= committer_date(C) + offset(C) This from (B) and (A( we get: committer_date(C) + offset(C) >= corrected_date(C) > > committer_date(P) + offset(P) > Condition (1) will give us the performance benefits related to the > committer-date heuristic. Condition (2) will give us backwards-compatibility > with generation numbers. Well, we should check/test if performance benefits of "offset date" ("corrected date with rising offset") truly holds. Best, -- Jakub Narębski
standalone library/tool to query commit-graph?
After producing the file ".git/objects/info/commit-graph" with the command "git commit-graph write", is there a way to answer queries like "git merge-base --is-ancestor" without having a .git directory? E.g. is there a library that will operate on the "commit-graph" file all by itself? Thanks, Karl
Re: standalone library/tool to query commit-graph?
On 5/22/2019 2:49 PM, Karl Ostmo wrote: > After producing the file ".git/objects/info/commit-graph" with the > command "git commit-graph write", is there a way to answer queries > like "git merge-base --is-ancestor" without having a .git directory? > E.g. is there a library that will operate on the "commit-graph" file > all by itself? You could certainly build such a tool, assuming your merge-base parameters are full-length commit ids. If you try to start at ref names, you'll need the .git directory. I would not expect such a tool to ever exist in the Git codebase. Instead, you would need a new project, say "graph-analyzer --graph= --is-ancestor " Thanks, -Stolee
Re: Revision walking, commit dates, slop
On 5/22/2019 2:29 PM, Jakub Narebski wrote: > Derrick Stolee writes: >> On 5/20/2019 7:27 PM, Jakub Narebski wrote: > Restating it yet again: > >A. corrected_date(C) = max(committer_date(C), >max_P(committer_date(P) + offset(P)) + 1) > >B. offset(C) = max(corrected_date(C) - committer_date(C), >max_P(offset(P)) + 1) The problem with this definition is that it "defines" the corrected date, and then _adjusts_ it by updating the offset. I consider corrected_date(C) = committer_date(C) + offset(C) to be part of the definition. You could restate the definition as follows: corrected_date = max(committer_date(C) + max_P(offset(P)) + 1, max_P(corrected_date(P))) or, equivalently corrected_date = max(committer_date(C) + max_P(offset(P)) + 1, max_P(committer_date(P) + offset(P))) This definition, in a single step, satisfies the conditions below: > >> The final definition needs two conditions on the offset of a commit C for >> every parent P: >> >> 1. committer_date(C) + offset(C) > committer_date(P) + offset(P) >> 2. offset(C) > offset(P) Plus, the "+ 1" in the first step takes into account that "0" is a special offset value in the commit-graph file format meaning "not computed". > Well, we should check/test if performance benefits of "offset date" > ("corrected date with rising offset") truly holds. Yes, a full performance test will be required. I have full confidence that the monotonic offset requirement will have only positive effect. That is, it will not affect the case where committer-date was better than generation number, but will help the cases where all the committer-dates are equal. Thanks, -Stolee
[PATCH v2 07/11] commit-graph: write commit-graph chains
From: Derrick Stolee Extend write_commit_graph() to write a commit-graph chain when given the COMMIT_GRAPH_SPLIT flag. This implementation is purposefully simplistic in how it creates a new chain. The commits not already in the chain are added to a new tip commit-graph file. Much of the logic around writing a graph-{hash}.graph file and updating the commit-graph-chain file is the same as the commit-graph file case. However, there are several places where we need to do some extra logic in the split case. Track the list of graph filenames before and after the planned write. This will be more important when we start merging graph files, but it also allows us to upgrade our commit-graph file to the appropriate graph-{hash}.graph file when we upgrade to a chain of commit-graphs. Note that we use the eighth byte of the commit-graph header to store the number of base graph files. This determines the length of the base graphs chunk. A subtle change of behavior with the new logic is that we do not write a commit-graph if we our commit list is empty. This extends to the typical case, which is reflected in t5318-commit-graph.sh. Signed-off-by: Derrick Stolee --- commit-graph.c | 286 ++-- commit-graph.h | 2 + t/t5318-commit-graph.sh | 2 +- 3 files changed, 278 insertions(+), 12 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b8a1444217..fb972c0bc2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -300,12 +300,18 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file) struct stat st; int fd; + struct commit_graph *g; int open_ok = open_commit_graph(graph_file, &fd, &st); if (!open_ok) return NULL; - return load_commit_graph_one_fd_st(fd, &st); + g = load_commit_graph_one_fd_st(fd, &st); + + if (g) + g->filename = xstrdup(graph_file); + + return g; } static int prepare_commit_graph_v1(struct repository *r, const char *obj_dir) @@ -709,8 +715,19 @@ struct write_commit_graph_context { struct progress *progress; int progress_done; uint64_t progress_cnt; + + char *base_graph_name; + int num_commit_graphs_before; + int num_commit_graphs_after; + char **commit_graph_filenames_before; + char **commit_graph_filenames_after; + char **commit_graph_hash_after; + uint32_t new_num_commits_in_base; + struct commit_graph *new_base_graph; + unsigned append:1, -report_progress:1; +report_progress:1, +split:1; }; static void write_graph_chunk_fanout(struct hashfile *f, @@ -780,6 +797,16 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, ctx->commits.nr, commit_to_sha1); + if (edge_value >= 0) + edge_value += ctx->new_num_commits_in_base; + else { + uint32_t pos; + if (find_commit_in_graph(parent->item, +ctx->new_base_graph, +&pos)) + edge_value = pos; + } + if (edge_value < 0) BUG("missing parent %s for commit %s", oid_to_hex(&parent->item->object.oid), @@ -800,6 +827,17 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len, ctx->commits.list, ctx->commits.nr, commit_to_sha1); + + if (edge_value >= 0) + edge_value += ctx->new_num_commits_in_base; + else { + uint32_t pos; + if (find_commit_in_graph(parent->item, +ctx->new_base_graph, +&pos)) + edge_value = pos; + } + if (edge_value < 0) BUG("missing parent %s for commit %s", oid_to_hex(&parent->item->object.oid), @@ -857,6 +895,16 @@ static void write_graph_chunk_extra_edges(struct hashfile *f, ctx->commits.nr, commit_to_sha1); + if (edge_value >= 0) + edge_value += ctx->new_num_commits_in_base; + else { +
[PATCH v2 05/11] commit-graph: add base graphs chunk
From: Derrick Stolee To quickly verify a commit-graph chain is valid on load, we will read from the new "Base Graphs Chunk" of each file in the chain. This will prevent accidentally loading incorrect data from manually editing the commit-graph-chain file or renaming graph-{hash}.graph files. The commit_graph struct already had an object_id struct "oid", but it was never initialized or used. Add a line to read the hash from the end of the commit-graph file and into the oid member. Signed-off-by: Derrick Stolee --- .../technical/commit-graph-format.txt | 11 -- commit-graph.c| 22 +++ commit-graph.h| 1 + 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 16452a0504..a4f17441ae 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -44,8 +44,9 @@ HEADER: 1-byte number (C) of "chunks" - 1-byte (reserved for later use) - Current clients should ignore this value. + 1-byte number (B) of base commit-graphs + We infer the length (H*B) of the Base Graphs chunk + from this value. CHUNK LOOKUP: @@ -92,6 +93,12 @@ CHUNK DATA: positions for the parents until reaching a value with the most-significant bit on. The other bits correspond to the position of the last parent. + Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] + This list of H-byte hashes describe a set of B commit-graph files that + form a commit-graph chain. The graph position for the ith commit in this + file's OID Lookup chunk is equal to i plus the number of commits in all + base graphs. If B is non-zero, this chunk must exist. + TRAILER: H-byte HASH-checksum of all of the above. diff --git a/commit-graph.c b/commit-graph.c index 70e44393b8..060897fff0 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -22,6 +22,7 @@ #define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ #define GRAPH_CHUNKID_EXTRAEDGES 0x45444745 /* "EDGE" */ +#define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */ #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) @@ -262,6 +263,12 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, else graph->chunk_extra_edges = data + chunk_offset; break; + + case GRAPH_CHUNKID_BASE: + if (graph->chunk_base_graphs) + chunk_repeated = 1; + else + graph->chunk_base_graphs = data + chunk_offset; } if (chunk_repeated) { @@ -280,6 +287,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, last_chunk_offset = chunk_offset; } + hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len); + if (verify_commit_graph_lite(graph)) return NULL; @@ -315,8 +324,21 @@ static int add_graph_to_chain(struct commit_graph *g, { struct commit_graph *cur_g = chain; + if (n && !g->chunk_base_graphs) { + warning(_("commit-graph has no base graphs chunk")); + return 0; + } + while (n) { n--; + + if (!oideq(&oids[n], &cur_g->oid) || + !hasheq(oids[n].hash, g->chunk_base_graphs + g->hash_len * n)) { + warning(_("commit-graph chain does not match")); + return 0; + } + + cur_g = cur_g->base_graph; } diff --git a/commit-graph.h b/commit-graph.h index f9fe32ebe3..80f4917ddb 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -55,6 +55,7 @@ struct commit_graph { const unsigned char *chunk_oid_lookup; const unsigned char *chunk_commit_data; const unsigned char *chunk_extra_edges; + const unsigned char *chunk_base_graphs; }; struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st); -- gitgitgadget
[PATCH v2 04/11] commit-graph: load commit-graph chains
From: Derrick Stolee Prepare the logic for reading a chain of commit-graphs. First, look for a file at $OBJDIR/info/commit-graph. If it exists, then use that file and stop. Next, look for the chain file at $OBJDIR/info/commit-graphs/commit-graph-chain. If this file exists, then load the hash values as line-separated values in that file and load $OBJDIR/info/commit-graphs/graph-{hash[i]}.graph for each hash[i] in that file. The file is given in order, so the first hash corresponds to the "base" file and the final hash corresponds to the "tip" file. This implementation assumes that all of the graph-{hash}.graph files are in the same object directory as the commit-graph-chain file. This will be updated in a future change. This change is purposefully simple so we can isolate the different concerns. Signed-off-by: Derrick Stolee --- commit-graph.c | 98 +++--- 1 file changed, 93 insertions(+), 5 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index e2f438f6a3..70e44393b8 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -45,6 +45,19 @@ char *get_commit_graph_filename(const char *obj_dir) return xstrfmt("%s/info/commit-graph", obj_dir); } +static char *get_split_graph_filename(const char *obj_dir, + const char *oid_hex) +{ + return xstrfmt("%s/info/commit-graphs/graph-%s.graph", + obj_dir, + oid_hex); +} + +static char *get_chain_filename(const char *obj_dir) +{ + return xstrfmt("%s/info/commit-graphs/commit-graph-chain", obj_dir); +} + static uint8_t oid_version(void) { return 1; @@ -286,18 +299,93 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file) return load_commit_graph_one_fd_st(fd, &st); } +static int prepare_commit_graph_v1(struct repository *r, const char *obj_dir) +{ + char *graph_name = get_commit_graph_filename(obj_dir); + r->objects->commit_graph = load_commit_graph_one(graph_name); + free(graph_name); + + return r->objects->commit_graph ? 0 : -1; +} + +static int add_graph_to_chain(struct commit_graph *g, + struct commit_graph *chain, + struct object_id *oids, + int n) +{ + struct commit_graph *cur_g = chain; + + while (n) { + n--; + cur_g = cur_g->base_graph; + } + + g->base_graph = chain; + + if (chain) + g->num_commits_in_base = chain->num_commits + chain->num_commits_in_base; + + return 1; +} + +static void prepare_commit_graph_chain(struct repository *r, const char *obj_dir) +{ + struct strbuf line = STRBUF_INIT; + struct stat st; + struct object_id *oids; + int i = 0, valid = 1; + char *chain_name = get_chain_filename(obj_dir); + FILE *fp; + + if (stat(chain_name, &st)) + return; + + if (st.st_size <= the_hash_algo->hexsz) + return; + + fp = fopen(chain_name, "r"); + free(chain_name); + + if (!fp) + return; + + oids = xcalloc(st.st_size / (the_hash_algo->hexsz + 1), sizeof(struct object_id)); + + while (strbuf_getline_lf(&line, fp) != EOF && valid) { + char *graph_name; + struct commit_graph *g; + + if (get_oid_hex(line.buf, &oids[i])) { + warning(_("invalid commit-graph chain: line '%s' not a hash"), + line.buf); + valid = 0; + break; + } + + graph_name = get_split_graph_filename(obj_dir, line.buf); + g = load_commit_graph_one(graph_name); + free(graph_name); + + if (g && add_graph_to_chain(g, r->objects->commit_graph, oids, i)) + r->objects->commit_graph = g; + else + valid = 0; + } + + free(oids); + fclose(fp); +} + static void prepare_commit_graph_one(struct repository *r, const char *obj_dir) { - char *graph_name; if (r->objects->commit_graph) return; - graph_name = get_commit_graph_filename(obj_dir); - r->objects->commit_graph = - load_commit_graph_one(graph_name); + if (!prepare_commit_graph_v1(r, obj_dir)) + return; - FREE_AND_NULL(graph_name); + prepare_commit_graph_chain(r, obj_dir); } /* -- gitgitgadget
[PATCH v2 09/11] commit-graph: merge commit-graph chains
From: Derrick Stolee When searching for a commit in a commit-graph chain of G graphs with N commits, the search takes O(G log N) time. If we always add a new tip graph with every write, the linear G term will start to dominate and slow the lookup process. To keep lookups fast, but also keep most incremental writes fast, create a strategy for merging levels of the commit-graph chain. The strategy is detailed in the commit-graph design document, but is summarized by these two conditions: 1. If the number of commits we are adding is more than half the number of commits in the graph below, then merge with that graph. 2. If we are writing more than 64,000 commits into a single graph, then merge with all lower graphs. The numeric values in the conditions above are currently constant, but can become config options in a future update. As we merge levels of the commit-graph chain, check that the commits still exist in the repository. A garbage-collection operation may have removed those commits from the object store and we do not want to persist them in the commit-graph chain. This is a non-issue if the 'git gc' process wrote a new, single-level commit-graph file. After we merge levels, the old graph-{hash}.graph files are no longer referenced by the commit-graph-chain file. We will expire these files in a future change. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 81 ++ commit-graph.c | 190 +++ t/t5323-split-commit-graph.sh| 13 ++ 3 files changed, 251 insertions(+), 33 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 1dca3bd8fe..083fff9927 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -186,6 +186,87 @@ positions to refer to their parents, which may be in `graph-{hash1}.graph` or its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + X2). +Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data +specifying the hashes of all files in the lower layers. In the above example, +`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains +`{hash0}` and `{hash1}`. + +## Merging commit-graph files + +If we only added a new commit-graph file on every write, we would run into a +linear search problem through many commit-graph files. Instead, we use a merge +strategy to decide when the stack should collapse some number of levels. + +The diagram below shows such a collapse. As a set of new commits are added, it +is determined by the merge strategy that the files should collapse to +`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and +the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` +file. + + +-+ + | | + |(new commits)| + | | + +-+ + | | + +---+ +-+ + | graph-{hash2} |->| | + +---+ +-+ + | | | + +---+ +-+ + | | | | + | graph-{hash1} |->| | + | | | | + +---+ +-+ + | tmp_graphXXX + +---+ + | | + | | + | | + | graph-{hash0} | + | | + | | + | | + +---+ + +During this process, the commits to write are combined, sorted and we write the +contents to a temporary file, all while holding a `commit-graph-chain.lock` +lock-file. When the file is flushed, we rename it to `graph-{hash3}` +according to the computed `{hash3}`. Finally, we write the new chain data to +`commit-graph-chain.lock`: + +``` + {hash3} + {hash0} +``` + +We then close the lock-file. + +## Merge Strategy + +When writing a set of commits that do not exist in the commit-graph stack of +height N, we default to creating a new file at level N + 1. We then decide to +merge with the Nth level if one of two conditions hold: + + 1. The expected file size for level N + 1 is at least half the file size for + level N. + + 2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64, + commits). + +This decision cascades down the levels: when we merge a level we create a new +set of commits that then compares to the next level. + +The first condition bounds the number of levels to be logarithmic in the t
[PATCH v2 11/11] commit-graph: expire commit-graph files
From: Derrick Stolee As we merge commit-graph files in a commit-graph chain, we should clean up the files that are no longer used. This change introduces an 'expiry_window' value to the context, which is always zero (for now). We then check the modified time of each graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and unlink the files that are older than the expiry_window. Since this is always zero, this immediately clears all unused graph files. We will update the value to match a config setting in a future change. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 15 + commit-graph.c | 71 t/t5323-split-commit-graph.sh| 2 +- 3 files changed, 87 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index b6fe8b2321..4ecec54148 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -267,6 +267,21 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Deleting graph-{hash} files + +After a new tip file is written, some `graph-{hash}` files may no longer +be part of a chain. It is important to remove these files from disk, eventually. +The main reason to delay removal is that another process could read the +`commit-graph-chain` file before it is rewritten, but then look for the +`graph-{hash}` files after they are deleted. + +To allow holding old split commit-graphs for a while after they are unreferenced, +we update the modified times of the files when they become unreferenced. Then, +we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}` +files whose modified times are older than a given expiry window. This window +defaults to zero, but can be changed using command-line arguments or a config +setting. + ## Chains across multiple object directories In a repo with alternates, we look for the `commit-graph-chain` file starting diff --git a/commit-graph.c b/commit-graph.c index 0d8b942e2b..69c4f83c2a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -739,6 +739,8 @@ struct write_commit_graph_context { unsigned append:1, report_progress:1, split:1; + + time_t expire_window; }; static void write_graph_chunk_fanout(struct hashfile *f, @@ -1629,6 +1631,70 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx) deduplicate_commits(ctx); } +static void mark_commit_graphs(struct write_commit_graph_context *ctx) +{ + uint32_t i; + time_t now = time(NULL); + + for (i = ctx->num_commit_graphs_after - 1; i < ctx->num_commit_graphs_before; i++) { + struct stat st; + struct utimbuf updated_time; + + stat(ctx->commit_graph_filenames_before[i], &st); + + updated_time.actime = st.st_atime; + updated_time.modtime = now; + utime(ctx->commit_graph_filenames_before[i], &updated_time); + } +} + +static void expire_commit_graphs(struct write_commit_graph_context *ctx) +{ + struct strbuf path = STRBUF_INIT; + DIR *dir; + struct dirent *de; + size_t dirnamelen; + time_t expire_time = time(NULL) - ctx->expire_window; + + strbuf_addstr(&path, ctx->obj_dir); + strbuf_addstr(&path, "/info/commit-graphs"); + dir = opendir(path.buf); + + if (!dir) { + strbuf_release(&path); + return; + } + + strbuf_addch(&path, '/'); + dirnamelen = path.len; + while ((de = readdir(dir)) != NULL) { + struct stat st; + uint32_t i, found = 0; + + strbuf_setlen(&path, dirnamelen); + strbuf_addstr(&path, de->d_name); + + stat(path.buf, &st); + + if (st.st_mtime > expire_time) + continue; + if (path.len < 6 || strcmp(path.buf + path.len - 6, ".graph")) + continue; + + for (i = 0; i < ctx->num_commit_graphs_after; i++) { + if (!strcmp(ctx->commit_graph_filenames_after[i], + path.buf)) { + found = 1; + break; + } + } + + if (!found) + unlink(path.buf); + + } +} + int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, @@ -1741,6 +1807,11 @@ int write_commit_graph(const char *obj_dir, res = write_commit_graph_file(ctx); + if (ctx->split) { + mark_commit_graphs(ctx); + expire_commit_graphs(ctx); + } + cleanu
[PATCH v2 06/11] commit-graph: rearrange chunk count logic
From: Derrick Stolee The number of chunks in a commit-graph file can change depending on whether we need the Extra Edges Chunk. We are going to add more optional chunks, and it will be helpful to rearrange this logic around the chunk count before doing so. Specifically, we need to finalize the number of chunks before writing the commit-graph header. Further, we also need to fill out the chunk lookup table dynamically and using "num_chunks" as we add optional chunks is useful for adding optional chunks in the future. Signed-off-by: Derrick Stolee --- commit-graph.c | 35 +-- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 060897fff0..b8a1444217 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1192,7 +1192,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) uint64_t chunk_offsets[5]; const unsigned hashsz = the_hash_algo->rawsz; struct strbuf progress_title = STRBUF_INIT; - int num_chunks = ctx->num_extra_edges ? 4 : 3; + int num_chunks = 3; ctx->graph_name = get_commit_graph_filename(ctx->obj_dir); if (safe_create_leading_directories(ctx->graph_name)) { @@ -1205,27 +1205,34 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) hold_lock_file_for_update(&lk, ctx->graph_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); - hashwrite_be32(f, GRAPH_SIGNATURE); - - hashwrite_u8(f, GRAPH_VERSION); - hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); - hashwrite_u8(f, 0); /* unused padding byte */ - chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; chunk_ids[2] = GRAPH_CHUNKID_DATA; - if (ctx->num_extra_edges) - chunk_ids[3] = GRAPH_CHUNKID_EXTRAEDGES; - else - chunk_ids[3] = 0; - chunk_ids[4] = 0; + if (ctx->num_extra_edges) { + chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES; + num_chunks++; + } + + chunk_ids[num_chunks] = 0; chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr; chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr; - chunk_offsets[4] = chunk_offsets[3] + 4 * ctx->num_extra_edges; + + num_chunks = 3; + if (ctx->num_extra_edges) { + chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] + + 4 * ctx->num_extra_edges; + num_chunks++; + } + + hashwrite_be32(f, GRAPH_SIGNATURE); + + hashwrite_u8(f, GRAPH_VERSION); + hashwrite_u8(f, oid_version()); + hashwrite_u8(f, num_chunks); + hashwrite_u8(f, 0); for (i = 0; i <= num_chunks; i++) { uint32_t chunk_write[3]; -- gitgitgadget
[PATCH v2 03/11] commit-graph: rename commit_compare to oid_compare
From: Derrick Stolee The helper function commit_compare() actually compares object_id structs, not commits. A future change to commit-graph.c will need to sort commit structs, so rename this function in advance. Signed-off-by: Derrick Stolee --- commit-graph.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 3afedcd7f5..e2f438f6a3 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -761,7 +761,7 @@ static void write_graph_chunk_extra_edges(struct hashfile *f, } } -static int commit_compare(const void *_a, const void *_b) +static int oid_compare(const void *_a, const void *_b) { const struct object_id *a = (const struct object_id *)_a; const struct object_id *b = (const struct object_id *)_b; @@ -1030,7 +1030,7 @@ static uint32_t count_distinct_commits(struct write_commit_graph_context *ctx) _("Counting distinct commits in commit graph"), ctx->oids.nr); display_progress(ctx->progress, 0); /* TODO: Measure QSORT() progress */ - QSORT(ctx->oids.list, ctx->oids.nr, commit_compare); + QSORT(ctx->oids.list, ctx->oids.nr, oid_compare); for (i = 1; i < ctx->oids.nr; i++) { display_progress(ctx->progress, i + 1); -- gitgitgadget
[PATCH v2 08/11] commit-graph: add --split option to builtin
From: Derrick Stolee Add a new "--split" option to the 'git commit-graph write' subcommand. This option allows the optional behavior of writing a commit-graph chain. The current behavior will add a tip commit-graph containing any commits that are not in the existing commit-graph or commit-graph chain. Later changes will allow merging the chain and expiring out-dated files. Add a new test script (t5323-split-commit-graph.sh) that demonstrates this behavior. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c| 10 ++- t/t5323-split-commit-graph.sh | 122 ++ 2 files changed, 129 insertions(+), 3 deletions(-) create mode 100755 t/t5323-split-commit-graph.sh diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 828b1a713f..c2c07d3917 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; @@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits]"), NULL }; @@ -35,9 +35,9 @@ static struct opts_commit_graph { int stdin_packs; int stdin_commits; int append; + int split; } opts; - static int graph_verify(int argc, const char **argv) { struct commit_graph *graph = NULL; @@ -156,6 +156,8 @@ static int graph_write(int argc, const char **argv) N_("start walk at commits listed by stdin")), OPT_BOOL(0, "append", &opts.append, N_("include all commits already in the commit-graph file")), + OPT_BOOL(0, "split", &opts.split, + N_("allow writing an incremental commit-graph file")), OPT_END(), }; @@ -169,6 +171,8 @@ static int graph_write(int argc, const char **argv) opts.obj_dir = get_object_directory(); if (opts.append) flags |= COMMIT_GRAPH_APPEND; + if (opts.split) + flags |= COMMIT_GRAPH_SPLIT; read_replace_refs = 0; diff --git a/t/t5323-split-commit-graph.sh b/t/t5323-split-commit-graph.sh new file mode 100755 index 00..96704b9f5b --- /dev/null +++ b/t/t5323-split-commit-graph.sh @@ -0,0 +1,122 @@ +#!/bin/sh + +test_description='split commit graph' +. ./test-lib.sh + +GIT_TEST_COMMIT_GRAPH=0 + +test_expect_success 'setup repo' ' + git init && + git config core.commitGraph true && + infodir=".git/objects/info" && + graphdir="$infodir/commit-graphs" && + test_oid_init +' + +graph_read_expect() { + NUM_BASE=0 + if test ! -z $2 + then + NUM_BASE=$2 + fi + cat >expect <<- EOF + header: 43475048 1 1 3 $NUM_BASE + num_commits: $1 + chunks: oid_fanout oid_lookup commit_metadata + EOF + git commit-graph read >output && + test_cmp expect output +} + +test_expect_success 'create commits and write commit-graph' ' + for i in $(test_seq 3) + do + test_commit $i && + git branch commits/$i + done && + git commit-graph write --reachable && + test_path_is_file $infodir/commit-graph && + graph_read_expect 3 +' + +graph_git_two_modes() { + git -c core.commitGraph=true $1 >output + git -c core.commitGraph=false $1 >expect + test_cmp expect output +} + +graph_git_behavior() { + MSG=$1 + BRANCH=$2 + COMPARE=$3 + test_expect_success "check normal git operations: $MSG" ' + graph_git_two_modes "log --oneline $BRANCH" && + graph_git_two_modes "log --topo-order $BRANCH" && + graph_git_two_modes "log --graph $COMPARE..$BRANCH" && + graph_git_two_modes "branch -vv" && + graph_git_two_modes "merge-base -a $BRANCH $COMPARE" + ' +} + +graph_git_behavior 'graph exists' commits/3 commits/1 + +verify_chain_files_exist() { + for hash in $(cat $1/commit-graph-chain) + do + test_path_is_file $1/graph-$hash.graph + done +} + +test_expect_success 'add more commits, and write a new base graph' ' + git reset --hard commits/1 && + for i in $(test_seq 4 5) + do + test_commit $i && + g
[PATCH v2 01/11] commit-graph: document commit-graph chains
From: Derrick Stolee Add a basic description of commit-graph chains. More details about the feature will be added as we add functionality. This introduction gives a high-level overview to the goals of the feature and the basic layout of commit-graph chains. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 59 1 file changed, 59 insertions(+) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index fb53341d5e..1dca3bd8fe 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -127,6 +127,65 @@ Design Details helpful for these clones, anyway. The commit-graph will not be read or written when shallow commits are present. +Commit Graphs Chains + + +Typically, repos grow with near-constant velocity (commits per day). Over time, +the number of commits added by a fetch operation is much smaller than the +number of commits in the full history. By creating a "chain" of commit-graphs, +we enable fast writes of new commit data without rewriting the entire commit +history -- at least, most of the time. + +## File Layout + +A commit-graph chain uses multiple files, and we use a fixed naming convention +to organize these files. Each commit-graph file has a name +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex- +valued hash stored in the footer of that file (which is a hash of the file's +contents before that hash). For a chain of commit-graph files, a plain-text +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the +hashes for the files in order from "lowest" to "highest". + +For example, if the `commit-graph-chain` file contains the lines + +``` + {hash0} + {hash1} + {hash2} +``` + +then the commit-graph chain looks like the following diagram: + + +---+ + | graph-{hash2}.graph | + +---+ + | + +---+ + | | + | graph-{hash1}.graph | + | | + +---+ + | + +---+ + | | + | | + | | + | graph-{hash0}.graph | + | | + | | + | | + +---+ + +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of +commits in `graph-{hash1}.graph`, and X2 be the number of commits in +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`, +then we interpret this as being the commit in position (X0 + X1 + i), and that +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these +positions to refer to their parents, which may be in `graph-{hash1}.graph` or +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + +X2). + Related Links - [0] https://bugs.chromium.org/p/git/issues/detail?id=8 -- gitgitgadget
[PATCH v2 02/11] commit-graph: prepare for commit-graph chains
From: Derrick Stolee To prepare for a chain of commit-graph files, augment the commit_graph struct to point to a base commit_graph. As we load commits from the graph, we may actually want to read from a base file according to the graph position. The "graph position" of a commit is given by concatenating the lexicographic commit orders from each of the commit-graph files in the chain. This means that we must distinguish two values: * lexicographic index : the position within the lexicographic order in a single commit-graph file. * graph position: the posiiton within the concatenated order of multiple commit-graph files Given the lexicographic index of a commit in a graph, we can compute the graph position by adding the number of commits in the lower-level graphs. To find the lexicographic index of a commit, we subtract the number of commits in lower-level graphs. Signed-off-by: Derrick Stolee --- commit-graph.c | 74 -- commit-graph.h | 3 ++ 2 files changed, 69 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 7723156964..3afedcd7f5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -371,6 +371,25 @@ static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t g->chunk_oid_lookup, g->hash_len, pos); } +static void load_oid_from_graph(struct commit_graph *g, int pos, struct object_id *oid) +{ + uint32_t lex_index; + + if (!g) + BUG("NULL commit-graph"); + + while (pos < g->num_commits_in_base) + g = g->base_graph; + + if (pos >= g->num_commits + g->num_commits_in_base) + BUG("position %d is beyond the scope of this commit-graph (%d local + %d base commits)", + pos, g->num_commits, g->num_commits_in_base); + + lex_index = pos - g->num_commits_in_base; + + hashcpy(oid->hash, g->chunk_oid_lookup + g->hash_len * lex_index); +} + static struct commit_list **insert_parent_or_die(struct repository *r, struct commit_graph *g, uint64_t pos, @@ -379,10 +398,10 @@ static struct commit_list **insert_parent_or_die(struct repository *r, struct commit *c; struct object_id oid; - if (pos >= g->num_commits) + if (pos >= g->num_commits + g->num_commits_in_base) die("invalid parent position %"PRIu64, pos); - hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); + load_oid_from_graph(g, pos, &oid); c = lookup_commit(r, &oid); if (!c) die(_("could not find commit %s"), oid_to_hex(&oid)); @@ -392,7 +411,14 @@ static struct commit_list **insert_parent_or_die(struct repository *r, static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos) { - const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos; + const unsigned char *commit_data; + uint32_t lex_index; + + while (pos < g->num_commits_in_base) + g = g->base_graph; + + lex_index = pos - g->num_commits_in_base; + commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index; item->graph_pos = pos; item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } @@ -405,10 +431,26 @@ static int fill_commit_in_graph(struct repository *r, uint32_t *parent_data_ptr; uint64_t date_low, date_high; struct commit_list **pptr; - const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos; + const unsigned char *commit_data; + uint32_t lex_index; - item->object.parsed = 1; + while (pos < g->num_commits_in_base) + g = g->base_graph; + + if (pos >= g->num_commits + g->num_commits_in_base) + BUG("position %d is beyond the scope of this commit-graph (%d local + %d base commits)", + pos, g->num_commits, g->num_commits_in_base); + + /* +* Store the "full" position, but then use the +* "local" position for the rest of the calculation. +*/ item->graph_pos = pos; + lex_index = pos - g->num_commits_in_base; + + commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index; + + item->object.parsed = 1; item->maybe_tree = NULL; @@ -452,7 +494,18 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin *pos = item->graph_pos; return 1; } else { - return bsearch_graph(g, &(item->object.oid), pos); + struct commit_graph *cur_g = g; + uint32_t lex_index; + + while (cur_g && !bsearch_graph(cur_g, &(item->object.oid), &lex_index)) + cur_g = cur_g->base_graph; + + if (cur_
[PATCH v2 10/11] commit-graph: allow cross-alternate chains
From: Derrick Stolee In an environment like a fork network, it is helpful to have a commit-graph chain that spans both the base repo and the fork repo. The fork is usually a small set of data on top of the large repo, but sometimes the fork is much larger. For example, git-for-windows/git has almost double the number of commits as git/git because it rebases its commits on every major version update. To allow cross-alternate commit-graph chains, we need a few pieces: 1. When looking for a graph-{hash}.graph file, check all alternates. 2. When merging commit-graph chains, do not merge across alternates. 3. When writing a new commit-graph chain based on a commit-graph file in another object directory, do not allow success if the base file has of the name "commit-graph" instead of "commit-graphs/graoh-{hash}.graph". Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 40 + commit-graph.c | 46 ++-- commit-graph.h | 1 + t/t5323-split-commit-graph.sh| 37 +++ 4 files changed, 114 insertions(+), 10 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 083fff9927..b6fe8b2321 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -267,6 +267,42 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Chains across multiple object directories + +In a repo with alternates, we look for the `commit-graph-chain` file starting +in the local object directory and then in each alternate. The first file that +exists defines our chain. As we look for the `graph-{hash}` files for +each `{hash}` in the chain file, we follow the same pattern for the host +directories. + +This allows commit-graphs to be split across multiple forks in a fork network. +The typical case is a large "base" repo with many smaller forks. + +As the base repo advances, it will likely update and merge its commit-graph +chain more frequently than the forks. If a fork updates their commit-graph after +the base repo, then it should "reparent" the commit-graph chain onto the new +chain in the base repo. When reading each `graph-{hash}` file, we track +the object directory containing it. During a write of a new commit-graph file, +we check for any changes in the source object directory and read the +`commit-graph-chain` file for that source and create a new file based on those +files. During this "reparent" operation, we necessarily need to collapse all +levels in the fork, as all of the files are invalid against the new base file. + +It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph` +files in this scenario. It falls to the user to define the proper settings for +their custom environment: + + 1. When merging levels in the base repo, the unreferenced files may still be +referenced by chains from fork repos. + + 2. The expiry time should be set to a length of time such that every fork has +time to recompute their commit-graph chain to "reparent" onto the new base +file(s). + + 3. If the commit-graph chain is updated in the base, the fork will not have +access to the new chain until its chain is updated to reference those files. +(This may change in the future [5].) + Related Links - [0] https://bugs.chromium.org/p/git/issues/detail?id=8 @@ -293,3 +329,7 @@ Related Links [4] https://public-inbox.org/git/20180108154822.54829-1-...@jeffhostetler.com/T/#u A patch to remove the ahead-behind calculation from 'status'. + +[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c...@gmail.com/ +A discussion of a "two-dimensional graph position" that can allow reading +multiple commit-graph chains at the same time. diff --git a/commit-graph.c b/commit-graph.c index e9784cd559..0d8b942e2b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -320,6 +320,9 @@ static int prepare_commit_graph_v1(struct repository *r, const char *obj_dir) r->objects->commit_graph = load_commit_graph_one(graph_name); free(graph_name); + if (r->objects->commit_graph) + r->objects->commit_graph->obj_dir = obj_dir; + return r->objects->commit_graph ? 0 : -1; } @@ -380,8 +383,7 @@ static void prepare_commit_graph_chain(struct repository *r, const char *obj_dir oids = xcalloc(st.st_size / (the_hash_algo->hexsz + 1), sizeof(struct object_id)); while (strbuf_getline_lf(&line, fp) != EOF && valid) { - char *graph_name; - struct commit_graph *g; + struct object_directory *odb; if (get_oid_hex(line.buf, &oids[i])) { warning(_("invalid commit-graph chain: line '%s' not a hash"),
[PATCH v2 00/11] [RFC] Commit-graph: Write incremental files
This patch series is marked as RFC quality because it is missing some key features and tests, but hopefully starts a concrete discussion of how the incremental commit-graph writes can work. This version takes the design suggestions from the earlier discussion and tries to work out most of the concerns. The commit-graph is a valuable performance feature for repos with large commit histories, but suffers from the same problem as git repack: it rewrites the entire file every time. This can be slow when there are millions of commits, especially after we stopped reading from the commit-graph file during a write in 43d3561 (commit-graph write: don't die if the existing graph is corrupt). Instead, create a "chain" of commit-graphs in the .git/objects/info/commit-graphs folder with name graph-{hash}.graph. The list of hashes is given by the commit-graph-chain file, and also in a "base graph chunk" in the commit-graph format. As we read a chain, we can verify that the hashes match the trailing hash of each commit-graph we read along the way and each hash below a level is expected by that graph file. When writing, we don't always want to add a new level to the stack. This would eventually result in performance degradation, especially when searching for a commit (before we know its graph position). We decide to merge levels of the stack when the new commits we will write satisfy two conditions: 1. The expected size of the new file is more than half the size of the tip of the stack. 2. The new file contains more than 64,000 commits. The first condition alone would prevent more than a logarithmic number of levels. The second condition is a stop-gap to prevent performance issues when another process starts reading the commit-graph stack as we are merging a large stack of commit-graph files. The reading process could be in a state where the new file is not ready, but the levels above the new file were already deleted. Thus, the commits that were merged down must be parsed from pack-files. The performance is necessarily amortized across multiple writes, so I tested by writing commit-graphs from the (non-rc) tags in the Linux repo. My test included 72 tags, and wrote everything reachable from the tag using --stdin-commits. Here are the overall perf numbers: git commit-graph write --stdin-commits: 8m 12s git commit-graph write --stdin-commits --split:48s The test using --split included at least six full collapses to the full commit-graph. I believe the commit-graph stack had at most three levels during this test. Here are a few points that still need to be addressed before this is ready for full review: * The merge strategy values should be extracted into config options. * If we have a commit-graph chain and someone writes without "--split" it will make a new commit-graph file and not clean up the old files. * We need to update 'git commit-graph verify' to understand the chains, and test that it catches the new problems. It would be good to have a '--shallow' option to only verify the tip file, as if we run that after every write we can have some confidence that the files at rest are still valid and we only need to check the smaller file. (This is the main reason this is a priority to the VFS for Git team.) This is based on ds/commit-graph-write-refactor. Thanks, -Stolee [1] https://github.com/git/git/commit/43d356180556180b4ef6ac232a14498a5bb2b446 commit-graph write: don't die if the existing graph is corrupt Derrick Stolee (11): commit-graph: document commit-graph chains commit-graph: prepare for commit-graph chains commit-graph: rename commit_compare to oid_compare commit-graph: load commit-graph chains commit-graph: add base graphs chunk commit-graph: rearrange chunk count logic commit-graph: write commit-graph chains commit-graph: add --split option to builtin commit-graph: merge commit-graph chains commit-graph: allow cross-alternate chains commit-graph: expire commit-graph files .../technical/commit-graph-format.txt | 11 +- Documentation/technical/commit-graph.txt | 195 + builtin/commit-graph.c| 10 +- commit-graph.c| 734 +- commit-graph.h| 7 + t/t5318-commit-graph.sh | 2 +- t/t5323-split-commit-graph.sh | 172 7 files changed, 1088 insertions(+), 43 deletions(-) create mode 100755 t/t5323-split-commit-graph.sh base-commit: 8520d7fc7c6edd4d71582c69a873436029b6cb1b Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-184%2Fderrickstolee%2Fgraph%2Fincremental-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-184/derrickstolee/graph/incremental-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/184 Range-diff vs v1: 1: 0be7713a25 < -: -- commit-graph: fix the_repository reference 2: 86b18f
[PATCH] fetch-pack: send server options after command
Currently, if any server options are specified during a protocol v2 fetch, server options will be sent before "command=fetch". Write server options to the request buffer in send_fetch_request() so that the components of the request are sent in the correct order. The protocol documentation states that the command must come first. The Git server implementation in serve.c (see process_request() in that file) tolerates any order of command and capability, which is perhaps why we haven't noticed this. This was noticed when testing against a JGit server implementation, which follows the documentation in this regard. Signed-off-by: Jonathan Tan --- fetch-pack.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fetch-pack.c b/fetch-pack.c index 3f24d0c8a6..1c10f54e78 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1115,7 +1115,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out, server_supports_v2("server-option", 1)) { int i; for (i = 0; i < args->server_options->nr; i++) - packet_write_fmt(fd_out, "server-option=%s", + packet_buf_write(&req_buf, "server-option=%s", args->server_options->items[i].string); } -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH 4/5] send-email: fix regression in sendemail.identity parsing
Hi Ævar, On Fri, 17 May 2019, Ævar Arnfjörð Bjarmason wrote: > diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh > index 61d484d1a6..890e2874c3 100755 > --- a/t/t9001-send-email.sh > +++ b/t/t9001-send-email.sh > @@ -1228,6 +1228,61 @@ test_expect_success $PREREQ 'sendemail.to works' ' > grep "To: Somebody " stdout > ' > > +test_expect_success $PREREQ 'setup sendemail.identity' ' > + git config --replace-all sendemail.to "defa...@example.com" && > + git config --replace-all sendemail.isp.to "i...@example.com" && > + git config --replace-all sendemail.cloud.to "cl...@example.com" > +' > + > +test_expect_success $PREREQ 'sendemail.identity: reads the correct identity > config' ' > + git -c sendemail.identity=cloud send-email \ > + --dry-run \ > + --from="nob...@example.com" \ > + $patches >stdout && > + grep "To: cl...@example.com" stdout > +' > + > +test_expect_success $PREREQ 'sendemail.identity: identity overrides > sendemail.identity' ' > + git -c sendemail.identity=cloud send-email \ > + --identity=isp \ > + --dry-run \ > + --from="nob...@example.com" \ > + $patches >stdout && > + grep "To: i...@example.com" stdout > +' > + > +test_expect_success $PREREQ 'sendemail.identity: --no-identity clears > previous identity' ' > + git -c sendemail.identity=cloud send-email \ > + --no-identity \ > + --dry-run \ > + --from="nob...@example.com" \ > + $patches >stdout && > + grep "To: defa...@example.com" stdout > +' > + > +test_expect_success $PREREQ 'sendemail.identity: bool identity variable > existance overrides' ' > + git -c sendemail.identity=cloud \ > + -c sendemail.xmailer=true \ > + -c sendemail.cloud.xmailer=false \ > + send-email \ > + --dry-run \ > + --from="nob...@example.com" \ > + $patches >stdout && > + grep "To: cl...@example.com" stdout && > + ! grep "X-Mailer" stdout > +' > + > +test_expect_success $PREREQ 'sendemail.identity: bool variable fallback' ' > + git -c sendemail.identity=cloud \ > + -c sendemail.xmailer=false \ > + send-email \ > + --dry-run \ > + --from="nob...@example.com" \ > + $patches >stdout && > + grep "To: cl...@example.com" stdout && > + ! grep "X-Mailer" stdout > +' > + These test cases all diligently use the `$PREREQ` prerequisite, but... > test_expect_success $PREREQ '--no-to overrides sendemail.to' ' > git send-email \ > --dry-run \ > @@ -1785,6 +1840,15 @@ test_expect_success '--dump-aliases must be used > alone' ' > test_must_fail git send-email --dump-aliases --to=jan...@example.com -1 > refs/heads/accounting > ' > > +test_expect_success 'aliases and sendemail.identity' ' > + test_must_fail git \ > + -c sendemail.identity=cloud \ > + -c sendemail.aliasesfile=default-aliases \ > + -c sendemail.cloud.aliasesfile=cloud-aliases \ > + send-email -1 2>stderr && > + test_i18ngrep "cloud-aliases" stderr > +' > + This one is missing it. That breaks the Windows job in our Azure Pipeline where we leave out all of the Perl bits (to accelerate the tests somewhat). Ciao, Dscho > test_sendmail_aliases () { > msg="$1" && shift && > expect="$@" && > -- > 2.21.0.1020.gf2820cf01a > > >
[GSoC] Blogging with Rohit
Hey all Here I am with one more blog[1], where I talked about how we are implementing `--skip` flag for git-cherry-pick. How you'll enjoy reading this. Thanks Rohit [1]: https://rashiwal.me/2019/to-pick-or-not-to-pick/ PS: comment section is now working, please leave your constructive remarks.
Re: [GSoC] Blogging with Rohit
s/How/Hope/ 😅 > Thanks > Rohit > > [1]: https://rashiwal.me/2019/to-pick-or-not-to-pick/ > PS: comment section is now working, please leave your constructive remarks. >
Re: [PATCH v2] status: add an empty line when there is no hint
Hi Junio, 林自均 於 2019年5月15日 週三 上午11:02寫道: > > Hi Junio, > > Junio C Hamano 於 2019年5月15日 週三 上午8:48寫道: > > > > 林自均 writes: > > > > > I was not talking about the messages in the editor session. I was > > > talking about "git commit" without "git add" anything. > > > > > > For example: > > > > > > ``` > > > $ touch newfile.txt > > > $ git commit > > > On branch master > > > Untracked files: > > > newfile.txt > > > > > > nothing added to commit but untracked files present > > > ``` > > > > > > My current patch is trying to add an empty line between > > > "Untracked files:" and "newfile.txt". > > > > I do not think that one is paged, so if you ask me, I'd say we > > shouldn't add an extra blank there. Is that message also reused in > > the editor session, or do two different codepaths produce a similar > > looking message, one for the above case direct to the terminal and > > the other for the editor session? > > The messages produced in wt-status.c seem to be reused in > both terminal and editor session. When I tried to modify the > messages in terminal, the ones in editor session will also > be modified accordingly. > > By the way, my new patch to remove extra blank line is here: > https://github.com/gitgitgadget/git/pull/196 > I am still waiting for someone to comment "/allow johnlinp". I've submitted the patch with the title [PATCH 1/1] status: remove the empty line after hints Please review if you are available. Thank you. Best, John Lin > > > > > But again... > > > > >> At the same time, I think I've been happy enough with the current > > >> output from both commands, so if you let me bikeshed freely, I'd > > >> probably pick "let's not change anything then" ;-) > >
git describe/contains for submodule commits
Hi, I've had a few times where I was curious of when a submodule got set to a specific commit. I noticed that git describe has "blob" support, which outputs something like :/path/to/file using the revision walking machinery. I'm curious if anyone knows if that sort of revision walk could be expected to find the first treeish that had a submodule commit instead of a blob. I'm not that familiar with the revision walking, so I was hoping to get some pointers of whre to look before I began implementing. Ultimately, I'd like to have some sort of command like: git submodule contains and have it try to figure out the most recent commit htat has a submodule change for which the submodule is a child of the specified submodule commit. I can sort of reverse engineer this through git log, but it's slow and tedious, so I was hoping to be able to implement it into a revision walk that did this. Once I know the commit that introduces the submodule change, I could feed that to git describe --contains to find the tag/version which included the change easily enough. Thanks, Jake
Re: [PATCH] Make stashing nothing exit 1
Junio C Hamano writes: > Keith Smiley writes: > > > In the case there are no files to stash, but the user asked to stash, we > > should exit 1 since the stashing failed. > > --- > > Sorry, but I fail to see why this is a good change. Did you have > some script that wanted the exit code from "git stash" to indicate > if it had anything to stash and change the behaviour based on it? > > Is it a big enough hassle to figure out if the "stash" command did > something yourself that justifies forcing existing scripts that rely > on "no-op is merely a normal exit" behaviour other people have > written in the past several years? The problem with current behaviour is it makes it hard to use stash in scripts. A natural stash use case is: wrap some operation requiring a clean working tree with a stash push-pop pair. But that doesn't work properly when working tree is already clean - push silently does nothing and following pop becomes unbalanced. You have to keep that in mind and work around with something like: if ! git diff-index --exit-code --quiet HEAD then git stash push trap 'git stash pop' EXIT fi With this change this can be simplified to: git stash push && trap 'git stash pop' EXIT I don't mind keeping this new behaviour behind an option for compatibility. Or alternatively resolve this use case by supporting --allow-empty in stash-push. But my feeling is it is natural for 'git stash push' to report error for no-op case because the command is explicitly about creating new stash entries. A close analogy is 'git commit' which errors on no-op. Contrary all commands treating no-op as a success I'm aware of are not about creating new objects but about querying or syncing.
Re: git describe/contains for submodule commits
On Thu, May 23 2019, Jacob Keller wrote: > Hi, > > I've had a few times where I was curious of when a submodule got set > to a specific commit. > > I noticed that git describe has "blob" support, which outputs something like > > :/path/to/file > > using the revision walking machinery. > > I'm curious if anyone knows if that sort of revision walk could be > expected to find the first treeish that had a submodule commit instead > of a blob. > > I'm not that familiar with the revision walking, so I was hoping to > get some pointers of whre to look before I began implementing. > > Ultimately, I'd like to have some sort of command like: > > git submodule contains > > and have it try to figure out the most recent commit htat has a > submodule change for which the submodule is a child of the specified > submodule commit. > > I can sort of reverse engineer this through git log, but it's slow and > tedious, so I was hoping to be able to implement it into a revision > walk that did this. > > Once I know the commit that introduces the submodule change, I could > feed that to git describe --contains to find the tag/version which > included the change easily enough. > > Thanks, > Jake You can do this with --find-object, e.g. on git.git: git log --find-object=855827c583bc30645ba427885caa40c5b81764d2 Plugging that into describe.c should be fairly straightforward.
Re: [PATCH] Make stashing nothing exit 1
On Thu, May 23 2019, Maksim Odnoletkov wrote: > Junio C Hamano writes: > >> Keith Smiley writes: >> >> > In the case there are no files to stash, but the user asked to stash, we >> > should exit 1 since the stashing failed. >> > --- >> >> Sorry, but I fail to see why this is a good change. Did you have >> some script that wanted the exit code from "git stash" to indicate >> if it had anything to stash and change the behaviour based on it? >> >> Is it a big enough hassle to figure out if the "stash" command did >> something yourself that justifies forcing existing scripts that rely >> on "no-op is merely a normal exit" behaviour other people have >> written in the past several years? > > The problem with current behaviour is it makes it hard to use stash in > scripts. A natural stash use case is: wrap some operation requiring a > clean working tree with a stash push-pop pair. But that doesn't work > properly when working tree is already clean - push silently does nothing > and following pop becomes unbalanced. You have to keep that in mind and > work around with something like: > > if ! git diff-index --exit-code --quiet HEAD > then > git stash push > trap 'git stash pop' EXIT > fi > > With this change this can be simplified to: > > git stash push && trap 'git stash pop' EXIT > > I don't mind keeping this new behaviour behind an option for > compatibility. Or alternatively resolve this use case by supporting > --allow-empty in stash-push. But my feeling is it is natural for 'git > stash push' to report error for no-op case because the command is > explicitly about creating new stash entries. A close analogy is 'git > commit' which errors on no-op. Contrary all commands treating no-op as a > success I'm aware of are not about creating new objects but about > querying or syncing. I view "stash push" more like just "push", or even a special case for "reset --hard", in both of those cases we don't exit non-zero if there's nothing to do, i.e. if there's nothing to push, or if "reset --hard" ends up needing to do nothing. On the other hand as you point out it can also be viewed as "create new stash entry", just like "create new commit", and we error out on that. In practice I bet there's very few scripters of "git commit" that don't want to actually create a commit, whereas "stash push" is more likely to be used like "reset --hard", i.e. just "wipe/save-wipe changes if needed". I don't mind an --exit-code for it, or even a change in the default behavior, just pointing out that it's a bit more nuanced than just a missing exit code, given "push", "reset" etc. prior art.
[PATCH] send-email: Add an option to suppress adding a specific email address
Make it easy to suppress sta...@vger.kernel.org. Long story short it is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes sent to the linux kernel. It is not always desirable to actually the stable maintainer immediately as the patches are still being reviewed etc. Actually cc'd the stable maintainers in the linux kernel is not even really necessary as they will always find the tag after the patch has been merged in the commit body. So I am adding yet another suppress command "suppress-addr" that will take an email address keep that email address from being automatically added to a destination the email will be sent to. Signed-off-by: "Eric W. Biederman" --- Documentation/git-send-email.txt | 5 + git-send-email.perl | 20 +++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 1afe9fc858ea..9833d4dbd9f4 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -346,6 +346,11 @@ Default is the value of `sendemail.suppresscc` configuration value; if that is unspecified, default to 'self' if --suppress-from is specified, as well as 'body' if --no-signed-off-cc is specified. +--suppress-addr=:: + Specify an address that should not be automatically copied + on any email. + Default is the value of `sendemail.suppressaddr`. + --[no-]suppress-from:: If this is set, do not add the From: address to the cc: list. Default is the value of `sendemail.suppressFrom` configuration diff --git a/git-send-email.perl b/git-send-email.perl index 8eb63b5a2f8d..2ac0985f3f00 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -98,6 +98,7 @@ sub usage { --to-cmd * Email To: via ` \$patch_path` --cc-cmd * Email Cc: via ` \$patch_path` --suppress-cc * author, self, sob, cc, cccmd, body, bodycc, misc-by, all. +--suppress-addr * Don't automatically add the specified address --[no-]cc-cover* Email Cc: addresses in the cover letter. --[no-]to-cover* Email To: addresses in the cover letter. --[no-]signed-off-by-cc* Send to Signed-off-by: addresses. Default on. @@ -237,6 +238,7 @@ sub do_edit { my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth); my ($validate, $confirm); my (@suppress_cc); +my (@suppress_addr); my ($auto_8bit_encoding); my ($compose_encoding); my $target_xfer_encoding = 'auto'; @@ -274,6 +276,7 @@ sub do_edit { "aliasfiletype" => \$aliasfiletype, "bcc" => \@bcclist, "suppresscc" => \@suppress_cc, +"suppressaddr" => \@suppress_addr, "envelopesender" => \$envelope_sender, "confirm" => \$confirm, "from" => \$sender, @@ -360,6 +363,7 @@ sub signal_handler { "suppress-from!" => \$suppress_from, "no-suppress-from" => sub {$suppress_from = 0}, "suppress-cc=s" => \@suppress_cc, + "suppress-addr=s" => \@suppress_addr, "signed-off-cc|signed-off-by-cc!" => \$signed_off_by_cc, "no-signed-off-cc|no-signed-off-by-cc" => sub {$signed_off_by_cc = 0}, "cc-cover|cc-cover!" => \$cover_cc, @@ -489,6 +493,16 @@ sub read_config { delete $suppress_cc{'body'}; } +# Set specific suppress addresses +my (%suppress_addr); +if (@suppress_addr) { + foreach my $addr (@suppress_addr) { + my $qaddr = unquote_rfc2047($addr); + my $saddr = sanitize_address($qaddr); + $suppress_addr{$saddr} = 1; + } +} + # Set confirm's default value my $confirm_unconfigured = !defined $confirm; if ($confirm_unconfigured) { @@ -1623,6 +1637,7 @@ sub process_file { $sauthor = sanitize_address($author); next if $suppress_cc{'author'}; next if $suppress_cc{'self'} and $sauthor eq $sender; + next if ($suppress_addr{$sauthor}); printf(__("(mbox) Adding cc: %s from line '%s'\n"), $1, $_) unless $quiet; push @cc, $1; @@ -1642,6 +1657,7 @@ sub process_file { next if ($suppress_cc{'self'}); } else { next if ($suppress_cc{'cc'}); + next if ($suppress_addr{$saddr}); } printf(__("(mbox) Adding cc: %s from line '%s'\n"), $addr, $_) unless $quiet; @@ -1681,7 +1697,7 @@ sub process_file { # line 2 = subject
[PATCH] send-email: Add an option to suppress adding a specific email address
Make it easy to suppress sta...@vger.kernel.org. Long story short it is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes sent to the linux kernel. It is not always desirable to actually the stable maintainer immediately as the patches are still being reviewed etc. Actually cc'd the stable maintainers in the linux kernel is not even really necessary as they will always find the tag after the patch has been merged in the commit body. So I am adding yet another suppress command "suppress-addr" that will take an email address keep that email address from being automatically added to a destination the email will be sent to. Signed-off-by: "Eric W. Biederman" --- Documentation/git-send-email.txt | 5 + git-send-email.perl | 20 +++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 1afe9fc858ea..9833d4dbd9f4 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -346,6 +346,11 @@ Default is the value of `sendemail.suppresscc` configuration value; if that is unspecified, default to 'self' if --suppress-from is specified, as well as 'body' if --no-signed-off-cc is specified. +--suppress-addr=:: + Specify an address that should not be automatically copied + on any email. + Default is the value of `sendemail.suppressaddr`. + --[no-]suppress-from:: If this is set, do not add the From: address to the cc: list. Default is the value of `sendemail.suppressFrom` configuration diff --git a/git-send-email.perl b/git-send-email.perl index 8eb63b5a2f8d..2ac0985f3f00 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -98,6 +98,7 @@ sub usage { --to-cmd * Email To: via ` \$patch_path` --cc-cmd * Email Cc: via ` \$patch_path` --suppress-cc * author, self, sob, cc, cccmd, body, bodycc, misc-by, all. +--suppress-addr * Don't automatically add the specified address --[no-]cc-cover* Email Cc: addresses in the cover letter. --[no-]to-cover* Email To: addresses in the cover letter. --[no-]signed-off-by-cc* Send to Signed-off-by: addresses. Default on. @@ -237,6 +238,7 @@ sub do_edit { my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth); my ($validate, $confirm); my (@suppress_cc); +my (@suppress_addr); my ($auto_8bit_encoding); my ($compose_encoding); my $target_xfer_encoding = 'auto'; @@ -274,6 +276,7 @@ sub do_edit { "aliasfiletype" => \$aliasfiletype, "bcc" => \@bcclist, "suppresscc" => \@suppress_cc, +"suppressaddr" => \@suppress_addr, "envelopesender" => \$envelope_sender, "confirm" => \$confirm, "from" => \$sender, @@ -360,6 +363,7 @@ sub signal_handler { "suppress-from!" => \$suppress_from, "no-suppress-from" => sub {$suppress_from = 0}, "suppress-cc=s" => \@suppress_cc, + "suppress-addr=s" => \@suppress_addr, "signed-off-cc|signed-off-by-cc!" => \$signed_off_by_cc, "no-signed-off-cc|no-signed-off-by-cc" => sub {$signed_off_by_cc = 0}, "cc-cover|cc-cover!" => \$cover_cc, @@ -489,6 +493,16 @@ sub read_config { delete $suppress_cc{'body'}; } +# Set specific suppress addresses +my (%suppress_addr); +if (@suppress_addr) { + foreach my $addr (@suppress_addr) { + my $qaddr = unquote_rfc2047($addr); + my $saddr = sanitize_address($qaddr); + $suppress_addr{$saddr} = 1; + } +} + # Set confirm's default value my $confirm_unconfigured = !defined $confirm; if ($confirm_unconfigured) { @@ -1623,6 +1637,7 @@ sub process_file { $sauthor = sanitize_address($author); next if $suppress_cc{'author'}; next if $suppress_cc{'self'} and $sauthor eq $sender; + next if ($suppress_addr{$sauthor}); printf(__("(mbox) Adding cc: %s from line '%s'\n"), $1, $_) unless $quiet; push @cc, $1; @@ -1642,6 +1657,7 @@ sub process_file { next if ($suppress_cc{'self'}); } else { next if ($suppress_cc{'cc'}); + next if ($suppress_addr{$saddr}); } printf(__("(mbox) Adding cc: %s from line '%s'\n"), $addr, $_) unless $quiet; @@ -1681,7 +1697,7 @@ sub process_file { # line 2 = subject
Re: [PATCH v2 09/11] commit-graph: merge commit-graph chains
On Wed, May 22 2019, Derrick Stolee via GitGitGadget wrote: > To keep lookups fast, but also keep most incremental writes fast, create > a strategy for merging levels of the commit-graph chain. The strategy is > detailed in the commit-graph design document, but is summarized by these > two conditions: > > 1. If the number of commits we are adding is more than half the number > of commits in the graph below, then merge with that graph. > > 2. If we are writing more than 64,000 commits into a single graph, > then merge with all lower graphs. > > The numeric values in the conditions above are currently constant, but > can become config options in a future update. > [...] > +## Merge Strategy > + > +When writing a set of commits that do not exist in the commit-graph stack of > +height N, we default to creating a new file at level N + 1. We then decide to > +merge with the Nth level if one of two conditions hold: > + > + 1. The expected file size for level N + 1 is at least half the file size > for > + level N. > + > + 2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64, > + commits). > + > +This decision cascades down the levels: when we merge a level we create a new > +set of commits that then compares to the next level. > + > +The first condition bounds the number of levels to be logarithmic in the > total > +number of commits. The second condition bounds the total number of commits > in > +a `graph-{hashN}` file and not in the `commit-graph` file, preventing > +significant performance issues when the stack merges and another process only > +partially reads the previous stack. > + > +The merge strategy values (2 for the size multiple, 64,000 for the maximum > +number of commits) could be extracted into config settings for full > +flexibility. As noted this can become configurable, so it's no big deal. But is there any reason for ths 64K limit anymore? While with the default expiry of 0sec we can still get that race, it seems unlikely in practice, as the "commit-graph write" process would write a new manifest at the end, then go and unlink() the old files. So maybe at this point we could make this even dumber with something that behaves like gc.autoPackLimit? I.e. keep writing new graphs, and then coalesce them all (or maybe not the "base" graph, like gc.bigPackThreshold)? Also: These docs refer to MAX_SPLIT_COMMITS, but in v2 it's now a "split_strategy_max_commits" variable instead.
Re: [PATCH] send-email: Add an option to suppress adding a specific email address
On Thu, May 23 2019, Eric W. Biederman wrote: > Make it easy to suppress sta...@vger.kernel.org. Long story short it > is desirable to have ``Cc: sta...@vger.kernel.org'' on many bug fixes > sent to the linux kernel. It is not always desirable to actually the > stable maintainer immediately as the patches are still being reviewed > etc. Actually cc'd the stable maintainers in the linux kernel is not > even really necessary as they will always find the tag after the patch > has been merged in the commit body. > > So I am adding yet another suppress command "suppress-addr" that will > take an email address keep that email address from being automatically > added to a destination the email will be sent to. I have a rewrite of much of the adjacent code queued in "next", can you check if applying it to that branch makes this work as you expect? Whether it does or not, this change should have a corresponding test update to t/t9001-send-email.sh, so we can just run that to see if it's doing the right thing.
[PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines
We check for a handy environment variable GIT_DEBUGGER when running via bin-wrappers/, but this feature is undocumented. Add a hint to how to use it into the CodingGuidelines (which is where other useful environment settings like DEVELOPER are documented). You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your own debugger if you like to use something else (or if you want custom flags for gdb). This commit documents that intent within CodingGuidelines. Signed-off-by: Emily Shaffer --- Since v2, only the commit message has changed. Removed some weak language to make the commit message read more easily. Documentation/CodingGuidelines | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 32210a4386..e99af36df9 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -412,6 +412,12 @@ For C programs: must be declared with "extern" in header files. However, function declarations should not use "extern", as that is already the default. + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or + run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) + For Perl programs: - Most of the C guidelines above apply. -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH 0/1] trace2: fix tracing when NO_PTHREADS is defined
On Wed, May 22, 2019 at 09:23:39AM -0400, Jeff Hostetler wrote: > I was wondering about that too as the proper long term solution. > We would need (as the discussion suggests [1]) to properly > respect/represent the pthread_key_t argument. > > For now, I've guarded my usage of pthread_getspecific() in the trace2 > (similar to what index-pack does), so its not urgent that we update it. > And I'd rather we take this simple trace2 fix now and not try to combine > it with fixes for the pthread macros. Especially now as we're in the RC > cycle for 2.22. Yeah, I think that makes sense. > I'll make a note to revisit the pthread code after 2.22. For fun, here's a constant-time zero-allocation implementation that I came up with. It passes t0211 with NO_PTHREADS, but I didn't test it beyond that. diff --git a/thread-utils.h b/thread-utils.h index 4961487ed9..f466215742 100644 --- a/thread-utils.h +++ b/thread-utils.h @@ -18,7 +18,7 @@ #define pthread_t int #define pthread_mutex_t int #define pthread_cond_t int -#define pthread_key_t int +#define pthread_key_t git_pthread_key_t #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex) #define pthread_mutex_lock(mutex) @@ -31,16 +31,49 @@ #define pthread_cond_broadcast(cond) #define pthread_cond_destroy(cond) -#define pthread_key_create(key, attr) dummy_pthread_init(key) -#define pthread_key_delete(key) +#define pthread_key_create(key, destroy) git_pthread_key_create(key, destroy) +#define pthread_key_delete(key) git_pthread_key_delete(key) #define pthread_create(thread, attr, fn, data) \ dummy_pthread_create(thread, attr, fn, data) #define pthread_join(thread, retval) \ dummy_pthread_join(thread, retval) -#define pthread_setspecific(key, data) -#define pthread_getspecific(key) NULL +#define pthread_setspecific(key, data) git_pthread_setspecific(key, data) +#define pthread_getspecific(key) git_pthread_getspecific(key) + +typedef struct { + void *data; + /* extra indirection because setspecific is passed key by value */ + void **vdata; +} git_pthread_key_t; + +static inline int git_pthread_key_create(git_pthread_key_t *key, +void (*destroy)(void *)) +{ + key->data = NULL; + key->vdata = &key->data; + /* We don't use this; alternatively we could all via atexit(). */ + if (destroy) + BUG("NO_PTHREADS does not support pthread key destructors"); + return 0; +} + +static inline int git_pthread_key_delete(git_pthread_key_t key) +{ + /* noop */ + return 0; +} + +static inline void git_pthread_setspecific(git_pthread_key_t key, void *data) +{ + *(key.vdata) = data; +} + +static inline void *git_pthread_getspecific(git_pthread_key_t key) +{ + return key.data; +} int dummy_pthread_create(pthread_t *pthread, const void *attr, void *(*fn)(void *), void *data);
[PATCH v2] upload-pack: strip namespace from symref data
On Wed, May 22, 2019 at 12:33:56PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Likely nobody noticed because bug (1) means that from the client's > > perspective, we did not report on HEAD at all. And thus it uses the > > pre-7171d8c15f fallback code to guess the correct HEAD, which is usually > > right. It only falls down in ambiguous cases (like the one laid out in > > the included test). > > ...because here you're talking about "the client's perspective" and "it > uses the pre-7171d8c15f [...] code", but this should say "the > pre-a45b5f0552 code", i.e. mention the commit that changed the *client* > logic. Heh. I actually found this split, too, while digging in the history, but thought it might be adding distracting details to the commit message. But hey, if you're gonna read them _carefully_, I guess I can be more precise. ;) The fallback code isn't even from a45b5f0552; it's much older than that. We started looking at ref annotations at all in fbb074c253 (remote: make guess_remote_head() use exact HEAD lookup if it is available, 2009-02-27). The guessing code that falls back to comes from 8434c2f1af (Build in clone, 2008-04-27). I didn't bother finding the true origin in the shell script. ;) So instead of mentioning it at the top (because the bug really is just on the server side), I re-wrote the paragraph about the client a little more clearly and precisely: Likely nobody noticed because we tend to do the right thing anyway. Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD). And so the client half of the code, from a45b5f0552 (connect: annotate refs with their symref information in get_remote_head(), 2013-09-17), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id. Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test. Revised patch below. -- >8 -- Subject: [PATCH] upload-pack: strip namespace from symref data Since 7171d8c15f (upload-pack: send symbolic ref information as capability, 2013-09-17), we've sent cloning and fetching clients special information about which branch HEAD is pointing to, so that they don't have to guess based on matching up commit ids. However, this feature has never worked properly with the GIT_NAMESPACE feature. Because upload-pack uses head_ref_namespaced(find_symref), we do find and report on refs/namespaces/foo/HEAD instead of the actual HEAD of the repo. This makes sense, since the branch pointed to by the top-level HEAD may not be advertised at all. But we do two things wrong: 1. We report the full name refs/namespaces/foo/HEAD, instead of just HEAD. Meaning no client is going to bother doing anything with that symref, since we're not otherwise advertising it. 2. We report the symref destination using its full name (e.g., refs/namespaces/foo/refs/heads/master). That's similarly useless to the client, who only saw "refs/heads/master" in the advertisement. We should be stripping the namespace prefix off of both places (which this patch fixes). Likely nobody noticed because we tend to do the right thing anyway. Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD). And so the client half of the code, from a45b5f0552 (connect: annotate refs with their symref information in get_remote_head(), 2013-09-17), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id. Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test. This also means that we don't have to worry about breaking anybody who was putting pre-stripped names into their namespace symrefs when we fix bug (2). Because of bug (1), nobody would have been using the symref we advertised in the first place (not to mention that those symrefs would have appeared broken for any non-namespaced access). Note that we have separate fixes here for the v0 and v2 protocols. The symref advertisement moved in v2 to be a part of the ls-refs command. This actually gets part (1) right, since the symref annotation piggy-backs on the existing ref advertisement, which is properly stripped. But it still needs a fix for part (2). The included tests cover both protocols. Reported-by: Bryan Turner Signed-off-by: Jeff King --- ls-refs.c| 3 ++- t/t5509-fetch-push-namespaces.sh | 28 upload-pack.c| 4 ++-- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/ls-refs.c b/ls-refs.c index 0a7dbc6442..818aef70a0 100644 --- a/ls-refs.c +++ b/ls-refs.c @@ -57,7 +57,8 @@ static int send_ref(const char *refname, const struct object_id *oid, if (!symref_target) die("'%s' is a symref but it is not?", refname); - strbuf_addf(&refline, " symref-target:%s", symref_target); + strbuf_addf(&refline,
Re: [PATCH] Make stashing nothing exit 1
Am 23.05.19 um 01:57 schrieb Maksim Odnoletkov: > The problem with current behaviour is it makes it hard to use stash in > scripts. A natural stash use case is: wrap some operation requiring a > clean working tree with a stash push-pop pair. But that doesn't work > properly when working tree is already clean - push silently does nothing > and following pop becomes unbalanced. You have to keep that in mind and > work around with something like: > > if ! git diff-index --exit-code --quiet HEAD > then > git stash push > trap 'git stash pop' EXIT > fi > > With this change this can be simplified to: > > git stash push && trap 'git stash pop' EXIT In a script, shouldn't you better use 'create' + 'store' instead of 'push'? -- Hannes
Re: [PATCH 3/4] am: drop tty requirement for --interactive
Hi Peff, On Mon, 20 May 2019, Jeff King wrote: > On Mon, May 20, 2019 at 08:11:13AM -0400, Jeff King wrote: > > > We have required that the stdin of "am --interactive" be a tty since > > a1451104ac (git-am: interactive should fail gracefully., 2005-10-12). > > However, this isn't strictly necessary, and makes the tool harder to > > test (and is unlike all of our other --interactive commands). > > I think this is worth doing for simplicity and consistency. But as you > might guess, my ulterior motive was making it easier to add tests. > > In theory we _should_ be able to use test_terminal for this, but it > seems to be racy, because it will quickly read all input and close the > descriptor (to give the reader EOF). But after that close, isatty() will > no longer report it correctly. E.g., if I run this: > > perl test-terminal.perl sh -c ' > for i in 0 1 2; do > echo $i is $(test -t $i || echo not) a tty > done > ' > it _usually_ says "0 is a tty", but racily may say "not a tty". If you > put a sleep into the beginning of the shell, then it will basically > always lose the race and say "not". This is just another nail in the coffin for `test-terminal.perl`, as far as I am concerned. In the built-in `add -i` patch series, I followed a strategy where I move totally away from `test-terminal`, in favor of using some knobs to force Git into thinking that we are in a terminal. But at the same time, I *also* remove the limitation (for most cases) of "read from /dev/tty", in favor of reading from stdin, and making things testable, and more importantly: scriptable. So I am *very* much in favor of this here patch. Thanks, Dscho P.S.: There are even more reasons to get rid of `test-terminal`, of course: it is an unnecessary dependency on Perl, works only when certain Perl modules are installed (that are *not* installed on Ubuntu by default, for example), and it requires pseudo terminals, so it will *never* work on Windows. > It might be possible to overcome this by making test-terminal more > clever (i.e., is there a way for us to send an "EOF" over the pty > without actually _closing_ it? That would behave like a real terminal, > where you can hit ^D to generate an EOF but then type more). > > But barring that, this works by just avoiding it entirely. :) > > Curiously, my script above also reports consistently that stdout is not > a tty, but that stderr is. I'm not sure why this is, but it no tests > seem to care either way. > > -Peff >