Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
Hi, Junio C Hamano wrote: > Usually, I refrain from merging larger topics in 'next' down to > 'master' when we get close to -rc0, but I am wondering if it is > better to merge all of them to 'master', even the ones on the larger > and possibly undercooked side, expecting that we collectively spend > effort on hunting and fixing bugs in them during the pre-release > freeze period. If we were to go that route, I'd want everybody's > buy-in and I'll promise to ignore any shiny new toys that appear on > list that are not regression fixes to topics merged to 'master' > since the end of the previous cycle to make sure people are not > distracted. Based on what I see in 'next' (midx, range-diff, etc), I quite like this idea. In private I may still have ideas for new features and hack away at them but I would avoid sending new feature topics and would focus on helping polish what's already been sent out. Let's make 2.19 easy for people to adopt. For whatever that's worth. I'm only one person; others may have their own wishes. Thanks, Jonathan
[PATCH] t2024: mark test using "checkout -p" with PERL prerequisite
Checkout with the -p switch uses the "add interactive" framework which is written in Perl. One test added in 8d7b558bae ("checkout & worktree: introduce checkout.defaultRemote", 2018-06-05) didn't declare the PERL prerequisite, breaking the test when built with NO_PERL. Reported-by: CB Bailey Signed-off-by: CB Bailey Signed-off-by: Ævar Arnfjörð Bjarmason --- On Sat, Aug 18, 2018 at 6:43 AM, CB Bailey wrote: > checkout with the -p switch uses the "add interactive" framework which > is written in Perl. Add a PERL prerequisite to skip this test when built > with NO_PERL. Thanks, and sorry about my buggy code. I didn't consider the interaction between -p and NO_PERL. Your patch works, but I think just splitting the test up is better, so we're not skipping things unrelated to "checkout -p" under NO_PERL. I added your SOB since I stole significant parts of your commit message. Junio: We'd want one patch or the other before 2.19 so that release doesn't break tests under NO_PERL. t/t2024-checkout-dwim.sh | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh index f79dfbbdd6..69b6774d10 100755 --- a/t/t2024-checkout-dwim.sh +++ b/t/t2024-checkout-dwim.sh @@ -86,8 +86,13 @@ test_expect_success 'checkout of branch from multiple remotes fails with advice' checkout foo 2>stderr && test_branch master && status_uno_is_clean && - test_i18ngrep ! "^hint: " stderr && - # Make sure the likes of checkout -p do not print this hint + test_i18ngrep ! "^hint: " stderr +' + +test_expect_success PERL 'checkout -p with multiple remotes does not print advice' ' + git checkout -B master && + test_might_fail git branch -D foo && + git checkout -p foo 2>stderr && test_i18ngrep ! "^hint: " stderr && status_uno_is_clean -- 2.18.0.865.gffc8e1a3cd6
Re: [PATCH/RFC] commit: add short option for --amend
On Thu, Aug 16, 2018 at 08:31:17PM +0200, Nguyễn Thái Ngọc Duy wrote: > I just realized how often I type "git ci --amend". Looking back at my > ~/.bash_history (only 10k lines) this is the second most often git > command I type which may justify a short option for it (assuming that > other people use this option often too, of course). Why not add another alias? As you're already using the ci alias, maybe cia? Personally I have the following aliases for committing: c = commit --verbose ca = commit --verbose --amend cad = commit --verbose --amend --date=now Besides the obvious g=git alias in the shell. I really like one character aliases for often used commands/subcommands. Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Sat, Aug 18, 2018 at 12:44 AM Junio C Hamano wrote: > * cc/delta-islands (2018-08-16) 7 commits > - pack-objects: move 'layer' into 'struct packing_data' > - pack-objects: move tree_depth into 'struct packing_data' > - t5320: tests for delta islands > - repack: add delta-islands support > - pack-objects: add delta-islands support > - pack-objects: refactor code into compute_layer_order() > - Add delta-islands.{c,h} > > Lift code from GitHub to restrict delta computation so that an > object that exists in one fork is not made into a delta against > another object that does not appear in the same forked repository. > > What's the doneness of this topic? All the suggestions by Peff, you, Duy, Dscho, Ramsay and Szeder Gabor have been taken into account in the v5 that is in pu. Except the suggestion by Duy to move 2 fields from 'struct object_entry' to 'struct packing_data' (which is implemented in patches 6/7 and 7/7) the suggestions have all been about relatively small things (documentation, code modernization, regex check, translation strings, ...) So the code is very similar to the original code in https://github.com/peff/git/commits/jk/delta-islands which has been used for years by GitHub in production. FYI this has been requested from GitLab by Drupal (as well as others) see https://www.drupal.org/drupalorg/blog/developer-tools-initiative-part-5-gitlab-partnership which contains: "The timeline for Phase 2 is dependent on GitLab’s resolution of a diskspace deduplication issue, which they have committed to on our behalf: https://gitlab.com/gitlab-org/gitlab-ce/issues/23029";
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Fri, Aug 17 2018, Junio C Hamano wrote: > Usually, I refrain from merging larger topics in 'next' down to > 'master' when we get close to -rc0, but I am wondering if it is > better to merge all of them to 'master', even the ones on the larger > and possibly undercooked side, expecting that we collectively spend > effort on hunting and fixing bugs in them during the pre-release > freeze period. If we were to go that route, I'd want everybody's > buy-in and I'll promise to ignore any shiny new toys that appear on > list that are not regression fixes to topics merged to 'master' > since the end of the previous cycle to make sure people are not > distracted. I'm very much up for this. We have some very useful things (like range-diff and midx) sitting in "next". I think if you merge next down to master now we'll have plenty of time to smoke out bugs & regressions before 2.19.
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Sat, Aug 18 2018, Christian Couder wrote: > On Sat, Aug 18, 2018 at 12:44 AM Junio C Hamano wrote: > >> * cc/delta-islands (2018-08-16) 7 commits >> - pack-objects: move 'layer' into 'struct packing_data' >> - pack-objects: move tree_depth into 'struct packing_data' >> - t5320: tests for delta islands >> - repack: add delta-islands support >> - pack-objects: add delta-islands support >> - pack-objects: refactor code into compute_layer_order() >> - Add delta-islands.{c,h} >> >> Lift code from GitHub to restrict delta computation so that an >> object that exists in one fork is not made into a delta against >> another object that does not appear in the same forked repository. >> >> What's the doneness of this topic? > > All the suggestions by Peff, you, Duy, Dscho, Ramsay and Szeder Gabor > have been taken into account in the v5 that is in pu. > > Except the suggestion by Duy to move 2 fields from 'struct > object_entry' to 'struct packing_data' (which is implemented in > patches 6/7 and 7/7) the suggestions have all been about relatively > small things (documentation, code modernization, regex check, > translation strings, ...) So the code is very similar to the original > code in https://github.com/peff/git/commits/jk/delta-islands which has > been used for years by GitHub in production. > > FYI this has been requested from GitLab by Drupal (as well as others) > see > https://www.drupal.org/drupalorg/blog/developer-tools-initiative-part-5-gitlab-partnership > which contains: > > "The timeline for Phase 2 is dependent on GitLab’s resolution of a > diskspace deduplication issue, which they have committed to on our > behalf: https://gitlab.com/gitlab-org/gitlab-ce/issues/23029"; This is not a critique of the delta islands feature, just something I'm curious about. Why is Drupal blocked on something like delta-islands? The blog post mentions they have 45k projects, which can be browsed at https://cgit.drupalcode.org Almost all of those are completely independent projects, so they wouldn't benefit from delta islands, and I'd guess >98% are of them are in an obscure long tail and probably won't have even a single fork. That leaves forks of say drupal.git, which is ~150MB, the mirror on GitHub has 1500 forks: https://github.com/drupal Even if there were 5000 forks of that that would be 750G of disk space. So accounting for backups, me being off by a lot etc. let's say that's 5TB. That's relatively cheap today. Are they really just holding up their GitLab migration plans to save something on the order of that disk space, or have I missed something here? Again, not a critique of delta-islands, because it's most certainly useful for the likes of github/gitlab, but I wonder if for this particular problem it wouldn't be more straightforward of a solution for GitLab to allow anyone to push to refs/for-merge// on any repository. Then they could open a MR for an existing branch in the repo (which GitLab already supports).
Re: [GSoC][PATCH v7 09/26] stash: implement the "list" command in the builtin
On Wed, Aug 15, 2018 at 10:41 PM, Thomas Gummerer wrote: >> Subject: stash: implement the "list" command in the builtin > > Nit: The previous commit messages all have the format "stash: convert > to builtin", maybe follow the same pattern here? At first, the subject of the commit matched the pattern. I think I changed it in order to avoid any confusion with "list" as data structure. Now, I guess that "stash:" at the beginning of the commit message removes any doubt.
Re: [GSoC][PATCH v7 10/26] stash: convert show to builtin
On Wed, Aug 15, 2018 at 11:20 PM, Thomas Gummerer wrote: > '--quiet' doesn't make too much sense to use with 'git stash show', so > I'm not sure whether or not it makes sense to support it at all. But > we do promise to pass all options through to in our documentation, so > the new behaviour is what we are documenting. Indeed, I guess it doesn't make sense at all. I cannot come up with a situation where `--quiet` would be useful. I will think more about it and come back with a reply if I find any good example where it would be useful.
Re: git-bug: Distributed bug tracker embedded in git
On Sat, Aug 18 2018, Jonathan Nieder wrote: > Hi, > > Michael Muré wrote: > >> I released today git-bug, a distributed bug tracker that embeds in >> git. It use git's internal storage to store bugs information in a way >> that can be merged without conflict. You can push/pull to the normal >> git remote you are already using to interact with other people. Normal >> code and bugs are completely separated and no files are added in the >> regular branches. > > I am a bit unhappy about the namespace grab. Not for trademark > reasons: the Git trademark rules are pretty clear about this kind of > usage being okay. Instead, the unhappiness comes because a future Git > command like "git bug" to produce a bug report with appropriate > diagnostics for a bug in Git seems like a likely and useful thing to > get added to Git some day. And now the name's taken. > > Is it too late to ask if it's possible to come up with a less generic > name? Wouldn't we call such a thing "git-reportbug", or "git gitbug", with reference to Debian reportbug or perl's perlbug? Addressing the more general issue, if we're concerned with 3rd party tools usurping the core namespace trying to convince individual authors of 3rd party tools to change the names of those tools to something more unique is pissing in the wind. That's never going to make a dent in the vast amount of git-whatever tools, most of which won't be discussed as ideas on this mailing list before they're released. I think we basically have these options: 1) Accept the status quo where people do create third party tools, much of which are way too obscure to matter (e.g. I'm sure someone's created a tool/alias called range-diff before, but we didn't care). If those tools become popular enough in the wild they get own that namespace, e.g. we're not going to ship a "git-annex" or "git-lfs" ourselves implementing some unrelated features (re parallel on-list discussion; "git annex" could also be a "git commit --amend" alias). 2) #1, but hope we catch new tools early enough to convince their authors to change the names. 3) Make some structural change to git where only things we ourselves compile get to be called as "git ", and you'd need to call e.g. "git-bug" as "git ext::bug" or something. We'd need to have a large hardcoded list of older tools (lfs, annex, ...) to grandfather in if we went for this approach. I think we should just go for #1, and if we're concerned about #1 not being OK we really need something like #3, because #2 isn't going to work. > Separately from that, I'm happy to see progress being made in the > distributed bug tracker world; thanks for that! > > Thanks, > Jonathan
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Sat, Aug 18, 2018 at 1:34 PM Ævar Arnfjörð Bjarmason wrote: > On Sat, Aug 18 2018, Christian Couder wrote: > > FYI this has been requested from GitLab by Drupal (as well as others) > > see > > https://www.drupal.org/drupalorg/blog/developer-tools-initiative-part-5-gitlab-partnership > > which contains: > > > > "The timeline for Phase 2 is dependent on GitLab’s resolution of a > > diskspace deduplication issue, which they have committed to on our > > behalf: https://gitlab.com/gitlab-org/gitlab-ce/issues/23029"; > > This is not a critique of the delta islands feature, just something I'm > curious about. > > Why is Drupal blocked on something like delta-islands? The blog post > mentions they have 45k projects, which can be browsed at > https://cgit.drupalcode.org > > Almost all of those are completely independent projects, so they > wouldn't benefit from delta islands, and I'd guess >98% are of them are > in an obscure long tail and probably won't have even a single fork. > > That leaves forks of say drupal.git, which is ~150MB, the mirror on > GitHub has 1500 forks: https://github.com/drupal Even if there were 5000 > forks of that that would be 750G of disk space. > > So accounting for backups, me being off by a lot etc. let's say that's > 5TB. That's relatively cheap today. Are they really just holding up > their GitLab migration plans to save something on the order of that disk > space, or have I missed something here? I am not sure why. I haven't been in touch directly with Drupal people and I haven't followed all the discussions with them. When I discussed this topic with someone responsible for another significant open source project. He told me that they don't want forks at all on their self hosted GitLab instance because of the disk space and management burden that would come with them. But they would like people to fork on a separate GitLab instance like gitlab.com or github.com (but then be able to send merge/pull requests to their self hosted GitLab instance) because they know that developers like to have their own fork :-) So my guess is that Drupal people are also afraid of the possible disk space burden, even if your numbers seem to say that they shouldn't. > Again, not a critique of delta-islands, because it's most certainly > useful for the likes of github/gitlab, but I wonder if for this > particular problem it wouldn't be more straightforward of a solution for > GitLab to allow anyone to push to > refs/for-merge// on any > repository. Then they could open a MR for an existing branch in the repo > (which GitLab already supports). My guess is that people are used to forks on GitHub and they like them and want to have the same thing and same workflow on GitLab. The delta islands documentation though says that it's possible to use git namespaces along with delta islands, and if forks are indeed implemented as namespaces, it will be kind of similar in the GitLab internals as what you suggest. There are still discussions by the way in the GitLab issue referenced above about whether it's better for GitLab to use git namespaces or alternates to implement deduplication in forks (along with delta islands).
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Jonathan Nieder writes: >> --- a/sideband.c >> +++ b/sideband.c >> @@ -75,7 +75,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, >> const char *src, int n) > > Not about this patch: should the 'n' parameter be a size_t instead of > an int? It doesn't matter in practice (since the caller has an int, > it can never be more than INT_MAX) but it might make the intent > clearer. I tend to agree, but I think a separate "clean-up" patch to do so is more appropriate than rolling it into this fix. >> /* >> * Match case insensitively, so we colorize output from existing >> * servers regardless of the case that they use for their >> * messages. We only highlight the word precisely, so >> * "successful" stays uncolored. >> */ >> if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) { > > Not about this patch: should this check "&& src[len] == ':'" instead, > as discussed upthread? I originally was of an opinion that we should take only lowercase keyword followed by a colon, primarily because that is what we produce. Then "the real world need" told us that we are better off catching the keyword case-insensitively. Recalling that lesson, I am not sure I would support "let's limit to the colon, rejecting any other punctionation letter". In any case, we should make such a policy decision outside a patch like this one that is about fixing a behaviour which all users would consider as a bug regardless of the policy they support. >> @@ -100,8 +103,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, >> const char *src, int n) >> } >> } >> >> -strbuf_add(dest, src, n); >> +if (0 < n) >> +strbuf_add(dest, src, n); > > This check seems unnecessary. strbuf_add can cope fine with !n. I was primarily interested in catching negatives, and !n was a mere optimization, but you are correct to point out that negative n at this point in the codeflow is a BUG().
[PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite
This makes sure that cache-tree is consistent with the index. The main purpose is to catch potential problems by saving the index in unpack_trees() but the line in write_index() would also help spot missing invalidation in other code. Signed-off-by: Nguyễn Thái Ngọc Duy --- cache-tree.c | 78 ++ cache-tree.h | 1 + read-cache.c | 3 ++ t/test-lib.sh | 6 unpack-trees.c | 2 ++ 5 files changed, 90 insertions(+) diff --git a/cache-tree.c b/cache-tree.c index caafbff2ff..c3c206427c 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -4,6 +4,7 @@ #include "tree-walk.h" #include "cache-tree.h" #include "object-store.h" +#include "replace-object.h" #ifndef DEBUG #define DEBUG 0 @@ -732,3 +733,80 @@ int update_main_cache_tree(int flags) the_index.cache_tree = cache_tree(); return cache_tree_update(&the_index, flags); } + +static void verify_one(struct index_state *istate, + struct cache_tree *it, + struct strbuf *path) +{ + int i, pos, len = path->len; + struct strbuf tree_buf = STRBUF_INIT; + struct object_id new_oid; + + for (i = 0; i < it->subtree_nr; i++) { + strbuf_addf(path, "%s/", it->down[i]->name); + verify_one(istate, it->down[i]->cache_tree, path); + strbuf_setlen(path, len); + } + + if (it->entry_count < 0 || + /* no verification on tests (t7003) that replace trees */ + lookup_replace_object(the_repository, &it->oid) != &it->oid) + return; + + if (path->len) { + pos = index_name_pos(istate, path->buf, path->len); + pos = -pos - 1; + } else { + pos = 0; + } + + i = 0; + while (i < it->entry_count) { + struct cache_entry *ce = istate->cache[pos + i]; + const char *slash; + struct cache_tree_sub *sub = NULL; + const struct object_id *oid; + const char *name; + unsigned mode; + int entlen; + + if (ce->ce_flags & (CE_STAGEMASK | CE_INTENT_TO_ADD | CE_REMOVE)) + BUG("%s with flags 0x%x should not be in cache-tree", + ce->name, ce->ce_flags); + name = ce->name + path->len; + slash = strchr(name, '/'); + if (slash) { + entlen = slash - name; + sub = find_subtree(it, ce->name + path->len, entlen, 0); + if (!sub || sub->cache_tree->entry_count < 0) + BUG("bad subtree '%.*s'", entlen, name); + oid = &sub->cache_tree->oid; + mode = S_IFDIR; + i += sub->cache_tree->entry_count; + } else { + oid = &ce->oid; + mode = ce->ce_mode; + entlen = ce_namelen(ce) - path->len; + i++; + } + strbuf_addf(&tree_buf, "%o %.*s%c", mode, entlen, name, '\0'); + strbuf_add(&tree_buf, oid->hash, the_hash_algo->rawsz); + } + hash_object_file(tree_buf.buf, tree_buf.len, tree_type, &new_oid); + if (oidcmp(&new_oid, &it->oid)) + BUG("cache-tree for path %.*s does not match. " + "Expected %s got %s", len, path->buf, + oid_to_hex(&new_oid), oid_to_hex(&it->oid)); + strbuf_setlen(path, len); + strbuf_release(&tree_buf); +} + +void cache_tree_verify(struct index_state *istate) +{ + struct strbuf path = STRBUF_INIT; + + if (!istate->cache_tree) + return; + verify_one(istate, istate->cache_tree, &path); + strbuf_release(&path); +} diff --git a/cache-tree.h b/cache-tree.h index 9799e894f7..c1fde531f9 100644 --- a/cache-tree.h +++ b/cache-tree.h @@ -32,6 +32,7 @@ struct cache_tree *cache_tree_read(const char *buffer, unsigned long size); int cache_tree_fully_valid(struct cache_tree *); int cache_tree_update(struct index_state *, int); +void cache_tree_verify(struct index_state *); int update_main_cache_tree(int); diff --git a/read-cache.c b/read-cache.c index 5ce40f39b3..41f313bc9e 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2744,6 +2744,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, int new_shared_index, ret; struct split_index *si = istate->split_index; + if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0)) + cache_tree_verify(istate); + if ((flags & SKIP_IF_UNCHANGED) && !istate->cache_changed) { if (flags & COMMIT_LOCK) rollback_lock_file(lock); diff --git a/t/test-lib.sh b/t/test-lib.sh index 78f7097746..5b50f6e2e6 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1083,6 +1083,12
[PATCH v5 0/7] Speed up unpack_trees()
v5 fixes some minor comments from round 4 and a big mistake in 5/5. Junio's scary feeling turns out true. There is a missing invalidation in keep_entry() which is not added in 6/7. 7/7 makes sure that similar problems will not slip through. I had to rebase this series on top of 'master' because 7/7 caught a bad cache-tree situation that has been fixed by Elijah in ad3762042a (read-cache: fix directory/file conflict handling in read_index_unmerged() - 2018-07-31). I believe the issue was we prime cache-tree in 'git reset --hard' even though the index has conflicts. Range-diff (before the rebase): 1: a192faf79e ! 1: ed8763726b trace.h: support nested performance tracing @@ -49,13 +49,16 @@ struct untracked_cache_dir *untracked; - uint64_t start = getnanotime(); - if (has_symlink_leading_path(path, len)) +- if (has_symlink_leading_path(path, len)) ++ trace_performance_enter(); ++ ++ if (has_symlink_leading_path(path, len)) { ++ trace_performance_leave("read directory %.*s", len, path); return dir->nr; ++ } -+ trace_performance_enter(); untracked = validate_untracked_cache(dir, len, pathspec); if (!untracked) - /* @@ dir->nr = i; } 2: 9afe7c488a = 2: 9b70652fa2 unpack-trees: add performance tracing 3: 74101edb60 ! 3: 8b3cfea623 unpack-trees: optimize walking same trees with cache-tree @@ -141,7 +141,7 @@ + + /* + * Do what unpack_callback() and unpack_nondirectories() normally -+ * do. But we walk all paths recursively in just one loop instead. ++ * do. But we walk all paths in an iterative loop instead. + * + * D/F conflicts and higher stage entries are not a concern + * because cache-tree would be invalidated and we would never 4: 9261c5920e = 4: 5af28d44ca unpack-trees: reduce malloc in cache-tree walk 5: 43fac1154f = 5: 5657c92fe9 unpack-trees: reuse (still valid) cache-tree from src_index -: -- > 6: 3b91783afc unpack-trees: add missing cache invalidation -: -- > 7: 0d5464c0dc cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy (7): trace.h: support nested performance tracing unpack-trees: add performance tracing unpack-trees: optimize walking same trees with cache-tree unpack-trees: reduce malloc in cache-tree walk unpack-trees: reuse (still valid) cache-tree from src_index unpack-trees: add missing cache invalidation cache-tree: verify valid cache-tree in the test suite cache-tree.c| 80 + cache-tree.h| 1 + diff-lib.c | 4 +- dir.c | 9 ++- name-hash.c | 4 +- preload-index.c | 4 +- read-cache.c| 16 +++-- t/test-lib.sh | 6 ++ trace.c | 69 -- trace.h | 15 + unpack-trees.c | 154 +++- 11 files changed, 340 insertions(+), 22 deletions(-) -- 2.18.0.1004.g6639190530
[PATCH v5 2/7] unpack-trees: add performance tracing
We're going to optimize unpack_trees() a bit in the following patches. Let's add some tracing to measure how long it takes before and after. This is the baseline ("git checkout -" on webkit.git, 275k files on worktree) performance: 0.056651714 s: read cache .git/index performance: 0.183101080 s: preload index performance: 0.008584433 s: refresh index performance: 0.633767589 s: traverse_trees performance: 0.340265448 s: check_updates performance: 0.381884638 s: cache_tree_update performance: 1.401562947 s: unpack_trees performance: 0.338687914 s: write index, changed mask = 2e performance: 0.411927922 s:traverse_trees performance: 0.23335 s:check_updates performance: 0.423697246 s: unpack_trees performance: 0.423708360 s: diff-index performance: 2.559524127 s: git command: git checkout - Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- cache-tree.c | 2 ++ unpack-trees.c | 9 - 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 181d5919f0..caafbff2ff 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -433,7 +433,9 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + trace_performance_enter(); i = update_one(it, cache, entries, "", 0, &skip, flags); + trace_performance_leave("cache_tree_update"); if (i < 0) return i; istate->cache_changed |= CACHE_TREE_CHANGED; diff --git a/unpack-trees.c b/unpack-trees.c index f9efee0836..6d9f692ea6 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -345,6 +345,7 @@ static int check_updates(struct unpack_trees_options *o) struct checkout state = CHECKOUT_INIT; int i; + trace_performance_enter(); state.force = 1; state.quiet = 1; state.refresh_cache = 1; @@ -414,6 +415,7 @@ static int check_updates(struct unpack_trees_options *o) errs |= finish_delayed_checkout(&state); if (o->update) git_attr_set_direction(GIT_ATTR_CHECKIN, NULL); + trace_performance_leave("check_updates"); return errs != 0; } @@ -1285,6 +1287,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options if (len > MAX_UNPACK_TREES) die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES); + trace_performance_enter(); memset(&el, 0, sizeof(el)); if (!core_apply_sparse_checkout || !o->update) o->skip_sparse_checkout = 1; @@ -1357,7 +1360,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options } } - if (traverse_trees(len, t, &info) < 0) + trace_performance_enter(); + ret = traverse_trees(len, t, &info); + trace_performance_leave("traverse_trees"); + if (ret < 0) goto return_failed; } @@ -1449,6 +1455,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options o->src_index = NULL; done: + trace_performance_leave("unpack_trees"); clear_exclude_list(&el); return ret; -- 2.18.0.1004.g6639190530
[PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index
We do n-way merge by walking the source index and n trees at the same time and add merge results to a new temporary index called o->result. The merge result for any given path could be either - keep_entry(): same old index entry in o->src_index is reused - merged_entry(): either a new entry is added, or an existing one updated - deleted_entry(): one entry from o->src_index is removed For some reason [1] we keep making sure that the source index's cache-tree is still valid if used by o->result: for all those merged/deleted entries, we invalidate the same path in o->src_index, so only cache-trees covering the "keep_entry" parts remain good. Because of this, the cache-tree from o->src_index can be perfectly reused in o->result. And in fact we already rely on this logic to reuse untracked cache in edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08). Move the cache-tree to o->result before doing cache_tree_update() to reduce hashing cost. Since cache_tree_update() has risen up as one of the most expensive parts in unpack_trees() after the last few patches. This does help reduce unpack_trees() time significantly (on webkit.git): before after 0.080394752 0.051258167 s: read cache .git/index 0.216010838 0.212106298 s: preload index 0.008534301 0.280521764 s: refresh index 0.251992198 0.218160442 s: traverse_trees 0.377031383 0.374948191 s: check_updates 0.372768105 0.037040114 s: cache_tree_update 1.045887251 0.672031609 s: unpack_trees 0.314983512 0.317456290 s: write index, changed mask = 2e 0.062572653 0.038382654 s:traverse_trees 0.22544 0.42731 s:check_updates 0.073795585 0.050930053 s: unpack_trees 0.073807557 0.051099735 s: diff-index 1.938191592 1.614241153 s: git command: git checkout - [1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make 'unpack_trees()' have a separate source and destination index - 2008-03-06). That patch aims to _not_ update the source index at all. The invalidation should have been done on o->result in that patch. But then there was no cache-tree on o->result even then so it's pointless to do so. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- read-cache.c | 2 ++ unpack-trees.c | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index 1c9c88c130..5ce40f39b3 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2940,6 +2940,8 @@ void move_index_extensions(struct index_state *dst, struct index_state *src) { dst->untracked = src->untracked; src->untracked = NULL; + dst->cache_tree = src->cache_tree; + src->cache_tree = NULL; } struct cache_entry *dup_cache_entry(const struct cache_entry *ce, diff --git a/unpack-trees.c b/unpack-trees.c index dbef6e1b8a..aa80b65ee1 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1576,6 +1576,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options ret = check_updates(o) ? (-2) : 0; if (o->dst_index) { + move_index_extensions(&o->result, o->src_index); if (!ret) { if (!o->result.cache_tree) o->result.cache_tree = cache_tree(); @@ -1584,7 +1585,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options WRITE_TREE_SILENT | WRITE_TREE_REPAIR); } - move_index_extensions(&o->result, o->src_index); discard_index(o->dst_index); *o->dst_index = o->result; } else { -- 2.18.0.1004.g6639190530
[PATCH v5 1/7] trace.h: support nested performance tracing
Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- diff-lib.c | 4 +-- dir.c | 9 --- name-hash.c | 4 +-- preload-index.c | 4 +-- read-cache.c| 11 trace.c | 69 - trace.h | 15 +++ 7 files changed, 96 insertions(+), 20 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index 732f684a49..d5bbb7ea50 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -518,11 +518,11 @@ static int diff_cache(struct rev_info *revs, int run_diff_index(struct rev_info *revs, int cached) { struct object_array_entry *ent; - uint64_t start = getnanotime(); if (revs->pending.nr != 1) BUG("run_diff_index must be passed exactly one tree"); + trace_performance_enter(); ent = revs->pending.objects; if (diff_cache(revs, &ent->item->oid, ent->name, cached)) exit(128); @@ -531,7 +531,7 @@ int run_diff_index(struct rev_info *revs, int cached) diffcore_fix_diff_index(&revs->diffopt); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); - trace_performance_since(start, "diff-index"); + trace_performance_leave("diff-index"); return 0; } diff --git a/dir.c b/dir.c index 32f5f72759..18b57b94cc 100644 --- a/dir.c +++ b/dir.c @@ -2263,10 +2263,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - uint64_t start = getnanotime(); - if (has_symlink_leading_path(path, len)) + trace_performance_enter(); + + if (has_symlink_leading_path(path, len)) { + trace_performance_leave("read directory %.*s", len, path); return dir->nr; + } untracked = validate_untracked_cache(dir, len, pathspec); if (!untracked) @@ -2302,7 +2305,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, dir->nr = i; } - trace_performance_since(start, "read directory %.*s", len, path); + trace_performance_leave("read directory %.*s", len, path); if (dir->untracked) { static int force_untracked_cache = -1; static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); diff --git a/name-hash.c b/name-hash.c index 163849831c..1fcda73cb3 100644 --- a/name-hash.c +++ b/name-hash.c @@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash( static void lazy_init_name_hash(struct index_state *istate) { - uint64_t start = getnanotime(); if (istate->name_hash_initialized) return; + trace_performance_enter(); hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr); hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr); @@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate) } istate->name_hash_initialized = 1; - trace_performance_since(start, "initialize name hash"); + trace_performance_leave("initialize name hash"); } /* diff --git a/preload-index.c b/preload-index.c index 4d08d44874..d7f7919ba2 100644 --- a/preload-index.c +++ b/preload-index.c @@ -78,7 +78,6 @@ static void preload_index(struct index_state *index, { int threads, i, work, offset; struct thread_data data[MAX_PARALLEL]; - uint64_t start = getnanotime(); if (!core_preload_index) return; @@ -88,6 +87,7 @@ static void preload_index(struct index_state *index, threads = 2; if (threads < 2) return; + trace_performance_enter(); if (threads > MAX_PARALLEL) threads = MAX_PARALLEL; offset = 0; @@ -109,7 +109,7 @@ static void preload_index(struct index_state *index, if (pthread_join(p->pthread, NULL)) die("unable to join threaded lstat"); } - trace_performance_since(start, "preload index"); + trace_performance_leave("preload index"); } #endif diff --git a/read-cache.c b/read-cache.c index c5fabc844a..1c9c88c130 100644 --- a/read-cache.c +++ b/read-cache.c @
[PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk
This is a micro optimization that probably only shines on repos with deep directory structure. Instead of allocating and freeing a new cache_entry in every iteration, we reuse the last one and only update the parts that are new each iteration. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- unpack-trees.c | 29 - 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 8376663b59..dbef6e1b8a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -685,6 +685,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names, { struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, }; struct unpack_trees_options *o = info->data; + struct cache_entry *tree_ce = NULL; + int ce_len = 0; int i, d; if (!o->merge) @@ -699,30 +701,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names, * get here in the first place. */ for (i = 0; i < nr_entries; i++) { - struct cache_entry *tree_ce; - int len, rc; + int new_ce_len, len, rc; src[0] = o->src_index->cache[pos + i]; len = ce_namelen(src[0]); - tree_ce = xcalloc(1, cache_entry_size(len)); + new_ce_len = cache_entry_size(len); + + if (new_ce_len > ce_len) { + new_ce_len <<= 1; + tree_ce = xrealloc(tree_ce, new_ce_len); + memset(tree_ce, 0, new_ce_len); + ce_len = new_ce_len; + + tree_ce->ce_flags = create_ce_flags(0); + + for (d = 1; d <= nr_names; d++) + src[d] = tree_ce; + } tree_ce->ce_mode = src[0]->ce_mode; - tree_ce->ce_flags = create_ce_flags(0); tree_ce->ce_namelen = len; oidcpy(&tree_ce->oid, &src[0]->oid); memcpy(tree_ce->name, src[0]->name, len + 1); - for (d = 1; d <= nr_names; d++) - src[d] = tree_ce; - rc = call_unpack_fn((const struct cache_entry * const *)src, o); - free(tree_ce); - if (rc < 0) + if (rc < 0) { + free(tree_ce); return rc; + } mark_ce_used(src[0], o); } + free(tree_ce); if (o->debug_unpack) printf("Unpacked %d entries from %s to %s using cache-tree\n", nr_entries, -- 2.18.0.1004.g6639190530
[PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree
In order to merge one or many trees with the index, unpack-trees code walks multiple trees in parallel with the index and performs n-way merge. If we find out at start of a directory that all trees are the same (by comparing OID) and cache-tree happens to be available for that directory as well, we could avoid walking the trees because we already know what these trees contain: it's flattened in what's called "the index". The upside is of course a lot less I/O since we can potentially skip lots of trees (think subtrees). We also save CPU because we don't have to inflate and apply the deltas. The downside is of course more fragile code since the logic in some functions are now duplicated elsewhere. "checkout -" with this patch on webkit.git (275k files): baseline new 0.056651714 0.080394752 s: read cache .git/index 0.183101080 0.216010838 s: preload index 0.008584433 0.008534301 s: refresh index 0.633767589 0.251992198 s: traverse_trees 0.340265448 0.377031383 s: check_updates 0.381884638 0.372768105 s: cache_tree_update 1.401562947 1.045887251 s: unpack_trees 0.338687914 0.314983512 s: write index, changed mask = 2e 0.411927922 0.062572653 s:traverse_trees 0.23335 0.22544 s:check_updates 0.423697246 0.073795585 s: unpack_trees 0.423708360 0.073807557 s: diff-index 2.559524127 1.938191592 s: git command: git checkout - Another measurement from Ben's running "git checkout" with over 500k trees (on the whole series): baselinenew -- 0.535510167 0.556558733 s: read cache .git/index 0.3057373 0.3147105 s: initialize name hash 0.0184082 0.023558433 s: preload index 0.086910967 0.089085967 s: refresh index 7.889590767 2.191554433 s: unpack trees 0.120760833 0.131941267 s: update worktree after a merge 2.2583504 2.572663167 s: repair cache-tree 0.8916137 0.959495233 s: write index, changed mask = 28 3.405199233 0.2710663 s: unpack trees 0.000999667 0.0021554 s: update worktree after a merge 3.4063306 0.273318333 s: diff-index 16.9524923 9.462943133 s: git command: git.exe checkout This command calls unpack_trees() twice, the first time on 2way merge and the second 1way merge. In both times, "unpack trees" time is reduced to one third. Overall time reduction is not that impressive of course because index operations take a big chunk. And there's that repair cache-tree line. PS. A note about cache-tree invalidation and the use of it in this code. We do invalidate cache-tree in _source_ index when we add new entries to the (temporary) "result" index. But we also use the cache-tree from source index in this optimization. Does this mean we end up having no cache-tree in the source index to activate this optimization? The answer is twisted: the order of finding a good cache-tree and invalidating it matters. In this case we check for a good cache-tree first in all_trees_same_as_cache_tree(), then we start to merge things and potentially invalidate that same cache-tree in the process. Since cache-tree invalidation happens after the optimization kicks in, we're still good. But we may lose that cache-tree at the very first call_unpack_fn() call in traverse_by_cache_tree(). Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- unpack-trees.c | 127 + 1 file changed, 127 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index 6d9f692ea6..8376663b59 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -635,6 +635,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid); } +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask, + struct name_entry *names, + struct traverse_info *info) +{ + struct unpack_trees_options *o = info->data; + int i; + + if (!o->merge || dirmask != ((1 << n) - 1)) + return 0; + + for (i = 1; i < n; i++) + if (!are_same_oid(names, names + i)) + return 0; + + return cache_tree_matches_traversal(o->src_index->cache_tree, names, info); +} + +static int index_pos_by_traverse_info(struct name_entry *names, + struct traverse_info *info) +{ + struct unpack_trees_options *o = info->data; + int len = traverse_path_len(info, names); + char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */); + int pos; + + make_traverse_path(name, info, names); + name[len++]
[PATCH v5 6/7] unpack-trees: add missing cache invalidation
Any changes to the output index should be (confusingly) marked in the source index with invalidate_ce_path(). This is used to make sure we still have valid untracked cache and cache-tree extensions in the end. We do a pretty good job of invalidating except in two places. verify_clean_subdirectory() is part of verify_absent() and verify_absent_sparse(). The former is usually called by merged_entry() or directly in threeway_merge(). The latter is obviously used by sparse checkout. In these three call sites, only merged_entry() follows up with invalidate_ce_path(). The other two don't, but they should not trigger this ce removal because this is about D/F conflicts [1]. But let's be safe and invalidate_ce_path() here as well. The second place is keep_entry() which is also used by threeway_merge() to keep higher stage entries. In order to reuse cache-tree we need to invalidate these paths as well. It's not a problem in the past because whenever a higher stage entry is present, cache-tree will not be created [2]. Now we salvage cache-tree even when higher stage entries are present, we need more invalidation. [1] c81935348b (Fix switching to a branch with D/F when current branch has file D. - 2007-03-15) [2] This is probably too strict. We should be able to create and save cache-tree for the directories that do not have conflict entries in cache_tree_update(). And this becomes more important when cache-tree plays bigger role in terms of performance. Signed-off-by: Nguyễn Thái Ngọc Duy --- unpack-trees.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/unpack-trees.c b/unpack-trees.c index aa80b65ee1..bc43922922 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1774,6 +1774,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce, if (verify_uptodate(ce2, o)) return -1; add_entry(o, ce2, CE_REMOVE, 0); + invalidate_ce_path(ce, o); mark_ce_used(ce2, o); } cnt++; @@ -2033,6 +2034,8 @@ static int keep_entry(const struct cache_entry *ce, struct unpack_trees_options *o) { add_entry(o, ce, 0, 0); + if (ce_stage(ce)) + invalidate_ce_path(ce, o); return 1; } -- 2.18.0.1004.g6639190530
Re: [RFC/PATCH] drop vcs-svn experiment
On Fri, Aug 17, 2018 at 10:26:05PM -0700, Jonathan Nieder wrote: > > We also ship contrib/svn-fe, which builds on the vcs-svn > > work. However, it does not seem to build out of the box for > > me, as the link step misses some required libraries for > > using libgit.a. > > What libraries do you mean? It builds and runs fine for me with > > $ git diff > diff --git i/contrib/svn-fe/Makefile w/contrib/svn-fe/Makefile > index e8651aaf4b5..bd709f8d83b 100644 > --- i/contrib/svn-fe/Makefile > +++ w/contrib/svn-fe/Makefile > @@ -4,7 +4,7 @@ CC = cc > RM = rm -f > MV = mv > > -CFLAGS = -g -O2 -Wall > +CFLAGS = -g -O2 -Wall -pthread > LDFLAGS = > EXTLIBS = -lz > > which appears to be platform related, not due to some internal change > in Git. Yes, it works for me with that, too[1]. So clearly there's some system dependence. But I suspect it's broken for every system with pthreads, which is most of them. And older versions _do_ compile out of the box, even on my modern system. For completeness, here's what I dug up: - it builds fine up through v1.8.2 - after eff80a9fd9 (Allow custom "comment char", 2013-01-16), it breaks with a ton of undefined references during the link stage, including SHA1_* and some xdl_* functions. I still have no idea why, as that commit is fairly mundane, but I guess it just somehow tickles something in the linker or the way we build libgit.a. - after da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), the error becomes: /usr/bin/ld: ../../libgit.a(sha1_file.o): undefined reference to symbol 'SHA1_Update@@OPENSSL_1_1_0' /usr/bin/ld: //usr/lib/x86_64-linux-gnu/libcrypto.so.1.1: error adding symbols: DSO missing from command line Presumably this did work for the author at the time. It's seems quite plausible that older versions of openssl did not exhibit this problem, and that it's system-specific. Or that it was possible to build with BLK_SHA1. - after e6b07da278 (Makefile: make DC_SHA1 the default, 2017-03-17), the openssl error goes away (naturally), but is replaced with: /usr/bin/ld: ../../libgit.a(run-command.o): undefined reference to symbol 'pthread_sigmask@@GLIBC_2.2.5' /usr/bin/ld: //lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line If I go back to 2014-era and start building with NO_OPENSSL, then even da011cb0e7 fails with: /usr/bin/ld: ../../libgit.a(run-command.o): undefined reference to symbol 'pthread_setspecific@@GLIBC_2.2.5' So again, assuming it worked back then for the author of that commit, that's something that has changed on the system, and we can't figure out through bisecting git when that became common. So that does mean it's possible that it works for some people on some systems today (though it was also probably broken for everybody for a year and a half in 2013 with nobody noticing). [1] That patch actually doesn't quite work out of the box, because we also include config.mak, and mine overrides CFLAGS. It also doesn't seem to work with USE_LIBPCRE. But those are only evidence that the Makefile is not very mature, not that people aren't using it for out-of-the-box config. > > Of course, I could be completely wrong about people using this. Maybe > > svn-fe builds are just completely broken on my system, and maybe people > > really do use testsvn::. But if so, they certainly aren't talking about > > it on the mailing list. :) > > My take: > > - svn-fe works fine and has been useful to me, though its Makefile >could likely be simplified and made more user-friendly > > - I've benefited from the test coverage of having this in-tree > > - testsvn:: is a demo and at a minimum we ought not to install it >with "make install" > > - keeping this in-tree for the benefit of just one user is excessive, >so removing it is probably the right thing Thanks, all of that sounds sensible to me. > - it would be nice if the commit removing this code from Git includes >a note to help people find its new home > > Would you mind holding off until I'm able to arrange that last bit? Not at all. This patch was mostly meant to start the discussion. Mission accomplished. ;) -Peff
Re: [GSoC][PATCH v7 12/26] stash: refactor `show_stash()` to use the diff API
On Thu, Aug 16, 2018 at 12:01 AM, Thomas Gummerer wrote: > On 08/08, Paul-Sebastian Ungureanu wrote: >> Currently, `show_stash()` uses `cmd_diff()` to generate >> the output. After this commit, the output will be generated >> using the internal API. >> >> Before this commit, `git stash show --quiet` would act like >> `git diff` and error out if the stash is not empty. Now, the >> `--quiet` option does not error out given an empty stash. > > I think this needs a bit more justification. As I mentioned in my > comment to a previous patch, I'm not sure '--quiet' makes much sense > with 'git stash show' (it will show nothing, and will always exit with > an error code, as the stash will always contain something), but if we > are supporting the same flags as 'git diff', and essentially just > forwarding them, shouldn't they keep the same behaviour as well? If we are going to support them, I guess there wouldn't be a problem if any change in behaviour is noted in documentation (but as you said, the next commit should be squashed into this). Indeed, the big question is if we should support them. I would say no considering there is no benefit. >> Signed-off-by: Paul-Sebastian Ungureanu >> --- >> builtin/stash--helper.c | 73 + >> 1 file changed, 45 insertions(+), 28 deletions(-) >> >> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c >> index 0c1efca6b..ec8c38c6f 100644 >> --- a/builtin/stash--helper.c >> +++ b/builtin/stash--helper.c >> @@ -10,6 +10,8 @@ >> #include "run-command.h" >> #include "dir.h" >> #include "rerere.h" >> +#include "revision.h" >> +#include "log-tree.h" >> >> static const char * const git_stash_helper_usage[] = { >> N_("git stash--helper list []"), >> @@ -662,56 +664,71 @@ static int git_stash_config(const char *var, const >> char *value, void *cb) >> >> static int show_stash(int argc, const char **argv, const char *prefix) >> { >> - int i, ret = 0; >> - struct child_process cp = CHILD_PROCESS_INIT; >> - struct argv_array args_refs = ARGV_ARRAY_INIT; >> + int i; >> + int flags = 0; >> struct stash_info info; >> + struct rev_info rev; >> + struct argv_array stash_args = ARGV_ARRAY_INIT; >> struct option options[] = { >> OPT_END() >> }; >> >> - argc = parse_options(argc, argv, prefix, options, >> - git_stash_helper_show_usage, >> - PARSE_OPT_KEEP_UNKNOWN); >> + init_diff_ui_defaults(); >> + git_config(git_diff_ui_config, NULL); >> >> - cp.git_cmd = 1; >> - argv_array_push(&cp.args, "diff"); >> + init_revisions(&rev, prefix); >> >> - /* Push arguments which are not options into args_refs. */ >> - for (i = 0; i < argc; ++i) { >> - if (argv[i][0] == '-') >> - argv_array_push(&cp.args, argv[i]); >> + /* Push arguments which are not options into stash_args. */ >> + for (i = 1; i < argc; ++i) { >> + if (argv[i][0] != '-') >> + argv_array_push(&stash_args, argv[i]); >> else >> - argv_array_push(&args_refs, argv[i]); >> - } >> - >> - if (get_stash_info(&info, args_refs.argc, args_refs.argv)) { >> - child_process_clear(&cp); >> - argv_array_clear(&args_refs); >> - return -1; >> + flags++; >> } >> >> /* >>* The config settings are applied only if there are not passed >>* any flags. >>*/ >> - if (cp.args.argc == 1) { >> + if (!flags) { >> git_config(git_stash_config, NULL); >> if (show_stat) >> - argv_array_push(&cp.args, "--stat"); >> + rev.diffopt.output_format |= DIFF_FORMAT_DIFFSTAT; >> + if (show_patch) { >> + rev.diffopt.output_format = ~DIFF_FORMAT_NO_OUTPUT; >> + rev.diffopt.output_format |= DIFF_FORMAT_PATCH; >> + } > > I failed to notice this in the previous patch (the problem existed > there as well), but this changes the behaviour of 'git -c > stash.showStat=false stash show '. Previously doing this would > not show anything, which is the correct behaviour, while now still > shows the diffstat. > > I think the show_stat variable is interpreted the wrong way around in > the previous patch. > > Something else I noticed now that I was playing around more with the > config options is that the parsing of the config options is not > correctly done in the previous patch. It does a 'strcmp(var, > "stash.showStat"))', but the config API makes all variables lowercase > (config options are case insensitive, and making everything lowercase > is the way to ensure that), so it should be 'strcmp(var, "stash.showstat"))', > and similar for the 'stash.showpatch' config option. > > This all sounds like it would be nice to have some tests for these > config opti
Re: [GSoC][PATCH v7 18/26] stash: convert push to builtin
On 08/08, Paul-Sebastian Ungureanu wrote: > Add stash push to the helper. > --- This (and the previous two and I think most subsequent patches) are missing your sign-off. > builtin/stash--helper.c | 209 > git-stash.sh| 6 +- > 2 files changed, 213 insertions(+), 2 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index f905d3908..c26cad3d5 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -23,6 +23,9 @@ static const char * const git_stash_helper_usage[] = { > N_("git stash--helper clear"), > N_("git stash--helper store [-m|--message ] [-q|--quiet] > "), > N_("git stash--helper create []"), > + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] > [-q|--quiet]\n" > +" [-u|--include-untracked] [-a|--all] [-m|--message > ]\n" > +" [--] [...]]"), > NULL > }; > > @@ -71,6 +74,13 @@ static const char * const git_stash_helper_create_usage[] > = { > NULL > }; > > +static const char * const git_stash_helper_push_usage[] = { > + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] > [-q|--quiet]\n" > +" [-u|--include-untracked] [-a|--all] [-m|--message > ]\n" > +" [--] [...]]"), > + NULL > +}; > + > static const char *ref_stash = "refs/stash"; > static int quiet; > static struct strbuf stash_index_path = STRBUF_INIT; > @@ -1210,6 +1220,203 @@ static int create_stash(int argc, const char **argv, > const char *prefix) > return ret < 0; > } > > +static int do_push_stash(int argc, const char **argv, const char *prefix, > + int keep_index, int patch_mode, int include_untracked, > + int quiet, const char *stash_msg) > +{ > + int ret = 0; > + struct pathspec ps; > + struct stash_info info; > + if (patch_mode && keep_index == -1) > + keep_index = 1; > + > + if (patch_mode && include_untracked) { > + fprintf_ln(stderr, "Can't use --patch and --include-untracked > or --all at the same time"); This should be marked for translation. Similar for the messages below. I noticed this in a previous patch as well, so it may be worth reviewing all the output, and checking that it's going to the right stream and is marked for translation. > + return -1; > + } > + > + parse_pathspec(&ps, 0, PATHSPEC_PREFER_FULL, prefix, argv); > + > + if (read_cache() < 0) > + die(_("index file corrupt")); > + > + if (!include_untracked && ps.nr) { > + int i; > + char *ps_matched = xcalloc(ps.nr, 1); > + > + for (i = 0; i < active_nr; ++i) { > + const struct cache_entry *ce = active_cache[i]; > + if (!ce_path_match(ce, &ps, ps_matched)) > + continue; > + } > + > + if (report_path_error(ps_matched, &ps, prefix)) { > + fprintf_ln(stderr, "Did you forget to 'git add'?"); > + return -1; > + } > + } > + > + read_cache_preload(NULL); Instead of doing a 'read_cache' before and 'read_cache_preload(NULL)' here, we could just use 'read_cache_preload(NULL)' above. 'read_cache' does return early if the index has already been read, so there's no big harm in doing this twice, but just having one call is still neater I think. It would make the command slightly slower in the error case above, but I doubt that's worth worrying about. > + if (refresh_cache(REFRESH_QUIET)) > + return -1; > + > + if (!check_changes(argv, include_untracked, prefix)) { > + fprintf_ln(stdout, "No local changes to save"); > + return 0; > + } > + > + if (!reflog_exists(ref_stash) && do_clear_stash()) { > + fprintf_ln(stderr, "Cannot initialize stash"); > + return -1; > + } > + > + if ((ret = do_create_stash(argc, argv, prefix, &stash_msg, > +include_untracked, patch_mode, &info))) > + return ret; Should this be 'return ret < 0'? 'ret == 1' means there are no changes, for which we currently get a 0 exit code. Though on second thought that can't happen, because we already have 'check_changes' above. Why do we want the 'ret' variable here? Something I notice here is that we are passing 'argc' and 'argv' around a lot. We passed that through parse-options already, and it seems to me that we're mostly left with pathspecs here, rather than 'argv'. It looks to me like we could just parse the pathspecs in the callers (which we do in some places, but maybe not in all of them) and then pass 'struct pathspec' around instead of the leftover argv, which is easier to understand, and gives all these functions a neater/easier to understand interface. Also looking at 'do_create_stash', the 'ar
Re: [GSoC][PATCH v7 15/26] stash: convert create to builtin
On Thu, Aug 16, 2018 at 1:13 AM, Thomas Gummerer wrote: > On 08/08, Paul-Sebastian Ungureanu wrote: >> Add stash create to the helper. >> >> Signed-off-by: Paul-Sebastian Ungureanu >> --- >> builtin/stash--helper.c | 406 >> git-stash.sh| 2 +- >> 2 files changed, 407 insertions(+), 1 deletion(-) >> >> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c >> index 5ff810f8c..a4e57899b 100644 >> --- a/builtin/stash--helper.c >> +++ b/builtin/stash--helper.c >> @@ -21,6 +21,7 @@ static const char * const git_stash_helper_usage[] = { >> N_("git stash--helper branch []"), >> N_("git stash--helper clear"), >> N_("git stash--helper store [-m|--message ] [-q|--quiet] >> "), >> + N_("git stash--helper create []"), >> NULL >> }; >> >> @@ -64,6 +65,11 @@ static const char * const git_stash_helper_store_usage[] >> = { >> NULL >> }; >> >> +static const char * const git_stash_helper_create_usage[] = { >> + N_("git stash--helper create []"), >> + NULL >> +}; >> + >> static const char *ref_stash = "refs/stash"; >> static int quiet; >> static struct strbuf stash_index_path = STRBUF_INIT; >> @@ -781,6 +787,404 @@ static int store_stash(int argc, const char **argv, >> const char *prefix) >> return do_store_stash(argv[0], stash_msg, quiet); >> } >> >> [...] >> >> + >> +static int do_create_stash(int argc, const char **argv, const char *prefix, >> +const char **stash_msg, int include_untracked, >> +int patch_mode, struct stash_info *info) >> +{ >> + int untracked_commit_option = 0; >> + int ret = 0; >> + int subject_len; >> + int flags; >> + const char *head_short_sha1 = NULL; >> + const char *branch_ref = NULL; >> + const char *head_subject = NULL; >> + const char *branch_name = "(no branch)"; >> + struct commit *head_commit = NULL; >> + struct commit_list *parents = NULL; >> + struct strbuf msg = STRBUF_INIT; >> + struct strbuf commit_tree_label = STRBUF_INIT; >> + struct strbuf out = STRBUF_INIT; >> + struct strbuf final_stash_msg = STRBUF_INIT; >> + >> + read_cache_preload(NULL); >> + refresh_cache(REFRESH_QUIET); >> + >> + if (!check_changes(argv, include_untracked, prefix)) { >> + ret = 1; >> + goto done; > > I wonder if we can just 'exit(0)' here, instead of returning. This > whole command is a builtin, and I *think* outside of 'libgit.a' exiting > early is fine. It does mean that we're not free'ing the memory > though, which means a leak checker would probably complain. So > dunno. It would simplify the code a little, but not sure it's worth it. Indeed, there shouldn't be any problem by calling exit(0). >> + } >> + >> + if (get_oid("HEAD", &info->b_commit)) { >> + fprintf_ln(stderr, "You do not have the initial commit yet"); >> + ret = -1; >> + goto done; >> + } else { >> + head_commit = lookup_commit(the_repository, &info->b_commit); >> + } >> + >> + branch_ref = resolve_ref_unsafe("HEAD", 0, NULL, &flags); >> + if (flags & REF_ISSYMREF) >> + branch_name = strrchr(branch_ref, '/') + 1; >> + head_short_sha1 = find_unique_abbrev(&head_commit->object.oid, >> + DEFAULT_ABBREV); >> + subject_len = find_commit_subject(get_commit_buffer(head_commit, NULL), >> + &head_subject); >> + strbuf_addf(&msg, "%s: %s %.*s\n", branch_name, head_short_sha1, >> + subject_len, head_subject); > > I think this can be written in a slightly simpler way: > > head_short_sha1 = find_unique_abbrev(&head_commit->object.oid, > DEFAULT_ABBREV); > strbuf_addf(&msg, "%s: %s", branch_name, head_short_sha1); > pp_commit_easy(CMIT_FMT_ONELINE, head_commit, &msg); > strbuf_addch(&msg, '\n'); > > The other advantage this brings is that it is consistent with other > places where we print/use the subject of a commit (e.g. in 'git reset > --hard'). Thanks for this suggestion. >> + >> + strbuf_addf(&commit_tree_label, "index on %s\n", msg.buf); >> + commit_list_insert(head_commit, &parents); >> + if (write_cache_as_tree(&info->i_tree, 0, NULL) || >> + commit_tree(commit_tree_label.buf, commit_tree_label.len, >> + &info->i_tree, parents, &info->i_commit, NULL, NULL)) { >> + fprintf_ln(stderr, "Cannot save the current index state"); > > Looks like this message is translated in the current 'git stash' > implementation, so it should be here as well. Same for the messages > below. > >> + ret = -1; >> + goto done; >> + } >> + >> + if (include_untracked && get_untracked_files(argv, 1, >> + include_untrac
Re: [GSoC][PATCH v7 19/26] stash: make push to be quiet
> Subject: stash: make push to be quiet Nit: maybe "stash: make push -q quiet"? I think the subject should at least mention the -q option. On 08/08, Paul-Sebastian Ungureanu wrote: > There is a change in behaviour with this commit. When there was > no initial commit, the shell version of stash would still display > a message. This commit makes `push` to not display any message if > `--quiet` or `-q` is specified. Yeah, not being quiet here cna be considered a bug, so this change in behaviour makes sense. Should the "No changes selected" message in 'stash_patch' also be made quiet? > --- > builtin/stash--helper.c | 41 +++-- > 1 file changed, 27 insertions(+), 14 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index c26cad3d5..4fd79532c 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -1079,7 +1079,7 @@ static int stash_working_tree(struct stash_info *info, > > static int do_create_stash(int argc, const char **argv, const char *prefix, > const char **stash_msg, int include_untracked, > -int patch_mode, struct stash_info *info) > +int patch_mode, struct stash_info *info, int quiet) > { > int untracked_commit_option = 0; > int ret = 0; > @@ -1105,7 +1105,8 @@ static int do_create_stash(int argc, const char **argv, > const char *prefix, > } > > if (get_oid("HEAD", &info->b_commit)) { > - fprintf_ln(stderr, "You do not have the initial commit yet"); > + if (!quiet) > + fprintf_ln(stderr, "You do not have the initial commit > yet"); > ret = -1; > goto done; > } else { > @@ -1127,7 +1128,8 @@ static int do_create_stash(int argc, const char **argv, > const char *prefix, > if (write_cache_as_tree(&info->i_tree, 0, NULL) || > commit_tree(commit_tree_label.buf, commit_tree_label.len, > &info->i_tree, parents, &info->i_commit, NULL, NULL)) { > - fprintf_ln(stderr, "Cannot save the current index state"); > + if (!quiet) > + fprintf_ln(stderr, "Cannot save the current index > state"); > ret = -1; > goto done; > } > @@ -1135,7 +1137,8 @@ static int do_create_stash(int argc, const char **argv, > const char *prefix, > if (include_untracked && get_untracked_files(argv, 1, >include_untracked, &out)) { > if (save_untracked_files(info, &msg, &out)) { > - printf_ln("Cannot save the untracked files"); > + if (!quiet) > + printf_ln("Cannot save the untracked files"); > ret = -1; > goto done; > } > @@ -1144,14 +1147,16 @@ static int do_create_stash(int argc, const char > **argv, const char *prefix, > if (patch_mode) { > ret = stash_patch(info, argv); > if (ret < 0) { > - printf_ln("Cannot save the current worktree state"); > + if (!quiet) > + printf_ln("Cannot save the current worktree > state"); > goto done; > } else if (ret > 0) { > goto done; > } > } else { > if (stash_working_tree(info, argv, prefix)) { > - printf_ln("Cannot save the current worktree state"); > + if (!quiet) > + printf_ln("Cannot save the current worktree > state"); > ret = -1; > goto done; > } > @@ -1176,7 +1181,8 @@ static int do_create_stash(int argc, const char **argv, > const char *prefix, > > if (commit_tree(*stash_msg, strlen(*stash_msg), &info->w_tree, > parents, &info->w_commit, NULL, NULL)) { > - printf_ln("Cannot record working tree state"); > + if (!quiet) > + printf_ln("Cannot record working tree state"); > ret = -1; > goto done; > } > @@ -1208,7 +1214,7 @@ static int create_stash(int argc, const char **argv, > const char *prefix) >0); > > ret = do_create_stash(argc, argv, prefix, &stash_msg, > - include_untracked, 0, &info); > + include_untracked, 0, &info, 0); > > if (!ret) > printf_ln("%s", oid_to_hex(&info.w_commit)); > @@ -1261,25 +1267,31 @@ static int do_push_stash(int argc, const char **argv, > const char *prefix, > return -1; > > if (!check_changes(argv, include_untracked, prefix)) { > - fprintf_ln(stdout, "No local changes to save"); > + if (!quiet) > +
Re: [PATCH 11/18] builtin rebase: support `--autostash` option
On Wed, Aug 8, 2018 at 5:26 PM Pratik Karki wrote: > @@ -224,13 +219,56 @@ static int read_basic_state(struct rebase_options *opts) > return 0; > } > > +static int apply_autostash(struct rebase_options *opts) > +{ > + const char *path = state_dir_path("autostash", opts); > + struct strbuf autostash = STRBUF_INIT; > + struct child_process stash_apply = CHILD_PROCESS_INIT; > + > + if (!file_exists(path)) > + return 0; > + > + if (read_one(state_dir_path("autostash", opts), &autostash)) > + return error(_("Could not read '%s'"), path); > + argv_array_pushl(&stash_apply.args, > +"stash", "apply", autostash.buf, NULL); > + stash_apply.git_cmd = 1; > + stash_apply.no_stderr = stash_apply.no_stdout = > + stash_apply.no_stdin = 1; > + if (!run_command(&stash_apply)) > + printf("Applied autostash.\n"); I think you need _() here. > + else { > + struct argv_array args = ARGV_ARRAY_INIT; > + int res = 0; > + > + argv_array_pushl(&args, > +"stash", "store", "-m", "autostash", "-q", > +autostash.buf, NULL); > + if (run_command_v_opt(args.argv, RUN_GIT_CMD)) > + res = error(_("Cannot store %s"), autostash.buf); > + argv_array_clear(&args); > + strbuf_release(&autostash); > + if (res) > + return res; > + > + fprintf(stderr, > + _("Applying autostash resulted in conflicts.\n" > + "Your changes are safe in the stash.\n" > + "You can run \"git stash pop\" or \"git stash > drop\" " > + "at any time.\n")); > + } > + > + strbuf_release(&autostash); > + return 0; > +} > + > static int finish_rebase(struct rebase_options *opts) > { > struct strbuf dir = STRBUF_INIT; > const char *argv_gc_auto[] = { "gc", "--auto", NULL }; > > delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF); > - apply_autostash(); > + apply_autostash(opts); > close_all_packs(the_repository->objects); > /* > * We ignore errors in 'gc --auto', since the -- Duy
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Junio C Hamano writes: >>> - strbuf_add(dest, src, n); >>> + if (0 < n) >>> + strbuf_add(dest, src, n); >> >> This check seems unnecessary. strbuf_add can cope fine with !n. > > I was primarily interested in catching negatives, and !n was a mere > optimization, but you are correct to point out that negative n at > this point in the codeflow is a BUG(). Actually, let's just lose the conditional. strbuf_add() would catch and issue an error message when it notices that we fed negative count anyway ;-).
Re: [PATCH v2 2/2] repack: repack promisor objects if -a or -A is set
On Thu, Aug 9, 2018 at 12:35 AM Jonathan Tan wrote: > @@ -179,6 +179,76 @@ static void prepare_pack_objects(struct child_process > *cmd, > cmd->out = -1; > } > > +/* > + * Write oid to the given struct child_process's stdin, starting it first if > + * necessary. > + */ > +static int write_oid(const struct object_id *oid, struct packed_git *pack, > +uint32_t pos, void *data) > +{ > + struct child_process *cmd = data; > + > + if (cmd->in == -1) { > + if (start_command(cmd)) > + die("Could not start pack-objects to repack promisor > objects"); _() ? (also other messages) -- Duy
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Jonathan Nieder writes: > And here are some tests to squash in. Thanks, will do.
Re: [GSoC][PATCH v7 05/26] stash: convert apply to builtin
On Wed, Aug 8, 2018 at 9:00 PM Paul-Sebastian Ungureanu wrote: > + strbuf_init(&info->revision, 0); > + if (!commit) { > + if (!ref_exists(ref_stash)) { > + free_stash_info(info); > + fprintf_ln(stderr, "No stash entries found."); Maybe _() ? -- Duy
Re: [PATCH 7/7] submodule--helper: introduce new update-module-mode helper
On Tue, Aug 14, 2018 at 12:45 AM Stefan Beller wrote: > +static int module_update_module_mode(int argc, const char **argv, const char > *prefix) > +{ > + const char *path, *update = NULL; > + int just_cloned; > + struct submodule_update_strategy update_strategy = { .type = > SM_UPDATE_CHECKOUT }; > + > + if (argc < 3 || argc > 4) > + die("submodule--helper update-module-clone expects > []"); Maybe _() ? -- Duy
Re: [GSoC][PATCH v7 20/26] stash: add tests for `git stash push -q`
On 08/08, Paul-Sebastian Ungureanu wrote: > This commit introduces more tests for the quiet option of > `git stash push`. I think this commit should be squashed into the previous one, so we have implementation and tests in one commit. That way it's easier to see during review that there are tests for the change. For more discussion on that also see [1]. [1]: https://public-inbox.org/git/20180806144726.gb97...@aiede.svl.corp.google.com/ > --- > t/t3903-stash.sh | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh > index 8d002a7f2..b78db74ae 100755 > --- a/t/t3903-stash.sh > +++ b/t/t3903-stash.sh > @@ -1064,6 +1064,27 @@ test_expect_success 'push: not in the > repository errors out' ' > test_path_is_file untracked > ' > > +test_expect_success 'push: -q is quiet with changes' ' > + >foo && > + git stash push -q >output 2>&1 && We create an untracked file here and then call 'git stash push', which will not create a new stash, as we don't use the --include-untracked option. In fact, right now this test is doing the same thing as the test below. There should be a 'git add foo' above the 'git stash push' call to test what we're claiming to test here. > + test_must_be_empty output > +' > + > +test_expect_success 'push: -q is quiet with no changes' ' > + git stash push -q >output 2>&1 && > + test_must_be_empty output > +' > + > +test_expect_success 'push: -q is quiet even if there is no initial commit' ' > + git init foo_dir && > + cd foo_dir && > + touch bar && The typical style in the test suite for creating a new file is to use '>bar', unless you care about the 'mtime' the file has. We don't seem to care about that in this test, so avoiding 'touch' would be better. > + test_must_fail git stash push -q >output 2>&1 && > + test_must_be_empty output && > + cd .. && The above should be in a subshell, i.e. ( cd foo_dir && touch bar && test_must_fail git stash push -q >output 2>&1 && test_must_be_empty output && ) then you don't have to do the 'cd ..' in the end. With the 'cd ..' in the end, if one of the commands between the 'cd foo_dir' and 'cd ..' fails, all subsequent tests will be run inside of 'foo_dir', which puts them in a different environment than they expect. That can cause all kinds of weirdness. If inside a subshell, the current working directory of the parent shell is unaffected, so we don't have to worry about cd'ing back, and subsequent tests will get the correct cwd even if things go wrong in this test. > + rm -rf foo_dir We'll want to run this cleanup to run even if the test fails. To do so, the 'test_when_finished' helper can be used. Using that, this would go at the top of the test, as 'test_when_finished rm -rf foo_dir'. Otherwise if any of the commands above fail, 'foo_dir' will not be removed, and may interfere with subsequent tests. > +' > + > test_expect_success 'untracked files are left in place when -u is not given' > ' > >file && > git add file && > -- > 2.18.0.573.g56500d98f >
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Junio C Hamano writes: > Junio C Hamano writes: > - strbuf_add(dest, src, n); + if (0 < n) + strbuf_add(dest, src, n); >>> >>> This check seems unnecessary. strbuf_add can cope fine with !n. >> >> I was primarily interested in catching negatives, and !n was a mere >> optimization, but you are correct to point out that negative n at >> this point in the codeflow is a BUG(). > > Actually, let's just lose the conditional. strbuf_add() would catch > and issue an error message when it notices that we fed negative > count anyway ;-). So I'll have this applied on top of the original topic to prevent a buggy version from escaping the lab. -- >8 -- Subject: [PATCH] sideband: do not read beyond the end of input The caller of maybe_colorize_sideband() gives a counted buffer , but the callee checked src[] as if it were a NUL terminated buffer. If src[] had all isspace() bytes in it, we would have made n negative, and then (1) made number of strncasecmp() calls to see if the remaining bytes in src[] matched keywords, reading beyond the end of the array (this actually happens even if n does not go negative), and/or (2) called strbuf_add() with negative count, most likely triggering the "you want to use way too much memory" error due to unsigned integer overflow. Fix both issues by making sure we do not go beyond &src[n]. In the longer term we may want to accept size_t as parameter for clarity (even though we know that a sideband message we are painting typically would fit on a line on a terminal and int is sufficient). Write it down as a NEEDSWORK comment. Helped-by: Jonathan Nieder Signed-off-by: Junio C Hamano --- sideband.c | 8 ++-- t/t5409-colorize-remote-messages.sh | 14 ++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/sideband.c b/sideband.c index 1c6bb0e25b..368647acf8 100644 --- a/sideband.c +++ b/sideband.c @@ -65,6 +65,8 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref * Optionally highlight one keyword in remote output if it appears at the start * of the line. This should be called for a single line only, which is * passed as the first N characters of the SRC array. + * + * NEEDSWORK: use "size_t n" instead for clarity. */ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) { @@ -75,7 +77,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) return; } - while (isspace(*src)) { + while (0 < n && isspace(*src)) { strbuf_addch(dest, *src); src++; n--; @@ -84,6 +86,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) for (i = 0; i < ARRAY_SIZE(keywords); i++) { struct keyword_entry *p = keywords + i; int len = strlen(p->keyword); + + if (n <= len) + continue; /* * Match case insensitively, so we colorize output from existing * servers regardless of the case that they use for their @@ -101,7 +106,6 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) } strbuf_add(dest, src, n); - } diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh index eb1b8aa05d..f81b6813c0 100755 --- a/t/t5409-colorize-remote-messages.sh +++ b/t/t5409-colorize-remote-messages.sh @@ -15,6 +15,8 @@ test_expect_success 'setup' ' echo warning: warning echo prefixerror: error echo " " "error: leading space" + echo "" + echo Err exit 0 EOF echo 1 >file && @@ -44,6 +46,12 @@ test_expect_success 'whole words at line start' ' grep "prefixerror: error" decoded ' +test_expect_success 'short line' ' + git -C child -c color.remote=always push -f origin HEAD:short-line 2>output && + test_decode_color decoded && + grep "remote: Err" decoded +' + test_expect_success 'case-insensitive' ' git --git-dir child/.git -c color.remote=always push -f origin HEAD:refs/heads/case-insensitive 2>output && cat output && @@ -58,6 +66,12 @@ test_expect_success 'leading space' ' grep " error: leading space" decoded ' +test_expect_success 'spaces only' ' + git -C child -c color.remote=always push -f origin HEAD:only-space 2>output && + test_decode_color decoded && + grep "remote: " decoded +' + test_expect_success 'no coloring for redirected output' ' git --git-dir child/.git push -f origin HEAD:refs/heads/redirected-output 2>output && test_decode_color decoded && -- 2.18.0-748-gfa03cdc39b
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Thu, Aug 16, 2018 at 1:54 AM Matthew DeVore wrote: > diff --git a/list-objects-filter.c b/list-objects-filter.c > index a0ba78b20..8e3caf5bf 100644 > --- a/list-objects-filter.c > +++ b/list-objects-filter.c > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init( > return d; > } > > +/* > + * A filter for list-objects to omit ALL trees and blobs from the traversal. > + * Can OPTIONALLY collect a list of the omitted OIDs. > + */ > +struct filter_trees_none_data { > + struct oidset *omits; > +}; > + > +static enum list_objects_filter_result filter_trees_none( > + enum list_objects_filter_situation filter_situation, > + struct object *obj, > + const char *pathname, > + const char *filename, > + void *filter_data_) > +{ > + struct filter_trees_none_data *filter_data = filter_data_; > + > + switch (filter_situation) { > + default: > + die("unknown filter_situation"); This sounds like BUG() than die() without _(). And you probably want to report filter_situation value too. > + return LOFR_ZERO; And neither BUG() or die() returns. -- Duy
Re: git-bug: Distributed bug tracker embedded in git
Michael Muré writes: > I released today git-bug, a distributed bug tracker that embeds in > git. It use git's internal storage to store bugs information in a way > that can be merged without conflict. You can push/pull to the normal > git remote you are already using to interact with other people. Normal > code and bugs are completely separated and no files are added in the > regular branches. This reminds me of a demo Scott Chacon showed us ages ago, the name of which escapes me. I guess great minds think alike, or something?
Re: [GSoC][PATCH v7 22/26] stash: convert save to builtin
On 08/08, Paul-Sebastian Ungureanu wrote: > Add stash save to the helper and delete functions which are no > longer needed (`show_help()`, `save_stash()`, `push_stash()`, > `create_stash()`, `clear_stash()`, `untracked_files()` and > `no_changes()`). > --- > builtin/stash--helper.c | 48 +++ > git-stash.sh| 311 +--- > 2 files changed, 50 insertions(+), 309 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index 5c27f5dcf..f54a476e3 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -26,6 +26,8 @@ static const char * const git_stash_helper_usage[] = { > N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] > [-q|--quiet]\n" > " [-u|--include-untracked] [-a|--all] [-m|--message > ]\n" > " [--] [...]]"), > + N_("git stash--helper save [-p|--patch] [-k|--[no-]keep-index] > [-q|--quiet]\n" > +" [-u|--include-untracked] [-a|--all] []"), > NULL > }; > > @@ -81,6 +83,12 @@ static const char * const git_stash_helper_push_usage[] = { > NULL > }; > > +static const char * const git_stash_helper_save_usage[] = { > + N_("git stash--helper save [-p|--patch] [-k|--[no-]keep-index] > [-q|--quiet]\n" > +" [-u|--include-untracked] [-a|--all] []"), > + NULL > +}; > + > static const char *ref_stash = "refs/stash"; > static int quiet; > static struct strbuf stash_index_path = STRBUF_INIT; > @@ -1445,6 +1453,44 @@ static int push_stash(int argc, const char **argv, > const char *prefix) >include_untracked, quiet, stash_msg); > } > > +static int save_stash(int argc, const char **argv, const char *prefix) > +{ > + int i; > + int keep_index = -1; > + int patch_mode = 0; > + int include_untracked = 0; > + int quiet = 0; > + char *stash_msg = NULL; > + struct strbuf alt_stash_msg = STRBUF_INIT; > + struct option options[] = { > + OPT_SET_INT('k', "keep-index", &keep_index, > + N_("keep index"), 1), > + OPT_BOOL('p', "patch", &patch_mode, > + N_("stash in patch mode")), > + OPT_BOOL('q', "quiet", &quiet, > + N_("quiet mode")), This could be OPT__QUIET again as mentioned in the previous patch as well. > + OPT_BOOL('u', "include-untracked", &include_untracked, > + N_("include untracked files in stash")), > + OPT_SET_INT('a', "all", &include_untracked, > + N_("include ignore files"), 2), > + OPT_STRING('m', "message", &stash_msg, N_("message"), > + N_("stash message")), > + OPT_END() > + }; > + > + argc = parse_options(argc, argv, prefix, options, > + git_stash_helper_save_usage, > + 0); > + > + for (i = 0; i < argc; ++i) > + strbuf_addf(&alt_stash_msg, "%s ", argv[i]); > + > + stash_msg = strbuf_detach(&alt_stash_msg, NULL); We unconditionally overwrite 'stash_msg' here, even if a '-m' parameter was given earlier, which I don't think is what we intended here. I think we started "supporting" (not erroring out on rather) the '-m' flag accidentally (my bad, sorry) in 'git stash save' rather than intentionally, and indeed it has the same behaviour as the code above. However I think we should just not add support the '-m' flag here, as it doesn't make a lot of sense to have two ways of passing a message. We also never free the memory we get back here from 'strbuf_detach'. As this is not code in 'libgit.a' that's probably fine, and we can just add an 'UNLEAK(stash_msg)' here I think. It may generally be interesting to consider using a leak checker, and see how far we can get in making this leak free. It may not be possible to make 'git stash' completely leak free, as the underlying APIs may not be, but it may be interesting to see how far we can get for the new code only. > + > + return do_push_stash(0, NULL, prefix, keep_index, patch_mode, > + include_untracked, quiet, stash_msg); > +} > + > int cmd_stash__helper(int argc, const char **argv, const char *prefix) > { > pid_t pid = getpid(); > @@ -1485,6 +1531,8 @@ int cmd_stash__helper(int argc, const char **argv, > const char *prefix) > return !!create_stash(argc, argv, prefix); > else if (!strcmp(argv[0], "push")) > return !!push_stash(argc, argv, prefix); > + else if (!strcmp(argv[0], "save")) > + return !!save_stash(argc, argv, prefix); > > usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]), > git_stash_helper_usage, options); > > [...]
Re: [GSoC][PATCH v7 23/26] stash: convert `stash--helper.c` into `stash.c`
On 08/08, Paul-Sebastian Ungureanu wrote: > The old shell script `git-stash.sh` was removed and replaced > entirely by `builtin/stash.c`. In order to do that, `create` and > `push` were adapted to work without `stash.sh`. For example, before > this commit, `git stash create` called `git stash--helper create > --message "$*"`. If it called `git stash--helper create "$@"`, then > some of these changes wouldn't have been necessary. > > This commit also removes the word `helper` since now stash is > called directly and not by a shell script. > --- > .gitignore | 1 - > Makefile | 3 +- > builtin.h| 2 +- > builtin/{stash--helper.c => stash.c} | 132 --- > git-stash.sh | 153 --- > git.c| 2 +- > 6 files changed, 74 insertions(+), 219 deletions(-) > rename builtin/{stash--helper.c => stash.c} (91%) > delete mode 100755 git-stash.sh > > diff --git a/builtin/stash--helper.c b/builtin/stash.c > similarity index 91% > rename from builtin/stash--helper.c > rename to builtin/stash.c > index f54a476e3..0ef88408a 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash.c > > [...] > > @@ -1445,9 +1448,10 @@ static int push_stash(int argc, const char **argv, > const char *prefix) > OPT_END() > }; > > - argc = parse_options(argc, argv, prefix, options, > - git_stash_helper_push_usage, > - 0); > + if (argc) > + argc = parse_options(argc, argv, prefix, options, > + git_stash_push_usage, > + 0); This change is a bit surprising here. Why is this necessary? I thought parse_options would handle no arguments just fine? > return do_push_stash(argc, argv, prefix, keep_index, patch_mode, >include_untracked, quiet, stash_msg); > @@ -1479,7 +1483,7 @@ static int save_stash(int argc, const char **argv, > const char *prefix) > }; > > argc = parse_options(argc, argv, prefix, options, > - git_stash_helper_save_usage, > + git_stash_save_usage, >0); > > for (i = 0; i < argc; ++i) > @@ -1491,7 +1495,7 @@ static int save_stash(int argc, const char **argv, > const char *prefix) >include_untracked, quiet, stash_msg); > } > > -int cmd_stash__helper(int argc, const char **argv, const char *prefix) > +int cmd_stash(int argc, const char **argv, const char *prefix) > { > pid_t pid = getpid(); > const char *index_file; > @@ -1502,16 +1506,16 @@ int cmd_stash__helper(int argc, const char **argv, > const char *prefix) > > git_config(git_default_config, NULL); > > - argc = parse_options(argc, argv, prefix, options, > git_stash_helper_usage, > + argc = parse_options(argc, argv, prefix, options, git_stash_usage, >PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH); > > index_file = get_index_file(); > strbuf_addf(&stash_index_path, "%s.stash.%" PRIuMAX, index_file, > (uintmax_t)pid); > > - if (argc < 1) > - usage_with_options(git_stash_helper_usage, options); > - if (!strcmp(argv[0], "apply")) > + if (argc == 0) > + return !!push_stash(0, NULL, prefix); > + else if (!strcmp(argv[0], "apply")) > return !!apply_stash(argc, argv, prefix); > else if (!strcmp(argv[0], "clear")) > return !!clear_stash(argc, argv, prefix); > @@ -1533,7 +1537,13 @@ int cmd_stash__helper(int argc, const char **argv, > const char *prefix) > return !!push_stash(argc, argv, prefix); > else if (!strcmp(argv[0], "save")) > return !!save_stash(argc, argv, prefix); > + if (*argv[0] == '-') { > + struct argv_array args = ARGV_ARRAY_INIT; > + argv_array_push(&args, "push"); > + argv_array_pushv(&args, argv); > + return !!push_stash(args.argc, args.argv, prefix); > + } This is a bit different than what the current code does. Currently the rules for when a plain 'git stash' becomes 'git stash push' are the following: - If there are no arguments. - If all arguments are option arguments. - If the first argument of 'git stash' is '-p'. - If the first argument of 'git stash' is '--'. This is to avoid someone typing 'git stash -q drop' for example, and then being surprised that a new stash was created instead of an old one being dropped, which what we have above would do. For more reasoning about these aliasing rules see also the thread at [1]. [1]: https://public-inbox.org/git/20170213200950.m3bcyp52wd25p...@sigill.intra.peff.net/ > > usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]), > -
[RFC PATCH] t7508-status: fix a bogus check in a submodule-related test
Since 90e14525f2 (Add tests for the diff.ignoreSubmodules config option, 2010-08-06) the test '.gitmodules ignore=dirty suppresses submodules with untracked content' in 't7508-status.sh' contains a check that is bogus for two reasons. The commit in question added the following three lines at the beginning of that test: git config diff.ignoreSubmodules dirty && git status >output && ! test -s actual && That 'test' is problematic, because: - The output of 'git status' is saved in the file 'output', but the subsequent 'test' looks at the file 'actual'. This is the first mention of 'actual' in t7508, so that file doesn't exist at that point, and, consequently, the 'test' itself fails. However, since there is a '!' in front to flip the exit code, the command as a whole succeeds. I guess that this 'test' should look at the file 'output' instead, but... - This whole command checks that the given file is empty, i.e. that, supposedly, 'git status' produced no output. However, in this case 'git status' does produce output, and indeed it should produce the same output as already expected in the neighbouring tests or even later in the same test, all running 'git status' in similar situations. So drop that bogus check, and verify that 'git status's output matches what's otherwise expected in similar cases. Signed-off-by: SZEDER Gábor --- This is a submodules-related test, and I'm not very well versed in submodules, hence the RFC. t/t7508-status.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t7508-status.sh b/t/t7508-status.sh index e1f11293e2..4ea528785a 100755 --- a/t/t7508-status.sh +++ b/t/t7508-status.sh @@ -1183,7 +1183,7 @@ test_expect_success '--ignore-submodules=dirty suppresses submodules with untrac test_expect_success '.gitmodules ignore=dirty suppresses submodules with untracked content' ' test_config diff.ignoreSubmodules dirty && git status >output && - ! test -s actual && + test_i18ncmp expect output && git config --add -f .gitmodules submodule.subname.ignore dirty && git config --add -f .gitmodules submodule.subname.path sm && git status >output && -- 2.18.0.903.gab616d7dc6
Re: [GSoC][PATCH v7 15/26] stash: convert create to builtin
On 08/18, Paul Sebastian Ungureanu wrote: > On Thu, Aug 16, 2018 at 1:13 AM, Thomas Gummerer wrote: > > On 08/08, Paul-Sebastian Ungureanu wrote: > >> > >> [...] > >> > >> + ret = -1; > >> + goto done; > >> + } > >> + untracked_commit_option = 1; > >> + } > >> + if (patch_mode) { > >> + ret = stash_patch(info, argv); > >> + if (ret < 0) { > >> + printf_ln("Cannot save the current worktree state"); > >> + goto done; > >> + } else if (ret > 0) { > >> + goto done; > >> + } > >> + } else { > >> + if (stash_working_tree(info, argv)) { > >> + printf_ln("Cannot save the current worktree state"); > >> + ret = -1; > >> + goto done; > >> + } > >> + } > >> + > >> + if (!*stash_msg || !strlen(*stash_msg)) > >> + strbuf_addf(&final_stash_msg, "WIP on %s", msg.buf); > >> + else > >> + strbuf_addf(&final_stash_msg, "On %s: %s\n", branch_name, > >> + *stash_msg); > >> + *stash_msg = strbuf_detach(&final_stash_msg, NULL); > > > > strbuf_detach means we're taking ownership of the memory, so we'll > > have to free it afterwards. Looking at this we may not even want to > > re-use the 'stash_msg' variable here, but instead introduce another > > variable for it, so it doesn't have a dual meaning in this function. > > > >> + > >> + /* > >> + * `parents` will be empty after calling `commit_tree()`, so there is > >> + * no need to call `free_commit_list()` > >> + */ > >> + parents = NULL; > >> + if (untracked_commit_option) > >> + commit_list_insert(lookup_commit(the_repository, > >> &info->u_commit), &parents); > >> + commit_list_insert(lookup_commit(the_repository, &info->i_commit), > >> &parents); > >> + commit_list_insert(head_commit, &parents); > >> + > >> + if (commit_tree(*stash_msg, strlen(*stash_msg), &info->w_tree, > >> + parents, &info->w_commit, NULL, NULL)) { > >> + printf_ln("Cannot record working tree state"); > >> + ret = -1; > >> + goto done; > >> + } > >> + > >> +done: > >> + strbuf_release(&commit_tree_label); > >> + strbuf_release(&msg); > >> + strbuf_release(&out); > >> + strbuf_release(&final_stash_msg); > >> + return ret; > >> +} > >> + > >> +static int create_stash(int argc, const char **argv, const char *prefix) > >> +{ > >> + int include_untracked = 0; > >> + int ret = 0; > >> + const char *stash_msg = NULL; > >> + struct stash_info info; > >> + struct option options[] = { > >> + OPT_BOOL('u', "include-untracked", &include_untracked, > >> + N_("include untracked files in stash")), > >> + OPT_STRING('m', "message", &stash_msg, N_("message"), > >> + N_("stash message")), > >> + OPT_END() > >> + }; > >> + > >> + argc = parse_options(argc, argv, prefix, options, > >> + git_stash_helper_create_usage, > >> + 0); > >> + > >> + ret = do_create_stash(argc, argv, prefix, &stash_msg, > >> + include_untracked, 0, &info); > > > > stash_msg doesn't have to be passed as a pointer to a pointer here, as > > we never need the modified value after this function returns. I think > > just passing 'stash_msg' here instead of '&stash_msg' will make > > 'do_create_stash' slightly easier to read. > > That's right, but `do_create_stash()` is also called by > `do_push_stash()`, which will need the modified value. Ah right, I didn't read that far yet when leaving this comment :) Reading the original push_stash again though, do we actually need the modified value in 'do_push_stash()'? The original lines are: create_stash -m "$stash_msg" -u "$untracked" -- "$@" store_stash -m "$stash_msg" -q $w_commit || die "$(gettext "Cannot save the current status")" '$stash_msg' gets passed in to 'create_stash()', but is the 'stash_msg' variable inside of 'create_stash()' is local and only the local copy is modified, so 'store_stash()' would still get the original. Or am I missing something?
Re: git-bug: Distributed bug tracker embedded in git
Hi, Ævar Arnfjörð Bjarmason wrote: > On Sat, Aug 18 2018, Jonathan Nieder wrote: >> Michael Muré wrote: >>> I released today git-bug, a distributed bug tracker [...] >> I am a bit unhappy about the namespace grab. Not for trademark >> reasons: the Git trademark rules are pretty clear about this kind of >> usage being okay. Instead, the unhappiness comes because a future Git >> command like "git bug" to produce a bug report with appropriate >> diagnostics for a bug in Git seems like a likely and useful thing to >> get added to Git some day. And now the name's taken. >> >> Is it too late to ask if it's possible to come up with a less generic >> name? > > Wouldn't we call such a thing "git-reportbug", or "git gitbug", with > reference to Debian reportbug or perl's perlbug? I hope you're kidding about "git gitbug". [...] > 1) Accept the status quo where people do create third party tools, much >of which are way too obscure to matter (e.g. I'm sure someone's >created a tool/alias called range-diff before, but we didn't >care). > >If those tools become popular enough in the wild they get own that >namespace, e.g. we're not going to ship a "git-annex" or "git-lfs" >ourselves implementing some unrelated features That's fair. Let me spell out my thinking a little more. This framework would lead me to rephrase my question to Michael a different way. Instead of saying that I'm not happy with the namespace grab, I should say something more severe: Don't be surprised if Git itself makes a "git bug" command in the future, and be prepared to rename. Is that preferable, in your opinion? I still think it's a reasonable thing for me to ask about, if only to save Michael some trouble later. Thanks, Jonathan
Re: [PATCHv3 0/5] Simple fixes to t7406
Hi, On Mon, Aug 13, 2018 at 1:28 PM SZEDER Gábor wrote: > On Tue, Aug 7, 2018 at 6:49 PM Elijah Newren wrote: > > > Since folks like to notice other problems with t7406 while reading my > > patches, here's a challenge: > > > > Find something *else* wrong with t7406 that neither I nor any of the > > reviewers so far have caught that could be fixed. > > Well, I'd hate to be that guy... but since those who already > commented on previous rounds are not explicitly excluded from the > challenge, let's see. > > - There are still a few command substitutions running git commands, > where the exit status of that command is ignored; just look for the > '[^=]$(' pattern in the test script. > > (Is not noticing those cases considered as "flubbing"?) Hmm, borderline. > - The 'compare_head' helper function defined in this test script looks > very similar to the generally available 'test_cmp_rev' function, > which has the benefit to provide some visible output on failure > (though, IMO, not a particularly useful output, because the diff of > two OIDs is not very informative, but at least it's something as > opposed to the silence of 'test $this = $that"). > > Now, since 'compare_head' always compares the same two revisions, > namely 'master' and HEAD, replacing 'compare_head' with an > appropriate 'test_cmp_rev' call would result in repeating 'master' > and 'HEAD' arguments all over the test script. I'm not sure whether > that's good or bad. Anyway, I think that 'compare_head' could be > turned into a wrapper around 'test_cmp_rev'. Ooh, that does sound better. > > - You get bonus points if that thing is in the context region for > > one of my five patches. > > - Extra bonus points if the thing needing fixing was on a line I > > changed. > > - You win outright if it's something big enough that I give up and > > request to just have my series merged as-is and punt your > > suggested fixes down the road to someone else. > > Well, there's always the indentation of the commands run in subshells, > which doesn't conform to our coding style... > > Gah, now you made me that guy ;) I read this on Monday and got a really good laugh. I meant to fix it up, but fell asleep too soon the first couple nights...and now this series is in next anyway and there are a couple other git things that have my attention. You have pointed out a couple additional nice fixups, like you always do, but I think at this point I'm just going to declare you the winner and label these as #leftoverbits. Thanks for always thoroughly looking over the testcase patches and your constant work to improve the testsuite.
Re: [GSoC][PATCH v7 16/26] stash: replace spawning a "read-tree" process
On 08/08, Paul-Sebastian Ungureanu wrote: > Instead of spawning a child process, make use of `reset_tree()` > function already implemented in `stash-helper.c`. > --- > builtin/stash--helper.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index a4e57899b..887b78d05 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -984,21 +984,18 @@ static int stash_patch(struct stash_info *info, const > char **argv) > static int stash_working_tree(struct stash_info *info, const char **argv) > { > int ret = 0; > - struct child_process cp0 = CHILD_PROCESS_INIT; > struct child_process cp1 = CHILD_PROCESS_INIT; > struct child_process cp2 = CHILD_PROCESS_INIT; > struct child_process cp3 = CHILD_PROCESS_INIT; > struct strbuf out1 = STRBUF_INIT; > struct strbuf out3 = STRBUF_INIT; > > - cp0.git_cmd = 1; > - argv_array_push(&cp0.args, "read-tree"); > - argv_array_pushf(&cp0.args, "--index-output=%s", stash_index_path.buf); > - argv_array_pushl(&cp0.args, "-m", oid_to_hex(&info->i_tree), NULL); > - if (run_command(&cp0)) { > + set_alternate_index_output(stash_index_path.buf); > + if (reset_tree(&info->i_tree, 0, 0)) { > ret = -1; > goto done; > } > + set_alternate_index_output(".git/index"); I think this second 'set_alternate_index_output()' should be 'set_alternate_index_output(NULL)', which has slightly different semantics than setting it to '.git/index'. Having it set means that the index is written unconditionally even if it is not set. Also the index file could be something other than ".git/index", if the GIT_INDEX_FILE environment variable is set, so it should be replaced with 'get_index_file()' if we were to keep this. I was also wondering if we could avoid writing a temporary index to disk, in 'stash_working_tree', but I don't see an easy way for doing that. > > cp1.git_cmd = 1; > argv_array_pushl(&cp1.args, "diff-index", "--name-only", "-z", > -- > 2.18.0.573.g56500d98f >
Re: [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite
On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy wrote: ... > diff --git a/read-cache.c b/read-cache.c > index 5ce40f39b3..41f313bc9e 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2744,6 +2744,9 @@ int write_locked_index(struct index_state *istate, > struct lock_file *lock, > int new_shared_index, ret; > struct split_index *si = istate->split_index; > > + if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0)) > + cache_tree_verify(istate); > + > if ((flags & SKIP_IF_UNCHANGED) && !istate->cache_changed) { > if (flags & COMMIT_LOCK) > rollback_lock_file(lock); > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 78f7097746..5b50f6e2e6 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -1083,6 +1083,12 @@ else > test_set_prereq C_LOCALE_OUTPUT > fi > > +if test -z "$GIT_TEST_CHECK_CACHE_TREE" > +then > + GIT_TEST_CHECK_CACHE_TREE=true > + export GIT_TEST_CHECK_CACHE_TREE > +fi > + > test_lazy_prereq PIPE ' > # test whether the filesystem supports FIFOs > test_have_prereq !MINGW,!CYGWIN && > diff --git a/unpack-trees.c b/unpack-trees.c > index bc43922922..3394540842 100644 > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1578,6 +1578,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, > struct unpack_trees_options > if (o->dst_index) { > move_index_extensions(&o->result, o->src_index); > if (!ret) { > + if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0)) > + cache_tree_verify(&o->result); > if (!o->result.cache_tree) > o->result.cache_tree = cache_tree(); > if (!cache_tree_fully_valid(o->result.cache_tree)) > -- > 2.18.0.1004.g6639190530 Should documentation of GIT_TEST_CHECK_CACHE_TREE be added in t/README, int the "Running tests with special setups" section?
Re: git-bug: Distributed bug tracker embedded in git
On Sat, Aug 18 2018, Jonathan Nieder wrote: > Hi, > > Ævar Arnfjörð Bjarmason wrote: >> On Sat, Aug 18 2018, Jonathan Nieder wrote: >>> Michael Muré wrote: > I released today git-bug, a distributed bug tracker > [...] >>> I am a bit unhappy about the namespace grab. Not for trademark >>> reasons: the Git trademark rules are pretty clear about this kind of >>> usage being okay. Instead, the unhappiness comes because a future Git >>> command like "git bug" to produce a bug report with appropriate >>> diagnostics for a bug in Git seems like a likely and useful thing to >>> get added to Git some day. And now the name's taken. >>> >>> Is it too late to ask if it's possible to come up with a less generic >>> name? >> >> Wouldn't we call such a thing "git-reportbug", or "git gitbug", with >> reference to Debian reportbug or perl's perlbug? > > I hope you're kidding about "git gitbug". It sounds a bit silly, but such a tool is going to be rarely used enough that we probably don't want to squat a 3 letter command to invoke it. > [...] >> 1) Accept the status quo where people do create third party tools, much >>of which are way too obscure to matter (e.g. I'm sure someone's >>created a tool/alias called range-diff before, but we didn't >>care). >> >>If those tools become popular enough in the wild they get own that >>namespace, e.g. we're not going to ship a "git-annex" or "git-lfs" >>ourselves implementing some unrelated features > > That's fair. Let me spell out my thinking a little more. > > This framework would lead me to rephrase my question to Michael a > different way. Instead of saying that I'm not happy with the > namespace grab, I should say something more severe: > > Don't be surprised if Git itself makes a "git bug" command in the > future, and be prepared to rename. > > Is that preferable, in your opinion? We're not going to make some blanket policy that doesn't recognize the difference between say git-lfs and git-tool_nobody_has_ever_heard_of, and then decide that it would be just as reasonable for us to ship a new git-lfs ourselves (which would do something different) as it were for us to ship git-tool_nobody_has_ever_heard_of. The reason I can drop a "git-whatever" in my $PATH and invoke it as "git whatever" is just a historical accident of how git was implemented. But because that feature has been exposed since the very beginning it's become an implicit API. There's thousands of git-whatever tools, and people do use these. The likes of git-lfs and git-annex are used a *lot* more than some builtins we ship. So we don't get to say "you never asked us about git-annex, we're using that name now" without considering how widely used it is. It's us who decided to expose the API of seamlessly integrating 3rd party tools.
Re: [PATCH v5 0/7] Speed up unpack_trees()
On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy wrote: > > v5 fixes some minor comments from round 4 and a big mistake in 5/5. > Junio's scary feeling turns out true. There is a missing invalidation > in keep_entry() which is not added in 6/7. 7/7 makes sure that similar I'm having trouble parsing this. Did you mean "...which is now added..."? Also, if 6/7 represents a fix to the "big mistake in 5/5", why is 6/7 separate from 5/7 instead of squashed in? > problems will not slip through. > > I had to rebase this series on top of 'master' because 7/7 caught a > bad cache-tree situation that has been fixed by Elijah in ad3762042a Cool, glad that helped. ... > Nguyễn Thái Ngọc Duy (7): > trace.h: support nested performance tracing > unpack-trees: add performance tracing > unpack-trees: optimize walking same trees with cache-tree > unpack-trees: reduce malloc in cache-tree walk > unpack-trees: reuse (still valid) cache-tree from src_index > unpack-trees: add missing cache invalidation > cache-tree: verify valid cache-tree in the test suite I read through the new series and only had one small comment. I'm not up to speed on cache-tree stuff, still, so don't feel qualified to give an Ack on it.
Re: [GSoC][PATCH v7 17/26] stash: avoid spawning a "diff-index" process
On 08/08, Paul-Sebastian Ungureanu wrote: > This commits replaces spawning `diff-index` child process by using > the already existing `diff` API I think this should be squashed into the previous commit. It's easier to review a commit that replaces all the 'run_command'/'pipe_command' calls in one function, rather than doing it call by call, especially if they interact with eachother. E.g. I was going to suggest replacing the 'write_tree' call as well, but reading ahead in the series I see that that's already being done :) While replacing all the calls of one type with the internal API call is probably easiest for writing the patches, at least I would find it easier to review replacing the run-command API calls in one codepath at a time. > --- > builtin/stash--helper.c | 56 ++--- > 1 file changed, 42 insertions(+), 14 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index 887b78d05..f905d3908 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -12,6 +12,7 @@ > #include "rerere.h" > #include "revision.h" > #include "log-tree.h" > +#include "diffcore.h" > > static const char * const git_stash_helper_usage[] = { > N_("git stash--helper list []"), > @@ -297,6 +298,18 @@ static int reset_head(const char *prefix) > return run_command(&cp); > } > > +static void add_diff_to_buf(struct diff_queue_struct *q, > + struct diff_options *options, > + void *data) > +{ > + int i; > + for (i = 0; i < q->nr; i++) { > + struct diff_filepair *p = q->queue[i]; > + strbuf_addstr(data, p->one->path); > + strbuf_addch(data, '\n'); What about filenames that include a '\n'? I think this in combination with removing the '-z' flag from the 'update-index' call will break with filenames that have a LF in them. This should be a '\0' instead of a '\n', and we should still be using the '-z' flag in 'update-index'. > + } > +} > + > static int get_newly_staged(struct strbuf *out, struct object_id *c_tree) > { > struct child_process cp = CHILD_PROCESS_INIT; > @@ -981,14 +994,16 @@ static int stash_patch(struct stash_info *info, const > char **argv) > return ret; > } > > -static int stash_working_tree(struct stash_info *info, const char **argv) > +static int stash_working_tree(struct stash_info *info, > + const char **argv, const char *prefix) > { > int ret = 0; > - struct child_process cp1 = CHILD_PROCESS_INIT; > struct child_process cp2 = CHILD_PROCESS_INIT; > struct child_process cp3 = CHILD_PROCESS_INIT; > - struct strbuf out1 = STRBUF_INIT; > struct strbuf out3 = STRBUF_INIT; We're left with cp{2,3} and out3 here, which is a bit weird. Then again renaming them in this patch adds more churn, making it harder to review. Maybe instead of numbering them it would be better to name them after the child process they are calling? e.g. 'cp1' could become 'cp_di', and so on? > + struct argv_array args = ARGV_ARRAY_INIT; > + struct strbuf diff_output = STRBUF_INIT; > + struct rev_info rev; > > set_alternate_index_output(stash_index_path.buf); > if (reset_tree(&info->i_tree, 0, 0)) { > @@ -997,26 +1012,36 @@ static int stash_working_tree(struct stash_info *info, > const char **argv) > } > set_alternate_index_output(".git/index"); > > - cp1.git_cmd = 1; > - argv_array_pushl(&cp1.args, "diff-index", "--name-only", "-z", > - "HEAD", "--", NULL); > + argv_array_push(&args, "dummy"); Not being familiar with the setup_revisions code, I had to dig a bit to figure out why this makes sense. This is a dummy replacement for argv[0] in normal operation. In retrospect it's kind of obvious, but maybe call "dummy" "fake_argv0" instead, to help nudge future readers in the right direction? > if (argv) > - argv_array_pushv(&cp1.args, argv); > - argv_array_pushf(&cp1.env_array, "GIT_INDEX_FILE=%s", > - stash_index_path.buf); > + argv_array_pushv(&args, argv); > + git_config(git_diff_basic_config, NULL); > + init_revisions(&rev, prefix); > + args.argc = setup_revisions(args.argc, args.argv, &rev, NULL); > + > + rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK; > + rev.diffopt.format_callback = add_diff_to_buf; > + rev.diffopt.format_callback_data = &diff_output; > + > + if (read_cache_preload(&rev.diffopt.pathspec) < 0) { > + ret = -1; > + goto done; > + } > > - if (pipe_command(&cp1, NULL, 0, &out1, 0, NULL, 0)) { > + add_pending_object(&rev, parse_object(the_repository, &info->b_commit), > ""); > + if (run_diff_index(&rev, 0)) { > ret = -1; > goto done; > } > > cp2.git_cmd = 1; > - argv_array_pushl(&cp2.args, "update-index"
Re: git-bug: Distributed bug tracker embedded in git
Ævar Arnfjörð Bjarmason wrote: > The reason I can drop a "git-whatever" in my $PATH and invoke it as "git > whatever" is just a historical accident of how git was implemented. No. This is a very deliberate design decision, to allow people to prototype new Git commands (and to create the kind of ecosystem that allows commands to be implemented outside Git. [...] > So we don't get to say "you never asked us about git-annex, we're using > that name now" without considering how widely used it is. It's us who > decided to expose the API of seamlessly integrating 3rd party tools. I think we're talking past each other. I haven't proposed any blanket policy. I'm saying that "git bug" is a bad name for this tool: - it's hard to find with search engines - it conflicts with some likely good future changes to Git - it assumes that no one else will have some other refinement of the Git bugtracker concept, that it is the only "git bug" tool It's a namespace grab. There's nothing stopping someone from naming a command "bug", either, but that doesn't make it a good idea. (I'm not saying that was the intent --- that's just the effect.) Meanwhile it looks like a neat tool, and I'm very supportive of the idea. But you certainly still have not convinced me that the name is a good idea, or that I shouldn't be bringing this up. I'm not sure *what* you're trying to convince me of, actually. Jonathan
Re: [GSoC][PATCH v7 21/26] stash: replace spawning `git ls-files` child process
On 08/08, Paul-Sebastian Ungureanu wrote: > This commit replaces spawning `git ls-files` child process with > API calls to get the untracked files. > --- > builtin/stash--helper.c | 49 +++-- > 1 file changed, 32 insertions(+), 17 deletions(-) > > diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c > index 4fd79532c..5c27f5dcf 100644 > --- a/builtin/stash--helper.c > +++ b/builtin/stash--helper.c > @@ -813,27 +813,42 @@ static int store_stash(int argc, const char **argv, > const char *prefix) > /* > * `out` will be filled with the names of untracked files. The return value > is: > * > - * < 0 if there was a bug (any arg given outside the repo will be detected > - * by `setup_revision()`) > * = 0 if there are not any untracked files > * > 0 if there are untracked files > */ > -static int get_untracked_files(const char **argv, int line_term, > +static int get_untracked_files(const char **argv, const char *prefix, > int include_untracked, struct strbuf *out) > { > - struct child_process cp = CHILD_PROCESS_INIT; > - cp.git_cmd = 1; > - argv_array_pushl(&cp.args, "ls-files", "-o", NULL); > - if (line_term) > - argv_array_push(&cp.args, "-z"); > + int max_len; > + int i; > + char *seen; > + struct dir_struct dir; > + struct pathspec pathspec; > + > + memset(&dir, 0, sizeof(dir)); > if (include_untracked != 2) > - argv_array_push(&cp.args, "--exclude-standard"); > - argv_array_push(&cp.args, "--"); > - if (argv) > - argv_array_pushv(&cp.args, argv); > + setup_standard_excludes(&dir); > > - if (pipe_command(&cp, NULL, 0, out, 0, NULL, 0)) > - return -1; > + parse_pathspec(&pathspec, 0, > +PATHSPEC_PREFER_FULL, > +prefix, argv); > + seen = xcalloc(pathspec.nr, 1); > + > + max_len = fill_directory(&dir, the_repository->index, &pathspec); > + for (i = 0; i < dir.nr; i++) { > + struct dir_entry *ent = dir.entries[i]; > + if (!dir_path_match(ent, &pathspec, max_len, seen)) { > + free(ent); > + continue; > + } > + strbuf_addf(out, "%s\n", ent->name); As mentioned in my comments about the 'diff-index' replacement, this '\n' should probably be '\0', and we should keep the '-z' flag in 'git update-index', in case somebody has a '\n' in their filenames. While creating such a file is probably not a good idea anyway, we should still be able to handle it (and have been before this, so we shouldn't break it). > + free(ent); > + } > + > + free(dir.entries); > + free(dir.ignored); > + clear_directory(&dir); > + free(seen); > return out->len; > } > > @@ -888,7 +903,7 @@ static int check_changes(const char **argv, int > include_untracked, > goto done; > } > > - if (include_untracked && get_untracked_files(argv, 0, > + if (include_untracked && get_untracked_files(argv, prefix, >include_untracked, &out)) > ret = 1; > > @@ -908,7 +923,7 @@ static int save_untracked_files(struct stash_info *info, > struct strbuf *msg, > struct child_process cp2 = CHILD_PROCESS_INIT; > > cp.git_cmd = 1; > - argv_array_pushl(&cp.args, "update-index", "-z", "--add", > + argv_array_pushl(&cp.args, "update-index", "--add", >"--remove", "--stdin", NULL); > argv_array_pushf(&cp.env_array, "GIT_INDEX_FILE=%s", >stash_index_path.buf); > @@ -1134,7 +1149,7 @@ static int do_create_stash(int argc, const char **argv, > const char *prefix, > goto done; > } > > - if (include_untracked && get_untracked_files(argv, 1, > + if (include_untracked && get_untracked_files(argv, prefix, >include_untracked, &out)) { > if (save_untracked_files(info, &msg, &out)) { > if (!quiet) > -- > 2.18.0.573.g56500d98f >
Re: git-bug: Distributed bug tracker embedded in git
On Sat, Aug 18 2018, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > >> The reason I can drop a "git-whatever" in my $PATH and invoke it as "git >> whatever" is just a historical accident of how git was implemented. > > No. This is a very deliberate design decision, to allow people to > prototype new Git commands (and to create the kind of ecosystem that > allows commands to be implemented outside Git. > > [...] >> So we don't get to say "you never asked us about git-annex, we're using >> that name now" without considering how widely used it is. It's us who >> decided to expose the API of seamlessly integrating 3rd party tools. > > I think we're talking past each other. I haven't proposed any blanket > policy. I'm saying that "git bug" is a bad name for this tool: > > - it's hard to find with search engines > - it conflicts with some likely good future changes to Git > - it assumes that no one else will have some other refinement of the >Git bugtracker concept, that it is the only "git bug" tool > > It's a namespace grab. There's nothing stopping someone from naming a > command "bug", either, but that doesn't make it a good idea. (I'm not > saying that was the intent --- that's just the effect.) > > Meanwhile it looks like a neat tool, and I'm very supportive of the > idea. But you certainly still have not convinced me that the name is > a good idea, or that I shouldn't be bringing this up. > > I'm not sure *what* you're trying to convince me of, actually. I'm not saying the git-bug name is a good idea, or that it isn't. I don't care about this particular case when it comes to naming. I'm just pointing out in the more general case that if someone comes up with a badly named git-xyz it doesn't scale to try to point this out to them before git-xyz is widely deployed. So we must either let it go (solution #1), or come up with some API-level solution that makes it a non-issue (my #3).
Re: git-bug: Distributed bug tracker embedded in git
Ævar Arnfjörð Bjarmason wrote: > I'm just pointing out in the more general case that if someone comes up > with a badly named git-xyz it doesn't scale to try to point this out to > them before git-xyz is widely deployed. > > So we must either let it go (solution #1), or come up with some > API-level solution that makes it a non-issue (my #3). How about solution #4: live and let live when it comes down to it, but act like actual people and talk to avoid negative consequences? Some social problems don't have technical solutions. Talking scales, in strange ways. For example, people are able to look at the list archive, people are able to spread thoughts they have found interesting, and so on. Jonathan
Re: [GSoC][PATCH v7 24/26] stash: optimize `get_untracked_files()` and `check_changes()`
On 08/08, Paul-Sebastian Ungureanu wrote: > This commits introduces a optimization by avoiding calling the > same functions again. For example, `git stash push -u` > would call at some points the following functions: > > * `check_changes()` > * `do_create_stash()`, which calls: `check_changes()` and > `get_untracked_files()` > > Note that `check_changes()` also calls `get_untracked_files()`. > So, `check_changes()` is called 2 times and `get_untracked_files()` > 3 times. By checking at the beginning of the function if we already > performed a check, we can avoid making useless calls. While I can see that this may give us some performance gains, what's being described above sounds like we should look into why we are making these duplicate calls in the first place, rather than trying to return early from them. I feel like the duplicate calls are mostly a remnant from the way the shell script was written, but not inherent to the design of 'git stash'. For example 'check_changes' could be called from 'create_stash' directly, so we don't have to call it in 'do_create_stash', in the process removing the duplicate call from the 'git stash push' codepath. That would provide the same improvements, and keep the code cleaner, rather than introducing more special cases for these functions. I haven't looked into the 'get_untracked_files()' call chain yet, but I imagine we can do something similar for that.
Re: git-bug: Distributed bug tracker embedded in git
(cc-ing Elijah Newren for the points about merging) Hi again, To avoid the other thread shadowing more important things: Michael Muré wrote: > Someone suggested in the Hacker News thread [0] to post it here as well. Thanks to Ævar for that. [...] > git-bug use as identifier the hash of the first commit in the chain > of commit of the bug. Clever! I like this approach to the naming problem. [...] > Git doesn't provide a low-level command to rebase a branch onto > another without touching the index. Thanks for pointing this out. There's been some recent work to make Git's merge code (also used for cherry-pick) less reliant on the index and worktree. See https://crbug.com/git/12 for some references. There's also been some heavy refactoring of "git rebase" code to be in C and be able to make use of library functions instead of being a shell script. That's all to say that we're in a pretty good place to consider introducing commands like git cherry-pick --onto= In absence of that kind of thing, you can run commands that need to touch the index (but not the working tree) by setting the GIT_INDEX environment variable to point to a temporary index file. > I'd love to have some feedback from you. Contribution are also very > much welcomed. Can you say more about the federation model it intends to support? For example, do you imagine - having multiple copies of a git bugs repo that automatically fetch updates from each other - having explicit "pull request" synchronization moments when the owners of one copy of a bug tracker push or request a fetch of changes that have been happening on another - individual contributors using an offline copy of the bug tracker and pushing push/pull mostly to synchronize with a single centralized copy - something else? Thanks, Jonathan
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
On Sat, Aug 18, 2018 at 09:16:28AM -0700, Junio C Hamano wrote: > -- >8 -- > Subject: [PATCH] sideband: do not read beyond the end of input > > The caller of maybe_colorize_sideband() gives a counted buffer > , but the callee checked src[] as if it were a NUL terminated > buffer. If src[] had all isspace() bytes in it, we would have made > n negative, and then > > (1) made number of strncasecmp() calls to see if the remaining > bytes in src[] matched keywords, reading beyond the end of the > array (this actually happens even if n does not go negative), > and/or > > (2) called strbuf_add() with negative count, most likely triggering > the "you want to use way too much memory" error due to unsigned > integer overflow. > > Fix both issues by making sure we do not go beyond &src[n]. Thanks. I've been sporadically seeing "fatal: you want to use way too much memory" the last few days while running 'next', and finally managed to catch a reproducible case. This patch definitely fixes it. > In the longer term we may want to accept size_t as parameter for > clarity (even though we know that a sideband message we are painting > typically would fit on a line on a terminal and int is sufficient). > Write it down as a NEEDSWORK comment. This "typically" made me nervous about what happens in the non-typical case, but I think we can say something even stronger: the length comes from a pktline, so the maximum is less than 16 bits. I wondered if we might ever call this on the accumulated string from multiple sidebands, but it doesn't look like it. -Peff
Re: git-bug: Distributed bug tracker embedded in git
Here was my reasoning for the naming choice: - I need something meaningful - I need something that encompass the idea and features of a bug tracker because the narrower ideas and actions will be in sub commands - other projects already used other words, in particular "issue" - it kind of sounds and looks good You say that it's a namespace grab and I understand that, but in the other hand, there is not that much freedom when choosing a name. Sorry if I'm stepping on someone's toe :-| 2018-08-19 0:50 GMT+02:00 Jonathan Nieder : > (cc-ing Elijah Newren for the points about merging) > Hi again, > > To avoid the other thread shadowing more important things: > > Michael Muré wrote: > >> Someone suggested in the Hacker News thread [0] to post it here as well. > > Thanks to Ævar for that. > > [...] >> git-bug use as identifier the hash of the first commit in the chain >> of commit of the bug. > > Clever! I like this approach to the naming problem. > > [...] >> Git doesn't provide a low-level command to rebase a branch onto >> another without touching the index. > > Thanks for pointing this out. There's been some recent work to make > Git's merge code (also used for cherry-pick) less reliant on the index > and worktree. See https://crbug.com/git/12 for some references. > There's also been some heavy refactoring of "git rebase" code to be in > C and be able to make use of library functions instead of being a > shell script. > > That's all to say that we're in a pretty good place to consider > introducing commands like > > git cherry-pick --onto= > > In absence of that kind of thing, you can run commands that need to > touch the index (but not the working tree) by setting the GIT_INDEX > environment variable to point to a temporary index file. > >> I'd love to have some feedback from you. Contribution are also very >> much welcomed. > > Can you say more about the federation model it intends to support? > For example, do you imagine > > - having multiple copies of a git bugs repo that automatically fetch > updates from each other > > - having explicit "pull request" synchronization moments when the > owners of one copy of a bug tracker push or request a fetch of > changes that have been happening on another > > - individual contributors using an offline copy of the bug tracker > and pushing push/pull mostly to synchronize with a single > centralized copy > > - something else? > > Thanks, > Jonathan -- Michael
Re: git-bug: Distributed bug tracker embedded in git
> There's been some recent work to make > Git's merge code (also used for cherry-pick) less reliant on the index > and worktree. Yes please ! In the mean time, someone suggested another trick [0]. > Can you say more about the federation model it intends to support? My goal is to have a workflow similar as what git does, to be versatile and leave to the users the choice of the topology they want to use. Obviously, it will be most of the time a single remote where they collaborate. As a bug tracker is a different workflow than regular code, there will be some tooling to help. For instance, automatic push/pull will help make it easier to use and more "out of the way". In the data model, each commit in the linear chain link to an array of new edit operations. That means that each commit is strictly independent from the others. When you get updates for a bug and you need to merge them, you will simply rebase your new commits on top of the linear chain. This design has several properties: - the merge happen on the user repo where git-bug is installed and can validate new data - the remote used doesn't have to be aware of git-bug - when pushing an update of a ref, the remote will make sure that it's fast forward, that is, no previous edit operations has been removed. It ensure that the history is append only. So for now, collaboration is based on push/pull to whatever remote you want, as git does, with the exception of the Web UI. The goal here is to have it running locally for each user but also to make it a public interface for users that don't have write access to the repo, much like any bug tracker has. In the future, it could be possible to have more fancy features like a federated forge with ActivityPub, but that's way outside of the scope of the project for now. [0]: https://github.com/MichaelMure/git-bug/issues/15 2018-08-19 2:45 GMT+02:00 Michael Muré : > Here was my reasoning for the naming choice: > > - I need something meaningful > - I need something that encompass the idea and features of a bug > tracker because the narrower ideas and actions will be in sub commands > - other projects already used other words, in particular "issue" > - it kind of sounds and looks good > > You say that it's a namespace grab and I understand that, but in the > other hand, there is not that much freedom when choosing a name. Sorry > if I'm stepping on someone's toe :-| > > 2018-08-19 0:50 GMT+02:00 Jonathan Nieder : >> (cc-ing Elijah Newren for the points about merging) >> Hi again, >> >> To avoid the other thread shadowing more important things: >> >> Michael Muré wrote: >> >>> Someone suggested in the Hacker News thread [0] to post it here as well. >> >> Thanks to Ævar for that. >> >> [...] >>> git-bug use as identifier the hash of the first commit in the chain >>> of commit of the bug. >> >> Clever! I like this approach to the naming problem. >> >> [...] >>> Git doesn't provide a low-level command to rebase a branch onto >>> another without touching the index. >> >> Thanks for pointing this out. There's been some recent work to make >> Git's merge code (also used for cherry-pick) less reliant on the index >> and worktree. See https://crbug.com/git/12 for some references. >> There's also been some heavy refactoring of "git rebase" code to be in >> C and be able to make use of library functions instead of being a >> shell script. >> >> That's all to say that we're in a pretty good place to consider >> introducing commands like >> >> git cherry-pick --onto= >> >> In absence of that kind of thing, you can run commands that need to >> touch the index (but not the working tree) by setting the GIT_INDEX >> environment variable to point to a temporary index file. >> >>> I'd love to have some feedback from you. Contribution are also very >>> much welcomed. >> >> Can you say more about the federation model it intends to support? >> For example, do you imagine >> >> - having multiple copies of a git bugs repo that automatically fetch >> updates from each other >> >> - having explicit "pull request" synchronization moments when the >> owners of one copy of a bug tracker push or request a fetch of >> changes that have been happening on another >> >> - individual contributors using an offline copy of the bug tracker >> and pushing push/pull mostly to synchronize with a single >> centralized copy >> >> - something else? >> >> Thanks, >> Jonathan > > > > -- > Michael -- Michael
Re: git-bug: Distributed bug tracker embedded in git
(cc-ing Scott) Hi Junio, Junio C Hamano wrote: > Michael Muré writes: >> I released today git-bug, a distributed bug tracker that embeds in >> git. It use git's internal storage to store bugs information in a way >> that can be merged without conflict. You can push/pull to the normal >> git remote you are already using to interact with other people. Normal >> code and bugs are completely separated and no files are added in the >> regular branches. > > This reminds me of a demo Scott Chacon showed us ages ago, the name > of which escapes me. I guess great minds think alike, or something? I believe you're thinking of TicGit[1]. Some other related work is listed at [2]. Most of these projects have gone quiet: - ditz[3] - git-issues[4] - cil[5] - Bugs Everywhere[6] - milli by Steve Kemp, which I haven't found a copy of - simple defects[7] - kipling[8] http://www.cs.unb.ca/~bremner/blog/posts/git-issue-trackers/ gives a nice overview, though it's rather old. This area seems to have gone mostly quiet since 2014, so it's nice to see new work. Thanks, Jonathan [1] https://github.com/jeffWelling/ticgit [2] https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools#Bug.2Fissue_trackers.2C_etc. [3] https://github.com/jashmenn/ditz [4] https://github.com/duplys/git-issues [5] https://github.com/chilts/cil [6] http://bugseverywhere.org/ [7] https://syncwith.us/sd/, https://gitorious.org/prophet/sd [8] https://gitorious.org/kipling/mainline
Re: [PATCH v3] checkout: optimize "git checkout -b "
On Fri, Aug 17, 2018 at 5:41 AM Ben Peart wrote: > On 8/16/2018 2:37 PM, Duy Nguyen wrote: > > On Thu, Aug 16, 2018 at 8:27 PM Ben Peart wrote: > >> > >> From: Ben Peart > >> > >> Skip merging the commit, updating the index and working directory if and > >> only if we are creating a new branch via "git checkout -b ." > >> Any other checkout options will still go through the former code path. > >> > >> If sparse_checkout is on, require the user to manually opt in to this > >> optimzed behavior by setting the config setting checkout.optimizeNewBranch > >> to true as we will no longer update the skip-worktree bit in the index, nor > >> add/remove files in the working directory to reflect the current sparse > >> checkout settings. > >> > >> For comparison, running "git checkout -b " on a large repo > >> takes: > >> > >> 14.6 seconds - without this patch > >> 0.3 seconds - with this patch > > > > I still don't think we should do this. If you want lightning fast > > branch creation, just use 'git branch'. From the timing breakdown you > > shown in the other thread it looks like sparse checkout still takes > > seconds, which could be optimized (or even excluded, I mentioned this > > too). And split index (or something similar if you can't use it) would > > give you saving across the board. There is still one idea Elijah gave > > me that should further lower traverse_trees() cost. > > > > We have investigated some of these already - split index ended up > slowing things down more than it sped them up do to the higher compute > costs. Sparse checkout we've already optimized significantly - limiting > the patterns we accept so that we can do the lookup via a hashmap > instead of the robust pattern matching. We will continue to look for > other optimizations and appreciate any and all ideas! > > In the end, this optimization makes a huge performance improvement by > avoiding doing a lot of work that isn't necessary. Taking a command > from 14+ seconds to sub-second is just too much of a win for us to ignore. > > > But anyway, it's not my call. I'll stop here. It's even less of my call, but since things seem to be stuck in what-should-we-do state (as per Junio's comments on this patch in the last two "What's cooking" emails), and since Ben and Duy obviously have opposite opinions on Ben's patch, let's see if I might be able to help at all. Here's my summary and my findings: == The pain == - For repositories with a really large number of entries (500K as Ben says), some operations are much slower than it feels like they should be. - This does not seem to be GFVS-specific in any way, I can duplicate slowness with a simple git-bomb[1]-like repo that has a sparse checkout pattern ignoring the "bomb" side. (It has 1M+1 entries in the index, and .git/info/sparse-checkout ignores the 1M so the working copy only has 1 entry). The timings on my repo for "git checkout -b $NEWBRANCH" are almost exactly double what Ben reports he gets on their repo. [1] https://kate.io/blog/making-your-own-exploding-git-repos/ == Short term solutions == - Alternative git commands exist today to do a fast checkout of a new branch in a huge repo. I also get sub-second timings in my even-bigger repo with this: git branch $NEWBRANCH && git symbolic-ref HEAD $NEWBRANCH But I do understand that wrapping this into a script or executable (git-fast-new-branch?) and asking users to use it is a usability problem and an uphill battle. (Sidenote: this isn't quite the same operation; it's missing a reflog update. The -m option to symbolic-ref doesn't seem to help; I guess the fact that HEAD's sha1sum is not changing is viewed as not an update? However, that could also be scripted around.) - Ben's patch successfully drops the time for "git checkout -b $NEWBRANCH" from 26+ seconds (in my cooked-up testcase) to sub-second (in fact, under .1 seconds for me). That's a _huge_ win. == unpack_trees optimization notes == - Ben's patch is extremely focused. It only affects "git checkout -b $NEWBRANCH". If someone runs "git branch $NEWBRANCH && git checkout $NEWBRANCH", they get the old 26+ second timing. They also suddenly get the really long timing if they add any other flags or checkout a commit that differs in only a single entry in the entire tree. It would be nice if we did general optimization for all issues rather than just special casing such narrow cases. - However, optimizing unpack_trees is hard. It's really easy to get lost trying to look at the code. Time has been spent trying to optimizing it. Ben really likes the speedup factors of 2-3 that Duy has produced. But he's pessimistic we'll find enough to bridge the gap for this case. And he's worried about breaking unrelated stuff due to the complexity of unpack_trees. - Duy is pretty sure we can optimize unpack_trees in at least one more way. I've tried looking through the code and think there are others, but then again I'm known to get lost and confused in unpack_trees. ==
Re: git-bug: Distributed bug tracker embedded in git
On Sat, Aug 18, 2018 at 3:50 PM Jonathan Nieder wrote: > (cc-ing Elijah Newren for the points about merging) > [...] > > Git doesn't provide a low-level command to rebase a branch onto > > another without touching the index. > > Thanks for pointing this out. There's been some recent work to make > Git's merge code (also used for cherry-pick) less reliant on the index > and worktree. See https://crbug.com/git/12 for some references. > There's also been some heavy refactoring of "git rebase" code to be in > C and be able to make use of library functions instead of being a > shell script. > > That's all to say that we're in a pretty good place to consider > introducing commands like > > git cherry-pick --onto= Yes, indeed, after the merge refactoring/rewriting stuff is complete, this is one thing already on my list that I wanted to do with it. Another thing I'd like to investigate with it is how much "in-memory" merges could speed up interactive rebases, as suggested by Dscho[1]. Once we do "in-memory" merges for interactive rebases for performance reasons, we're pretty close to having a rebase-without-touching-index-or-worktree that we can make accessible to other scripts like git-bug. However, we have to have a pretty good answer about what to do when we hit conflicts. [1] https://public-inbox.org/git/nycvar.qro.7.76.6.180616000...@tvgsbejvaqbjf.bet/
Re: Git Project Leadership Committee
On Thu, Aug 16, 2018 at 3:43 PM Jeff King wrote: > - we should avoid anyone who is affiliated with a company that already >has a member on the committee. So nobody from Google and nobody from >GitHub. I would extend that to Microsoft, too, given a certain >impending acquisition. I'd expect anybody who is affiliated with a >company to recuse themselves from decisions that directly affect that >company (which is what we've done so far). That might make it hard for some of us to nominate others, since as far as I can tell (e.g. looking at shortlog -sne output) few git contributors use work email addresses to do so and we don't necessarily know employers of other contributors. > So here are the nominations I came up with. If you'd like to nominate > somebody else (or yourself!), please do. If you have opinions, let me > know (public or private, as you prefer). > > - Christian Couder > - Ævar Arnfjörð Bjarmason I think either of them would be excellent choices.
Re: git-bug: Distributed bug tracker embedded in git
[+cc Stefan Monnier] Jonathan Nieder writes: > (cc-ing Scott) [...] > I believe you're thinking of TicGit[1]. > > Some other related work is listed at [2]. Most of these projects have > gone quiet: > > - ditz[3] > - git-issues[4] > - cil[5] > - Bugs Everywhere[6] > - milli by Steve Kemp, which I haven't found a copy of > - simple defects[7] > - kipling[8] To add to that list: There's also BuGit [1,2], though it too seems to have gone quiet. [1]: https://gitlab.com/monnier/bugit [2]: https://public-inbox.org/git/jwva8psr6vr.fsf-monnier+gmane.comp.version-control@gnu.org/ -- Kyle
Re: Git Project Leadership Committee
On Sat, Aug 18, 2018 at 07:32:38PM -0700, Elijah Newren wrote: > On Thu, Aug 16, 2018 at 3:43 PM Jeff King wrote: > > - we should avoid anyone who is affiliated with a company that already > >has a member on the committee. So nobody from Google and nobody from > >GitHub. I would extend that to Microsoft, too, given a certain > >impending acquisition. I'd expect anybody who is affiliated with a > >company to recuse themselves from decisions that directly affect that > >company (which is what we've done so far). > > That might make it hard for some of us to nominate others, since as > far as I can tell (e.g. looking at shortlog -sne output) few git > contributors use work email addresses to do so and we don't > necessarily know employers of other contributors. I would say to err on the side of nominating, and the candidate can provide the information. -Peff
Re: git-bug: Distributed bug tracker embedded in git
(+ git-dit authors) Kyle Meyer wrote: > Jonathan Nieder writes: >> I believe you're thinking of TicGit[1]. >> >> Some other related work is listed at [2]. Most of these projects have >> gone quiet: >> >> - ditz[3] >> - git-issues[4] >> - cil[5] >> - Bugs Everywhere[6] >> - milli by Steve Kemp, which I haven't found a copy of >> - simple defects[7] >> - kipling[8] > > To add to that list: There's also BuGit [1,2], though it too seems to > have gone quiet. I just found a good list on a non Git-specific wiki[3] that is more helpful (more up to date and has more discussion) than the list on the Git wiki. It might be a good place to coordinate and compare notes. git-dit[4] in particular seems to have very similar goals and a similar data model. In my ideal world there may be a path forward that involves working more closely together. Thanks, Jonathan > [1]: https://gitlab.com/monnier/bugit > [2]: > https://public-inbox.org/git/jwva8psr6vr.fsf-monnier+gmane.comp.version-control@gnu.org/ [3] https://dist-bugs.branchable.com/software/ [4] https://github.com/neithernut/git-dit
Re: [PATCH v5 0/7] Speed up unpack_trees()
On Sun, Aug 19, 2018 at 12:01 AM Elijah Newren wrote: > > On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy > wrote: > > > > v5 fixes some minor comments from round 4 and a big mistake in 5/5. > > Junio's scary feeling turns out true. There is a missing invalidation > > in keep_entry() which is not added in 6/7. 7/7 makes sure that similar > > I'm having trouble parsing this. Did you mean "...which is now > added..."? Oops. Yes. > Also, if 6/7 represents a fix to the "big mistake in 5/5", > why is 6/7 separate from 5/7 instead of squashed in? I felt that was cramming up too much in the commit message. But if it's the right thing to do, I'll reroll and combine 5/7 and 6/7 . -- Duy