[PATCH v2] gpg-interface.c: detect and reject multiple signatures on commits
GnuPG supports creating signatures consisting of multiple signature packets. If such a signature is verified, it outputs all the status messages for each signature separately. However, git currently does not account for such scenario and gets terribly confused over getting multiple *SIG statuses. For example, if a malicious party alters a signed commit and appends a new untrusted signature, git is going to ignore the original bad signature and report untrusted commit instead. However, %GK and %GS format strings may still expand to the data corresponding to the original signature, potentially tricking the scripts into trusting the malicious commit. Given that the use of multiple signatures is quite rare, git does not support creating them without jumping through a few hoops, and finally supporting them properly would require extensive API improvement, it seems reasonable to just reject them at the moment. Signed-off-by: Michał Górny --- gpg-interface.c | 41 t/t7510-signed-commit.sh | 26 + 2 files changed, 59 insertions(+), 8 deletions(-) Changes in v2: * used generic 'flags' instead of boolean field, * added test case for git-verify-commit & git-show. diff --git a/gpg-interface.c b/gpg-interface.c index 09ddfbc26..35c25106a 100644 --- a/gpg-interface.c +++ b/gpg-interface.c @@ -21,24 +21,29 @@ void signature_check_clear(struct signature_check *sigc) FREE_AND_NULL(sigc->key); } +/* An exclusive status -- only one of them can appear in output */ +#define GPG_STATUS_EXCLUSIVE (1<<0) + static struct { char result; const char *check; + unsigned int flags; } sigcheck_gpg_status[] = { - { 'G', "\n[GNUPG:] GOODSIG " }, - { 'B', "\n[GNUPG:] BADSIG " }, - { 'U', "\n[GNUPG:] TRUST_NEVER" }, - { 'U', "\n[GNUPG:] TRUST_UNDEFINED" }, - { 'E', "\n[GNUPG:] ERRSIG "}, - { 'X', "\n[GNUPG:] EXPSIG "}, - { 'Y', "\n[GNUPG:] EXPKEYSIG "}, - { 'R', "\n[GNUPG:] REVKEYSIG "}, + { 'G', "\n[GNUPG:] GOODSIG ", GPG_STATUS_EXCLUSIVE }, + { 'B', "\n[GNUPG:] BADSIG ", GPG_STATUS_EXCLUSIVE }, + { 'U', "\n[GNUPG:] TRUST_NEVER", 0 }, + { 'U', "\n[GNUPG:] TRUST_UNDEFINED", 0 }, + { 'E', "\n[GNUPG:] ERRSIG ", GPG_STATUS_EXCLUSIVE }, + { 'X', "\n[GNUPG:] EXPSIG ", GPG_STATUS_EXCLUSIVE }, + { 'Y', "\n[GNUPG:] EXPKEYSIG ", GPG_STATUS_EXCLUSIVE }, + { 'R', "\n[GNUPG:] REVKEYSIG ", GPG_STATUS_EXCLUSIVE }, }; static void parse_gpg_output(struct signature_check *sigc) { const char *buf = sigc->gpg_status; int i; + int had_exclusive_status = 0; /* Iterate over all search strings */ for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) { @@ -50,6 +55,10 @@ static void parse_gpg_output(struct signature_check *sigc) continue; found += strlen(sigcheck_gpg_status[i].check); } + + if (sigcheck_gpg_status[i].flags & GPG_STATUS_EXCLUSIVE) + had_exclusive_status++; + sigc->result = sigcheck_gpg_status[i].result; /* The trust messages are not followed by key/signer information */ if (sigc->result != 'U') { @@ -62,6 +71,22 @@ static void parse_gpg_output(struct signature_check *sigc) } } } + + /* +* GOODSIG, BADSIG etc. can occur only once for each signature. +* Therefore, if we had more than one then we're dealing with multiple +* signatures. We don't support them currently, and they're rather +* hard to create, so something is likely fishy and we should reject +* them altogether. +*/ + if (had_exclusive_status > 1) { + sigc->result = 'E'; + /* Clear partial data to avoid confusion */ + if (sigc->signer) + FREE_AND_NULL(sigc->signer); + if (sigc->key) + FREE_AND_NULL(sigc->key); + } } int check_signature(const char *payload, size_t plen, const char *signature, diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh index 6e2015ed9..51fb92a72 100755 --- a/t/t7510-signed-commit.sh +++ b/t/t7510-signed-commit.sh @@ -227,4 +227,30 @@ test_expect_success GPG 'log.showsignature behaves like --show-signature' ' grep "gpg: Good signature" actual ' +test_expect_success GPG 'detect fudged commit with double signature' ' + sed -e "/gpgsig/,/END PGP/d" forged1 >double-base && + sed -n -e "/gpgsig/,/END PGP/p" forged1 | \ + sed -e "s/^gpgsig//;s/^ //" | gpg --dearmor >double-sig1.sig && + gpg -o double-sig2.sig -u 29472784 --detach-sign double-base && + cat double-sig1.sig double-sig2.sig | gpg --enarmor >double-combined.asc && + sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/gpg
Dearly Beloved In Christ.
Hi Dear, Sorry to invade your privacy, I am Mrs. Daniella Kyle the wife of Mr Angelo Kyle,my husband worked with Central Bank Of Philippines for ten years before he died in the year 2012. When my late husband was alive he deposited sum amount of Money with a Bank in UK, Presently,this money is still with the Bank,I am in a hospital in Philippines receiving treatment and my doctor has confirmed to me that i have just few days to live on earth due to my esophageal cancer. Please I am choosing you to receive this money on my behalf,and use it to help the less privilege. Contact me on my private email: daniellaaky...@gmail.com Your Sister in Christ, Mrs. Daniella Kyle
[PATCH] gpg-interface.c: Fix potentially freeing NULL values
Fix signature_check_clear() to free only values that are non-NULL. This especially applies to 'key' and 'signer' members that can be NULL during normal operations, depending on exact GnuPG output. While at it, also allow other members to be NULL to make the function easier to use, even if there is no real need to account for that right now. Signed-off-by: Michał Górny --- gpg-interface.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/gpg-interface.c b/gpg-interface.c index 35c25106a..9aedaf464 100644 --- a/gpg-interface.c +++ b/gpg-interface.c @@ -15,9 +15,14 @@ static const char *gpg_program = "gpg"; void signature_check_clear(struct signature_check *sigc) { - FREE_AND_NULL(sigc->payload); - FREE_AND_NULL(sigc->gpg_output); - FREE_AND_NULL(sigc->gpg_status); - FREE_AND_NULL(sigc->signer); - FREE_AND_NULL(sigc->key); + if (sigc->payload) + FREE_AND_NULL(sigc->payload); + if (sigc->gpg_output) + FREE_AND_NULL(sigc->gpg_output); + if (sigc->gpg_status) + FREE_AND_NULL(sigc->gpg_status); + if (sigc->signer) + FREE_AND_NULL(sigc->signer); + if (sigc->key) + FREE_AND_NULL(sigc->key); } -- 2.18.0
Re: [PATCH] gpg-interface.c: Fix potentially freeing NULL values
On Fri, Aug 17, 2018 at 5:17 AM Michał Górny wrote: > Fix signature_check_clear() to free only values that are non-NULL. This > especially applies to 'key' and 'signer' members that can be NULL during > normal operations, depending on exact GnuPG output. While at it, also > allow other members to be NULL to make the function easier to use, > even if there is no real need to account for that right now. free(NULL) is valid behavior[1] and much of the Git codebase relies upon it. Did you run into a case where it misbehaved? [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html > Signed-off-by: Michał Górny > --- > diff --git a/gpg-interface.c b/gpg-interface.c > index 35c25106a..9aedaf464 100644 > --- a/gpg-interface.c > +++ b/gpg-interface.c > @@ -15,9 +15,14 @@ static const char *gpg_program = "gpg"; > void signature_check_clear(struct signature_check *sigc) > { > - FREE_AND_NULL(sigc->payload); > - FREE_AND_NULL(sigc->gpg_output); > - FREE_AND_NULL(sigc->gpg_status); > - FREE_AND_NULL(sigc->signer); > - FREE_AND_NULL(sigc->key); > + if (sigc->payload) > + FREE_AND_NULL(sigc->payload); > + if (sigc->gpg_output) > + FREE_AND_NULL(sigc->gpg_output); > + if (sigc->gpg_status) > + FREE_AND_NULL(sigc->gpg_status); > + if (sigc->signer) > + FREE_AND_NULL(sigc->signer); > + if (sigc->key) > + FREE_AND_NULL(sigc->key); > }
Re: [PATCH] gpg-interface.c: Fix potentially freeing NULL values
On Fri, 2018-08-17 at 05:28 -0400, Eric Sunshine wrote: > On Fri, Aug 17, 2018 at 5:17 AM Michał Górny wrote: > > Fix signature_check_clear() to free only values that are non-NULL. This > > especially applies to 'key' and 'signer' members that can be NULL during > > normal operations, depending on exact GnuPG output. While at it, also > > allow other members to be NULL to make the function easier to use, > > even if there is no real need to account for that right now. > > free(NULL) is valid behavior[1] and much of the Git codebase relies upon it. > > Did you run into a case where it misbehaved? Nope. I was actually wondering if it's expected, so I did a quick grep to check whether git is checking pointers for non-NULL before free()ing, and found at least one: blame.c-static void drop_origin_blob(struct blame_origin *o) blame.c-{ blame.c-if (o->file.ptr) { blame.c:FREE_AND_NULL(o->file.ptr); blame.c-} blame.c-} So I wrongly presumed it might be desirable. If it's not, that's fine by me. > > [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html > > > Signed-off-by: Michał Górny > > --- > > diff --git a/gpg-interface.c b/gpg-interface.c > > index 35c25106a..9aedaf464 100644 > > --- a/gpg-interface.c > > +++ b/gpg-interface.c > > @@ -15,9 +15,14 @@ static const char *gpg_program = "gpg"; > > void signature_check_clear(struct signature_check *sigc) > > { > > - FREE_AND_NULL(sigc->payload); > > - FREE_AND_NULL(sigc->gpg_output); > > - FREE_AND_NULL(sigc->gpg_status); > > - FREE_AND_NULL(sigc->signer); > > - FREE_AND_NULL(sigc->key); > > + if (sigc->payload) > > + FREE_AND_NULL(sigc->payload); > > + if (sigc->gpg_output) > > + FREE_AND_NULL(sigc->gpg_output); > > + if (sigc->gpg_status) > > + FREE_AND_NULL(sigc->gpg_status); > > + if (sigc->signer) > > + FREE_AND_NULL(sigc->signer); > > + if (sigc->key) > > + FREE_AND_NULL(sigc->key); > > } -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
[no subject]
did you get my last letter?
Re: [PATCH v3] checkout: optimize "git checkout -b "
On 8/16/2018 2:37 PM, Duy Nguyen wrote: On Thu, Aug 16, 2018 at 8:27 PM Ben Peart wrote: From: Ben Peart Skip merging the commit, updating the index and working directory if and only if we are creating a new branch via "git checkout -b ." Any other checkout options will still go through the former code path. If sparse_checkout is on, require the user to manually opt in to this optimzed behavior by setting the config setting checkout.optimizeNewBranch to true as we will no longer update the skip-worktree bit in the index, nor add/remove files in the working directory to reflect the current sparse checkout settings. For comparison, running "git checkout -b " on a large repo takes: 14.6 seconds - without this patch 0.3 seconds - with this patch I still don't think we should do this. If you want lightning fast branch creation, just use 'git branch'. From the timing breakdown you shown in the other thread it looks like sparse checkout still takes seconds, which could be optimized (or even excluded, I mentioned this too). And split index (or something similar if you can't use it) would give you saving across the board. There is still one idea Elijah gave me that should further lower traverse_trees() cost. We have investigated some of these already - split index ended up slowing things down more than it sped them up do to the higher compute costs. Sparse checkout we've already optimized significantly - limiting the patterns we accept so that we can do the lookup via a hashmap instead of the robust pattern matching. We will continue to look for other optimizations and appreciate any and all ideas! In the end, this optimization makes a huge performance improvement by avoiding doing a lot of work that isn't necessary. Taking a command from 14+ seconds to sub-second is just too much of a win for us to ignore. But anyway, it's not my call. I'll stop here.
[PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
Change the few conditional uses of FREE_AND_NULL(x) to be unconditional. As noted in the standard[1] free(NULL) is perfectly valid, so we might as well leave this check up to the C library. 1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html Signed-off-by: Ævar Arnfjörð Bjarmason --- Let's do the opposite of this instead. blame.c | 4 +--- branch.c| 4 +--- http.c | 4 +--- tree-diff.c | 4 +--- 4 files changed, 4 insertions(+), 12 deletions(-) diff --git a/blame.c b/blame.c index 58a7036847..b22a95de7b 100644 --- a/blame.c +++ b/blame.c @@ -334,9 +334,7 @@ static void fill_origin_blob(struct diff_options *opt, static void drop_origin_blob(struct blame_origin *o) { - if (o->file.ptr) { - FREE_AND_NULL(o->file.ptr); - } + FREE_AND_NULL(o->file.ptr); } /* diff --git a/branch.c b/branch.c index ecd710d730..776f55fc66 100644 --- a/branch.c +++ b/branch.c @@ -25,9 +25,7 @@ static int find_tracked_branch(struct remote *remote, void *priv) tracking->remote = remote->name; } else { free(tracking->spec.src); - if (tracking->src) { - FREE_AND_NULL(tracking->src); - } + FREE_AND_NULL(tracking->src); } tracking->spec.src = NULL; } diff --git a/http.c b/http.c index b4bfbceaeb..4162860ee3 100644 --- a/http.c +++ b/http.c @@ -2418,9 +2418,7 @@ void release_http_object_request(struct http_object_request *freq) close(freq->localfile); freq->localfile = -1; } - if (freq->url != NULL) { - FREE_AND_NULL(freq->url); - } + FREE_AND_NULL(freq->url); if (freq->slot != NULL) { freq->slot->callback_func = NULL; freq->slot->callback_data = NULL; diff --git a/tree-diff.c b/tree-diff.c index fe2e466ac1..553bc0e63a 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -557,9 +557,7 @@ struct combine_diff_path *diff_tree_paths( * free pre-allocated last element, if any * (see path_appendnew() for details about why) */ - if (p->next) { - FREE_AND_NULL(p->next); - } + FREE_AND_NULL(p->next); return p; } -- 2.18.0.865.gffc8e1a3cd6
[RFC] git send-email hashbang
I recently contributed for the first time patches on this maillist and used for the first time `git format-patch` and `git send-email`. I had hard times making `git send-email` work on my mac, because the OSX bundled perl was missing the Net::SMTP::SSL module. So I did `cpan -f Net::SMTP::SSL` (I'm using gmail with smtps/ssl) which asked me some questions (to setup cpan, I'm not really using perl usually), and installed the module. Still `git send-email` wasn't able to find the module. Actually, during the setup of cpan, I have been asked this: -- Warning: You do not have write permission for Perl library directories. To install modules, you need to configure a local Perl library directory or escalate your privileges. CPAN can help you by bootstrapping the local::lib module or by configuring itself to use 'sudo' (if available). You may also resolve this problem manually if you need to customize your setup. What approach do you want? (Choose 'local::lib', 'sudo' or 'manual') -- I have naturally choosed the default ('local::lib'), but it still didn't worked. So I choose to not use the system bundled perl and installed my own perl with homebrew, installed the Net::SMTP::SSL module ... but still , it didn't worked. I looked at the send-email script, changed the hashbang to use /usr/local/bin/perl instead of /usr/bin/perl and it worked ! Then I wondered what happened, and I discovered that using the bundled cpan's "sudo" approach works, but I'm not very satisfied that I need to be root to make this script work. I also found several stackoverflow questions, gists and other discussiond with people having this exact problem (on osx) with some different solution (mostly not working, using `sudo cpan` or whatever). It seems strange to me that the script doesn't uses "the perl I use in my environment", that is, I would have thought the `git-send-email.pl` script had `#!/usr/bin/env perl` as hashbang. Then, I read that some environment (namely busybox) don't bundle `/usr/bin/env`, so I understood this might not be portable. I think there is a solution involving using a combination of /bin/sh as hashbang and there executing perl with probably the `-x` flag (see `perldoc perlrun`). Is it worth proposing a solution for this problem ? Thanks !
Re: [GSoC][PATCH v6 11/20] rebase -i: rewrite complete_action() in C
Hi Alban The interdiff from v5 to v6 looks good, I think the changes you have made the the other patches in this series are fine, I've just got a couple of small comments below about this one. Best Wishes Phillip On 10/08/2018 17:51, Alban Gruin wrote: > > This rewrites complete_action() from shell to C. > > A new mode is added to rebase--helper (`--complete-action`), as well as > a new flag (`--autosquash`). > > Finally, complete_action() is stripped from git-rebase--interactive.sh. > > The original complete_action() would return the code 2 when the todo > list contained no actions. This was a special case for rebase -i and > -p; git-rebase.sh would then apply the autostash, delete the state > directory, and die with the message "Nothing to do". This cleanup is > rewritten in C instead of returning 2. As rebase -i no longer returns > 2, the comment describing this behaviour in git-rebase.sh is updated to > reflect this change. > > The message "Nothing to do" is now printed with error(), and so becomes > "error: nothing to do". Some tests in t3404 check this value, so they > are updated to fit this change. > > The first check might seem useless as we write "noop" to the todo list > if it is empty. Actually, the todo list might contain commented > commands (ie. empty commits). In this case, complete_action() won’t > write "noop", and will abort without starting the editor. > > Signed-off-by: Alban Gruin > --- > builtin/rebase--helper.c | 12 +++- > git-rebase--interactive.sh| 53 ++ > git-rebase.sh | 2 +- > sequencer.c | 102 ++ > sequencer.h | 4 ++ > t/t3404-rebase-interactive.sh | 2 +- > 6 files changed, 122 insertions(+), 53 deletions(-) > > diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c > index bed3dd2b95..01b958 100644 > --- a/builtin/rebase--helper.c > +++ b/builtin/rebase--helper.c > @@ -13,13 +13,13 @@ static const char * const builtin_rebase_helper_usage[] = > { > int cmd_rebase__helper(int argc, const char **argv, const char *prefix) > { > struct replay_opts opts = REPLAY_OPTS_INIT; > - unsigned flags = 0, keep_empty = 0, rebase_merges = 0; > + unsigned flags = 0, keep_empty = 0, rebase_merges = 0, autosquash = 0; > int abbreviate_commands = 0, rebase_cousins = -1; > enum { > CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, > CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH, > ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH, > - CHECKOUT_ONTO > + CHECKOUT_ONTO, COMPLETE_ACTION > } command = 0; > struct option options[] = { > OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), > @@ -29,6 +29,8 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > OPT_BOOL(0, "rebase-merges", &rebase_merges, N_("rebase merge > commits")), > OPT_BOOL(0, "rebase-cousins", &rebase_cousins, >N_("keep original branch points of cousins")), > + OPT_BOOL(0, "autosquash", &autosquash, > + N_("move commits that begin with squash!/fixup!")), > OPT__VERBOSE(&opts.verbose, N_("be verbose")), > OPT_CMDMODE(0, "continue", &command, N_("continue rebase"), > CONTINUE), > @@ -57,6 +59,8 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > N_("prepare the branch to be rebased"), > PREPARE_BRANCH), > OPT_CMDMODE(0, "checkout-onto", &command, > N_("checkout a commit"), CHECKOUT_ONTO), > + OPT_CMDMODE(0, "complete-action", &command, > + N_("complete the action"), COMPLETE_ACTION), > OPT_END() > }; > > @@ -110,5 +114,9 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > return !!prepare_branch_to_be_rebased(&opts, argv[1]); > if (command == CHECKOUT_ONTO && argc == 4) > return !!checkout_onto(&opts, argv[1], argv[2], argv[3]); > + if (command == COMPLETE_ACTION && argc == 6) > + return !!complete_action(&opts, flags, argv[1], argv[2], > argv[3], > + argv[4], argv[5], autosquash); > + > usage_with_options(builtin_rebase_helper_usage, options); > } > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index b68f108f28..59dc4888a6 100644 > --- a/git-rebase--interactive.sh > +++ b/git-rebase--interactive.sh > @@ -127,54 +127,6 @@ init_revisions_and_shortrevisions () { > fi > } > > -complete_action() { > - test -s "$todo" || echo noop >> "$todo" > - test -z "$autosquash" || git rebase--helper --rearrange-squash || exit > - test -n "$cmd" && git rebase--he
Re: [RFC] git send-email hashbang
On Fri, Aug 17 2018, Samuel Maftoul wrote: > I recently contributed for the first time patches on this maillist and > used for the first time `git format-patch` and `git send-email`. > I had hard times making `git send-email` work on my mac, because the > OSX bundled perl was missing the Net::SMTP::SSL module. > So I did `cpan -f Net::SMTP::SSL` (I'm using gmail with smtps/ssl) > which asked me some questions (to setup cpan, I'm not really using > perl usually), and installed the module. > Still `git send-email` wasn't able to find the module. > Actually, during the setup of cpan, I have been asked this: > > -- > Warning: You do not have write permission for Perl library directories. > > To install modules, you need to configure a local Perl library directory or > escalate your privileges. CPAN can help you by bootstrapping the local::lib > module or by configuring itself to use 'sudo' (if available). You may also > resolve this problem manually if you need to customize your setup. > > What approach do you want? (Choose 'local::lib', 'sudo' or 'manual') > -- > > I have naturally choosed the default ('local::lib'), but it still didn't > worked. > > So I choose to not use the system bundled perl and installed my own > perl with homebrew, installed the Net::SMTP::SSL module ... but still > , it didn't worked. > I looked at the send-email script, changed the hashbang to use > /usr/local/bin/perl instead of /usr/bin/perl and it worked ! > > Then I wondered what happened, and I discovered that using the bundled > cpan's "sudo" approach works, but I'm not very satisfied that I need > to be root to make this script work. > I also found several stackoverflow questions, gists and other > discussiond with people having this exact problem (on osx) with some > different solution (mostly not working, using `sudo cpan` or > whatever). Yeah this experience sucks. > It seems strange to me that the script doesn't uses "the perl I use in > my environment", that is, I would have thought the `git-send-email.pl` > script had `#!/usr/bin/env perl` as hashbang. > Then, I read that some environment (namely busybox) don't bundle > `/usr/bin/env`, so I understood this might not be portable. > I think there is a solution involving using a combination of /bin/sh > as hashbang and there executing perl with probably the `-x` flag (see > `perldoc perlrun`). > Is it worth proposing a solution for this problem ? The reason not to use the "perl" in the env is because you just get the other side of this problem. I.e. I install "git" on some linux distro, but I also do perl development so I install a perlbrew version of it into my ~/ which doesn't know how to do ssl or whatever else. Now send-email, "git add -p" and the like will break because the perl I have doesn't have the required modules etc. This is why we pick a perl at compile-time, just as we link to libraries etc. at compile-time. But perhaps this trade-off isn't the right one to make on OSX.
Re: [GSoC][PATCH v6 15/20] rebase -i: rewrite write_basic_state() in C
On 10/08/2018 17:51, Alban Gruin wrote: > This rewrites write_basic_state() from git-rebase.sh in C. This is the > first step in the conversion of init_basic_state(), hence the mode in > rebase--helper.c is called INIT_BASIC_STATE. init_basic_state() will be > converted in the next commit. > > The part of read_strategy_opts() that parses the stategy options is > moved to a new function to allow its use in rebase--helper.c. > > Finally, the call to write_basic_state() is removed from > git-rebase--interactive.sh, replaced by a call to `--init-basic-state`. > > Signed-off-by: Alban Gruin > --- > No changes since v5. > > builtin/rebase--helper.c | 28 +- > git-rebase--interactive.sh | 7 +++- > sequencer.c| 77 -- > sequencer.h| 4 ++ > 4 files changed, 102 insertions(+), 14 deletions(-) > > diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c > index 0716bbfd78..63c5086e42 100644 > --- a/builtin/rebase--helper.c > +++ b/builtin/rebase--helper.c > @@ -5,6 +5,8 @@ > #include "sequencer.h" > #include "rebase-interactive.h" > #include "argv-array.h" > +#include "rerere.h" > +#include "alias.h" > > static GIT_PATH_FUNC(path_squash_onto, "rebase-merge/squash-onto") > > @@ -53,11 +55,12 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > unsigned flags = 0, keep_empty = 0, rebase_merges = 0, autosquash = 0; > int abbreviate_commands = 0, rebase_cousins = -1, ret; > const char *head_hash = NULL, *onto = NULL, *restrict_revision = NULL, > - *squash_onto = NULL, *upstream = NULL; > + *squash_onto = NULL, *upstream = NULL, *head_name = NULL; > + char *raw_strategies = NULL; > enum { > CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS, > CHECK_TODO_LIST, REARRANGE_SQUASH, ADD_EXEC, EDIT_TODO, > PREPARE_BRANCH, > - COMPLETE_ACTION > + COMPLETE_ACTION, INIT_BASIC_STATE > } command = 0; > struct option options[] = { > OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), > @@ -69,6 +72,7 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) >N_("keep original branch points of cousins")), > OPT_BOOL(0, "autosquash", &autosquash, >N_("move commits that begin with squash!/fixup!")), > + OPT_BOOL(0, "signoff", &opts.signoff, N_("sign commits")), > OPT__VERBOSE(&opts.verbose, N_("be verbose")), > OPT_CMDMODE(0, "continue", &command, N_("continue rebase"), > CONTINUE), > @@ -93,6 +97,8 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > N_("prepare the branch to be rebased"), > PREPARE_BRANCH), > OPT_CMDMODE(0, "complete-action", &command, > N_("complete the action"), COMPLETE_ACTION), > + OPT_CMDMODE(0, "init-basic-state", &command, > + N_("initialise the rebase state"), > INIT_BASIC_STATE), > OPT_STRING(0, "onto", &onto, N_("onto"), N_("onto")), > OPT_STRING(0, "restrict-revision", &restrict_revision, > N_("restrict-revision"), N_("restrict revision")), > @@ -100,6 +106,14 @@ int cmd_rebase__helper(int argc, const char **argv, > const char *prefix) > N_("squash onto")), > OPT_STRING(0, "upstream", &upstream, N_("upstream"), > N_("the upstream commit")), > + OPT_STRING(0, "head-name", &head_name, N_("head-name"), > N_("head name")), > + OPT_STRING('S', "gpg-sign", &opts.gpg_sign, N_("gpg-sign"), > +N_("GPG-sign commits")), > + OPT_STRING(0, "strategy", &opts.strategy, N_("strategy"), > +N_("rebase strategy")), > + OPT_STRING(0, "strategy-opts", &raw_strategies, > N_("strategy-opts"), > +N_("strategy options")), > + OPT_RERERE_AUTOUPDATE(&opts.allow_rerere_auto), > OPT_END() > }; > > @@ -176,6 +190,16 @@ int cmd_rebase__helper(int argc, const char **argv, > const char *prefix) > free(shortrevisions); > return !!ret; > } > + if (command == INIT_BASIC_STATE) { > + if (raw_strategies) > + parse_strategy_opts(&opts, raw_strategies); > + > + ret = get_revision_ranges(upstream, onto, &head_hash, NULL, > NULL); > + if (ret) > + return ret; > + > + return !!write_basic_state(&opts, head_name, onto, head_hash); > + } > > usage_with_options(builtin_rebase_helper_usage, options); > } > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index 08e9a
Re: [PATCH/RFC] commit: add short option for --amend
On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder wrote: > > Nguyễn Thái Ngọc Duy wrote: > > > --- a/builtin/commit.c > > +++ b/builtin/commit.c > > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const > > char *prefix) > > STATUS_FORMAT_LONG), > > OPT_BOOL('z', "null", &s.null_termination, > >N_("terminate entries with NUL")), > > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")), > > + OPT_BOOL('j', "amend", &amend, N_("amend previous commit")), > [...] > > Thoughts? > > I'm not a fan. I would have trouble remembering what the short option > name means, and it matches the common --jobs option for parallelism > that many commands use. "git commit --am" works today already and > doesn't run into those problems. The alternative is -A or -M which may be easier associated with --amend. That "--am" also would break the moment somebody adds --amsomething. > I'm sympathetic to the goal of saving typing, but I'm more sympathetic > to the goal of making user support easier, which is what makes me end > up there. > > That said, I've been looking recently at Mercurial's "hg evolve" > extension[1] and I wouldn't be against a well thought out new command > (e.g. "git amend") that does the equivalent of "git commit --amend" > with some related features. So I think there are some paths forward > that involve abbreviating. I'm not opposed to a new command like this, but I don't think it should stop us from adding short options. -- Duy
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 3:05 PM Ævar Arnfjörð Bjarmason wrote: > > Change the few conditional uses of FREE_AND_NULL(x) to be > unconditional. As noted in the standard[1] free(NULL) is perfectly > valid, so we might as well leave this check up to the C library. I'm not trying to make you work more on this. But out of curiosity would coccinelle help catch this pattern? Szeder's recent work on running cocci automatically would help catch all future code like this if we could write an spatch. -- Duy
Re: non-smooth progress indication for git fsck and git gc
On Thu, Aug 16, 2018 at 11:08 PM Jeff King wrote: > > On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote: > > > > * We spend the majority of the ~30s on this: > > > > > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79 > > > > This is hashing the actual packfile. This is potentially quite long, > > especially if you have a ton of big objects. > > > > I wonder if we need to do this as a separate step anyway, though. Our > > verification is based on index-pack these days, which means it's going > > to walk over the whole content as part of the "Indexing objects" step to > > expand base objects and mark deltas for later. Could we feed this hash > > as part of that walk over the data? It's not going to save us 30s, but > > it's likely to be more efficient. And it would fold the effort naturally > > into the existing progress meter. > > Actually, I take it back. That's the nice, modern way we do it in > git-verify-pack. But git-fsck uses the ancient "just walk over all of > the idx entries method". It at least sorts in pack order, which is good, > but: > > - it's not multi-threaded, like index-pack/verify-pack > > - the index-pack way is actually more efficient than pack-ordering for > the delta-base cache, because it actually walks the delta-graph in > the optimal order > I actually tried to make git-fsck use index-pack --verify at one point. The only thing that stopped it from working was index-pack automatically wrote the newer index version if I remember correctly, and that would fail the final hash check. fsck performance was not a big deal so I dropped it. Just saying it should be possible, if someone's interested in that direction. -- Duy
Re: [PATCH] completion: include PARSE_OPT_HIDDEN in completion output
On Thu, Aug 16, 2018 at 8:42 PM Ævar Arnfjörð Bjarmason wrote: > > The PARSE_OPT_HIDDEN is, per the documentation of the "option" struct > in option parse-options.h, only supposed to affect -h output, not > completion. That's what the PARSE_OPT_NOCOMPLETE flag is supposed to > be for. > > Since 2e29dca66a ("completion: use __gitcomp_builtin in _git_commit", > 2018-02-09) we've been using e.g. "git commit --git-completion-helper" > to get the bash completion for the git-commit command. Due to > PARSE_OPT_HIDDEN this excluded e.g. --allow-empty and > --allow-empty-message. > > Now, this wasn't a behavior change in that commit. Before that we had > a hardcoded list of options, removed in 2e29dca66a ("completion: use > __gitcomp_builtin in _git_commit", 2018-02-09). This list didn't > contain those two options. > > But as noted in the follow-up discussion to c9b5fde759 ("Add option to > git-commit to allow empty log messages", 2010-04-06) in > https://public-inbox.org/git/20100406055530.ge3...@coredump.intra.peff.net/ > the motivation for PARSE_OPT_HIDDEN was to keep the "git commit -h" > output to a minimum, not to hide it from completion. > > I think it makes sense to exclude options like these from -h output, > but for the completion the user is usually not trying to complete "git > commit --", Actually I do :) > but e.g. "git commit --allo", and because of this behavior we don't show > these options at all there. And it would be great if these options do not show up at -- but do when with --a. We already do something similar to this with --no-. So this could be another option. -- Duy
Re: What's cooking in git.git (Aug 2018, #03; Wed, 15)
On Thu, Aug 16, 2018 at 1:01 AM Junio C Hamano wrote: > * bp/checkout-new-branch-optim (2018-07-31) 1 commit > - checkout: optimize "git checkout -b " > > "git checkout -b newbranch [HEAD]" should not have to do as much as > checking out a commit different from HEAD. An attempt is made to > optimize this special case. > > So... what is the status of this thing? Is the other "optimize > unpack-trees" effort turning out to be a safer and less hacky way > to achieve similar gain and this no longer is needed? I still dislike the fact that this depends on config keys to tweak behavior and believe there are still rooms for general optimizations. But the feeling I have so far is I'm doing all the work for Microsoft in that direction. So I give up. It's up to you and Ben to decide. -- Duy
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 04:36:13PM +0200, Duy Nguyen wrote: > On Fri, Aug 17, 2018 at 3:05 PM Ævar Arnfjörð Bjarmason > wrote: > > > > Change the few conditional uses of FREE_AND_NULL(x) to be > > unconditional. As noted in the standard[1] free(NULL) is perfectly > > valid, so we might as well leave this check up to the C library. > > I'm not trying to make you work more on this. But out of curiosity > would coccinelle help catch this pattern? Szeder's recent work on > running cocci automatically would help catch all future code like this > if we could write an spatch. Just fyi this seems to do the trick. Although I'm nowhere good at coccinelle to say if we should include this (or something like it) -- 8< -- diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci index 4490069df9..f8e018d104 100644 --- a/contrib/coccinelle/free.cocci +++ b/contrib/coccinelle/free.cocci @@ -16,3 +16,9 @@ expression E; - free(E); + FREE_AND_NULL(E); - E = NULL; + +@@ +expression E; +@@ +- if (E) { FREE_AND_NULL(E); } ++ FREE_AND_NULL(E); -- 8< -- -- Duy
Re: Contributor Summit planning
On Mon, Aug 13, 2018 at 10:42 PM Stefan Beller wrote: > Ævar specifically pointed out that we might want to hear from you and Duy > if you want to attend a conference and if so how we can make that happen > (by choosing location/time/setting appropriately) IIUC. Since my name shows up... I'm with Elijah on the travel resistance thing (and am probably even lazier than him). I guess I will remain the mystery in the git circle. -- Duy
Re: [PATCH] config.txt: clarify core.checkStat = minimal
Duy Nguyen writes: > Perfect. I could wrap it in a patch, but I feel you should take > authorship for that one. I'll leave it to you to create this commit. OK, here is what I ended up with. An extra paragraph was taken from the old commit you referrred to, which is probably the only remaining part from your contribution, so the attribution has been demoted to "Helped-by", but your initiative still is appreciated very much. -- >8 -- Subject: [PATCH] config.txt: clarify core.checkStat The description of this key does not really tell what the 'minimal' mode checks and does not check. The description for the 'default' mode is not much better and just says 'all fields', which is unclear and is not even correct (e.g. we do not look at 'atime'). Spell out what are and what are not checked under the 'minimal' mode relative to the 'default' mode to help those who want to decide if they want to use the 'minimal' mode, also taking information about this mode from the commit message of c08e4d5b5c (Enable minimal stat checking - 2013-01-22). Helped-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- Documentation/config.txt | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..933d719137 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -449,10 +449,20 @@ core.untrackedCache:: See linkgit:git-update-index[1]. `keep` by default. core.checkStat:: - Determines which stat fields to match between the index - and work tree. The user can set this to 'default' or - 'minimal'. Default (or explicitly 'default'), is to check - all fields, including the sub-second part of mtime and ctime. + When missing or is set to `default`, many fields in the stat + structure are checked to detect if a file has been modified + since Git looked at it. When this configuration variable is + set to `minimal`, sub-second part of mtime and ctime, the + uid and gid of the owner of the file, the inode number (and + the device number, if Git was compiled to use it), are + excluded from the check among these fields, leaving only the + whole-second part of mtime (and ctime, if `core.trustCtime` + is set) and the filesize to be checked. ++ +There are implementations of Git that do not leave usable values in +some fields (e.g. JGit); by excluding these fields from the +comparison, the `minimal` mode may help interoperability when the +same repository is used by these other systems at the same time. core.quotePath:: Commands that output paths (e.g. 'ls-files', 'diff'), will -- 2.18.0-666-g63749b2dea
Re: [PATCH/RFC] commit: add short option for --amend
On Fri, Aug 17, 2018 at 04:33:30PM +0200, Duy Nguyen wrote: > On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder wrote: > > > > Nguyễn Thái Ngọc Duy wrote: > > > > > --- a/builtin/commit.c > > > +++ b/builtin/commit.c > > > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const > > > char *prefix) > > > STATUS_FORMAT_LONG), > > > OPT_BOOL('z', "null", &s.null_termination, > > >N_("terminate entries with NUL")), > > > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")), > > > + OPT_BOOL('j', "amend", &amend, N_("amend previous commit")), > > [...] > > > Thoughts? > > > > I'm not a fan. I would have trouble remembering what the short option > > name means, and it matches the common --jobs option for parallelism > > that many commands use. "git commit --am" works today already and > > doesn't run into those problems. > > The alternative is -A or -M which may be easier associated with > --amend. That "--am" also would break the moment somebody adds > --amsomething. I think "-A" has been considered as possibility for matching "commit -a" / "add -A" in the past, but I had trouble finding past discussion (searching for "A" in the mailing list is not very productive). It was mentioned in 3ba1f11426 (git-add --all: add all files, 2008-07-19), but that was quite a while ago. Not necessarily a blocker, but something to consider. Like Jonathan, I do find "-j" a little non-intuitive, but I agree that most of the intuitive ones are taken. :) -Peff
Re: [PATCH] config.txt: clarify core.checkStat = minimal
On Fri, Aug 17, 2018 at 5:26 PM Junio C Hamano wrote: > -- >8 -- > Subject: [PATCH] config.txt: clarify core.checkStat > > The description of this key does not really tell what the 'minimal' > mode checks and does not check. The description for the 'default' > mode is not much better and just says 'all fields', which is unclear > and is not even correct (e.g. we do not look at 'atime'). > > Spell out what are and what are not checked under the 'minimal' mode > relative to the 'default' mode to help those who want to decide if > they want to use the 'minimal' mode, also taking information about > this mode from the commit message of c08e4d5b5c (Enable minimal stat > checking - 2013-01-22). Looking good. This does make me want to adjust $GIT_DIR/index format to optionally not store extra fields if we know we're not going to use them. But that's a topic for another day. > Helped-by: Nguyễn Thái Ngọc Duy > Signed-off-by: Junio C Hamano > --- > Documentation/config.txt | 18 ++ > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index ab641bf5a9..933d719137 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -449,10 +449,20 @@ core.untrackedCache:: > See linkgit:git-update-index[1]. `keep` by default. > > core.checkStat:: > - Determines which stat fields to match between the index > - and work tree. The user can set this to 'default' or > - 'minimal'. Default (or explicitly 'default'), is to check > - all fields, including the sub-second part of mtime and ctime. > + When missing or is set to `default`, many fields in the stat > + structure are checked to detect if a file has been modified > + since Git looked at it. When this configuration variable is > + set to `minimal`, sub-second part of mtime and ctime, the > + uid and gid of the owner of the file, the inode number (and > + the device number, if Git was compiled to use it), are > + excluded from the check among these fields, leaving only the > + whole-second part of mtime (and ctime, if `core.trustCtime` > + is set) and the filesize to be checked. > ++ > +There are implementations of Git that do not leave usable values in > +some fields (e.g. JGit); by excluding these fields from the > +comparison, the `minimal` mode may help interoperability when the > +same repository is used by these other systems at the same time. > > core.quotePath:: > Commands that output paths (e.g. 'ls-files', 'diff'), will > -- > 2.18.0-666-g63749b2dea > -- Duy
Re: [PATCH/RFC] commit: add short option for --amend
On Fri, Aug 17, 2018 at 5:26 PM Jeff King wrote: > > On Fri, Aug 17, 2018 at 04:33:30PM +0200, Duy Nguyen wrote: > > > On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder wrote: > > > > > > Nguyễn Thái Ngọc Duy wrote: > > > > > > > --- a/builtin/commit.c > > > > +++ b/builtin/commit.c > > > > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const > > > > char *prefix) > > > > STATUS_FORMAT_LONG), > > > > OPT_BOOL('z', "null", &s.null_termination, > > > >N_("terminate entries with NUL")), > > > > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")), > > > > + OPT_BOOL('j', "amend", &amend, N_("amend previous > > > > commit")), > > > [...] > > > > Thoughts? > > > > > > I'm not a fan. I would have trouble remembering what the short option > > > name means, and it matches the common --jobs option for parallelism > > > that many commands use. "git commit --am" works today already and > > > doesn't run into those problems. > > > > The alternative is -A or -M which may be easier associated with > > --amend. That "--am" also would break the moment somebody adds > > --amsomething. > > I think "-A" has been considered as possibility for matching "commit -a" > / "add -A" in the past, but I had trouble finding past discussion > (searching for "A" in the mailing list is not very productive). It was > mentioned in 3ba1f11426 (git-add --all: add all files, 2008-07-19), but > that was quite a while ago. > > Not necessarily a blocker, but something to consider. > > Like Jonathan, I do find "-j" a little non-intuitive, but I agree that > most of the intuitive ones are taken. :) Oh well. Maybe next time we'll be more careful with adding short options. Consider this patch dropped. -- Duy
Re: [PATCH/RFC] commit: add short option for --amend
Duy Nguyen writes: > The alternative is -A or -M which may be easier associated with > --amend. I would be confused to mistake that "git commit -A $args" would do something similar to "git add -A && git commit $args". I do my fair share of amends during the day, and I've never felt the need for a short-hand, but perhaps that is just me. I am wondering if "-E" (stands for 'edit', not 'A or M are out, and the next letter in amend is E') is understandable and memorable enough---after all, it is "editing" an existing commit, and the edit is done in a big and different way than the existing "commit --edit". But perhaps that reasoning is a bit too cute. I dunno.
[PATCH v5] clone: report duplicate entries on case-insensitive filesystems
Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy --- v5 respects core.checkStat and sorts the output case-insensitively. I still don't trust magic st_ino zero, or core.checkStat being zero on Windows, so the #if condition still remains but it covers smallest area possible and I tested it by manually make it "#if 1" The fallback with fspathcmp() is only done when inode can't be trusted because strcmp is more expensive and when fspathcmp() learns more about real world in the future, it could become even more expensive. The output sorting is the result of Sublime-Gitignore repo being reported recently. It's not perfect but it should help seeing the groups in normal case. builtin/clone.c | 1 + cache.h | 1 + entry.c | 31 +++ t/t5601-clone.sh | 8 +++- unpack-trees.c | 35 +++ unpack-trees.h | 1 + 6 files changed, 76 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, +clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..8766e27255 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i, trust_ino = check_stat; + +#if defined(GIT_WINDOWS_NATIVE) + trust_ino = 0; +#endif + + ce->ce_flags |= CE_MATCHED; + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || + (!trust_ino && !fspathcmp(ce->name, dup->name))) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have collided" icasefs/warning +' + partial_clone () { SERVER="$1" && URL="$2" && diff --git a/unpack-trees.c b/unpack-trees.c index cd06
Re: Syncing HEAD
Jeff King writes: > So all of this really implies to me that you want to be able to say > "take this symref on the other side and update this one on the local > side". I.e., some way to tell a refspec "don't update the value, update > the symref destination". ... > ... > git fetch origin ~HEAD:refs/remotes/origin/HEAD We need to be a bit careful here. You can define the meaning of the above sanely if you know that refmap refs/heads/*:refs/remotes/origin/* is in effect for the remote to read "My HEAD points at refs/heads/frotz" and interpret it as "In order to match, I need to make my refs/remotes/origin/HEAD to point at refs/remotes/origin/frotz". Also, what should the above form of "git fetch" write in FETCH_HEAD? Should "git pull origin ~HEAD:refs/remotes/origin/HEAD" run the fetch and then merge it (which may have value of refs/remotes/origin/frotz) to the current branch? Should the underlying fetch be also fetching the frotz branch from them at the same time, or do we attempt to merge a possibly stale 'frotz' (which might not even have been there, the last time we fetched from them)?
Re: [PATCH] gpg-interface.c: detect and reject multiple signatures on commits
Jonathan Nieder writes: > Michał Górny wrote: > >> GnuPG supports creating signatures consisting of multiple signature >> packets. If such a signature is verified, it outputs all the status >> messages for each signature separately. However, git currently does not >> account for such scenario and gets terribly confused over getting >> multiple *SIG statuses. >> >> For example, if a malicious party alters a signed commit and appends >> a new untrusted signature, git is going to ignore the original bad >> signature and report untrusted commit instead. However, %GK and %GS >> format strings may still expand to the data corresponding >> to the original signature, potentially tricking the scripts into >> trusting the malicious commit. >> >> Given that the use of multiple signatures is quite rare, git does not >> support creating them without jumping through a few hoops, and finally >> supporting them properly would require extensive API improvement, it >> seems reasonable to just reject them at the moment. >> --- > > Thanks for the clear analysis and fix. > > May we have your sign-off? See > https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html#sign-off > (or the equivalent section of your local copy of > Documentation/SubmittingPatches) for what this means. I do not see the original message you are writing response to on public-inbox archive. As long as an update with sign-off will be sent to the git@vger.kernel.org list, that is OK. >> gpg-interface.c | 38 ++ >> 1 file changed, 30 insertions(+), 8 deletions(-) >> >> diff --git a/gpg-interface.c b/gpg-interface.c >> index 09ddfbc26..4e03aec15 100644 >> --- a/gpg-interface.c >> +++ b/gpg-interface.c >> @@ -24,21 +24,23 @@ void signature_check_clear(struct signature_check *sigc) >> static struct { >> char result; >> const char *check; >> +int is_status; >> } sigcheck_gpg_status[] = { >> -{ 'G', "\n[GNUPG:] GOODSIG " }, >> -{ 'B', "\n[GNUPG:] BADSIG " }, >> -{ 'U', "\n[GNUPG:] TRUST_NEVER" }, >> -{ 'U', "\n[GNUPG:] TRUST_UNDEFINED" }, >> -{ 'E', "\n[GNUPG:] ERRSIG "}, >> -{ 'X', "\n[GNUPG:] EXPSIG "}, >> -{ 'Y', "\n[GNUPG:] EXPKEYSIG "}, >> -{ 'R', "\n[GNUPG:] REVKEYSIG "}, >> +{ 'G', "\n[GNUPG:] GOODSIG ", 1 }, >> +{ 'B', "\n[GNUPG:] BADSIG ", 1 }, >> +{ 'U', "\n[GNUPG:] TRUST_NEVER", 0 }, >> +{ 'U', "\n[GNUPG:] TRUST_UNDEFINED", 0 }, >> +{ 'E', "\n[GNUPG:] ERRSIG ", 1}, >> +{ 'X', "\n[GNUPG:] EXPSIG ", 1}, >> +{ 'Y', "\n[GNUPG:] EXPKEYSIG ", 1}, >> +{ 'R', "\n[GNUPG:] REVKEYSIG ", 1}, >> }; > > nit: I wonder if making is_status into a flag field (like 'option' in > git.c's cmd_struct) and having an explicit SIGNATURE_STATUS value to > put there would make this easier to read. > > It's not clear to me that the name is_status or SIGNATURE_STATUS > captures what this field represents. Aren't these all sigcheck > statuses? Can you describe briefly what distinguishes the cases where > this should be 0 versus 1? Good suggestion. >> static void parse_gpg_output(struct signature_check *sigc) >> { >> const char *buf = sigc->gpg_status; >> int i; >> +int had_status = 0; >> >> /* Iterate over all search strings */ >> for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) { >> @@ -50,6 +52,10 @@ static void parse_gpg_output(struct signature_check *sigc) >> continue; >> found += strlen(sigcheck_gpg_status[i].check); >> } >> + >> +if (sigcheck_gpg_status[i].is_status) >> +had_status++; >> + >> sigc->result = sigcheck_gpg_status[i].result; >> /* The trust messages are not followed by key/signer >> information */ >> if (sigc->result != 'U') { >> @@ -62,6 +68,22 @@ static void parse_gpg_output(struct signature_check *sigc) >> } >> } >> } >> + >> +/* >> + * GOODSIG, BADSIG etc. can occur only once for each signature. >> + * Therefore, if we had more than one then we're dealing with multiple >> + * signatures. We don't support them currently, and they're rather >> + * hard to create, so something is likely fishy and we should reject >> + * them altogether. >> + */ >> +if (had_status > 1) { >> +sigc->result = 'E'; >> +/* Clear partial data to avoid confusion */ >> +if (sigc->signer) >> +FREE_AND_NULL(sigc->signer); >> +if (sigc->key) >> +FREE_AND_NULL(sigc->key); >> +} > > Makes sense to me. I was wondering if we have to revamp the loop altogether. The current code runs through the list of all the possible "status" lines, and find the first occurrence for each type in the buffer that has GPG output. Second and subsequent occurrence of the same type, if existed, will not be noticed by the original loop structure
Re: Syncing HEAD
On Fri, Aug 17, 2018 at 09:28:59AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > So all of this really implies to me that you want to be able to say > > "take this symref on the other side and update this one on the local > > side". I.e., some way to tell a refspec "don't update the value, update > > the symref destination". ... > > ... > > git fetch origin ~HEAD:refs/remotes/origin/HEAD > > We need to be a bit careful here. > > You can define the meaning of the above sanely if you know that > refmap refs/heads/*:refs/remotes/origin/* is in effect for the > remote to read "My HEAD points at refs/heads/frotz" and interpret it > as "In order to match, I need to make my refs/remotes/origin/HEAD to > point at refs/remotes/origin/frotz". Good point. I was thinking too much about the symlink itself and not its destination. You need some way of mapping that destination, as well. For Christian's case, it is really the "refs/*:refs/*" mapping that he would want to emulate (because he'd be doing ~HEAD:HEAD). In some cases, like that one, you could infer the mapping from the HEAD:HEAD itself (X on the remote becomes X locally). But that does not work for the refs/remotes case. You might infer from "~HEAD:refs/remotes/origin/HEAD" that "X becomes refs/remotes/origin/X", but it is actually "refs/heads/X becomes ...". So yeah, this really does need pairing with the overall ref mapping. I think that's doable even for the example I gave above, because we could find those refspecs in the config. But: git fetch git://... ~HEAD:HEAD does not have that information. We may or may not have fetched the pointed-to ref previously. What if this _required_ that the symref destination from the other side also be something that we are fetching, and was otherwise an error? That would avoid any config trickery. It does mean that "git fetch origin ~HEAD:HEAD" does not work. But I think you'd generally want to pair it with a fetch anyway. I.e., Either two configured refspecs, or if a one-off fetch from a URL, fetching the branches and the symref at the same time. I suppose the one case it would not support is "I want to fetch HEAD as a symref and whatever branch it points to, but I do not yet know what that branch is". Or, I suppose, "I do not want to update any branches, but just update my notion of HEAD" (i.e., what "remote set-head -a" currently does). > Also, what should the above form of "git fetch" write in FETCH_HEAD? > Should "git pull origin ~HEAD:refs/remotes/origin/HEAD" run the fetch > and then merge it (which may have value of refs/remotes/origin/frotz) > to the current branch? Should the underlying fetch be also fetching > the frotz branch from them at the same time, or do we attempt to merge > a possibly stale 'frotz' (which might not even have been there, the > last time we fetched from them)? I'd be tempted to say that a symref fetch writes nothing into FETCH_HEAD at all. Which would make a bare "git pull origin ~HEAD" an error. I'm sure there are cases it _could_ do something useful, but there are so many where it doesn't make sense. -Peff
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
Duy Nguyen writes: > Just fyi this seems to do the trick. Although I'm nowhere good at > coccinelle to say if we should include this (or something like it) > > -- 8< -- > diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci > index 4490069df9..f8e018d104 100644 > --- a/contrib/coccinelle/free.cocci > +++ b/contrib/coccinelle/free.cocci > @@ -16,3 +16,9 @@ expression E; > - free(E); > + FREE_AND_NULL(E); > - E = NULL; > + > +@@ > +expression E; > +@@ > +- if (E) { FREE_AND_NULL(E); } > ++ FREE_AND_NULL(E); It is a bit sad that - if (E) FREE_AND_NULL(E); is not sufficient to catch it. Shouldn't we be doing the same for regular free(E) as well? IOW, like the attached patch. contrib/coccinelle/free.cocci | 24 1 file changed, 24 insertions(+) diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci index 4490069df9..f748bcfe30 100644 --- a/contrib/coccinelle/free.cocci +++ b/contrib/coccinelle/free.cocci @@ -16,3 +16,27 @@ expression E; - free(E); + FREE_AND_NULL(E); - E = NULL; + +@@ +expression E; +@@ +- if (E) + FREE_AND_NULL(E); + +@@ +expression E; +@@ +- if (E) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (!E) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (E) { FREE_AND_NULL(E); } ++ FREE_AND_NULL(E);
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
Junio C Hamano writes: > It is a bit sad that > > - if (E) > FREE_AND_NULL(E); > > is not sufficient to catch it. Shouldn't we be doing the same for > regular free(E) as well? IOW, like the attached patch. > ... And revised even more to also spell "E" as "E != NULL" (and "!E" as "E == NULL"), which seems to make a difference, which is even more sad. I do not want to wonder if I have to also add "NULL == E" and other variants, so I'll stop here. contrib/coccinelle/free.cocci | 60 +++ 1 file changed, 60 insertions(+) diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci index 4490069df9..29ca98796f 100644 --- a/contrib/coccinelle/free.cocci +++ b/contrib/coccinelle/free.cocci @@ -16,3 +16,63 @@ expression E; - free(E); + FREE_AND_NULL(E); - E = NULL; + +@@ +expression E; +@@ +- if (E) + FREE_AND_NULL(E); + +@@ +expression E; +@@ +- if (E != NULL) + free(E); + +@@ +expression E; +@@ +- if (E == NULL) + free(E); + +@@ +expression E; +@@ +- if (E != NULL) + FREE_AND_NULL(E); + +@@ +expression E; +@@ +- if (E) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (!E) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (E) { FREE_AND_NULL(E); } ++ FREE_AND_NULL(E); + +@@ +expression E; +@@ +- if (E != NULL) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (E == NULL) { free(E); } ++ free(E); + +@@ +expression E; +@@ +- if (E != NULL) { FREE_AND_NULL(E); } ++ FREE_AND_NULL(E);
Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
Nguyễn Thái Ngọc Duy writes: > I still don't trust magic st_ino zero, or core.checkStat being zero > on Windows, so the #if condition still remains but it covers smallest > area possible and I tested it by manually make it "#if 1" > > The fallback with fspathcmp() is only done when inode can't be > trusted because strcmp is more expensive and when fspathcmp() learns > more about real world in the future, it could become even more > expensive. > > The output sorting is the result of Sublime-Gitignore repo being > reported recently. It's not perfect but it should help seeing the > groups in normal case. Looks small and safe. > + > + if (o->clone) { > + struct string_list list = STRING_LIST_INIT_NODUP; > + int i; > + > + for (i = 0; i < index->cache_nr; i++) { > + struct cache_entry *ce = index->cache[i]; > + > + if (!(ce->ce_flags & CE_MATCHED)) > + continue; > + > + string_list_append(&list, ce->name); > + ce->ce_flags &= ~CE_MATCHED; > + } > + > + list.cmp = fspathcmp; > + string_list_sort(&list); > + > + if (list.nr) > + warning(_("the following paths have collided (e.g. > case-sensitive paths\n" > + "on a case-insensitive filesystem) and only > one from the same\n" > + "colliding group is in the working tree:\n")); > + > + for (i = 0; i < list.nr; i++) > + fprintf(stderr, " '%s'\n", list.items[i].string); > + > + string_list_clear(&list, 0); I would have written the "sort, show warning, and list" all inside "if (list.nr)" block, leaving list-clear outside, which would have made the logic a bit cleaner. The reader does not have to bother thinking "ah, when list.nr==0, this is a no-op anyway" to skip them if written that way. I highly suspect that the above was written in that way to reduce the indentation level, but the right way to reduce the indentation level, if it bothers readers too much, is to make the whole thing inside the above if (o->clone) into a dedicated helper function "void report_collided_checkout(void)", I would think.
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 10:07:36AM -0700, Junio C Hamano wrote: > Junio C Hamano writes: > > > It is a bit sad that > > > > - if (E) > > FREE_AND_NULL(E); > > > > is not sufficient to catch it. Shouldn't we be doing the same for > > regular free(E) as well? IOW, like the attached patch. > > ... > > And revised even more to also spell "E" as "E != NULL" (and "!E" as > "E == NULL"), which seems to make a difference, which is even more > sad. I do not want to wonder if I have to also add "NULL == E" and > other variants, so I'll stop here. I think it makes sense that these are all distinct if you're using coccinelle to do stylistic transformations between them (e.g., enforcing curly braces even around one-liners). I wonder if there is a way to "relax" a pattern where these semantically equivalent cases can all be covered automatically. I don't know enough about the tool to say. I guess one way to do it would be to normalize the style in one rule (e.g., always "!E" instead of "E == NULL"), and then you only have to write the FREE_AND_NULL rule for the normalized form. For a single case like this, the end result is about the same number of rules, but in the long term it saves us work when we have a similar transformation. -Peff
Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
On Fri, Aug 17, 2018 at 12:36 AM Junio C Hamano wrote: > > Andrei Rybak writes: > > > On 14/08/18 13:47, SZEDER Gábor wrote: > >> ... both > >> invocations produce empty 'pack{a,b}.objects' files, and the > >> subsequent 'test_cmp' happily finds those two empty files identical. > > > > Is test_cmp ever used for empty files? Would it make sense for > > test_cmp to issue warning when an empty file is being compared? > > Typically test_cmp is used to compare the actual output from a > dubious command being tested with the expected output from a > procedure that is known not to be broken (e.g. a run of 'echo', or a > 'cat' of here-doc), so at least one side would not be empty. > > The test done here is an odd case---it compares output from two > equally dubious processes that are being tested and sees if their > results match. > > That said, since we have test_must_be_empty, we could forbid feeding > empty files to test_cmp, after telling everybody that a test that > expects an empty output must use test_must_be_empty. I do not think > it is a terrible idea. I thought so as well, and I've looked into it; in fact this patch was one of the skeletons that fell out of our test suite "while at it". However, I had to change my mind about it, and now I don't think we should go all the way and forbid that. See, we have quite a few tests that extract repetitive common tasks into helper functions, which sometimes includes preparing the expected results and running 'test_cmp', e.g. something like this (oversimplified) example: check_cmd () { git cmd $1 >actual && echo "$2" >expect && test_cmp expect actual } check_cmd --fooFOO check_cmd --no-foo "" Completely forbidding feeding empty files to 'test_cmp' would require us to add a bit of logic to cases like this to call 'test_cmp' or 'test_must_be_empty' based on the content of the second parameter. Arguably it's only a tiny bit of logic, as only a single if statement is needed, but following our coding style it would take 7 lines instead of only 2: if test -n "$2" then echo "$2" >expect && test_cmp expect actual else test_must_be_empty actual fi I don't think it's worth it in cases like this.
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 01:39:51PM -0400, Jeff King wrote: > > I wonder if there is a way to "relax" a pattern where these semantically > > equivalent cases can all be covered automatically. I don't know enough > > about the tool to say. > > Hmm. They seem to call these "standard isomorphisms": > > http://coccinelle.lip6.fr/standard.iso.html > > but I'm not sure of the correct way to use them (e.g., if we want to > apply them for matching but not actually transform the code, though I am > not actually opposed to transforming the code, too). Hmph, I should really pause before hitting 'send'. Last message, I promise. :) I do not see an option to include a list an arbitrary set of isomorphisms, but the standard.iso list should be used by default. I wonder if you simply need to write your case in the normalized version they use there (which I think is "X == NULL"), and the others would be taken care of. -Peff
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 01:33:08PM -0400, Jeff King wrote: > > And revised even more to also spell "E" as "E != NULL" (and "!E" as > > "E == NULL"), which seems to make a difference, which is even more > > sad. I do not want to wonder if I have to also add "NULL == E" and > > other variants, so I'll stop here. > > I think it makes sense that these are all distinct if you're using > coccinelle to do stylistic transformations between them (e.g., enforcing > curly braces even around one-liners). > > I wonder if there is a way to "relax" a pattern where these semantically > equivalent cases can all be covered automatically. I don't know enough > about the tool to say. Hmm. They seem to call these "standard isomorphisms": http://coccinelle.lip6.fr/standard.iso.html but I'm not sure of the correct way to use them (e.g., if we want to apply them for matching but not actually transform the code, though I am not actually opposed to transforming the code, too). -Peff
Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
On Fri, Aug 17, 2018 at 10:20:36AM -0700, Junio C Hamano wrote: > I highly suspect that the above was written in that way to reduce > the indentation level, but the right way to reduce the indentation > level, if it bothers readers too much, is to make the whole thing > inside the above if (o->clone) into a dedicated helper function > "void report_collided_checkout(void)", I would think. I read my mind. I thought of separating into a helper function too, but was not happy that the clearing CE_MATCHED in preparation for this test is in check_updates(), but the cleaning up CE_MATCHED() is in the helper function. So here is the version that separates _both_ phases into helper functions. -- 8< -- Subject: [PATCH v6] clone: report duplicate entries on case-insensitive filesystems Paths that only differ in case work fine in a case-sensitive filesystems, but if those repos are cloned in a case-insensitive one, you'll get problems. The first thing to notice is "git status" will never be clean with no indication what exactly is "dirty". This patch helps the situation a bit by pointing out the problem at clone time. Even though this patch talks about case sensitivity, the patch makes no assumption about folding rules by the filesystem. It simply observes that if an entry has been already checked out at clone time when we're about to write a new path, some folding rules are behind this. In the case that we can't rely on filesystem (via inode number) to do this check, fall back to fspathcmp() which is not perfect but should not give false positives. This patch is tested with vim-colorschemes and Sublime-Gitignore repositories on a JFS partition with case insensitive support on Linux. Signed-off-by: Nguyễn Thái Ngọc Duy --- builtin/clone.c | 1 + cache.h | 1 + entry.c | 31 +++ t/t5601-clone.sh | 8 +++- unpack-trees.c | 47 +++ unpack-trees.h | 1 + 6 files changed, 88 insertions(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index 5c439f1394..0702b0e9d0 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -747,6 +747,7 @@ static int checkout(int submodule_progress) memset(&opts, 0, sizeof opts); opts.update = 1; opts.merge = 1; + opts.clone = 1; opts.fn = oneway_merge; opts.verbose_update = (option_verbosity >= 0); opts.src_index = &the_index; diff --git a/cache.h b/cache.h index 8b447652a7..6d6138f4f1 100644 --- a/cache.h +++ b/cache.h @@ -1455,6 +1455,7 @@ struct checkout { unsigned force:1, quiet:1, not_new:1, +clone:1, refresh_cache:1; }; #define CHECKOUT_INIT { NULL, "" } diff --git a/entry.c b/entry.c index b5d1d3cf23..8766e27255 100644 --- a/entry.c +++ b/entry.c @@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) return lstat(path, st); } +static void mark_colliding_entries(const struct checkout *state, + struct cache_entry *ce, struct stat *st) +{ + int i, trust_ino = check_stat; + +#if defined(GIT_WINDOWS_NATIVE) + trust_ino = 0; +#endif + + ce->ce_flags |= CE_MATCHED; + + for (i = 0; i < state->istate->cache_nr; i++) { + struct cache_entry *dup = state->istate->cache[i]; + + if (dup == ce) + break; + + if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) + continue; + + if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) || + (!trust_ino && !fspathcmp(ce->name, dup->name))) { + dup->ce_flags |= CE_MATCHED; + break; + } + } +} + /* * Write the contents from ce out to the working tree. * @@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce, return -1; } + if (state->clone) + mark_colliding_entries(state, ce, &st); + /* * We unlink the old file, to get the new one with the * right permissions (including umask, which is nasty diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 0b62037744..f2eb73bc74 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' ' git hash-object -w -t tree --stdin) && c=$(git commit-tree -m bogus $t) && git update-ref refs/heads/bogus $c && - git clone -b bogus . bogus + git clone -b bogus . bogus 2>warning ) ' +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' ' + grep X icasefs/warning && + grep x icasefs/warning && + test_i18ngrep "the following paths have
Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)
On Fri, Aug 17, 2018 at 7:33 PM Jeff King wrote: > > On Fri, Aug 17, 2018 at 10:07:36AM -0700, Junio C Hamano wrote: > > > Junio C Hamano writes: > > > > > It is a bit sad that > > > > > > - if (E) > > > FREE_AND_NULL(E); > > > > > > is not sufficient to catch it. Shouldn't we be doing the same for > > > regular free(E) as well? IOW, like the attached patch. > > > ... > > > > And revised even more to also spell "E" as "E != NULL" (and "!E" as > > "E == NULL"), which seems to make a difference, which is even more > > sad. I do not want to wonder if I have to also add "NULL == E" and > > other variants, so I'll stop here. > > I think it makes sense that these are all distinct if you're using > coccinelle to do stylistic transformations between them (e.g., enforcing > curly braces even around one-liners). Googling a bit shows a kernel patch [1]. Assuming that it works (I didn't check if it made it to linux.git) it would simplify our rules a bit. [1] https://patchwork.kernel.org/patch/5167641/ -- Duy
Re: [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Han-Wen Nienhuys writes: > +/* > + * Optionally highlight one keyword in remote output if it appears at the > start > + * of the line. This should be called for a single line only, which is > + * passed as the first N characters of the SRC array. > + */ > +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, > int n) > +{ > + int i; > + > + if (!want_color_stderr(use_sideband_colors())) { > + strbuf_add(dest, src, n); > + return; > + } > + > + while (isspace(*src)) { > + strbuf_addch(dest, *src); > + src++; > + n--; > + } This loop can run out of bytes in src in search of non-space before n gets to zero or negative, and when that happens ... > + for (i = 0; i < ARRAY_SIZE(keywords); i++) { > + struct keyword_entry *p = keywords + i; > + int len = strlen(p->keyword); > + /* > + * Match case insensitively, so we colorize output from existing > + * servers regardless of the case that they use for their > + * messages. We only highlight the word precisely, so > + * "successful" stays uncolored. > + */ > + if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) { ... these access src[] beyond the end of what the caller intended to show us, and also ... > + strbuf_addstr(dest, p->color); > + strbuf_add(dest, src, len); > + strbuf_addstr(dest, GIT_COLOR_RESET); > + n -= len; > + src += len; > + break; > + } > + } > + > + strbuf_add(dest, src, n); ... this will now try to add 0 or negative number of bytes. > + > +} > + Perhaps this will help (not really tested). The second hunk is an unrelated style clean-up. sideband.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sideband.c b/sideband.c index 1c6bb0e25b..d99a559a44 100644 --- a/sideband.c +++ b/sideband.c @@ -75,11 +75,13 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) return; } - while (isspace(*src)) { + while (isspace(*src) && n) { strbuf_addch(dest, *src); src++; n--; } + if (!n) + return; for (i = 0; i < ARRAY_SIZE(keywords); i++) { struct keyword_entry *p = keywords + i; @@ -101,7 +103,6 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) } strbuf_add(dest, src, n); - }
Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Junio C Hamano writes: > This loop can run out of bytes in src in search of non-space before > n gets to zero or negative, and when that happens ... > >> +for (i = 0; i < ARRAY_SIZE(keywords); i++) { >> +struct keyword_entry *p = keywords + i; >> +int len = strlen(p->keyword); >> +/* >> + * Match case insensitively, so we colorize output from existing >> + * servers regardless of the case that they use for their >> + * messages. We only highlight the word precisely, so >> + * "successful" stays uncolored. >> + */ >> +if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) { > > ... these access src[] beyond the end of what the caller intended to > show us, and also ... Actually, leaving when !n before this loop is insufficient. src[] may have 2 bytes "in" remaining, and we may be trying to see if it begins with "info", for example, and using strncasecmp() with len==4 would of course read beyond the end of src[]. -- >8 -- Subject: sideband: do not read beyond the end of input The caller of maybe_colorize_sideband() gives a counted buffer , but the callee checked *src as if it were a NUL terminated buffer. If src[] had all isspace() bytes in it, we would have made n negative, and then (1) called number of strncasecmp() to see if the remaining bytes in src[] matched keywords, reading beyond the end of the array, and/or (2) called strbuf_add() with negative count, most likely triggering the "you want to use way too much memory" error due to unsigned integer overflow. Signed-off-by: Junio C Hamano --- sideband.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sideband.c b/sideband.c index 1c6bb0e25b..372039247f 100644 --- a/sideband.c +++ b/sideband.c @@ -75,7 +75,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) return; } - while (isspace(*src)) { + while (0 < n && isspace(*src)) { strbuf_addch(dest, *src); src++; n--; @@ -84,6 +84,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) for (i = 0; i < ARRAY_SIZE(keywords); i++) { struct keyword_entry *p = keywords + i; int len = strlen(p->keyword); + + if (n <= len) + continue; /* * Match case insensitively, so we colorize output from existing * servers regardless of the case that they use for their @@ -100,8 +103,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) } } - strbuf_add(dest, src, n); - + if (0 < n) + strbuf_add(dest, src, n); }
[RFC/PATCH] drop vcs-svn experiment
The code in vcs-svn was started in 2010 as an attempt to build a remote-helper for interacting with svn repositories (as opposed to git-svn). However, we never got as far as shipping a mature remote helper, and the last substantive commit was e99d012a6bc in 2012. We do have a git-remote-testsvn, and it is even installed as part of "make install". But given the name, it seems unlikely to be used by anybody (you'd have to explicitly "git clone testsvn::$url", and there have been zero mentions of that on the mailing list since 2013, and even that includes the phrase "you might need to hack a bit to get it working properly"[1]). We also ship contrib/svn-fe, which builds on the vcs-svn work. However, it does not seem to build out of the box for me, as the link step misses some required libraries for using libgit.a. Curiously, the original build breakage bisects for me to eff80a9fd9 (Allow custom "comment char", 2013-01-16), which seems unrelated. There was an attempt to fix it in da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), but on my system that only switches the error message. So it seems like the result is not really usable by anybody in practice. It would be wonderful if somebody wanted to pick up the topic again, and potentially it's worth carrying around for that reason. But the flip side is that people doing a tree-wide operations have to deal with this code. And you can see the list with (replace "HEAD" with this commit as appropriate): { echo "--" git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD } | git log e99d012a6bc.. --stdin which shows 58 times somebody had to deal with the code, generally due to a compile or test failure, or a tree-wide style fix or API change. Let's drop it and let anybody who wants to pick it up do so by resurrecting it from the git history. [1] https://public-inbox.org/git/calkwk0mphzkfzfkkpzkfaus3yvc9nfydbfnt+5jqyvkipk3...@mail.gmail.com/ Signed-off-by: Jeff King --- Of course, I could be completely wrong about people using this. Maybe svn-fe builds are just completely broken on my system, and maybe people really do use testsvn::. But if so, they certainly aren't talking about it on the mailing list. :) I'm cc-ing Jonathan as the only currently-active developer who seems to have put significant work into this code. Maybe you have a more informed opinion. .gitignore | 1 - Makefile | 22 -- contrib/svn-fe/.gitignore | 4 - contrib/svn-fe/Makefile| 105 --- contrib/svn-fe/svn-fe.c| 18 -- contrib/svn-fe/svn-fe.txt | 71 - contrib/svn-fe/svnrdump_sim.py | 68 - remote-testsvn.c | 337 t/helper/test-line-buffer.c| 81 - t/helper/test-svn-fe.c | 52 t/t9020-remote-svn.sh | 89 -- vcs-svn/LICENSE| 32 -- vcs-svn/fast_export.c | 365 -- vcs-svn/fast_export.h | 34 --- vcs-svn/line_buffer.c | 126 vcs-svn/line_buffer.h | 30 -- vcs-svn/line_buffer.txt| 77 - vcs-svn/sliding_window.c | 79 - vcs-svn/sliding_window.h | 18 -- vcs-svn/svndiff.c | 309 --- vcs-svn/svndiff.h | 10 - vcs-svn/svndump.c | 540 - vcs-svn/svndump.h | 10 - 23 files changed, 2478 deletions(-) delete mode 100644 contrib/svn-fe/.gitignore delete mode 100644 contrib/svn-fe/Makefile delete mode 100644 contrib/svn-fe/svn-fe.c delete mode 100644 contrib/svn-fe/svn-fe.txt delete mode 100755 contrib/svn-fe/svnrdump_sim.py delete mode 100644 remote-testsvn.c delete mode 100644 t/helper/test-line-buffer.c delete mode 100644 t/helper/test-svn-fe.c delete mode 100755 t/t9020-remote-svn.sh delete mode 100644 vcs-svn/LICENSE delete mode 100644 vcs-svn/fast_export.c delete mode 100644 vcs-svn/fast_export.h delete mode 100644 vcs-svn/line_buffer.c delete mode 100644 vcs-svn/line_buffer.h delete mode 100644 vcs-svn/line_buffer.txt delete mode 100644 vcs-svn/sliding_window.c delete mode 100644 vcs-svn/sliding_window.h delete mode 100644 vcs-svn/svndiff.c delete mode 100644 vcs-svn/svndiff.h delete mode 100644 vcs-svn/svndump.c delete mode 100644 vcs-svn/svndump.h diff --git a/.gitignore b/.gitignore index 3524803da5..fe022dbeb0 100644 --- a/.gitignore +++ b/.gitignore @@ -131,7 +131,6 @@ /git-remote-ext /git-remote-testgit /git-remote-testpy -/git-remote-testsvn /git-repack /git-replace /git-request-pull diff --git a/Makefile b/Makefile index e3364a42a5..8a890e28e9 100644 --- a/Makefile +++ b/Makefile @@ -695,7 +695,6 @@ PROGRAM_OBJS += http-backend.o PROGRAM_OBJS += imap-send.o PROGRAM_OBJS += sh-i18n--envsubst.o PROGRAM_OBJS += shell.o -PROGRAM_OBJS += remote-testsvn.o # Binary suffix, set to .exe for Windows builds X = @@ -743,10 +742,8 @@ TEST_BUILTINS_OBJS += test-write-cache.o
Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
On 17/08/18 19:39, SZEDER Gábor wrote: > > See, we have quite a few tests that extract repetitive common tasks > into helper functions, which sometimes includes preparing the expected > results and running 'test_cmp', e.g. something like this > (oversimplified) example: > > check_cmd () { > git cmd $1 >actual && > echo "$2" >expect && > test_cmp expect actual > } > > check_cmd --fooFOO > check_cmd --no-foo "" I've only had time to look into this from t0001 up to t0008-ignores.sh, where test_check_ignore does this. If these helper functions need to allow comparing empty files -- how about adding special variation of cmp functions for cases like this: test_cmp_allow_empty and test_i18ncmp_allow_empty? I think it would be a good trade-off to allow these helper functions to skip checking emptiness of arguments for test_cmp. Such patch will require only s/test_cmp/&_allow_empty/ for these helper functions and it will help catch cases as bogus test in t5310. I'll try something like the following on the weekend: test_cmp() { if test "$1" != - && ! test -s "$1" then echo >&4 "error: trying to compare empty file '$1'" return 1 fi if test "$2" != - && ! test -s "$2" then echo >&4 "error: trying to compare empty file '$2'" return 1 fi test_cmp_allow_empty "$@" } test_cmp_allow_empty() { $GIT_TEST_CMP "$@" } (I'm not sure about redirections in test lib functions. The two if's would probably be in a separate function to be re-used by test_i18ncmp.)
Re: [RFC/PATCH] drop vcs-svn experiment
Hi Jeff, Jeff King wrote: > .gitignore | 1 - > Makefile | 22 -- > contrib/svn-fe/.gitignore | 4 - > contrib/svn-fe/Makefile| 105 --- > contrib/svn-fe/svn-fe.c| 18 -- > contrib/svn-fe/svn-fe.txt | 71 - > contrib/svn-fe/svnrdump_sim.py | 68 - > remote-testsvn.c | 337 > t/helper/test-line-buffer.c| 81 - > t/helper/test-svn-fe.c | 52 > t/t9020-remote-svn.sh | 89 -- Doesn't t/t9010-svn-fe.sh also need to be removed? It uses the test-svn-fe helper which is removed. The Fedora git-svn package has included git-remote-testsvn for years now but no one has ever filed any bug reports about it. I looked at whether it should be packaged last year. I came to the conclusion that while it could be used outside of the test suite it was doubtful it actually was. -- Todd ~~ No one ever went broke underestimating the taste of the American public. -- H. L. Mencken
Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems
On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote: The whole patch looks good to me. (I was just sending a different version, but your version is better :-) One minor remark, should the line warning: the following paths have collided start with a capital letter: Warning: the following paths have collided > Paths that only differ in case work fine in a case-sensitive > filesystems, but if those repos are cloned in a case-insensitive one, > you'll get problems. The first thing to notice is "git status" will > never be clean with no indication what exactly is "dirty". > > This patch helps the situation a bit by pointing out the problem at > clone time. Even though this patch talks about case sensitivity, the > patch makes no assumption about folding rules by the filesystem. It > simply observes that if an entry has been already checked out at clone > time when we're about to write a new path, some folding rules are > behind this. > > In the case that we can't rely on filesystem (via inode number) to do > this check, fall back to fspathcmp() which is not perfect but should > not give false positives. > > This patch is tested with vim-colorschemes and Sublime-Gitignore > repositories on a JFS partition with case insensitive support on > Linux. Now even tested under Mac OS/HFS+ [] > ' > > +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file > detection' ' My ambition is to run the test under Windows (both CYGWIN and native) next week, so that we can remove !MINGW and !CYGWIN
Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
Andrei Rybak writes: > I think it would be a good trade-off to allow these helper functions to skip > checking emptiness of arguments for test_cmp. Such patch will require only > s/test_cmp/&_allow_empty/ for these helper functions and it will help catch > cases as bogus test in t5310. > > I'll try something like the following on the weekend: > > test_cmp() { Style: SP before and after (). > if test "$1" != - && ! test -s "$1" > then > echo >&4 "error: trying to compare empty file '$1'" > return 1 > fi > if test "$2" != - && ! test -s "$2" > then > echo >&4 "error: trying to compare empty file '$2'" > return 1 > fi > test_cmp_allow_empty "$@" > } I actually think the above gives way too confusing output, when the actual output is empty and we are expecting some output. I.e. : >expect && git cmd >actual && test_cmp expect actual The tester wants to hear from test_cmp "your 'git cmd' produced some output when we are expecting none" as the primary message. We are trying to find bugs in "git" under development, and diagnosing iffy tests is secondary. But with your change, the first thing that is checked is if 'expect' is an empty file and that is what we get complaints about, without even looking at what is in 'actual'. It's the same story, and it is even worse than the above, when we are expecting some output but the command produced no output, i.e. echo Everything up-to-date. >expect && git cmd >actual && test_cmp expect actual We should get a complaint from test_cmp that 'actual' does not match the string we were expecting (and even better, we show how they are different by running them thru "diff -u"), not that 'actual' is an empty file. > test_cmp_allow_empty() { > $GIT_TEST_CMP "$@" > } > > (I'm not sure about redirections in test lib functions. The two if's would > probably be in a separate function to be re-used by test_i18ncmp.)
Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
On Fri, Aug 17, 2018 at 9:27 PM Andrei Rybak wrote: > > On 17/08/18 19:39, SZEDER Gábor wrote: > > > > See, we have quite a few tests that extract repetitive common tasks > > into helper functions, which sometimes includes preparing the expected > > results and running 'test_cmp', e.g. something like this > > (oversimplified) example: > > > > check_cmd () { > > git cmd $1 >actual && > > echo "$2" >expect && > > test_cmp expect actual > > } > > > > check_cmd --fooFOO > > check_cmd --no-foo "" > > I've only had time to look into this from t0001 up to t0008-ignores.sh, where > test_check_ignore does this. If these helper functions need to allow comparing > empty files -- how about adding special variation of cmp functions for cases > like this: test_cmp_allow_empty and test_i18ncmp_allow_empty? > > I think it would be a good trade-off to allow these helper functions to skip > checking emptiness of arguments for test_cmp. Such patch will require only > s/test_cmp/&_allow_empty/ for these helper functions and it will help catch > cases as bogus test in t5310. > > I'll try something like the following on the weekend: > > test_cmp() { > if test "$1" != - && ! test -s "$1" > then > echo >&4 "error: trying to compare empty file '$1'" > return 1 > fi > if test "$2" != - && ! test -s "$2" > then > echo >&4 "error: trying to compare empty file '$2'" > return 1 > fi Yeah, these conditions are what I instrumented 'test_cmp' with (except I used 'error "bug in test script ..."') to find callsites that could be converted to 'test_must_be_empty', that's how I found the bug fixed in this patch. However, it resulted in a lot of errors from the cases mentioned in my previous email. Then I reached out to Bash and tried this: test_cmp() { if test -n "$BASH_VERSION" && test "${FUNCNAME[1]}" = "test_eval_inner_" && test "$1" != "-" && test ! -s "$1" then error "bug in test script: using test_cmp to check empty file; use test_must_be_empty instead" fi $GIT_TEST_CMP "$@" } i.e. to limit the check callsites where 'test_cmp' is called directly from within a test_expect_{success,failure} block. This is better, almost all errors could be converted to 'test_must_be_empty' stright away; I have the patches almost ready. There are, however, a few parametric cases that choke on this: where we run 'test_cmp' in a loop, e.g. 'cvs update (-p)' in t9400, and I think there was a case where the 'test_expect_success' block is within a function. > test_cmp_allow_empty "$@" > } > > test_cmp_allow_empty() { > $GIT_TEST_CMP "$@" > } > > (I'm not sure about redirections in test lib functions. The two if's would > probably be in a separate function to be re-used by test_i18ncmp.)
[PATCH 2/3] range-diff: make use of different output indicators
This change itself only changes the internal communication and should have no visible effect to the user. We instruct the diff code that produces the inner diffs to use other markers instead of the usual markers for new, old and context lines. Signed-off-by: Stefan Beller --- range-diff.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/range-diff.c b/range-diff.c index b6b9abac266..a906d25f139 100644 --- a/range-diff.c +++ b/range-diff.c @@ -38,6 +38,14 @@ static int read_patches(const char *range, struct string_list *list) argv_array_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges", "--reverse", "--date-order", "--decorate=no", + /* +* Choose indicators that are not used anywhere +* else in diffs, but still look reasonable +* (e.g. will not be confusing when debugging) +*/ + "--output-indicator-new=>", + "--output-indicator-old=<", + "--output-indicator-context=#", "--no-abbrev-commit", range, NULL); cp.out = -1; @@ -108,8 +116,18 @@ static int read_patches(const char *range, struct string_list *list) * we are not interested. */ continue; - else + else if (line.buf[0] == '>') { + strbuf_addch(&buf, '+'); + strbuf_add(&buf, line.buf + 1, line.len - 1); + } else if (line.buf[0] == '<') { + strbuf_addch(&buf, '-'); + strbuf_add(&buf, line.buf + 1, line.len - 1); + } else if (line.buf[0] == '#') { + strbuf_addch(&buf, ' '); + strbuf_add(&buf, line.buf + 1, line.len - 1); + } else { strbuf_addbuf(&buf, &line); + } strbuf_addch(&buf, '\n'); util->diffsize++; -- 2.18.0.265.g16de1b435c9.dirty
[PATCH 1/3] diff.c: add --output-indicator-{new, old, context}
This will prove useful in range-diff in a later patch as we will be able to differentiate between adding a new file (that line is starting with +++ and then the file name) and regular new lines. It could also be useful for experimentation in new patch formats, i.e. we could teach git to emit moved lines with lines other than +/-. Signed-off-by: Stefan Beller --- diff.c | 21 ++--- diff.h | 5 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/diff.c b/diff.c index c5c7739ce34..03486c35b75 100644 --- a/diff.c +++ b/diff.c @@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o, else if (c == '-') set = diff_get_color_opt(o, DIFF_FILE_OLD); } - emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len, + emit_line_ws_markup(o, set_sign, set, reset, + o->output_indicators[OUTPUT_INDICATOR_CONTEXT], + line, len, flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0); break; case DIFF_SYMBOL_PLUS: @@ -1324,7 +1326,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o, set = diff_get_color_opt(o, DIFF_CONTEXT_BOLD); flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK; } - emit_line_ws_markup(o, set_sign, set, reset, '+', line, len, + emit_line_ws_markup(o, set_sign, set, reset, + o->output_indicators[OUTPUT_INDICATOR_NEW], + line, len, flags & DIFF_SYMBOL_CONTENT_WS_MASK, flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF); break; @@ -1367,7 +1371,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o, else set = diff_get_color_opt(o, DIFF_CONTEXT_DIM); } - emit_line_ws_markup(o, set_sign, set, reset, '-', line, len, + emit_line_ws_markup(o, set_sign, set, reset, + o->output_indicators[OUTPUT_INDICATOR_OLD], + line, len, flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0); break; case DIFF_SYMBOL_WORDS_PORCELAIN: @@ -4382,6 +4388,9 @@ void diff_setup(struct diff_options *options) options->file = stdout; + options->output_indicators[OUTPUT_INDICATOR_NEW] = '+'; + options->output_indicators[OUTPUT_INDICATOR_OLD] = '-'; + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = ' '; options->abbrev = DEFAULT_ABBREV; options->line_termination = '\n'; options->break_opt = -1; @@ -4869,6 +4878,12 @@ int diff_opt_parse(struct diff_options *options, options->output_format |= DIFF_FORMAT_DIFFSTAT; } else if (!strcmp(arg, "--no-compact-summary")) options->flags.stat_with_summary = 0; + else if (skip_prefix(arg, "--output-indicator-new=", &arg)) + options->output_indicators[OUTPUT_INDICATOR_NEW] = arg[0]; + else if (skip_prefix(arg, "--output-indicator-old=", &arg)) + options->output_indicators[OUTPUT_INDICATOR_OLD] = arg[0]; + else if (skip_prefix(arg, "--output-indicator-context=", &arg)) + options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = arg[0]; /* renames options */ else if (starts_with(arg, "-B") || diff --git a/diff.h b/diff.h index e1e54256c18..d7edc64454a 100644 --- a/diff.h +++ b/diff.h @@ -194,6 +194,11 @@ struct diff_options { FILE *file; int close_file; +#define OUTPUT_INDICATOR_NEW 0 +#define OUTPUT_INDICATOR_OLD 1 +#define OUTPUT_INDICATOR_CONTEXT 2 + char output_indicators[3]; + struct pathspec pathspec; pathchange_fn_t pathchange; change_fn_t change; -- 2.18.0.265.g16de1b435c9.dirty
[PATCH 0/3] Better colors in range-diff
This improves colors of the range-diff, see last patch for details. it is also available via git fetch https://github.com/stefanbeller/git sb/range-diff-better-colors Thanks, Stefan Stefan Beller (3): diff.c: add --output-indicator-{new, old, context} range-diff: make use of different output indicators range-diff: indent special lines as context diff.c| 21 ++--- diff.h| 5 + range-diff.c | 22 +- t/t3206-range-diff.sh | 12 ++-- 4 files changed, 50 insertions(+), 10 deletions(-) -- 2.18.0.265.g16de1b435c9.dirty
[PATCH 3/3] range-diff: indent special lines as context
The range-diff coloring is a bit fuzzy when it comes to special lines of a diff, such as indicating new and old files with +++ and ---, as it would pickup the first character and interpret it for its coloring, which seems annoying as in regular diffs, these lines are colored bold via DIFF_METAINFO. By indenting these lines by a white space, they will be treated as context which is much more useful, an example [1] on the range diff series itself: [...] + diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt + new file mode 100644 + --- /dev/null + +++ b/Documentation/git-range-diff.txt +@@ ++git-range-diff(1) [...] + diff --git a/Makefile b/Makefile --- a/Makefile +++ b/Makefile [...] The first lines that introduce the new file for the man page will have the '+' sign colored and the rest of the line will be bold. The later lines that indicate a change to the Makefile will be treated as context both in the outer and inner diff, such that those lines stay regular color. [1] ./git-range-diff pr-1/dscho/branch-diff-v3...pr-1/dscho/branch-diff-v4 These tags are found at https://github.com/gitgitgadget/git Signed-off-by: Stefan Beller --- range-diff.c | 2 ++ t/t3206-range-diff.sh | 12 ++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/range-diff.c b/range-diff.c index a906d25f139..3e9b9844012 100644 --- a/range-diff.c +++ b/range-diff.c @@ -90,6 +90,7 @@ static int read_patches(const char *range, struct string_list *list) strbuf_addch(&buf, '\n'); if (!util->diff_offset) util->diff_offset = buf.len; + strbuf_addch(&buf, ' '); strbuf_addbuf(&buf, &line); } else if (in_header) { if (starts_with(line.buf, "Author: ")) { @@ -126,6 +127,7 @@ static int read_patches(const char *range, struct string_list *list) strbuf_addch(&buf, ' '); strbuf_add(&buf, line.buf + 1, line.len - 1); } else { + strbuf_addch(&buf, ' '); strbuf_addbuf(&buf, &line); } diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 7dc7c80a1de..c94f9bf5ee1 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -133,9 +133,9 @@ test_expect_success 'changed message' ' Z +Also a silly comment here! + - Zdiff --git a/file b/file - Z--- a/file - Z+++ b/file + Z diff --git a/file b/file + Z --- a/file + Z +++ b/file 3: 147e64e = 3: b9cb956 s/11/B/ 4: a63e992 = 4: 8add5f1 s/12/B/ EOF @@ -152,9 +152,9 @@ test_expect_success 'dual-coloring' ' : :+Also a silly comment here! :+ - : diff --git a/file b/file - : --- a/file - : +++ b/file + : diff --git a/file b/file + : --- a/file + : +++ b/file :3: 0559556 ! 3: b9cb956 s/11/B/ :@@ -10,7 +10,7 @@ : 9 -- 2.18.0.265.g16de1b435c9.dirty
[PATCH 0/6] reuse on-disk deltas for fetches with bitmaps
This series more aggressively reuses on-disk deltas to serve fetches when reachability bitmaps tell us a more complete picture of what the client has. That saves server CPU and results in smaller packs. See the final patch for numbers and more discussion. It's a resurrection of this very old series: https://public-inbox.org/git/20140326072215.ga31...@sigill.intra.peff.net/ The core idea is good, but it got left as "I should dig into this more to see if we can do even better". In fact, I _did_ do some of that digging, as you can see in the thread, so I'm mildly embarrassed not to have reposted it before now. We've been using that original at GitHub since 2014, which I think helps demonstrate the correctness of the approach (and the numbers here and in that thread show that performance is generally a net win over the status quo). I's rebased on top of the current master, since the original made some assumptions about struct object_entry that are no longer true post-v2.18 (due to the struct-shrinking exercise). So I fixed that and a few other rough edges. But that also means you're not getting code with 4-years of production testing behind it. :) The other really ugly thing in the original was the way it leaked object_entry structs (though in practice that didn't really matter since we needed them until the end of the program anyway). This version fixes that. [1/6]: t/perf: factor boilerplate out of test_perf [2/6]: t/perf: factor out percent calculations [3/6]: t/perf: add infrastructure for measuring sizes [4/6]: t/perf: add perf tests for fetches from a bitmapped server [5/6]: pack-bitmap: save "have" bitmap from walk [6/6]: pack-objects: reuse on-disk deltas for thin "have" objects builtin/pack-objects.c | 28 +++ pack-bitmap.c | 23 +- pack-bitmap.h | 7 +++ pack-objects.c | 19 pack-objects.h | 20 +++- t/perf/README | 25 ++ t/perf/aggregate.perl | 69 ++-- t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++ t/perf/perf-lib.sh | 74 +++--- 9 files changed, 258 insertions(+), 52 deletions(-) create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh -Peff
[PATCH 1/6] t/perf: factor boilerplate out of test_perf
About half of test_perf() is boilerplate preparing to run _any_ test, and the other half is specifically running a timing test. Let's split it into two functions, so that we can reuse the boilerplate in future commits. Signed-off-by: Jeff King --- Best viewed with "-w". t/perf/perf-lib.sh | 61 ++ 1 file changed, 35 insertions(+), 26 deletions(-) diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index e4c343a6b7..a54be09516 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -179,8 +179,8 @@ exit $ret' >&3 2>&4 return "$eval_ret" } - -test_perf () { +test_wrapper_ () { + test_wrapper_func_=$1; shift test_start_ test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq= test "$#" = 2 || @@ -191,35 +191,44 @@ test_perf () { base=$(basename "$0" .sh) echo "$test_count" >>"$perf_results_dir"/$base.subtests echo "$1" >"$perf_results_dir"/$base.$test_count.descr - if test -z "$verbose"; then - printf "%s" "perf $test_count - $1:" - else - echo "perf $test_count - $1:" - fi - for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do - say >&3 "running: $2" - if test_run_perf_ "$2" - then - if test -z "$verbose"; then - printf " %s" "$i" - else - echo "* timing run $i/$GIT_PERF_REPEAT_COUNT:" - fi + base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" .sh)"."$test_count" + "$test_wrapper_func_" "$@" + fi + + test_finish_ +} + +test_perf_ () { + if test -z "$verbose"; then + printf "%s" "perf $test_count - $1:" + else + echo "perf $test_count - $1:" + fi + for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do + say >&3 "running: $2" + if test_run_perf_ "$2" + then + if test -z "$verbose"; then + printf " %s" "$i" else - test -z "$verbose" && echo - test_failure_ "$@" - break + echo "* timing run $i/$GIT_PERF_REPEAT_COUNT:" fi - done - if test -z "$verbose"; then - echo " ok" else - test_ok_ "$1" + test -z "$verbose" && echo + test_failure_ "$@" + break fi - base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" .sh)"."$test_count" - "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times + done + if test -z "$verbose"; then + echo " ok" + else + test_ok_ "$1" fi - test_finish_ + "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times +} + +test_perf () { + test_wrapper_ test_perf_ "$@" } # We extend test_done to print timings at the end (./run disables this -- 2.18.0.1205.g3878b1e64a
[PATCH 2/6] t/perf: factor out percent calculations
This will let us reuse the code when we add new values to aggregate besides times. Signed-off-by: Jeff King --- t/perf/aggregate.perl | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl index bc865160e7..3181b087ab 100755 --- a/t/perf/aggregate.perl +++ b/t/perf/aggregate.perl @@ -19,21 +19,24 @@ sub get_times { return ($rt, $4, $5); } +sub relative_change { + my ($r, $firstr) = @_; + if ($firstr > 0) { + return sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr; + } elsif ($r == 0) { + return "="; + } else { + return "+inf"; + } +} + sub format_times { my ($r, $u, $s, $firstr) = @_; if (!defined $r) { return ""; } my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s; - if (defined $firstr) { - if ($firstr > 0) { - $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr; - } elsif ($r == 0) { - $out .= " ="; - } else { - $out .= " +inf"; - } - } + $out .= ' ' . relative_change($r, $firstr) if defined $firstr; return $out; } -- 2.18.0.1205.g3878b1e64a
[PATCH 3/6] t/perf: add infrastructure for measuring sizes
The main objective of scripts in the perf framework is to run "test_perf", which measures the time it takes to run some operation. However, it can also be interesting to see the change in the output size of certain operations. This patch introduces test_size, which records a single numeric output from the test and shows it in the aggregated output (with pretty printing and relative size comparison). Signed-off-by: Jeff King --- t/perf/README | 25 ++ t/perf/aggregate.perl | 48 ++- t/perf/perf-lib.sh| 13 3 files changed, 81 insertions(+), 5 deletions(-) diff --git a/t/perf/README b/t/perf/README index 21321a0f36..be12090c38 100644 --- a/t/perf/README +++ b/t/perf/README @@ -168,3 +168,28 @@ that While we have tried to make sure that it can cope with embedded whitespace and other special characters, it will not work with multi-line data. + +Rather than tracking the performance by run-time as `test_perf` does, you +may also track output size by using `test_size`. The stdout of the +function should be a single numeric value, which will be captured and +shown in the aggregated output. For example: + + test_perf 'time foo' ' + ./foo >foo.out + ' + + test_size 'output size' + wc -c ; return undef if not defined $line; close $fh or die "cannot close $name: $!"; - $line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/ - or die "bad input line: $line"; - my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3; - return ($rt, $4, $5); + # times + if ($line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) (\d+(?:\.\d+)?)$/) { + my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3; + return ($rt, $4, $5); + # size + } elsif ($line =~ /^\d+$/) { + return $&; + } else { + die "bad input line: $line"; + } } sub relative_change { @@ -32,9 +38,15 @@ sub relative_change { sub format_times { my ($r, $u, $s, $firstr) = @_; + # no value means we did not finish the test if (!defined $r) { return ""; } + # a single value means we have a size, not times + if (!defined $u) { + return format_size($r, $firstr); + } + # otherwise, we have real/user/system times my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s; $out .= ' ' . relative_change($r, $firstr) if defined $firstr; return $out; @@ -54,6 +66,25 @@ sub usage { exit(1); } +sub human_size { + my $n = shift; + my @units = ('', qw(K M G)); + while ($n > 900 && @units > 1) { + $n /= 1000; + shift @units; + } + return $n unless length $units[0]; + return sprintf '%.1f%s', $n, $units[0]; +} + +sub format_size { + my ($size, $first) = @_; + # match the width of a time: 0.00(0.00+0.00) + my $out = sprintf '%15s', human_size($size); + $out .= ' ' . relative_change($size, $first) if defined $first; + return $out; +} + my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests, $codespeed, $sortby, $subsection, $reponame); @@ -184,7 +215,14 @@ sub print_default_results { my $firstr; for my $i (0..$#dirs) { my $d = $dirs[$i]; - $times{$prefixes{$d}.$t} = [get_times("$resultsdir/$prefixes{$d}$t.times")]; + my $base = "$resultsdir/$prefixes{$d}$t"; + $times{$prefixes{$d}.$t} = []; + foreach my $type (qw(times size)) { + if (-e "$base.$type") { + $times{$prefixes{$d}.$t} = [get_times("$base.$type")]; + last; + } + } my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}}; my $w = length format_times($r,$u,$s,$firstr); $colwidth[$i] = $w if $w > $colwidth[$i]; diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index a54be09516..11d1922cf5 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -231,6 +231,19 @@ test_perf () { test_wrapper_ test_perf_ "$@" } +test_size_ () { + say >&3 "running: $2" + if test_eval_ "$2" 3>"$base".size; then + test_ok_ "$1" + else + test_failure_ "$@" + fi +} + +test_size () { + test_wrapper_ test_size_ "$@" +} + # We extend test_done to print timings at the end (./run disables this # and does it after running everything) test_at_end_hook_ () { -- 2.18.0.1205.g3878b1e64a
[PATCH 4/6] t/perf: add perf tests for fetches from a bitmapped server
A server with bitmapped packs can serve a clone very quickly. However, fetches are not necessarily made any faster, because we spend a lot less time in object traversal (which is what bitmaps help with) and more time finding deltas (because we may have to throw out on-disk deltas if the client does not have the base). As a first step to making this faster, this patch introduces a new perf script to measure fetches into a repo of various ages from a fully-bitmapped server. We separately measure the work done by the server (in pack-objects) and that done by the client (in index-pack). Furthermore, we measure the size of the resulting pack. Breaking it down like this (instead of just doing a regular "git fetch") lets us see how much each side benefits from any changes. And since we know the pack size, if we estimate the network speed, then one could calculate a complete wall-clock time for the operation (though the script does not do this automatically). Signed-off-by: Jeff King --- t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++ 1 file changed, 45 insertions(+) create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh new file mode 100755 index 00..b04575951f --- /dev/null +++ b/t/perf/p5311-pack-bitmaps-fetch.sh @@ -0,0 +1,45 @@ +#!/bin/sh + +test_description='performance of fetches from bitmapped packs' +. ./perf-lib.sh + +test_perf_default_repo + +test_expect_success 'create bitmapped server repo' ' + git config pack.writebitmaps true && + git config pack.writebitmaphashcache true && + git repack -ad +' + +# simulate a fetch from a repository that last fetched N days ago, for +# various values of N. We do so by following the first-parent chain, +# and assume the first entry in the chain that is N days older than the current +# HEAD is where the HEAD would have been then. +for days in 1 2 4 8 16 32 64 128; do + title=$(printf '%10s' "($days days)") + test_expect_success "setup revs from $days days ago" ' + now=$(git log -1 --format=%ct HEAD) && + then=$(($now - ($days * 86400))) && + tip=$(git rev-list -1 --first-parent --until=$then HEAD) && + { + echo HEAD && + echo ^$tip + } >revs + ' + + test_perf "server $title" ' + git pack-objects --stdout --revs \ +--thin --delta-base-offset \ +tmp.pack + ' + + test_size "size $title" ' + wc -c
[PATCH 5/6] pack-bitmap: save "have" bitmap from walk
When we do a bitmap walk, we save the result, which represents (WANTs & ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. A few notes on the accessor interface: - the bitmap code calls these "haves" because it grew out of the want/have negotiation for fetches. But really, these are simply the objects that would be flagged UNINTERESTING in a regular traversal. Let's use that more universal nomenclature for the external module interface. We may want to change the internal naming inside the bitmap code, but that's outside the scope of this patch. - it still uses a bare "sha1" rather than "oid". That's true of all of the bitmap code. And in this particular instance, our caller in pack-objects is dealing with the bare sha1 that comes from a packed REF_DELTA (we're pointing directly to the mmap'd pack on disk). That's something we'll have to deal with as we transition to a new hash, but we can wait and see how the caller ends up being fixed and adjust this interface accordingly. Signed-off-by: Jeff King --- Funny story: the earlier version of this series called it bitmap_have(). That caused a bug later when somebody tried to build on it, thinking it was "does the bitmap have this object in the result". Oops. Hence the more descriptive name. pack-bitmap.c | 23 ++- pack-bitmap.h | 7 +++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index f0a1937a1c..76fd93a3de 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -86,6 +86,9 @@ struct bitmap_index { /* Bitmap result of the last performed walk */ struct bitmap *result; + /* "have" bitmap from the last performed walk */ + struct bitmap *haves; + /* Version of the bitmap index */ unsigned int version; @@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) bitmap_and_not(wants_bitmap, haves_bitmap); bitmap_git->result = wants_bitmap; + bitmap_git->haves = haves_bitmap; - bitmap_free(haves_bitmap); return bitmap_git; cleanup: @@ -1114,5 +1117,23 @@ void free_bitmap_index(struct bitmap_index *b) free(b->ext_index.objects); free(b->ext_index.hashes); bitmap_free(b->result); + bitmap_free(b->haves); free(b); } + +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, +const unsigned char *sha1) +{ + int pos; + + if (!bitmap_git) + return 0; /* no bitmap loaded */ + if (!bitmap_git->haves) + return 0; /* walk had no "haves" */ + + pos = bitmap_position_packfile(bitmap_git, sha1); + if (pos < 0) + return 0; + + return bitmap_get(bitmap_git->haves, pos); +} diff --git a/pack-bitmap.h b/pack-bitmap.h index 4555907dee..02a60ce670 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping khash_sha1 *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); +/* + * After a traversal has been performed on the bitmap_index, this can be + * queried to see if a particular object was reachable from any of the + * objects flagged as UNINTERESTING. + */ +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned char *sha1); + void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, -- 2.18.0.1205.g3878b1e64a
[PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects
When we serve a fetch, we pass the "wants" and "haves" from the fetch negotiation to pack-objects. That tells us not only which objects we need to send, but we also use the boundary commits as "preferred bases": their trees and blobs are candidates for delta bases, both for reusing on-disk deltas and for finding new ones. However, this misses some opportunities. Modulo some special cases like shallow or partial clones, we know that every object reachable from the "haves" could be a preferred base. We don't use them all for two reasons: 1. It's expensive to traverse the whole history and enumerate all of the objects the other side has. 2. The delta search is expensive, so we want to keep the number of candidate bases sane. The boundary commits are the most likely to work. When we have reachability bitmaps, though, reason 1 no longer applies. We can efficiently compute the set of reachable objects on the other side (and in fact already did so as part of the bitmap set-difference to get the list of interesting objects). And using this set conveniently covers the shallow and partial cases, since we have to disable the use of bitmaps for those anyway. The second reason argues against using these bases in the search for new deltas. But there's one case where we can use this information for free: when we have an existing on-disk delta that we're considering reusing, we can do so if we know the other side has the base object. This in fact saves time during the delta search, because it's one less delta we have to compute. And that's exactly what this patch does: when we're considering whether to reuse an on-disk delta, if bitmaps tell us the other side has the object (and we're making a thin-pack), then we reuse it. Here are the results on p5311 using linux.git, which simulates a client fetching after `N` days since their last fetch: Test origin HEAD -- 5311.3: server (1 days)0.27(0.27+0.04) 0.12(0.09+0.03) -55.6% 5311.4: size (1 days) 0.9M 237.0K -73.7% 5311.5: client (1 days)0.04(0.05+0.00) 0.10(0.10+0.00) +150.0% 5311.7: server (2 days)0.34(0.42+0.04) 0.13(0.10+0.03) -61.8% 5311.8: size (2 days) 1.5M 347.7K -76.5% 5311.9: client (2 days)0.07(0.08+0.00) 0.16(0.15+0.01) +128.6% 5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8% 5311.12: size (4 days) 2.8M 566.6K -79.8% 5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5% 5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1% 5311.16: size (8 days) 4.3M1.0M -76.0% 5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0% 5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3% 5311.20: size(16 days) 8.0M2.0M -74.5% 5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5% 5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1% 5311.24: size(32 days) 14.1M4.1M -70.9% 5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6% 5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3% 5311.28: size(64 days) 89.4M 47.4M -47.0% 5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2% 5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4% 5311.32: size (128 days)134.8M 100.4M -25.5% 5311.33: client (128 days) 9.42(16.86+0.49)25.34(25.80+0.46) +169.0% In all cases we save CPU time on the server (sometimes significant) and the resulting pack is smaller. We do spend more CPU time on the client side, because it has to reconstruct more deltas. But that's the right tradeoff to make, since clients tend to outnumber servers. It just means the thin pack mechanism is doing its job. >From the user's perspective, the end-to-end time of the operation will generally be faster. E.g., in the 128-day case, we saved 15s on the server at a cost of 16s on the client. Since the resulting pack is 34MB smaller, this is a net win if the network speed is less than 270Mbit/s. And that's actually the worst case. The 64-day case saves just over 11s at a cost of just under 11s. So it's a slight win at any network speed, and the 40MB saved is pure bonus. That trend continues for the smaller fetches. The implementation itself is mostly straightforward, with the new logic going into check_object(). But there are two tricky bits. The first is that check_object() needs access to the relevant information (the thin flag and bitmap result). We can do this by pushing these into program-lifetime globals. The second is that the rest of the code assumes that any reused delta will point to another "struct obj
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore wrote: > > Teach list-objects the "tree:0" filter which allows for filtering > out all tree and blob objects (unless other objects are explicitly > specified by the user). The purpose of this patch is to allow smaller > partial clones. > > The name of this filter - tree:0 - does not explicitly specify that > it also filters out all blobs, but this should not cause much confusion > because blobs are not at all useful without the trees that refer to > them. > > I also consider only:commits as a name, but this is inaccurate because > it suggests that annotated tags are omitted, but actually they are > included. Speaking of tag objects, it is possible to tag anything, including blobs. Would a blob that is tagged (hence reachable without a tree) be not filtered by tree:0 (or in the future any deeper depth) ? I found this series a good read, despite my unfamiliarity of the partial cloning. One situation where I scratched my head for a second were previous patches that use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty would be an equally good choice (I am more used to the latter than the former) Thanks, Stefan
Re: [PATCH 0/3] Better colors in range-diff
Stefan Beller writes: > This improves colors of the range-diff, see last patch for details. How does this relate to your other "color with range-diff" topic that is still in flight? This supersedes it? Builds on it? Something else? Thanks.
git-bug: Distributed bug tracker embedded in git
Hi everyone, I released today git-bug, a distributed bug tracker that embeds in git. It use git's internal storage to store bugs information in a way that can be merged without conflict. You can push/pull to the normal git remote you are already using to interact with other people. Normal code and bugs are completely separated and no files are added in the regular branches. Someone suggested in the Hacker News thread [0] to post it here as well. The project is here [1]. It's a all-in-one binary that is picked up by git as a porcelain command. It features a set of CLI command for simple interaction, an interactive terminal UI and a rich web UI. For more information about the internal design, please read this document [2]. In short, bugs are stored as a series of edit operations stored in git blobs and assembled in a linear chain of commits. This allow to have conflict-free merge and to not pollute the regular branches with bug data. Media embedding is also possible but not yet finished. I'd love to have some feedback from you. Contribution are also very much welcomed. Best regards, [0]: https://news.ycombinator.com/item?id=17782121 [1]: https://github.com/MichaelMure/git-bug [2]: https://github.com/MichaelMure/git-bug/blob/master/doc/model.md -- Michael Muré
Re: [PATCH 0/3] Better colors in range-diff
On Fri, Aug 17, 2018 at 3:04 PM Junio C Hamano wrote: > > Stefan Beller writes: > > > This improves colors of the range-diff, see last patch for details. > > How does this relate to your other "color with range-diff" topic > that is still in flight? This supersedes it? Builds on it? > Something else? It builds on top. Sorry about missing the obvious, Stefan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller wrote: > > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore wrote: > > > > Teach list-objects the "tree:0" filter which allows for filtering > > out all tree and blob objects (unless other objects are explicitly > > specified by the user). The purpose of this patch is to allow smaller > > partial clones. > > > > The name of this filter - tree:0 - does not explicitly specify that > > it also filters out all blobs, but this should not cause much confusion > > because blobs are not at all useful without the trees that refer to > > them. > > > > I also consider only:commits as a name, but this is inaccurate because > > it suggests that annotated tags are omitted, but actually they are > > included. > > Speaking of tag objects, it is possible to tag anything, including blobs. > Would a blob that is tagged (hence reachable without a tree) be not > filtered by tree:0 (or in the future any deeper depth) ? I think so. If I try to fetch a tagged tree or blob, it should fetch that object itself, since I'm referring to it explicitly in the git pack-objects arguments (I mention fetch since git rev-list apparently doesn't support specifying non-commits on the command line). This is similar to how I can fetch a commit that would otherwise be filtered *if* I specify it explicitly (rather than a child commit). If you're fetching a tagged tree, then for depth=0, it will fetch the given tree only, and not fetch any referents of an explicitly-given tree. For depth=1, it will fetch all direct referents. If you're fetching a commit, then for depth=0, you will not get any tree objects, and for depth=1, you'll get only the root tree object and none of its referrents. So the commit itself is like a "layer" in the depth count. > > I found this series a good read, despite my unfamiliarity of the > partial cloning. > > One situation where I scratched my head for a second were previous patches > that use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty > would be an equally good choice (I am more used to the latter than the former) Done. Here is an interdiff (sorry, the tab characters are not maintained by my mail client): diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index a1b93c72c..7e2f7ff26 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -200,14 +200,14 @@ test_expect_success 'missing tree objects with --missing=allow-promisor and --ex git -C repo config extensions.partialclone "arbitrary string" && git -C repo rev-list --missing=allow-promisor --objects HEAD >objs 2>rev_list_err && - test_line_count = 0 rev_list_err && + test_must_be_empty rev_list_err && # 3 commits, 3 blobs, and 1 tree test_line_count = 7 objs && # Do the same for --exclude-promisor-objects, but with all trees gone. promise_and_delete $(git -C repo rev-parse baz^{tree}) && git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs 2>rev_list_err && - test_line_count = 0 rev_list_err && + test_must_be_empty rev_list_err && # 3 commits, no blobs or trees test_line_count = 3 objs ' @@ -226,7 +226,7 @@ test_expect_success 'missing non-root tree object and rev-list' ' git -C repo config extensions.partialclone "arbitrary string" && git -C repo rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err && - test_line_count = 0 rev_list_err && + test_must_be_empty rev_list_err && # 1 commit and 1 tree test_line_count = 2 objs ' diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh index 30bf1c73e..27040d73a 100755 --- a/t/t6112-rev-list-filters-objects.sh +++ b/t/t6112-rev-list-filters-objects.sh @@ -206,11 +206,11 @@ test_expect_success 'rev-list W/ --missing=print and --missing=allow-any for tre test_cmp expected missing_objs && # do not complain when a missing tree cannot be parsed - test_line_count = 0 rev_list_err && + test_must_be_empty rev_list_err && git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err && ! grep $TREE objs && - test_line_count = 0 rev_list_err + test_must_be_empty rev_list_err ' # Test tree:0 filter. > > Thanks, > Stefan
Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0
On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore wrote: > > On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller wrote: > > > > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore wrote: > > > > > > Teach list-objects the "tree:0" filter which allows for filtering > > > out all tree and blob objects (unless other objects are explicitly > > > specified by the user). The purpose of this patch is to allow smaller > > > partial clones. > > > > > > The name of this filter - tree:0 - does not explicitly specify that > > > it also filters out all blobs, but this should not cause much confusion > > > because blobs are not at all useful without the trees that refer to > > > them. > > > > > > I also consider only:commits as a name, but this is inaccurate because > > > it suggests that annotated tags are omitted, but actually they are > > > included. > > > > Speaking of tag objects, it is possible to tag anything, including blobs. > > Would a blob that is tagged (hence reachable without a tree) be not > > filtered by tree:0 (or in the future any deeper depth) ? > I think so. If I try to fetch a tagged tree or blob, it should fetch > that object itself, since I'm referring to it explicitly in the git > pack-objects arguments (I mention fetch since git rev-list apparently > doesn't support specifying non-commits on the command line). This is > similar to how I can fetch a commit that would otherwise be filtered > *if* I specify it explicitly (rather than a child commit). > > If you're fetching a tagged tree, then for depth=0, it will fetch the > given tree only, and not fetch any referents of an explicitly-given > tree. For depth=1, it will fetch all direct referents. > > If you're fetching a commit, then for depth=0, you will not get any > tree objects, and for depth=1, you'll get only the root tree object > and none of its referrents. So the commit itself is like a "layer" in > the depth count. That seems smart. Thanks! > > > > > I found this series a good read, despite my unfamiliarity of the > > partial cloning. > > > > One situation where I scratched my head for a second were previous patches > > that use "test_line_count = 0 rev_list_err" whereas using > > test_must_be_empty > > would be an equally good choice (I am more used to the latter than the > > former) > > Done. Here is an interdiff (sorry, the tab characters are not > maintained by my mail client): heh. Thanks for switching the style; I should have emphasized that (after reflection) I found them equally good, I am used to one over the other more. So if that is the only issue brought up, I would not even ask for a resend. Thanks, Stefan
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
> diff --git a/pack-bitmap.h b/pack-bitmap.h > index 4555907dee..02a60ce670 100644 > --- a/pack-bitmap.h > +++ b/pack-bitmap.h > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct > packing_data *mapping > khash_sha1 *reused_bitmaps, int show_progress); > void free_bitmap_index(struct bitmap_index *); > > +/* > + * After a traversal has been performed on the bitmap_index, this can be > + * queried to see if a particular object was reachable from any of the > + * objects flagged as UNINTERESTING. If the traversal has not been performed, we pretend the object was not reachable? Is this a good API design, as it can be used when you do not have done all preparations? similarly to prepare_bitmap_walk we could have if (!bitmap_git->result) BUG("failed to perform bitmap walk before querying"); > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned > char *sha1); You seem to have rebased it to master resolving conflicts only. ;-) Do we want to talk about object ids here instead? (This is what I get to think about when reviewing this series "bottom up". I use "git log -w -p master..HEAD" after applying the patches, probably I should also use --reverse, such that I get to see the commit message before the code for each commit and yet only need to scroll in one direction.)
What's cooking in git.git (Aug 2018, #04; Fri, 17)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. Quite a many topics have graduated to 'master', and also a handful of topics have entered 'next'. I am planning to tag -rc0 over the weekend, and some topics that are in 'next' and marked for 'master' in this issue of "What's cooking" report may be reclassified to cook in 'next' during the pre-release period when that happens. Usually, I refrain from merging larger topics in 'next' down to 'master' when we get close to -rc0, but I am wondering if it is better to merge all of them to 'master', even the ones on the larger and possibly undercooked side, expecting that we collectively spend effort on hunting and fixing bugs in them during the pre-release freeze period. If we were to go that route, I'd want everybody's buy-in and I'll promise to ignore any shiny new toys that appear on list that are not regression fixes to topics merged to 'master' since the end of the previous cycle to make sure people are not distracted. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to "master"] * ab/fetch-nego (2018-08-01) 3 commits (merged to 'next' on 2018-08-08 at 87662bb344) + fetch doc: cross-link two new negotiation options + negotiator: unknown fetch.negotiationAlgorithm should error out + Merge branch 'jt/fetch-nego-tip' into ab/fetch-nego Update to a few other topics around 'git fetch'. * ab/fsck-transfer-updates (2018-07-27) 10 commits (merged to 'next' on 2018-08-08 at d92085269f) + fsck: test and document unknown fsck. values + fsck: add stress tests for fsck.skipList + fsck: test & document {fetch,receive}.fsck.* config fallback + fetch: implement fetch.fsck.* + transfer.fsckObjects tests: untangle confusing setup + config doc: elaborate on fetch.fsckObjects security + config doc: elaborate on what transfer.fsckObjects does + config doc: unify the description of fsck.* and receive.fsck.* + config doc: don't describe *.fetchObjects twice + receive.fsck. tests: remove dead code The test performed at the receiving end of "git push" to prevent bad objects from entering repository can be customized via receive.fsck.* configuration variables; we now have gained a counterpart to do the same on the "git fetch" side, with fetch.fsck.* configuration variables. * ab/sha1dc (2018-08-02) 1 commit (merged to 'next' on 2018-08-08 at 920c190941) + sha1dc: update from upstream AIX portability update for the SHA1DC hash, imported from upstream. * ab/test-must-be-empty (2018-07-30) 1 commit (merged to 'next' on 2018-08-08 at 06599ebd1f) + tests: make use of the test_must_be_empty function Test updates. * ar/t4150-am-scissors-test-fix (2018-08-06) 1 commit (merged to 'next' on 2018-08-08 at e639183205) + t4150: fix broken test for am --scissors Test fix. * en/abort-df-conflict-fixes (2018-07-31) 2 commits (merged to 'next' on 2018-08-08 at a19cad0bb7) + read-cache: fix directory/file conflict handling in read_index_unmerged() + t1015: demonstrate directory/file conflict recovery failures "git merge --abort" etc. did not clean things up properly when there were conflicted entries in the index in certain order that are involved in D/F conflicts. This has been corrected. * en/t3031-title-fix (2018-08-06) 1 commit (merged to 'next' on 2018-08-08 at 3913b03884) + t3031: update test description to mention desired behavior Test fix. * es/rebase-i-author-script-fix (2018-07-31) 4 commits (merged to 'next' on 2018-08-08 at 6b34261b72) + sequencer: don't die() on bogus user-edited timestamp + sequencer: fix "rebase -i --root" corrupting author header timestamp + sequencer: fix "rebase -i --root" corrupting author header timezone + sequencer: fix "rebase -i --root" corrupting author header (this branch is used by pw/rebase-i-author-script-fix.) The "author-script" file "git rebase -i" creates got broken when we started to move the command away from shell script, which is getting fixed now. * es/want-color-fd-defensive (2018-08-03) 1 commit (merged to 'next' on 2018-08-08 at a11d90d26f) + color: protect against out-of-bounds reads and writes Futureproofing a helper function that can easily be misused. * hn/config-in-code-comment (2018-08-06) 1 commit (merged to 'next' on 2018-08-08 at 1fae946a0f) + config: document git config getter return value Header update. * jk/diff-rendered-docs (2018-08-06) 1 commit (merged to 'next' on 2018-08-08 at fe6e1b4dbe) + add a script to diff rendered documentation The end result of documentation update has been made to be inspected more easily to help developers. * jk
Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk
On Fri, Aug 17, 2018 at 03:39:29PM -0700, Stefan Beller wrote: > > diff --git a/pack-bitmap.h b/pack-bitmap.h > > index 4555907dee..02a60ce670 100644 > > --- a/pack-bitmap.h > > +++ b/pack-bitmap.h > > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, > > struct packing_data *mapping > > khash_sha1 *reused_bitmaps, int show_progress); > > void free_bitmap_index(struct bitmap_index *); > > > > +/* > > + * After a traversal has been performed on the bitmap_index, this can be > > + * queried to see if a particular object was reachable from any of the > > + * objects flagged as UNINTERESTING. > > If the traversal has not been performed, we pretend the > object was not reachable? If the traversal hasn't been performed, the results are not defined (though I suspect yeah, it happens to say "no"). > Is this a good API design, as it can be used when you do not > have done all preparations? similarly to prepare_bitmap_walk > we could have > > if (!bitmap_git->result) > BUG("failed to perform bitmap walk before querying"); That seems like a reasonable precaution. > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned > > char *sha1); > > You seem to have rebased it to master resolving conflicts only. ;-) > Do we want to talk about object ids here instead? See the discussion in the commit message. -Peff
Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects
On Fri, Aug 17, 2018 at 2:06 PM Jeff King wrote: > > When we serve a fetch, we pass the "wants" and "haves" from > the fetch negotiation to pack-objects. That tells us not > only which objects we need to send, but we also use the > boundary commits as "preferred bases": their trees and blobs > are candidates for delta bases, both for reusing on-disk > deltas and for finding new ones. > > However, this misses some opportunities. Modulo some special > cases like shallow or partial clones, we know that every > object reachable from the "haves" could be a preferred base. > We don't use them all for two reasons: s/all/at all/ ? > The first is that check_object() needs access to the > relevant information (the thin flag and bitmap result). We > can do this by pushing these into program-lifetime globals. I discussed internally if extending the fetch protocol to include submodule packs would be a good idea, as then you can get all the superproject+submodule updates via one connection. This gives some benefits, such as a more consistent view from the superproject as well as already knowing the have/wants for the submodule. With this background story, moving things into globals makes me sad, but I guess we can flip this decision once we actually move towards "submodule packs in the main connection". > > The second is that the rest of the code assumes that any > reused delta will point to another "struct object_entry" as > its base. But by definition, we don't have such an entry! I got lost here by the definition (which def?). The delta that we look up from the bitmap, doesn't may not be in the pack, but it could be based off of an object the client already has in its object store and for that there is no struct object_entry in memory. Is that correct? > So taking all of those options into account, what I ended up > with is a separate list of "external bases" that are not > part of the main packing list. Each delta entry that points > to an external base has a single-bit flag to do so; we have a > little breathing room in the bitfield section of > object_entry. > > This lets us limit the change primarily to the oe_delta() > and oe_set_delta_ext() functions. And as a bonus, most of > the rest of the code does not consider these dummy entries > at all, saving both runtime CPU and code complexity. Thanks, Stefan
Re: git-bug: Distributed bug tracker embedded in git
I really like this idea. I've often wanted an integrated bug database like this. My solution has always been to have a subrepo storing bug reports and coments in .txt files and then using bash porcelain scripts to make a git-like interface. I think I like this better. My only nit is Go. That makes me sad. Someone should re-implement this in C or Rust. //t??
Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects
On Fri, Aug 17, 2018 at 03:57:18PM -0700, Stefan Beller wrote: > On Fri, Aug 17, 2018 at 2:06 PM Jeff King wrote: > > > > When we serve a fetch, we pass the "wants" and "haves" from > > the fetch negotiation to pack-objects. That tells us not > > only which objects we need to send, but we also use the > > boundary commits as "preferred bases": their trees and blobs > > are candidates for delta bases, both for reusing on-disk > > deltas and for finding new ones. > > > > However, this misses some opportunities. Modulo some special > > cases like shallow or partial clones, we know that every > > object reachable from the "haves" could be a preferred base. > > We don't use them all for two reasons: > > s/all/at all/ ? No, I meant "we don't use all of them". As in the Pokemon "gotta catch 'em all" slogan. ;) Probably writing out "all of them" is better. > > The first is that check_object() needs access to the > > relevant information (the thin flag and bitmap result). We > > can do this by pushing these into program-lifetime globals. > > I discussed internally if extending the fetch protocol to include > submodule packs would be a good idea, as then you can get all > the superproject+submodule updates via one connection. This > gives some benefits, such as a more consistent view from the > superproject as well as already knowing the have/wants for > the submodule. > > With this background story, moving things into globals > makes me sad, but I guess we can flip this decision once > we actually move towards "submodule packs in the > main connection". I don't think it significantly changes the existing code, which is already relying on a ton of globals (most notably to_pack). The first step in doing multiple packs in the same process is going to be to shove all of that into a "struct pack_objects_context" or similar, and these can just follow the rest. > > > > The second is that the rest of the code assumes that any > > reused delta will point to another "struct object_entry" as > > its base. But by definition, we don't have such an entry! > > I got lost here by the definition (which def?). > > The delta that we look up from the bitmap, doesn't may > not be in the pack, but it could be based off of an object > the client already has in its object store and for that > there is no struct object_entry in memory. > > Is that correct? Right, we are interested in objects that we _couldn't_ find a struct for. I agree this could be more clear. -Peff
[PATCH] t2024: mark test using "checkout -p" with PERL prerequisite
checkout with the -p switch uses the "add interactive" framework which is written in Perl. Add a PERL prerequisite to skip this test when built with NO_PERL. Signed-off-by: CB Bailey --- t/t2024-checkout-dwim.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh index f79dfbbdd6..0e512c3066 100755 --- a/t/t2024-checkout-dwim.sh +++ b/t/t2024-checkout-dwim.sh @@ -75,7 +75,7 @@ test_expect_success 'checkout of branch from multiple remotes fails #1' ' test_branch master ' -test_expect_success 'checkout of branch from multiple remotes fails with advice' ' +test_expect_success PERL 'checkout of branch from multiple remotes fails with advice' ' git checkout -B master && test_might_fail git branch -D foo && test_must_fail git checkout foo 2>stderr && -- 2.14.3 (Apple Git-98)
Re: [RFC/PATCH] drop vcs-svn experiment
Hi, Jeff King wrote: > The code in vcs-svn was started in 2010 as an attempt to > build a remote-helper for interacting with svn repositories > (as opposed to git-svn). However, we never got as far as > shipping a mature remote helper, and the last substantive > commit was e99d012a6bc in 2012. I do use svn-fe occasionally, and have done so in the past few years. That said, it's probably not worth keeping this in tree just for me. > We do have a git-remote-testsvn, and it is even installed as > part of "make install". At a minimum, we should stop doing that. [...] > We also ship contrib/svn-fe, which builds on the vcs-svn > work. However, it does not seem to build out of the box for > me, as the link step misses some required libraries for > using libgit.a. What libraries do you mean? It builds and runs fine for me with $ git diff diff --git i/contrib/svn-fe/Makefile w/contrib/svn-fe/Makefile index e8651aaf4b5..bd709f8d83b 100644 --- i/contrib/svn-fe/Makefile +++ w/contrib/svn-fe/Makefile @@ -4,7 +4,7 @@ CC = cc RM = rm -f MV = mv -CFLAGS = -g -O2 -Wall +CFLAGS = -g -O2 -Wall -pthread LDFLAGS = EXTLIBS = -lz which appears to be platform related, not due to some internal change in Git. [...] > Of course, I could be completely wrong about people using this. Maybe > svn-fe builds are just completely broken on my system, and maybe people > really do use testsvn::. But if so, they certainly aren't talking about > it on the mailing list. :) My take: - svn-fe works fine and has been useful to me, though its Makefile could likely be simplified and made more user-friendly - I've benefited from the test coverage of having this in-tree - testsvn:: is a demo and at a minimum we ought not to install it with "make install" - keeping this in-tree for the benefit of just one user is excessive, so removing it is probably the right thing - it would be nice if the commit removing this code from Git includes a note to help people find its new home Would you mind holding off until I'm able to arrange that last bit? Thanks, Jonathan
Re: git-bug: Distributed bug tracker embedded in git
Hi, Michael Muré wrote: > I released today git-bug, a distributed bug tracker that embeds in > git. It use git's internal storage to store bugs information in a way > that can be merged without conflict. You can push/pull to the normal > git remote you are already using to interact with other people. Normal > code and bugs are completely separated and no files are added in the > regular branches. I am a bit unhappy about the namespace grab. Not for trademark reasons: the Git trademark rules are pretty clear about this kind of usage being okay. Instead, the unhappiness comes because a future Git command like "git bug" to produce a bug report with appropriate diagnostics for a bug in Git seems like a likely and useful thing to get added to Git some day. And now the name's taken. Is it too late to ask if it's possible to come up with a less generic name? Separately from that, I'm happy to see progress being made in the distributed bug tracker world; thanks for that! Thanks, Jonathan
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
(-cc: my @google.com email) Hi, Junio C Hamano wrote: > Subject: sideband: do not read beyond the end of input > > The caller of maybe_colorize_sideband() gives a counted buffer > , but the callee checked *src as if it were a NUL terminated > buffer. If src[] had all isspace() bytes in it, we would have made > n negative, and then (1) called number of strncasecmp() to see if > the remaining bytes in src[] matched keywords, reading beyond the > end of the array, and/or (2) called strbuf_add() with negative > count, most likely triggering the "you want to use way too much > memory" error due to unsigned integer overflow. > > Signed-off-by: Junio C Hamano > --- > sideband.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) This indeed avoids the "you want to use way too much memory" error when I apply it. > --- a/sideband.c > +++ b/sideband.c > @@ -75,7 +75,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, > const char *src, int n) Not about this patch: should the 'n' parameter be a size_t instead of an int? It doesn't matter in practice (since the caller has an int, it can never be more than INT_MAX) but it might make the intent clearer. Based on inspecting the caller, using an int seems fine. > return; > } > > - while (isspace(*src)) { > + while (0 < n && isspace(*src)) { Yes, we need to check 'n && isspace(*src)' to avoid overflowing the buffer if it consists entirely of spaces. > strbuf_addch(dest, *src); > src++; > n--; > @@ -84,6 +84,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, > const char *src, int n) > for (i = 0; i < ARRAY_SIZE(keywords); i++) { > struct keyword_entry *p = keywords + i; > int len = strlen(p->keyword); > + > + if (n <= len) > + continue; Using <= instead of < since we look at the character after the word as well. Good. > /* >* Match case insensitively, so we colorize output from existing >* servers regardless of the case that they use for their >* messages. We only highlight the word precisely, so >* "successful" stays uncolored. >*/ > if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) { Not about this patch: should this check "&& src[len] == ':'" instead, as discussed upthread? > @@ -100,8 +103,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, > const char *src, int n) > } > } > > - strbuf_add(dest, src, n); > + if (0 < n) > + strbuf_add(dest, src, n); This check seems unnecessary. strbuf_add can cope fine with !n. Should we put assert(n >= 0); or even if (n < 0) BUG(); instead, since the earlier part of the fix guarantees that n >= 0? Thanks for the careful work. With or without such a change, Reviewed-by: Jonathan Nieder Thanks.
Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)
On Sat, Aug 18, 2018 at 12:44 AM Junio C Hamano wrote: > * nd/unpack-trees-with-cache-tree (2018-08-13) 5 commits > - unpack-trees: reuse (still valid) cache-tree from src_index > - unpack-trees: reduce malloc in cache-tree walk > - unpack-trees: optimize walking same trees with cache-tree > - unpack-trees: add performance tracing > - trace.h: support nested performance tracing > > The unpack_trees() API used in checking out a branch and merging > walks one or more trees along with the index. When the cache-tree > in the index tells us that we are walking a tree whose flattened > contents is known (i.e. matches a span in the index), as linearly > scanning a span in the index is much more efficient than having to > open tree objects recursively and listing their entries, the walk > can be optimized, which is done in this topic. > > Will merge to and cook in 'next'. Please hold. I was going to address your "The last step feels a bit scary" comment by auditing and adding some cache-tree validation. I'm seeing some bad cache-tree when running more validation through the test suite. Maybe it's just bugs in my validation code, but better be safe than sorry. -- Duy
Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output
Junio C Hamano wrote: > Subject: sideband: do not read beyond the end of input > > The caller of maybe_colorize_sideband() gives a counted buffer > , but the callee checked *src as if it were a NUL terminated > buffer. If src[] had all isspace() bytes in it, we would have made > n negative, and then (1) called number of strncasecmp() to see if > the remaining bytes in src[] matched keywords, reading beyond the > end of the array, and/or (2) called strbuf_add() with negative > count, most likely triggering the "you want to use way too much > memory" error due to unsigned integer overflow. > > Signed-off-by: Junio C Hamano > --- > sideband.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) And here are some tests to squash in. A sideband line consisting entirely of spaces causes us to overflow our buffer and end up with a negative number of remaining characters, ultimately resulting in the message fatal: you want to use way too much memory when parsing it in order to add color. We also forget to limit a strncasecmp and isalnum looking for keywords to color in to the memory region passed in. Fortunately all callers put a delimiter character (\0, \r, or \n) after the end of the region so this does not cause trouble in practice. Add a test for futureproofing to protect the new bounds checking code in case that ever changes. Signed-off-by: Jonathan Nieder --- Thanks again, Jonathan t/t5409-colorize-remote-messages.sh | 14 ++ 1 file changed, 14 insertions(+) diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh index eb1b8aa05df..f81b6813c03 100755 --- a/t/t5409-colorize-remote-messages.sh +++ b/t/t5409-colorize-remote-messages.sh @@ -15,6 +15,8 @@ test_expect_success 'setup' ' echo warning: warning echo prefixerror: error echo " " "error: leading space" + echo "" + echo Err exit 0 EOF echo 1 >file && @@ -44,6 +46,12 @@ test_expect_success 'whole words at line start' ' grep "prefixerror: error" decoded ' +test_expect_success 'short line' ' + git -C child -c color.remote=always push -f origin HEAD:short-line 2>output && + test_decode_color decoded && + grep "remote: Err" decoded +' + test_expect_success 'case-insensitive' ' git --git-dir child/.git -c color.remote=always push -f origin HEAD:refs/heads/case-insensitive 2>output && cat output && @@ -58,6 +66,12 @@ test_expect_success 'leading space' ' grep " error: leading space" decoded ' +test_expect_success 'spaces only' ' + git -C child -c color.remote=always push -f origin HEAD:only-space 2>output && + test_decode_color decoded && + grep "remote: " decoded +' + test_expect_success 'no coloring for redirected output' ' git --git-dir child/.git push -f origin HEAD:refs/heads/redirected-output 2>output && test_decode_color decoded && -- 2.18.0.1017.ga543ac7ca45