date:20180817

[PATCH v2] gpg-interface.c: detect and reject multiple signatures on commits

2018-08-17 Thread Michał Górny

GnuPG supports creating signatures consisting of multiple signature
packets.  If such a signature is verified, it outputs all the status
messages for each signature separately.  However, git currently does not
account for such scenario and gets terribly confused over getting
multiple *SIG statuses.

For example, if a malicious party alters a signed commit and appends
a new untrusted signature, git is going to ignore the original bad
signature and report untrusted commit instead.  However, %GK and %GS
format strings may still expand to the data corresponding
to the original signature, potentially tricking the scripts into
trusting the malicious commit.

Given that the use of multiple signatures is quite rare, git does not
support creating them without jumping through a few hoops, and finally
supporting them properly would require extensive API improvement, it
seems reasonable to just reject them at the moment.

Signed-off-by: Michał Górny 
---
 gpg-interface.c  | 41 
 t/t7510-signed-commit.sh | 26 +
 2 files changed, 59 insertions(+), 8 deletions(-)

Changes in v2:
* used generic 'flags' instead of boolean field,
* added test case for git-verify-commit & git-show.

diff --git a/gpg-interface.c b/gpg-interface.c
index 09ddfbc26..35c25106a 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -21,24 +21,29 @@ void signature_check_clear(struct signature_check *sigc)
FREE_AND_NULL(sigc->key);
 }
 
+/* An exclusive status -- only one of them can appear in output */
+#define GPG_STATUS_EXCLUSIVE   (1<<0)
+
 static struct {
char result;
const char *check;
+   unsigned int flags;
 } sigcheck_gpg_status[] = {
-   { 'G', "\n[GNUPG:] GOODSIG " },
-   { 'B', "\n[GNUPG:] BADSIG " },
-   { 'U', "\n[GNUPG:] TRUST_NEVER" },
-   { 'U', "\n[GNUPG:] TRUST_UNDEFINED" },
-   { 'E', "\n[GNUPG:] ERRSIG "},
-   { 'X', "\n[GNUPG:] EXPSIG "},
-   { 'Y', "\n[GNUPG:] EXPKEYSIG "},
-   { 'R', "\n[GNUPG:] REVKEYSIG "},
+   { 'G', "\n[GNUPG:] GOODSIG ", GPG_STATUS_EXCLUSIVE },
+   { 'B', "\n[GNUPG:] BADSIG ", GPG_STATUS_EXCLUSIVE },
+   { 'U', "\n[GNUPG:] TRUST_NEVER", 0 },
+   { 'U', "\n[GNUPG:] TRUST_UNDEFINED", 0 },
+   { 'E', "\n[GNUPG:] ERRSIG ", GPG_STATUS_EXCLUSIVE },
+   { 'X', "\n[GNUPG:] EXPSIG ", GPG_STATUS_EXCLUSIVE },
+   { 'Y', "\n[GNUPG:] EXPKEYSIG ", GPG_STATUS_EXCLUSIVE },
+   { 'R', "\n[GNUPG:] REVKEYSIG ", GPG_STATUS_EXCLUSIVE },
 };
 
 static void parse_gpg_output(struct signature_check *sigc)
 {
const char *buf = sigc->gpg_status;
int i;
+   int had_exclusive_status = 0;
 
/* Iterate over all search strings */
for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) {
@@ -50,6 +55,10 @@ static void parse_gpg_output(struct signature_check *sigc)
continue;
found += strlen(sigcheck_gpg_status[i].check);
}
+
+   if (sigcheck_gpg_status[i].flags & GPG_STATUS_EXCLUSIVE)
+   had_exclusive_status++;
+
sigc->result = sigcheck_gpg_status[i].result;
/* The trust messages are not followed by key/signer 
information */
if (sigc->result != 'U') {
@@ -62,6 +71,22 @@ static void parse_gpg_output(struct signature_check *sigc)
}
}
}
+
+   /*
+* GOODSIG, BADSIG etc. can occur only once for each signature.
+* Therefore, if we had more than one then we're dealing with multiple
+* signatures.  We don't support them currently, and they're rather
+* hard to create, so something is likely fishy and we should reject
+* them altogether.
+*/
+   if (had_exclusive_status > 1) {
+   sigc->result = 'E';
+   /* Clear partial data to avoid confusion */
+   if (sigc->signer)
+   FREE_AND_NULL(sigc->signer);
+   if (sigc->key)
+   FREE_AND_NULL(sigc->key);
+   }
 }
 
 int check_signature(const char *payload, size_t plen, const char *signature,
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 6e2015ed9..51fb92a72 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -227,4 +227,30 @@ test_expect_success GPG 'log.showsignature behaves like 
--show-signature' '
grep "gpg: Good signature" actual
 '
 
+test_expect_success GPG 'detect fudged commit with double signature' '
+   sed -e "/gpgsig/,/END PGP/d" forged1 >double-base &&
+   sed -n -e "/gpgsig/,/END PGP/p" forged1 | \
+   sed -e "s/^gpgsig//;s/^ //" | gpg --dearmor >double-sig1.sig &&
+   gpg -o double-sig2.sig -u 29472784 --detach-sign double-base &&
+   cat double-sig1.sig double-sig2.sig | gpg --enarmor 
>double-combined.asc &&
+   sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/gpg

Dearly Beloved In Christ.

2018-08-17 Thread Mrs. Daniella Kyle

Hi Dear,

Sorry to invade your privacy, I am Mrs. Daniella Kyle the wife of Mr
Angelo Kyle,my husband worked with Central Bank Of Philippines for ten
years before he died in the year 2012. When my late husband was alive
he deposited sum amount of Money with a Bank in UK, Presently,this
money is still with the Bank,I am in a hospital in Philippines
receiving treatment
and my doctor has confirmed to me that i have just few days to live on
earth due to my esophageal cancer.

Please I am choosing you to receive this money on my behalf,and use it
to help the less privilege. Contact me on my private email:
daniellaaky...@gmail.com

Your Sister in Christ,
Mrs. Daniella Kyle

[PATCH] gpg-interface.c: Fix potentially freeing NULL values

2018-08-17 Thread Michał Górny

Fix signature_check_clear() to free only values that are non-NULL.  This
especially applies to 'key' and 'signer' members that can be NULL during
normal operations, depending on exact GnuPG output.  While at it, also
allow other members to be NULL to make the function easier to use,
even if there is no real need to account for that right now.

Signed-off-by: Michał Górny 
---
 gpg-interface.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/gpg-interface.c b/gpg-interface.c
index 35c25106a..9aedaf464 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -15,9 +15,14 @@ static const char *gpg_program = "gpg";
 void signature_check_clear(struct signature_check *sigc)
 {
-   FREE_AND_NULL(sigc->payload);
-   FREE_AND_NULL(sigc->gpg_output);
-   FREE_AND_NULL(sigc->gpg_status);
-   FREE_AND_NULL(sigc->signer);
-   FREE_AND_NULL(sigc->key);
+   if (sigc->payload)
+   FREE_AND_NULL(sigc->payload);
+   if (sigc->gpg_output)
+   FREE_AND_NULL(sigc->gpg_output);
+   if (sigc->gpg_status)
+   FREE_AND_NULL(sigc->gpg_status);
+   if (sigc->signer)
+   FREE_AND_NULL(sigc->signer);
+   if (sigc->key)
+   FREE_AND_NULL(sigc->key);
 }
 
-- 
2.18.0

Re: [PATCH] gpg-interface.c: Fix potentially freeing NULL values

2018-08-17 Thread Eric Sunshine

On Fri, Aug 17, 2018 at 5:17 AM Michał Górny  wrote:
> Fix signature_check_clear() to free only values that are non-NULL.  This
> especially applies to 'key' and 'signer' members that can be NULL during
> normal operations, depending on exact GnuPG output.  While at it, also
> allow other members to be NULL to make the function easier to use,
> even if there is no real need to account for that right now.

free(NULL) is valid behavior[1] and much of the Git codebase relies upon it.

Did you run into a case where it misbehaved?

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html

> Signed-off-by: Michał Górny 
> ---
> diff --git a/gpg-interface.c b/gpg-interface.c
> index 35c25106a..9aedaf464 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -15,9 +15,14 @@ static const char *gpg_program = "gpg";
>  void signature_check_clear(struct signature_check *sigc)
>  {
> -   FREE_AND_NULL(sigc->payload);
> -   FREE_AND_NULL(sigc->gpg_output);
> -   FREE_AND_NULL(sigc->gpg_status);
> -   FREE_AND_NULL(sigc->signer);
> -   FREE_AND_NULL(sigc->key);
> +   if (sigc->payload)
> +   FREE_AND_NULL(sigc->payload);
> +   if (sigc->gpg_output)
> +   FREE_AND_NULL(sigc->gpg_output);
> +   if (sigc->gpg_status)
> +   FREE_AND_NULL(sigc->gpg_status);
> +   if (sigc->signer)
> +   FREE_AND_NULL(sigc->signer);
> +   if (sigc->key)
> +   FREE_AND_NULL(sigc->key);
>  }

Re: [PATCH] gpg-interface.c: Fix potentially freeing NULL values

2018-08-17 Thread Michał Górny

On Fri, 2018-08-17 at 05:28 -0400, Eric Sunshine wrote:
> On Fri, Aug 17, 2018 at 5:17 AM Michał Górny  wrote:
> > Fix signature_check_clear() to free only values that are non-NULL.  This
> > especially applies to 'key' and 'signer' members that can be NULL during
> > normal operations, depending on exact GnuPG output.  While at it, also
> > allow other members to be NULL to make the function easier to use,
> > even if there is no real need to account for that right now.
> 
> free(NULL) is valid behavior[1] and much of the Git codebase relies upon it.
> 
> Did you run into a case where it misbehaved?

Nope.  I was actually wondering if it's expected, so I did a quick grep
to check whether git is checking pointers for non-NULL before free()ing,
and found at least one:

blame.c-static void drop_origin_blob(struct blame_origin *o)
blame.c-{
blame.c-if (o->file.ptr) {
blame.c:FREE_AND_NULL(o->file.ptr);
blame.c-}
blame.c-}

So I wrongly presumed it might be desirable.  If it's not, that's fine
by me.

> 
> [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html
> 
> > Signed-off-by: Michał Górny 
> > ---
> > diff --git a/gpg-interface.c b/gpg-interface.c
> > index 35c25106a..9aedaf464 100644
> > --- a/gpg-interface.c
> > +++ b/gpg-interface.c
> > @@ -15,9 +15,14 @@ static const char *gpg_program = "gpg";
> >  void signature_check_clear(struct signature_check *sigc)
> >  {
> > -   FREE_AND_NULL(sigc->payload);
> > -   FREE_AND_NULL(sigc->gpg_output);
> > -   FREE_AND_NULL(sigc->gpg_status);
> > -   FREE_AND_NULL(sigc->signer);
> > -   FREE_AND_NULL(sigc->key);
> > +   if (sigc->payload)
> > +   FREE_AND_NULL(sigc->payload);
> > +   if (sigc->gpg_output)
> > +   FREE_AND_NULL(sigc->gpg_output);
> > +   if (sigc->gpg_status)
> > +   FREE_AND_NULL(sigc->gpg_status);
> > +   if (sigc->signer)
> > +   FREE_AND_NULL(sigc->signer);
> > +   if (sigc->key)
> > +   FREE_AND_NULL(sigc->key);
> >  }

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part

[no subject]

2018-08-17 Thread Mr Arthur




did you get my last letter?

Re: [PATCH v3] checkout: optimize "git checkout -b "

2018-08-17 Thread Ben Peart





On 8/16/2018 2:37 PM, Duy Nguyen wrote:

On Thu, Aug 16, 2018 at 8:27 PM Ben Peart  wrote:


From: Ben Peart 

Skip merging the commit, updating the index and working directory if and
only if we are creating a new branch via "git checkout -b ."
Any other checkout options will still go through the former code path.

If sparse_checkout is on, require the user to manually opt in to this
optimzed behavior by setting the config setting checkout.optimizeNewBranch
to true as we will no longer update the skip-worktree bit in the index, nor
add/remove files in the working directory to reflect the current sparse
checkout settings.

For comparison, running "git checkout -b " on a large repo takes:

14.6 seconds - without this patch
0.3 seconds - with this patch


I still don't think we should do this. If you want lightning fast
branch creation, just use 'git branch'. From the timing breakdown you
shown in the other thread it looks like sparse checkout still takes
seconds, which could be optimized (or even excluded, I mentioned this
too). And split index (or something similar if you can't use it) would
give you saving across the board. There is still one idea Elijah gave
me that should further lower traverse_trees()  cost.



We have investigated some of these already - split index ended up 
slowing things down more than it sped them up do to the higher compute 
costs.  Sparse checkout we've already optimized significantly - limiting 
the patterns we accept so that we can do the lookup via a hashmap 
instead of the robust pattern matching.  We will continue to look for 
other optimizations and appreciate any and all ideas!


In the end, this optimization makes a huge performance improvement by 
avoiding doing a lot of work that isn't necessary.  Taking a command 
from 14+ seconds to sub-second is just too much of a win for us to ignore.



But anyway, it's not my call. I'll stop here.

[PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Ævar Arnfjörð Bjarmason

Change the few conditional uses of FREE_AND_NULL(x) to be
unconditional. As noted in the standard[1] free(NULL) is perfectly
valid, so we might as well leave this check up to the C library.

1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Let's do the opposite of this instead.

 blame.c | 4 +---
 branch.c| 4 +---
 http.c  | 4 +---
 tree-diff.c | 4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/blame.c b/blame.c
index 58a7036847..b22a95de7b 100644
--- a/blame.c
+++ b/blame.c
@@ -334,9 +334,7 @@ static void fill_origin_blob(struct diff_options *opt,
 
 static void drop_origin_blob(struct blame_origin *o)
 {
-   if (o->file.ptr) {
-   FREE_AND_NULL(o->file.ptr);
-   }
+   FREE_AND_NULL(o->file.ptr);
 }
 
 /*
diff --git a/branch.c b/branch.c
index ecd710d730..776f55fc66 100644
--- a/branch.c
+++ b/branch.c
@@ -25,9 +25,7 @@ static int find_tracked_branch(struct remote *remote, void 
*priv)
tracking->remote = remote->name;
} else {
free(tracking->spec.src);
-   if (tracking->src) {
-   FREE_AND_NULL(tracking->src);
-   }
+   FREE_AND_NULL(tracking->src);
}
tracking->spec.src = NULL;
}
diff --git a/http.c b/http.c
index b4bfbceaeb..4162860ee3 100644
--- a/http.c
+++ b/http.c
@@ -2418,9 +2418,7 @@ void release_http_object_request(struct 
http_object_request *freq)
close(freq->localfile);
freq->localfile = -1;
}
-   if (freq->url != NULL) {
-   FREE_AND_NULL(freq->url);
-   }
+   FREE_AND_NULL(freq->url);
if (freq->slot != NULL) {
freq->slot->callback_func = NULL;
freq->slot->callback_data = NULL;
diff --git a/tree-diff.c b/tree-diff.c
index fe2e466ac1..553bc0e63a 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -557,9 +557,7 @@ struct combine_diff_path *diff_tree_paths(
 * free pre-allocated last element, if any
 * (see path_appendnew() for details about why)
 */
-   if (p->next) {
-   FREE_AND_NULL(p->next);
-   }
+   FREE_AND_NULL(p->next);
 
return p;
 }
-- 
2.18.0.865.gffc8e1a3cd6

[RFC] git send-email hashbang

2018-08-17 Thread Samuel Maftoul

I recently contributed for the first time patches on this maillist and
used for the first time `git format-patch` and `git send-email`.
I had hard times making `git send-email` work on my mac, because the
OSX bundled perl was missing the Net::SMTP::SSL module.
So I did `cpan -f Net::SMTP::SSL` (I'm using gmail with smtps/ssl)
which asked me some questions (to setup cpan, I'm not really using
perl usually), and installed the module.
Still `git send-email` wasn't able to find the module.
Actually, during the setup of cpan, I have been asked this:

--
Warning: You do not have write permission for Perl library directories.

To install modules, you need to configure a local Perl library directory or
escalate your privileges.  CPAN can help you by bootstrapping the local::lib
module or by configuring itself to use 'sudo' (if available).  You may also
resolve this problem manually if you need to customize your setup.

What approach do you want?  (Choose 'local::lib', 'sudo' or 'manual')
--

I have naturally choosed the default ('local::lib'), but it still didn't worked.

So I choose to not use the system bundled perl and installed my own
perl with homebrew, installed the Net::SMTP::SSL module ... but still
, it didn't worked.
I looked at the send-email script, changed the hashbang to use
/usr/local/bin/perl instead of /usr/bin/perl and it worked !

Then I wondered what happened, and I discovered that using the bundled
cpan's "sudo" approach works, but I'm not very satisfied that I need
to be root to make this script work.
I also found several stackoverflow questions, gists and other
discussiond with people having this exact problem (on osx) with some
different solution (mostly not working, using `sudo cpan` or
whatever).

It seems strange to me that the script doesn't uses "the perl I use in
my environment", that is, I would have thought the `git-send-email.pl`
script had `#!/usr/bin/env perl` as hashbang.
Then, I read that some environment (namely busybox) don't bundle
`/usr/bin/env`, so I understood this might not be portable.
I think there is a solution involving using a combination of /bin/sh
as hashbang and there executing perl with probably the `-x` flag (see
`perldoc perlrun`).
Is it worth proposing a solution for this problem ?

Thanks !

Re: [GSoC][PATCH v6 11/20] rebase -i: rewrite complete_action() in C

2018-08-17 Thread Phillip Wood

Hi Alban

The interdiff from v5 to v6 looks good, I think the changes you have
made the the other patches in this series are fine, I've just got a
couple of small comments below about this one.

Best Wishes

Phillip

On 10/08/2018 17:51, Alban Gruin wrote:
> 
> This rewrites complete_action() from shell to C.
> 
> A new mode is added to rebase--helper (`--complete-action`), as well as
> a new flag (`--autosquash`).
> 
> Finally, complete_action() is stripped from git-rebase--interactive.sh.
> 
> The original complete_action() would return the code 2 when the todo
> list contained no actions.  This was a special case for rebase -i and
> -p; git-rebase.sh would then apply the autostash, delete the state
> directory, and die with the message "Nothing to do".  This cleanup is
> rewritten in C instead of returning 2.  As rebase -i no longer returns
> 2, the comment describing this behaviour in git-rebase.sh is updated to
> reflect this change.
> 
> The message "Nothing to do" is now printed with error(), and so becomes
> "error: nothing to do".  Some tests in t3404 check this value, so they
> are updated to fit this change.
> 
> The first check might seem useless as we write "noop" to the todo list
> if it is empty.  Actually, the todo list might contain commented
> commands (ie. empty commits).  In this case, complete_action() won’t
> write "noop", and will abort without starting the editor.
> 
> Signed-off-by: Alban Gruin 
> ---
>  builtin/rebase--helper.c  |  12 +++-
>  git-rebase--interactive.sh|  53 ++
>  git-rebase.sh |   2 +-
>  sequencer.c   | 102 ++
>  sequencer.h   |   4 ++
>  t/t3404-rebase-interactive.sh |   2 +-
>  6 files changed, 122 insertions(+), 53 deletions(-)
> 
> diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
> index bed3dd2b95..01b958 100644
> --- a/builtin/rebase--helper.c
> +++ b/builtin/rebase--helper.c
> @@ -13,13 +13,13 @@ static const char * const builtin_rebase_helper_usage[] = 
> {
>  int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
>  {
>   struct replay_opts opts = REPLAY_OPTS_INIT;
> - unsigned flags = 0, keep_empty = 0, rebase_merges = 0;
> + unsigned flags = 0, keep_empty = 0, rebase_merges = 0, autosquash = 0;
>   int abbreviate_commands = 0, rebase_cousins = -1;
>   enum {
>   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
>   CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH,
>   ADD_EXEC, APPEND_TODO_HELP, EDIT_TODO, PREPARE_BRANCH,
> - CHECKOUT_ONTO
> + CHECKOUT_ONTO, COMPLETE_ACTION
>   } command = 0;
>   struct option options[] = {
>   OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
> @@ -29,6 +29,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   OPT_BOOL(0, "rebase-merges", &rebase_merges, N_("rebase merge 
> commits")),
>   OPT_BOOL(0, "rebase-cousins", &rebase_cousins,
>N_("keep original branch points of cousins")),
> + OPT_BOOL(0, "autosquash", &autosquash,
> +  N_("move commits that begin with squash!/fixup!")),
>   OPT__VERBOSE(&opts.verbose, N_("be verbose")),
>   OPT_CMDMODE(0, "continue", &command, N_("continue rebase"),
>   CONTINUE),
> @@ -57,6 +59,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   N_("prepare the branch to be rebased"), 
> PREPARE_BRANCH),
>   OPT_CMDMODE(0, "checkout-onto", &command,
>   N_("checkout a commit"), CHECKOUT_ONTO),
> + OPT_CMDMODE(0, "complete-action", &command,
> + N_("complete the action"), COMPLETE_ACTION),
>   OPT_END()
>   };
>  
> @@ -110,5 +114,9 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!prepare_branch_to_be_rebased(&opts, argv[1]);
>   if (command == CHECKOUT_ONTO && argc == 4)
>   return !!checkout_onto(&opts, argv[1], argv[2], argv[3]);
> + if (command == COMPLETE_ACTION && argc == 6)
> + return !!complete_action(&opts, flags, argv[1], argv[2], 
> argv[3],
> +  argv[4], argv[5], autosquash);
> +
>   usage_with_options(builtin_rebase_helper_usage, options);
>  }
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index b68f108f28..59dc4888a6 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -127,54 +127,6 @@ init_revisions_and_shortrevisions () {
>   fi
>  }
>  
> -complete_action() {
> - test -s "$todo" || echo noop >> "$todo"
> - test -z "$autosquash" || git rebase--helper --rearrange-squash || exit
> - test -n "$cmd" && git rebase--he

Re: [RFC] git send-email hashbang

2018-08-17 Thread Ævar Arnfjörð Bjarmason



On Fri, Aug 17 2018, Samuel Maftoul wrote:

> I recently contributed for the first time patches on this maillist and
> used for the first time `git format-patch` and `git send-email`.
> I had hard times making `git send-email` work on my mac, because the
> OSX bundled perl was missing the Net::SMTP::SSL module.
> So I did `cpan -f Net::SMTP::SSL` (I'm using gmail with smtps/ssl)
> which asked me some questions (to setup cpan, I'm not really using
> perl usually), and installed the module.
> Still `git send-email` wasn't able to find the module.
> Actually, during the setup of cpan, I have been asked this:
>
> --
> Warning: You do not have write permission for Perl library directories.
>
> To install modules, you need to configure a local Perl library directory or
> escalate your privileges.  CPAN can help you by bootstrapping the local::lib
> module or by configuring itself to use 'sudo' (if available).  You may also
> resolve this problem manually if you need to customize your setup.
>
> What approach do you want?  (Choose 'local::lib', 'sudo' or 'manual')
> --
>
> I have naturally choosed the default ('local::lib'), but it still didn't 
> worked.
>
> So I choose to not use the system bundled perl and installed my own
> perl with homebrew, installed the Net::SMTP::SSL module ... but still
> , it didn't worked.
> I looked at the send-email script, changed the hashbang to use
> /usr/local/bin/perl instead of /usr/bin/perl and it worked !
>
> Then I wondered what happened, and I discovered that using the bundled
> cpan's "sudo" approach works, but I'm not very satisfied that I need
> to be root to make this script work.
> I also found several stackoverflow questions, gists and other
> discussiond with people having this exact problem (on osx) with some
> different solution (mostly not working, using `sudo cpan` or
> whatever).

Yeah this experience sucks.

> It seems strange to me that the script doesn't uses "the perl I use in
> my environment", that is, I would have thought the `git-send-email.pl`
> script had `#!/usr/bin/env perl` as hashbang.
> Then, I read that some environment (namely busybox) don't bundle
> `/usr/bin/env`, so I understood this might not be portable.
> I think there is a solution involving using a combination of /bin/sh
> as hashbang and there executing perl with probably the `-x` flag (see
> `perldoc perlrun`).
> Is it worth proposing a solution for this problem ?

The reason not to use the "perl" in the env is because you just get the
other side of this problem. I.e. I install "git" on some linux distro,
but I also do perl development so I install a perlbrew version of it
into my ~/ which doesn't know how to do ssl or whatever else.

Now send-email, "git add -p" and the like will break because the perl I
have doesn't have the required modules etc.

This is why we pick a perl at compile-time, just as we link to libraries
etc. at compile-time.

But perhaps this trade-off isn't the right one to make on OSX.

Re: [GSoC][PATCH v6 15/20] rebase -i: rewrite write_basic_state() in C

2018-08-17 Thread Phillip Wood

On 10/08/2018 17:51, Alban Gruin wrote:
> This rewrites write_basic_state() from git-rebase.sh in C.  This is the
> first step in the conversion of init_basic_state(), hence the mode in
> rebase--helper.c is called INIT_BASIC_STATE.  init_basic_state() will be
> converted in the next commit.
> 
> The part of read_strategy_opts() that parses the stategy options is
> moved to a new function to allow its use in rebase--helper.c.
> 
> Finally, the call to write_basic_state() is removed from
> git-rebase--interactive.sh, replaced by a call to `--init-basic-state`.
> 
> Signed-off-by: Alban Gruin 
> ---
> No changes since v5.
> 
>  builtin/rebase--helper.c   | 28 +-
>  git-rebase--interactive.sh |  7 +++-
>  sequencer.c| 77 --
>  sequencer.h|  4 ++
>  4 files changed, 102 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
> index 0716bbfd78..63c5086e42 100644
> --- a/builtin/rebase--helper.c
> +++ b/builtin/rebase--helper.c
> @@ -5,6 +5,8 @@
>  #include "sequencer.h"
>  #include "rebase-interactive.h"
>  #include "argv-array.h"
> +#include "rerere.h"
> +#include "alias.h"
>  
>  static GIT_PATH_FUNC(path_squash_onto, "rebase-merge/squash-onto")
>  
> @@ -53,11 +55,12 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   unsigned flags = 0, keep_empty = 0, rebase_merges = 0, autosquash = 0;
>   int abbreviate_commands = 0, rebase_cousins = -1, ret;
>   const char *head_hash = NULL, *onto = NULL, *restrict_revision = NULL,
> - *squash_onto = NULL, *upstream = NULL;
> + *squash_onto = NULL, *upstream = NULL, *head_name = NULL;
> + char *raw_strategies = NULL;
>   enum {
>   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_OIDS, EXPAND_OIDS,
>   CHECK_TODO_LIST, REARRANGE_SQUASH, ADD_EXEC, EDIT_TODO, 
> PREPARE_BRANCH,
> - COMPLETE_ACTION
> + COMPLETE_ACTION, INIT_BASIC_STATE
>   } command = 0;
>   struct option options[] = {
>   OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
> @@ -69,6 +72,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>N_("keep original branch points of cousins")),
>   OPT_BOOL(0, "autosquash", &autosquash,
>N_("move commits that begin with squash!/fixup!")),
> + OPT_BOOL(0, "signoff", &opts.signoff, N_("sign commits")),
>   OPT__VERBOSE(&opts.verbose, N_("be verbose")),
>   OPT_CMDMODE(0, "continue", &command, N_("continue rebase"),
>   CONTINUE),
> @@ -93,6 +97,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   N_("prepare the branch to be rebased"), 
> PREPARE_BRANCH),
>   OPT_CMDMODE(0, "complete-action", &command,
>   N_("complete the action"), COMPLETE_ACTION),
> + OPT_CMDMODE(0, "init-basic-state", &command,
> + N_("initialise the rebase state"), 
> INIT_BASIC_STATE),
>   OPT_STRING(0, "onto", &onto, N_("onto"), N_("onto")),
>   OPT_STRING(0, "restrict-revision", &restrict_revision,
>  N_("restrict-revision"), N_("restrict revision")),
> @@ -100,6 +106,14 @@ int cmd_rebase__helper(int argc, const char **argv, 
> const char *prefix)
>  N_("squash onto")),
>   OPT_STRING(0, "upstream", &upstream, N_("upstream"),
>  N_("the upstream commit")),
> + OPT_STRING(0, "head-name", &head_name, N_("head-name"), 
> N_("head name")),
> + OPT_STRING('S', "gpg-sign", &opts.gpg_sign, N_("gpg-sign"),
> +N_("GPG-sign commits")),
> + OPT_STRING(0, "strategy", &opts.strategy, N_("strategy"),
> +N_("rebase strategy")),
> + OPT_STRING(0, "strategy-opts", &raw_strategies, 
> N_("strategy-opts"),
> +N_("strategy options")),
> + OPT_RERERE_AUTOUPDATE(&opts.allow_rerere_auto),
>   OPT_END()
>   };
>  
> @@ -176,6 +190,16 @@ int cmd_rebase__helper(int argc, const char **argv, 
> const char *prefix)
>   free(shortrevisions);
>   return !!ret;
>   }
> + if (command == INIT_BASIC_STATE) {
> + if (raw_strategies)
> + parse_strategy_opts(&opts, raw_strategies);
> +
> + ret = get_revision_ranges(upstream, onto, &head_hash, NULL, 
> NULL);
> + if (ret)
> + return ret;
> +
> + return !!write_basic_state(&opts, head_name, onto, head_hash);
> + }
>  
>   usage_with_options(builtin_rebase_helper_usage, options);
>  }
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 08e9a

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder  wrote:
>
> Nguyễn Thái Ngọc Duy wrote:
>
> > --- a/builtin/commit.c
> > +++ b/builtin/commit.c
> > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const 
> > char *prefix)
> >   STATUS_FORMAT_LONG),
> >   OPT_BOOL('z', "null", &s.null_termination,
> >N_("terminate entries with NUL")),
> > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")),
> > + OPT_BOOL('j', "amend", &amend, N_("amend previous commit")),
> [...]
> > Thoughts?
>
> I'm not a fan.  I would have trouble remembering what the short option
> name means, and it matches the common --jobs option for parallelism
> that many commands use.  "git commit --am" works today already and
> doesn't run into those problems.

The alternative is -A or -M which may be easier associated with
--amend. That "--am" also would break the moment somebody adds
--amsomething.

> I'm sympathetic to the goal of saving typing, but I'm more sympathetic
> to the goal of making user support easier, which is what makes me end
> up there.
>
> That said, I've been looking recently at Mercurial's "hg evolve"
> extension[1] and I wouldn't be against a well thought out new command
> (e.g. "git amend") that does the equivalent of "git commit --amend"
> with some related features.  So I think there are some paths forward
> that involve abbreviating.

I'm not opposed to a new command like this, but I don't think it
should stop us from adding short options.
-- 
Duy

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 3:05 PM Ævar Arnfjörð Bjarmason
 wrote:
>
> Change the few conditional uses of FREE_AND_NULL(x) to be
> unconditional. As noted in the standard[1] free(NULL) is perfectly
> valid, so we might as well leave this check up to the C library.

I'm not trying to make you work more on this. But out of curiosity
would coccinelle help catch this pattern? Szeder's recent work on
running cocci automatically would help catch all future code like this
if we could write an spatch.
-- 
Duy

Re: non-smooth progress indication for git fsck and git gc

2018-08-17 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 11:08 PM Jeff King  wrote:
>
> On Thu, Aug 16, 2018 at 04:55:56PM -0400, Jeff King wrote:
>
> > >  * We spend the majority of the ~30s on this:
> > >
> > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L70-L79
> >
> > This is hashing the actual packfile. This is potentially quite long,
> > especially if you have a ton of big objects.
> >
> > I wonder if we need to do this as a separate step anyway, though. Our
> > verification is based on index-pack these days, which means it's going
> > to walk over the whole content as part of the "Indexing objects" step to
> > expand base objects and mark deltas for later. Could we feed this hash
> > as part of that walk over the data? It's not going to save us 30s, but
> > it's likely to be more efficient. And it would fold the effort naturally
> > into the existing progress meter.
>
> Actually, I take it back. That's the nice, modern way we do it in
> git-verify-pack. But git-fsck uses the ancient "just walk over all of
> the idx entries method". It at least sorts in pack order, which is good,
> but:
>
>   - it's not multi-threaded, like index-pack/verify-pack
>
>   - the index-pack way is actually more efficient than pack-ordering for
> the delta-base cache, because it actually walks the delta-graph in
> the optimal order
>

I actually tried to make git-fsck use index-pack --verify at one
point. The only thing that stopped it from working was index-pack
automatically wrote the newer index version if I remember correctly,
and that would fail the final hash check. fsck performance was not a
big deal so I dropped it. Just saying it should be possible, if
someone's interested in that direction.
-- 
Duy

Re: [PATCH] completion: include PARSE_OPT_HIDDEN in completion output

2018-08-17 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 8:42 PM Ævar Arnfjörð Bjarmason
 wrote:
>
> The PARSE_OPT_HIDDEN is, per the documentation of the "option" struct
> in option parse-options.h, only supposed to affect -h output, not
> completion. That's what the PARSE_OPT_NOCOMPLETE flag is supposed to
> be for.
>
> Since 2e29dca66a ("completion: use __gitcomp_builtin in _git_commit",
> 2018-02-09) we've been using e.g. "git commit --git-completion-helper"
> to get the bash completion for the git-commit command. Due to
> PARSE_OPT_HIDDEN this excluded e.g. --allow-empty and
> --allow-empty-message.
>
> Now, this wasn't a behavior change in that commit. Before that we had
> a hardcoded list of options, removed in 2e29dca66a ("completion: use
> __gitcomp_builtin in _git_commit", 2018-02-09). This list didn't
> contain those two options.
>
> But as noted in the follow-up discussion to c9b5fde759 ("Add option to
> git-commit to allow empty log messages", 2010-04-06) in
> https://public-inbox.org/git/20100406055530.ge3...@coredump.intra.peff.net/
> the motivation for PARSE_OPT_HIDDEN was to keep the "git commit -h"
> output to a minimum, not to hide it from completion.
>
> I think it makes sense to exclude options like these from -h output,
> but for the completion the user is usually not trying to complete "git
> commit --",

Actually I do :)

> but e.g. "git commit --allo", and because of this behavior we don't show 
> these options at all there.

And it would be great if these options do not show up at -- but
do when with --a.

We already do something similar to this with --no-. So this could
be another option.
-- 
Duy

Re: What's cooking in git.git (Aug 2018, #03; Wed, 15)

2018-08-17 Thread Duy Nguyen

On Thu, Aug 16, 2018 at 1:01 AM Junio C Hamano  wrote:
> * bp/checkout-new-branch-optim (2018-07-31) 1 commit
>  - checkout: optimize "git checkout -b "
>
>  "git checkout -b newbranch [HEAD]" should not have to do as much as
>  checking out a commit different from HEAD.  An attempt is made to
>  optimize this special case.
>
>  So... what is the status of this thing?  Is the other "optimize
>  unpack-trees" effort turning out to be a safer and less hacky way
>  to achieve similar gain and this no longer is needed?

I still dislike the fact that this depends on config keys to tweak
behavior and believe there are still rooms for general optimizations.
But the feeling I have so far is I'm doing all the work for Microsoft
in that direction. So I give up. It's up to you and Ben to decide.
-- 
Duy

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 04:36:13PM +0200, Duy Nguyen wrote:
> On Fri, Aug 17, 2018 at 3:05 PM Ævar Arnfjörð Bjarmason
>  wrote:
> >
> > Change the few conditional uses of FREE_AND_NULL(x) to be
> > unconditional. As noted in the standard[1] free(NULL) is perfectly
> > valid, so we might as well leave this check up to the C library.
> 
> I'm not trying to make you work more on this. But out of curiosity
> would coccinelle help catch this pattern? Szeder's recent work on
> running cocci automatically would help catch all future code like this
> if we could write an spatch.

Just fyi this seems to do the trick. Although I'm nowhere good at
coccinelle to say if we should include this (or something like it)

-- 8< --
diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index 4490069df9..f8e018d104 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -16,3 +16,9 @@ expression E;
 - free(E);
 + FREE_AND_NULL(E);
 - E = NULL;
+
+@@
+expression E;
+@@
+- if (E) { FREE_AND_NULL(E); }
++ FREE_AND_NULL(E);
-- 8< --

--
Duy

Re: Contributor Summit planning

2018-08-17 Thread Duy Nguyen

On Mon, Aug 13, 2018 at 10:42 PM Stefan Beller  wrote:
> Ævar specifically pointed out that we might want to hear from you and Duy
> if you want to attend a conference and if so how we can make that happen
> (by choosing location/time/setting appropriately) IIUC.

Since my name shows up... I'm with Elijah on the travel resistance
thing (and am probably even lazier than him). I guess I will remain
the mystery in the git circle.
-- 
Duy

Re: [PATCH] config.txt: clarify core.checkStat = minimal

2018-08-17 Thread Junio C Hamano

Duy Nguyen  writes:

> Perfect. I could wrap it in a patch, but I feel you should take
> authorship for that one. I'll leave it to you to create this commit.

OK, here is what I ended up with.  An extra paragraph was taken from
the old commit you referrred to, which is probably the only
remaining part from your contribution, so the attribution has been
demoted to "Helped-by", but your initiative still is appreciated
very much.

-- >8 --
Subject: [PATCH] config.txt: clarify core.checkStat

The description of this key does not really tell what the 'minimal'
mode checks and does not check.  The description for the 'default'
mode is not much better and just says 'all fields', which is unclear
and is not even correct (e.g. we do not look at 'atime').

Spell out what are and what are not checked under the 'minimal' mode
relative to the 'default' mode to help those who want to decide if
they want to use the 'minimal' mode, also taking information about
this mode from the commit message of c08e4d5b5c (Enable minimal stat
checking - 2013-01-22).

Helped-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 Documentation/config.txt | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..933d719137 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -449,10 +449,20 @@ core.untrackedCache::
See linkgit:git-update-index[1]. `keep` by default.

 core.checkStat::
-   Determines which stat fields to match between the index
-   and work tree. The user can set this to 'default' or
-   'minimal'. Default (or explicitly 'default'), is to check
-   all fields, including the sub-second part of mtime and ctime.
+   When missing or is set to `default`, many fields in the stat
+   structure are checked to detect if a file has been modified
+   since Git looked at it.  When this configuration variable is
+   set to `minimal`, sub-second part of mtime and ctime, the
+   uid and gid of the owner of the file, the inode number (and
+   the device number, if Git was compiled to use it), are
+   excluded from the check among these fields, leaving only the
+   whole-second part of mtime (and ctime, if `core.trustCtime`
+   is set) and the filesize to be checked.
++
+There are implementations of Git that do not leave usable values in
+some fields (e.g. JGit); by excluding these fields from the
+comparison, the `minimal` mode may help interoperability when the
+same repository is used by these other systems at the same time.

 core.quotePath::
Commands that output paths (e.g. 'ls-files', 'diff'), will
-- 
2.18.0-666-g63749b2dea

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 04:33:30PM +0200, Duy Nguyen wrote:

> On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder  wrote:
> >
> > Nguyễn Thái Ngọc Duy wrote:
> >
> > > --- a/builtin/commit.c
> > > +++ b/builtin/commit.c
> > > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const 
> > > char *prefix)
> > >   STATUS_FORMAT_LONG),
> > >   OPT_BOOL('z', "null", &s.null_termination,
> > >N_("terminate entries with NUL")),
> > > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")),
> > > + OPT_BOOL('j', "amend", &amend, N_("amend previous commit")),
> > [...]
> > > Thoughts?
> >
> > I'm not a fan.  I would have trouble remembering what the short option
> > name means, and it matches the common --jobs option for parallelism
> > that many commands use.  "git commit --am" works today already and
> > doesn't run into those problems.
> 
> The alternative is -A or -M which may be easier associated with
> --amend. That "--am" also would break the moment somebody adds
> --amsomething.

I think "-A" has been considered as possibility for matching "commit -a"
/ "add -A" in the past, but I had trouble finding past discussion
(searching for "A" in the mailing list is not very productive). It was
mentioned in 3ba1f11426 (git-add --all: add all files, 2008-07-19), but
that was quite a while ago.

Not necessarily a blocker, but something to consider.

Like Jonathan, I do find "-j" a little non-intuitive, but I agree that
most of the intuitive ones are taken. :)

-Peff

Re: [PATCH] config.txt: clarify core.checkStat = minimal

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 5:26 PM Junio C Hamano  wrote:
> -- >8 --
> Subject: [PATCH] config.txt: clarify core.checkStat
>
> The description of this key does not really tell what the 'minimal'
> mode checks and does not check.  The description for the 'default'
> mode is not much better and just says 'all fields', which is unclear
> and is not even correct (e.g. we do not look at 'atime').
>
> Spell out what are and what are not checked under the 'minimal' mode
> relative to the 'default' mode to help those who want to decide if
> they want to use the 'minimal' mode, also taking information about
> this mode from the commit message of c08e4d5b5c (Enable minimal stat
> checking - 2013-01-22).

Looking good. This does make me want to adjust $GIT_DIR/index format
to optionally not store extra fields if we know we're not going to use
them. But that's a topic for another day.

> Helped-by: Nguyễn Thái Ngọc Duy 
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/config.txt | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ab641bf5a9..933d719137 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -449,10 +449,20 @@ core.untrackedCache::
> See linkgit:git-update-index[1]. `keep` by default.
>
>  core.checkStat::
> -   Determines which stat fields to match between the index
> -   and work tree. The user can set this to 'default' or
> -   'minimal'. Default (or explicitly 'default'), is to check
> -   all fields, including the sub-second part of mtime and ctime.
> +   When missing or is set to `default`, many fields in the stat
> +   structure are checked to detect if a file has been modified
> +   since Git looked at it.  When this configuration variable is
> +   set to `minimal`, sub-second part of mtime and ctime, the
> +   uid and gid of the owner of the file, the inode number (and
> +   the device number, if Git was compiled to use it), are
> +   excluded from the check among these fields, leaving only the
> +   whole-second part of mtime (and ctime, if `core.trustCtime`
> +   is set) and the filesize to be checked.
> ++
> +There are implementations of Git that do not leave usable values in
> +some fields (e.g. JGit); by excluding these fields from the
> +comparison, the `minimal` mode may help interoperability when the
> +same repository is used by these other systems at the same time.
>
>  core.quotePath::
> Commands that output paths (e.g. 'ls-files', 'diff'), will
> --
> 2.18.0-666-g63749b2dea
>


-- 
Duy

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 5:26 PM Jeff King  wrote:
>
> On Fri, Aug 17, 2018 at 04:33:30PM +0200, Duy Nguyen wrote:
>
> > On Fri, Aug 17, 2018 at 8:47 AM Jonathan Nieder  wrote:
> > >
> > > Nguyễn Thái Ngọc Duy wrote:
> > >
> > > > --- a/builtin/commit.c
> > > > +++ b/builtin/commit.c
> > > > @@ -1489,7 +1489,7 @@ int cmd_commit(int argc, const char **argv, const 
> > > > char *prefix)
> > > >   STATUS_FORMAT_LONG),
> > > >   OPT_BOOL('z', "null", &s.null_termination,
> > > >N_("terminate entries with NUL")),
> > > > - OPT_BOOL(0, "amend", &amend, N_("amend previous commit")),
> > > > + OPT_BOOL('j', "amend", &amend, N_("amend previous 
> > > > commit")),
> > > [...]
> > > > Thoughts?
> > >
> > > I'm not a fan.  I would have trouble remembering what the short option
> > > name means, and it matches the common --jobs option for parallelism
> > > that many commands use.  "git commit --am" works today already and
> > > doesn't run into those problems.
> >
> > The alternative is -A or -M which may be easier associated with
> > --amend. That "--am" also would break the moment somebody adds
> > --amsomething.
>
> I think "-A" has been considered as possibility for matching "commit -a"
> / "add -A" in the past, but I had trouble finding past discussion
> (searching for "A" in the mailing list is not very productive). It was
> mentioned in 3ba1f11426 (git-add --all: add all files, 2008-07-19), but
> that was quite a while ago.
>
> Not necessarily a blocker, but something to consider.
>
> Like Jonathan, I do find "-j" a little non-intuitive, but I agree that
> most of the intuitive ones are taken. :)

Oh well. Maybe next time we'll be more careful with adding short
options. Consider this patch dropped.
-- 
Duy

Re: [PATCH/RFC] commit: add short option for --amend

2018-08-17 Thread Junio C Hamano

Duy Nguyen  writes:

> The alternative is -A or -M which may be easier associated with
> --amend.

I would be confused to mistake that "git commit -A $args" would do
something similar to "git add -A && git commit $args".

I do my fair share of amends during the day, and I've never felt the
need for a short-hand, but perhaps that is just me.  I am wondering
if "-E" (stands for 'edit', not 'A or M are out, and the next letter
in amend is E') is understandable and memorable enough---after all,
it is "editing" an existing commit, and the edit is done in a big
and different way than the existing "commit --edit".

But perhaps that reasoning is a bit too cute.  I dunno.

[PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Nguyễn Thái Ngọc Duy

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v5 respects core.checkStat and sorts the output case-insensitively.

 I still don't trust magic st_ino zero, or core.checkStat being zero
 on Windows, so the #if condition still remains but it covers smallest
 area possible and I tested it by manually make it "#if 1"

 The fallback with fspathcmp() is only done when inode can't be
 trusted because strcmp is more expensive and when fspathcmp() learns
 more about real world in the future, it could become even more
 expensive.

 The output sorting is the result of Sublime-Gitignore repo being
 reported recently. It's not perfect but it should help seeing the
 groups in normal case.

 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 31 +++
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 35 +++
 unpack-trees.h   |  1 +
 6 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(&opts, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   trust_ino = 0;
+#endif
+
+   ce->ce_flags |= CE_MATCHED;
+
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, &st);
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have collided" icasefs/warning
+'
+
 partial_clone () {
   SERVER="$1" &&
   URL="$2" &&
diff --git a/unpack-trees.c b/unpack-trees.c
index cd06

Re: Syncing HEAD

2018-08-17 Thread Junio C Hamano

Jeff King  writes:

> So all of this really implies to me that you want to be able to say
> "take this symref on the other side and update this one on the local
> side". I.e., some way to tell a refspec "don't update the value, update
> the symref destination". ...
> ...
>   git fetch origin ~HEAD:refs/remotes/origin/HEAD

We need to be a bit careful here.

You can define the meaning of the above sanely if you know that
refmap refs/heads/*:refs/remotes/origin/* is in effect for the
remote to read "My HEAD points at refs/heads/frotz" and interpret it
as "In order to match, I need to make my refs/remotes/origin/HEAD to
point at refs/remotes/origin/frotz".

Also, what should the above form of "git fetch" write in FETCH_HEAD?
Should "git pull origin ~HEAD:refs/remotes/origin/HEAD" run the fetch
and then merge it (which may have value of refs/remotes/origin/frotz)
to the current branch?  Should the underlying fetch be also fetching
the frotz branch from them at the same time, or do we attempt to merge
a possibly stale 'frotz' (which might not even have been there, the
last time we fetched from them)?

Re: [PATCH] gpg-interface.c: detect and reject multiple signatures on commits

2018-08-17 Thread Junio C Hamano

Jonathan Nieder  writes:

> Michał Górny wrote:
>
>> GnuPG supports creating signatures consisting of multiple signature
>> packets.  If such a signature is verified, it outputs all the status
>> messages for each signature separately.  However, git currently does not
>> account for such scenario and gets terribly confused over getting
>> multiple *SIG statuses.
>>
>> For example, if a malicious party alters a signed commit and appends
>> a new untrusted signature, git is going to ignore the original bad
>> signature and report untrusted commit instead.  However, %GK and %GS
>> format strings may still expand to the data corresponding
>> to the original signature, potentially tricking the scripts into
>> trusting the malicious commit.
>>
>> Given that the use of multiple signatures is quite rare, git does not
>> support creating them without jumping through a few hoops, and finally
>> supporting them properly would require extensive API improvement, it
>> seems reasonable to just reject them at the moment.
>> ---
>
> Thanks for the clear analysis and fix.
>
> May we have your sign-off?  See
> https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html#sign-off
> (or the equivalent section of your local copy of
> Documentation/SubmittingPatches) for what this means.

I do not see the original message you are writing response to on
public-inbox archive.  As long as an update with sign-off will be
sent to the git@vger.kernel.org list, that is OK.

>>  gpg-interface.c | 38 ++
>>  1 file changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/gpg-interface.c b/gpg-interface.c
>> index 09ddfbc26..4e03aec15 100644
>> --- a/gpg-interface.c
>> +++ b/gpg-interface.c
>> @@ -24,21 +24,23 @@ void signature_check_clear(struct signature_check *sigc)
>>  static struct {
>>  char result;
>>  const char *check;
>> +int is_status;
>>  } sigcheck_gpg_status[] = {
>> -{ 'G', "\n[GNUPG:] GOODSIG " },
>> -{ 'B', "\n[GNUPG:] BADSIG " },
>> -{ 'U', "\n[GNUPG:] TRUST_NEVER" },
>> -{ 'U', "\n[GNUPG:] TRUST_UNDEFINED" },
>> -{ 'E', "\n[GNUPG:] ERRSIG "},
>> -{ 'X', "\n[GNUPG:] EXPSIG "},
>> -{ 'Y', "\n[GNUPG:] EXPKEYSIG "},
>> -{ 'R', "\n[GNUPG:] REVKEYSIG "},
>> +{ 'G', "\n[GNUPG:] GOODSIG ", 1 },
>> +{ 'B', "\n[GNUPG:] BADSIG ", 1 },
>> +{ 'U', "\n[GNUPG:] TRUST_NEVER", 0 },
>> +{ 'U', "\n[GNUPG:] TRUST_UNDEFINED", 0 },
>> +{ 'E', "\n[GNUPG:] ERRSIG ", 1},
>> +{ 'X', "\n[GNUPG:] EXPSIG ", 1},
>> +{ 'Y', "\n[GNUPG:] EXPKEYSIG ", 1},
>> +{ 'R', "\n[GNUPG:] REVKEYSIG ", 1},
>>  };
>
> nit: I wonder if making is_status into a flag field (like 'option' in
> git.c's cmd_struct) and having an explicit SIGNATURE_STATUS value to
> put there would make this easier to read.
>
> It's not clear to me that the name is_status or SIGNATURE_STATUS
> captures what this field represents.  Aren't these all sigcheck
> statuses?  Can you describe briefly what distinguishes the cases where
> this should be 0 versus 1?

Good suggestion.

>>  static void parse_gpg_output(struct signature_check *sigc)
>>  {
>>  const char *buf = sigc->gpg_status;
>>  int i;
>> +int had_status = 0;
>>  
>>  /* Iterate over all search strings */
>>  for (i = 0; i < ARRAY_SIZE(sigcheck_gpg_status); i++) {
>> @@ -50,6 +52,10 @@ static void parse_gpg_output(struct signature_check *sigc)
>>  continue;
>>  found += strlen(sigcheck_gpg_status[i].check);
>>  }
>> +
>> +if (sigcheck_gpg_status[i].is_status)
>> +had_status++;
>> +
>>  sigc->result = sigcheck_gpg_status[i].result;
>>  /* The trust messages are not followed by key/signer 
>> information */
>>  if (sigc->result != 'U') {
>> @@ -62,6 +68,22 @@ static void parse_gpg_output(struct signature_check *sigc)
>>  }
>>  }
>>  }
>> +
>> +/*
>> + * GOODSIG, BADSIG etc. can occur only once for each signature.
>> + * Therefore, if we had more than one then we're dealing with multiple
>> + * signatures.  We don't support them currently, and they're rather
>> + * hard to create, so something is likely fishy and we should reject
>> + * them altogether.
>> + */
>> +if (had_status > 1) {
>> +sigc->result = 'E';
>> +/* Clear partial data to avoid confusion */
>> +if (sigc->signer)
>> +FREE_AND_NULL(sigc->signer);
>> +if (sigc->key)
>> +FREE_AND_NULL(sigc->key);
>> +}
>
> Makes sense to me.

I was wondering if we have to revamp the loop altogether.  The
current code runs through the list of all the possible "status"
lines, and find the first occurrence for each type in the buffer
that has GPG output.  Second and subsequent occurrence of the same
type, if existed, will not be noticed by the original loop
structure

Re: Syncing HEAD

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 09:28:59AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > So all of this really implies to me that you want to be able to say
> > "take this symref on the other side and update this one on the local
> > side". I.e., some way to tell a refspec "don't update the value, update
> > the symref destination". ...
> > ...
> >   git fetch origin ~HEAD:refs/remotes/origin/HEAD
> 
> We need to be a bit careful here.
> 
> You can define the meaning of the above sanely if you know that
> refmap refs/heads/*:refs/remotes/origin/* is in effect for the
> remote to read "My HEAD points at refs/heads/frotz" and interpret it
> as "In order to match, I need to make my refs/remotes/origin/HEAD to
> point at refs/remotes/origin/frotz".

Good point. I was thinking too much about the symlink itself and not its
destination. You need some way of mapping that destination, as well.

For Christian's case, it is really the "refs/*:refs/*" mapping that he
would want to emulate (because he'd be doing ~HEAD:HEAD). In some cases,
like that one, you could infer the mapping from the HEAD:HEAD itself (X
on the remote becomes X locally). But that does not work for the
refs/remotes case. You might infer from "~HEAD:refs/remotes/origin/HEAD"
that "X becomes refs/remotes/origin/X", but it is actually "refs/heads/X
becomes ...".

So yeah, this really does need pairing with the overall ref mapping.

I think that's doable even for the example I gave above, because we
could find those refspecs in the config. But:

  git fetch git://... ~HEAD:HEAD

does not have that information. We may or may not have fetched the
pointed-to ref previously.

What if this _required_ that the symref destination from the other side
also be something that we are fetching, and was otherwise an error? That
would avoid any config trickery. It does mean that "git fetch origin
~HEAD:HEAD" does not work.  But I think you'd generally want to pair it
with a fetch anyway. I.e., Either two configured refspecs, or if a
one-off fetch from a URL, fetching the branches and the symref at the
same time.

I suppose the one case it would not support is "I want to fetch HEAD as
a symref and whatever branch it points to, but I do not yet know what
that branch is". Or, I suppose, "I do not want to update any branches,
but just update my notion of HEAD" (i.e., what "remote set-head -a"
currently does).

> Also, what should the above form of "git fetch" write in FETCH_HEAD?
> Should "git pull origin ~HEAD:refs/remotes/origin/HEAD" run the fetch
> and then merge it (which may have value of refs/remotes/origin/frotz)
> to the current branch?  Should the underlying fetch be also fetching
> the frotz branch from them at the same time, or do we attempt to merge
> a possibly stale 'frotz' (which might not even have been there, the
> last time we fetched from them)?

I'd be tempted to say that a symref fetch writes nothing into FETCH_HEAD
at all. Which would make a bare "git pull origin ~HEAD" an error. I'm
sure there are cases it _could_ do something useful, but there are so
many where it doesn't make sense.

-Peff

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Junio C Hamano

Duy Nguyen  writes:

> Just fyi this seems to do the trick. Although I'm nowhere good at
> coccinelle to say if we should include this (or something like it)
>
> -- 8< --
> diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
> index 4490069df9..f8e018d104 100644
> --- a/contrib/coccinelle/free.cocci
> +++ b/contrib/coccinelle/free.cocci
> @@ -16,3 +16,9 @@ expression E;
>  - free(E);
>  + FREE_AND_NULL(E);
>  - E = NULL;
> +
> +@@
> +expression E;
> +@@
> +- if (E) { FREE_AND_NULL(E); }
> ++ FREE_AND_NULL(E);

It is a bit sad that

- if (E)
  FREE_AND_NULL(E);

is not sufficient to catch it.  Shouldn't we be doing the same for
regular free(E) as well?  IOW, like the attached patch.

 contrib/coccinelle/free.cocci | 24 
 1 file changed, 24 insertions(+)

diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index 4490069df9..f748bcfe30 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -16,3 +16,27 @@ expression E;
 - free(E);
 + FREE_AND_NULL(E);
 - E = NULL;
+
+@@
+expression E;
+@@
+- if (E)
+  FREE_AND_NULL(E);
+
+@@
+expression E;
+@@
+- if (E) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (!E) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (E) { FREE_AND_NULL(E); }
++ FREE_AND_NULL(E);

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Junio C Hamano

Junio C Hamano  writes:

> It is a bit sad that
>
>   - if (E)
> FREE_AND_NULL(E);
>
> is not sufficient to catch it.  Shouldn't we be doing the same for
> regular free(E) as well?  IOW, like the attached patch.
> ...

And revised even more to also spell "E" as "E != NULL" (and "!E" as
"E == NULL"), which seems to make a difference, which is even more
sad.  I do not want to wonder if I have to also add "NULL == E" and
other variants, so I'll stop here.

 contrib/coccinelle/free.cocci | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/contrib/coccinelle/free.cocci b/contrib/coccinelle/free.cocci
index 4490069df9..29ca98796f 100644
--- a/contrib/coccinelle/free.cocci
+++ b/contrib/coccinelle/free.cocci
@@ -16,3 +16,63 @@ expression E;
 - free(E);
 + FREE_AND_NULL(E);
 - E = NULL;
+
+@@
+expression E;
+@@
+- if (E)
+  FREE_AND_NULL(E);
+
+@@
+expression E;
+@@
+- if (E != NULL)
+  free(E);
+
+@@
+expression E;
+@@
+- if (E == NULL)
+  free(E);
+
+@@
+expression E;
+@@
+- if (E != NULL)
+  FREE_AND_NULL(E);
+
+@@
+expression E;
+@@
+- if (E) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (!E) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (E) { FREE_AND_NULL(E); }
++ FREE_AND_NULL(E);
+
+@@
+expression E;
+@@
+- if (E != NULL) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (E == NULL) { free(E); }
++ free(E);
+
+@@
+expression E;
+@@
+- if (E != NULL) { FREE_AND_NULL(E); }
++ FREE_AND_NULL(E);

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

>  I still don't trust magic st_ino zero, or core.checkStat being zero
>  on Windows, so the #if condition still remains but it covers smallest
>  area possible and I tested it by manually make it "#if 1"
>
>  The fallback with fspathcmp() is only done when inode can't be
>  trusted because strcmp is more expensive and when fspathcmp() learns
>  more about real world in the future, it could become even more
>  expensive.
>
>  The output sorting is the result of Sublime-Gitignore repo being
>  reported recently. It's not perfect but it should help seeing the
>  groups in normal case.

Looks small and safe.

> +
> + if (o->clone) {
> + struct string_list list = STRING_LIST_INIT_NODUP;
> + int i;
> +
> + for (i = 0; i < index->cache_nr; i++) {
> + struct cache_entry *ce = index->cache[i];
> +
> + if (!(ce->ce_flags & CE_MATCHED))
> + continue;
> +
> + string_list_append(&list, ce->name);
> + ce->ce_flags &= ~CE_MATCHED;
> + }
> +
> + list.cmp = fspathcmp;
> + string_list_sort(&list);
> +
> + if (list.nr)
> + warning(_("the following paths have collided (e.g. 
> case-sensitive paths\n"
> +   "on a case-insensitive filesystem) and only 
> one from the same\n"
> +   "colliding group is in the working tree:\n"));
> +
> + for (i = 0; i < list.nr; i++)
> + fprintf(stderr, "  '%s'\n", list.items[i].string);
> +
> + string_list_clear(&list, 0);

I would have written the "sort, show warning, and list" all inside
"if (list.nr)" block, leaving list-clear outside, which would have
made the logic a bit cleaner.  The reader does not have to bother
thinking "ah, when list.nr==0, this is a no-op anyway" to skip them
if written that way.

I highly suspect that the above was written in that way to reduce
the indentation level, but the right way to reduce the indentation
level, if it bothers readers too much, is to make the whole thing
inside the above if (o->clone) into a dedicated helper function
"void report_collided_checkout(void)", I would think.

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 10:07:36AM -0700, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > It is a bit sad that
> >
> > - if (E)
> >   FREE_AND_NULL(E);
> >
> > is not sufficient to catch it.  Shouldn't we be doing the same for
> > regular free(E) as well?  IOW, like the attached patch.
> > ...
> 
> And revised even more to also spell "E" as "E != NULL" (and "!E" as
> "E == NULL"), which seems to make a difference, which is even more
> sad.  I do not want to wonder if I have to also add "NULL == E" and
> other variants, so I'll stop here.

I think it makes sense that these are all distinct if you're using
coccinelle to do stylistic transformations between them (e.g., enforcing
curly braces even around one-liners).

I wonder if there is a way to "relax" a pattern where these semantically
equivalent cases can all be covered automatically. I don't know enough
about the tool to say.

I guess one way to do it would be to normalize the style in one rule
(e.g., always "!E" instead of "E == NULL"), and then you only have to
write the FREE_AND_NULL rule for the normalized form. For a single case
like this, the end result is about the same number of rules, but in the
long term it saves us work when we have a similar transformation.

-Peff

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-17 Thread SZEDER Gábor

On Fri, Aug 17, 2018 at 12:36 AM Junio C Hamano  wrote:
>
> Andrei Rybak  writes:
>
> > On 14/08/18 13:47, SZEDER Gábor wrote:
> >> ... both
> >> invocations produce empty 'pack{a,b}.objects' files, and the
> >> subsequent 'test_cmp' happily finds those two empty files identical.
> >
> > Is test_cmp ever used for empty files? Would it make sense for
> > test_cmp to issue warning when an empty file is being compared?
>
> Typically test_cmp is used to compare the actual output from a
> dubious command being tested with the expected output from a
> procedure that is known not to be broken (e.g. a run of 'echo', or a
> 'cat' of here-doc), so at least one side would not be empty.
>
> The test done here is an odd case---it compares output from two
> equally dubious processes that are being tested and sees if their
> results match.
>
> That said, since we have test_must_be_empty, we could forbid feeding
> empty files to test_cmp, after telling everybody that a test that
> expects an empty output must use test_must_be_empty.  I do not think
> it is a terrible idea.

I thought so as well, and I've looked into it; in fact this patch was
one of the skeletons that fell out of our test suite "while at it".
However, I had to change my mind about it, and now I don't think we
should go all the way and forbid that.

See, we have quite a few tests that extract repetitive common tasks
into helper functions, which sometimes includes preparing the expected
results and running 'test_cmp', e.g. something like this
(oversimplified) example:

  check_cmd () {
git cmd $1 >actual &&
echo "$2" >expect &&
test_cmp expect actual
  }

  check_cmd --fooFOO
  check_cmd --no-foo ""

Completely forbidding feeding empty files to 'test_cmp' would require
us to add a bit of logic to cases like this to call 'test_cmp' or
'test_must_be_empty' based on the content of the second parameter.
Arguably it's only a tiny bit of logic, as only a single if statement
is needed, but following our coding style it would take 7 lines
instead of only 2:

  if test -n "$2"
  then
echo "$2" >expect &&
test_cmp expect actual
  else
test_must_be_empty actual
  fi

I don't think it's worth it in cases like this.

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 01:39:51PM -0400, Jeff King wrote:

> > I wonder if there is a way to "relax" a pattern where these semantically
> > equivalent cases can all be covered automatically. I don't know enough
> > about the tool to say.
> 
> Hmm. They seem to call these "standard isomorphisms":
> 
>   http://coccinelle.lip6.fr/standard.iso.html
> 
> but I'm not sure of the correct way to use them (e.g., if we want to
> apply them for matching but not actually transform the code, though I am
> not actually opposed to transforming the code, too).

Hmph, I should really pause before hitting 'send'. Last message, I
promise. :)

I do not see an option to include a list an arbitrary set of
isomorphisms, but the standard.iso list should be used by default. I
wonder if you simply need to write your case in the normalized version
they use there (which I think is "X == NULL"), and the others would be
taken care of.

-Peff

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 01:33:08PM -0400, Jeff King wrote:

> > And revised even more to also spell "E" as "E != NULL" (and "!E" as
> > "E == NULL"), which seems to make a difference, which is even more
> > sad.  I do not want to wonder if I have to also add "NULL == E" and
> > other variants, so I'll stop here.
> 
> I think it makes sense that these are all distinct if you're using
> coccinelle to do stylistic transformations between them (e.g., enforcing
> curly braces even around one-liners).
> 
> I wonder if there is a way to "relax" a pattern where these semantically
> equivalent cases can all be covered automatically. I don't know enough
> about the tool to say.

Hmm. They seem to call these "standard isomorphisms":

  http://coccinelle.lip6.fr/standard.iso.html

but I'm not sure of the correct way to use them (e.g., if we want to
apply them for matching but not actually transform the code, though I am
not actually opposed to transforming the code, too).

-Peff

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 10:20:36AM -0700, Junio C Hamano wrote:
> I highly suspect that the above was written in that way to reduce
> the indentation level, but the right way to reduce the indentation
> level, if it bothers readers too much, is to make the whole thing
> inside the above if (o->clone) into a dedicated helper function
> "void report_collided_checkout(void)", I would think.

I read my mind. I thought of separating into a helper function too,
but was not happy that the clearing CE_MATCHED in preparation for this
test is in check_updates(), but the cleaning up CE_MATCHED() is in the
helper function.

So here is the version that separates _both_ phases into helper
functions.

-- 8< --
Subject: [PATCH v6] clone: report duplicate entries on case-insensitive 
filesystems

Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".

This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.

In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.

This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/clone.c  |  1 +
 cache.h  |  1 +
 entry.c  | 31 +++
 t/t5601-clone.sh |  8 +++-
 unpack-trees.c   | 47 +++
 unpack-trees.h   |  1 +
 6 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5c439f1394..0702b0e9d0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -747,6 +747,7 @@ static int checkout(int submodule_progress)
memset(&opts, 0, sizeof opts);
opts.update = 1;
opts.merge = 1;
+   opts.clone = 1;
opts.fn = oneway_merge;
opts.verbose_update = (option_verbosity >= 0);
opts.src_index = &the_index;
diff --git a/cache.h b/cache.h
index 8b447652a7..6d6138f4f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,7 @@ struct checkout {
unsigned force:1,
 quiet:1,
 not_new:1,
+clone:1,
 refresh_cache:1;
 };
 #define CHECKOUT_INIT { NULL, "" }
diff --git a/entry.c b/entry.c
index b5d1d3cf23..8766e27255 100644
--- a/entry.c
+++ b/entry.c
@@ -399,6 +399,34 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
return lstat(path, st);
 }
 
+static void mark_colliding_entries(const struct checkout *state,
+  struct cache_entry *ce, struct stat *st)
+{
+   int i, trust_ino = check_stat;
+
+#if defined(GIT_WINDOWS_NATIVE)
+   trust_ino = 0;
+#endif
+
+   ce->ce_flags |= CE_MATCHED;
+
+   for (i = 0; i < state->istate->cache_nr; i++) {
+   struct cache_entry *dup = state->istate->cache[i];
+
+   if (dup == ce)
+   break;
+
+   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+   continue;
+
+   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+   dup->ce_flags |= CE_MATCHED;
+   break;
+   }
+   }
+}
+
 /*
  * Write the contents from ce out to the working tree.
  *
@@ -455,6 +483,9 @@ int checkout_entry(struct cache_entry *ce,
return -1;
}
 
+   if (state->clone)
+   mark_colliding_entries(state, ce, &st);
+
/*
 * We unlink the old file, to get the new one with the
 * right permissions (including umask, which is nasty
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 0b62037744..f2eb73bc74 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -624,10 +624,16 @@ test_expect_success 'clone on case-insensitive fs' '
git hash-object -w -t tree --stdin) &&
c=$(git commit-tree -m bogus $t) &&
git update-ref refs/heads/bogus $c &&
-   git clone -b bogus . bogus
+   git clone -b bogus . bogus 2>warning
)
 '
 
+test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+   grep X icasefs/warning &&
+   grep x icasefs/warning &&
+   test_i18ngrep "the following paths have

Re: [PATCH] refactor various if (x) FREE_AND_NULL(x) to just FREE_AND_NULL(x)

2018-08-17 Thread Duy Nguyen

On Fri, Aug 17, 2018 at 7:33 PM Jeff King  wrote:
>
> On Fri, Aug 17, 2018 at 10:07:36AM -0700, Junio C Hamano wrote:
>
> > Junio C Hamano  writes:
> >
> > > It is a bit sad that
> > >
> > > - if (E)
> > >   FREE_AND_NULL(E);
> > >
> > > is not sufficient to catch it.  Shouldn't we be doing the same for
> > > regular free(E) as well?  IOW, like the attached patch.
> > > ...
> >
> > And revised even more to also spell "E" as "E != NULL" (and "!E" as
> > "E == NULL"), which seems to make a difference, which is even more
> > sad.  I do not want to wonder if I have to also add "NULL == E" and
> > other variants, so I'll stop here.
>
> I think it makes sense that these are all distinct if you're using
> coccinelle to do stylistic transformations between them (e.g., enforcing
> curly braces even around one-liners).

Googling a bit shows a kernel patch [1]. Assuming that it works (I
didn't check if it made it to linux.git) it would simplify our rules a
bit.

[1] https://patchwork.kernel.org/patch/5167641/
-- 
Duy

Re: [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-17 Thread Junio C Hamano

Han-Wen Nienhuys  writes:

> +/*
> + * Optionally highlight one keyword in remote output if it appears at the 
> start
> + * of the line. This should be called for a single line only, which is
> + * passed as the first N characters of the SRC array.
> + */
> +static void maybe_colorize_sideband(struct strbuf *dest, const char *src, 
> int n)
> +{
> + int i;
> +
> + if (!want_color_stderr(use_sideband_colors())) {
> + strbuf_add(dest, src, n);
> + return;
> + }
> +
> + while (isspace(*src)) {
> + strbuf_addch(dest, *src);
> + src++;
> + n--;
> + }

This loop can run out of bytes in src in search of non-space before
n gets to zero or negative, and when that happens ...

> + for (i = 0; i < ARRAY_SIZE(keywords); i++) {
> + struct keyword_entry *p = keywords + i;
> + int len = strlen(p->keyword);
> + /*
> +  * Match case insensitively, so we colorize output from existing
> +  * servers regardless of the case that they use for their
> +  * messages. We only highlight the word precisely, so
> +  * "successful" stays uncolored.
> +  */
> + if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {

... these access src[] beyond the end of what the caller intended to
show us, and also ...

> + strbuf_addstr(dest, p->color);
> + strbuf_add(dest, src, len);
> + strbuf_addstr(dest, GIT_COLOR_RESET);
> + n -= len;
> + src += len;
> + break;
> + }
> + }
> +
> + strbuf_add(dest, src, n);

... this will now try to add 0 or negative number of bytes.

> +
> +}
> +

Perhaps this will help (not really tested).  The second hunk is an
unrelated style clean-up.


 sideband.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/sideband.c b/sideband.c
index 1c6bb0e25b..d99a559a44 100644
--- a/sideband.c
+++ b/sideband.c
@@ -75,11 +75,13 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
return;
}
 
-   while (isspace(*src)) {
+   while (isspace(*src) && n) {
strbuf_addch(dest, *src);
src++;
n--;
}
+   if (!n)
+   return;
 
for (i = 0; i < ARRAY_SIZE(keywords); i++) {
struct keyword_entry *p = keywords + i;
@@ -101,7 +103,6 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
}
 
strbuf_add(dest, src, n);
-
 }

Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-17 Thread Junio C Hamano

Junio C Hamano  writes:

> This loop can run out of bytes in src in search of non-space before
> n gets to zero or negative, and when that happens ...
>
>> +for (i = 0; i < ARRAY_SIZE(keywords); i++) {
>> +struct keyword_entry *p = keywords + i;
>> +int len = strlen(p->keyword);
>> +/*
>> + * Match case insensitively, so we colorize output from existing
>> + * servers regardless of the case that they use for their
>> + * messages. We only highlight the word precisely, so
>> + * "successful" stays uncolored.
>> + */
>> +if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
>
> ... these access src[] beyond the end of what the caller intended to
> show us, and also ...

Actually, leaving when !n before this loop is insufficient.  src[]
may have 2 bytes "in" remaining, and we may be trying to see if it
begins with "info", for example, and using strncasecmp() with len==4
would of course read beyond the end of src[].

-- >8 --
Subject: sideband: do not read beyond the end of input

The caller of maybe_colorize_sideband() gives a counted buffer
, but the callee checked *src as if it were a NUL terminated
buffer.  If src[] had all isspace() bytes in it, we would have made
n negative, and then (1) called number of strncasecmp() to see if
the remaining bytes in src[] matched keywords, reading beyond the
end of the array, and/or (2) called strbuf_add() with negative
count, most likely triggering the "you want to use way too much
memory" error due to unsigned integer overflow.

Signed-off-by: Junio C Hamano 
---
 sideband.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sideband.c b/sideband.c
index 1c6bb0e25b..372039247f 100644
--- a/sideband.c
+++ b/sideband.c
@@ -75,7 +75,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
return;
}

-   while (isspace(*src)) {
+   while (0 < n && isspace(*src)) {
strbuf_addch(dest, *src);
src++;
n--;
@@ -84,6 +84,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
for (i = 0; i < ARRAY_SIZE(keywords); i++) {
struct keyword_entry *p = keywords + i;
int len = strlen(p->keyword);
+
+   if (n <= len)
+   continue;
/*
 * Match case insensitively, so we colorize output from existing
 * servers regardless of the case that they use for their
@@ -100,8 +103,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
}
}

-   strbuf_add(dest, src, n);
-
+   if (0 < n)
+   strbuf_add(dest, src, n);
 }

[RFC/PATCH] drop vcs-svn experiment

2018-08-17 Thread Jeff King

The code in vcs-svn was started in 2010 as an attempt to
build a remote-helper for interacting with svn repositories
(as opposed to git-svn). However, we never got as far as
shipping a mature remote helper, and the last substantive
commit was e99d012a6bc in 2012.

We do have a git-remote-testsvn, and it is even installed as
part of "make install". But given the name, it seems
unlikely to be used by anybody (you'd have to explicitly
"git clone testsvn::$url", and there have been zero mentions
of that on the mailing list since 2013, and even that
includes the phrase "you might need to hack a bit to get it
working properly"[1]).

We also ship contrib/svn-fe, which builds on the vcs-svn
work. However, it does not seem to build out of the box for
me, as the link step misses some required libraries for
using libgit.a. Curiously, the original build breakage
bisects for me to eff80a9fd9 (Allow custom "comment char",
2013-01-16), which seems unrelated. There was an attempt to
fix it in da011cb0e7 (contrib/svn-fe: fix Makefile,
2014-08-28), but on my system that only switches the error
message.

So it seems like the result is not really usable by anybody
in practice. It would be wonderful if somebody wanted to
pick up the topic again, and potentially it's worth carrying
around for that reason. But the flip side is that people
doing a tree-wide operations have to deal with this code.
And you can see the list with (replace "HEAD" with this
commit as appropriate):

  {
echo "--"
git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD
  } |
  git log e99d012a6bc.. --stdin

which shows 58 times somebody had to deal with the code,
generally due to a compile or test failure, or a tree-wide
style fix or API change. Let's drop it and let anybody who
wants to pick it up do so by resurrecting it from the git
history.

[1] 
https://public-inbox.org/git/calkwk0mphzkfzfkkpzkfaus3yvc9nfydbfnt+5jqyvkipk3...@mail.gmail.com/

Signed-off-by: Jeff King 
---
Of course, I could be completely wrong about people using this. Maybe
svn-fe builds are just completely broken on my system, and maybe people
really do use testsvn::. But if so, they certainly aren't talking about
it on the mailing list. :)

I'm cc-ing Jonathan as the only currently-active developer who seems to
have put significant work into this code. Maybe you have a more informed
opinion.

 .gitignore |   1 -
 Makefile   |  22 --
 contrib/svn-fe/.gitignore  |   4 -
 contrib/svn-fe/Makefile| 105 ---
 contrib/svn-fe/svn-fe.c|  18 --
 contrib/svn-fe/svn-fe.txt  |  71 -
 contrib/svn-fe/svnrdump_sim.py |  68 -
 remote-testsvn.c   | 337 
 t/helper/test-line-buffer.c|  81 -
 t/helper/test-svn-fe.c |  52 
 t/t9020-remote-svn.sh  |  89 --
 vcs-svn/LICENSE|  32 --
 vcs-svn/fast_export.c  | 365 --
 vcs-svn/fast_export.h  |  34 ---
 vcs-svn/line_buffer.c  | 126 
 vcs-svn/line_buffer.h  |  30 --
 vcs-svn/line_buffer.txt|  77 -
 vcs-svn/sliding_window.c   |  79 -
 vcs-svn/sliding_window.h   |  18 --
 vcs-svn/svndiff.c  | 309 ---
 vcs-svn/svndiff.h  |  10 -
 vcs-svn/svndump.c  | 540 -
 vcs-svn/svndump.h  |  10 -
 23 files changed, 2478 deletions(-)
 delete mode 100644 contrib/svn-fe/.gitignore
 delete mode 100644 contrib/svn-fe/Makefile
 delete mode 100644 contrib/svn-fe/svn-fe.c
 delete mode 100644 contrib/svn-fe/svn-fe.txt
 delete mode 100755 contrib/svn-fe/svnrdump_sim.py
 delete mode 100644 remote-testsvn.c
 delete mode 100644 t/helper/test-line-buffer.c
 delete mode 100644 t/helper/test-svn-fe.c
 delete mode 100755 t/t9020-remote-svn.sh
 delete mode 100644 vcs-svn/LICENSE
 delete mode 100644 vcs-svn/fast_export.c
 delete mode 100644 vcs-svn/fast_export.h
 delete mode 100644 vcs-svn/line_buffer.c
 delete mode 100644 vcs-svn/line_buffer.h
 delete mode 100644 vcs-svn/line_buffer.txt
 delete mode 100644 vcs-svn/sliding_window.c
 delete mode 100644 vcs-svn/sliding_window.h
 delete mode 100644 vcs-svn/svndiff.c
 delete mode 100644 vcs-svn/svndiff.h
 delete mode 100644 vcs-svn/svndump.c
 delete mode 100644 vcs-svn/svndump.h

diff --git a/.gitignore b/.gitignore
index 3524803da5..fe022dbeb0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -131,7 +131,6 @@
 /git-remote-ext
 /git-remote-testgit
 /git-remote-testpy
-/git-remote-testsvn
 /git-repack
 /git-replace
 /git-request-pull
diff --git a/Makefile b/Makefile
index e3364a42a5..8a890e28e9 100644
--- a/Makefile
+++ b/Makefile
@@ -695,7 +695,6 @@ PROGRAM_OBJS += http-backend.o
 PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
-PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
 X =
@@ -743,10 +742,8 @@ TEST_BUILTINS_OBJS += test-write-cache.o

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-17 Thread Andrei Rybak

On 17/08/18 19:39, SZEDER Gábor wrote:
> 
> See, we have quite a few tests that extract repetitive common tasks
> into helper functions, which sometimes includes preparing the expected
> results and running 'test_cmp', e.g. something like this
> (oversimplified) example:
> 
>   check_cmd () {
> git cmd $1 >actual &&
> echo "$2" >expect &&
> test_cmp expect actual
>   }
> 
>   check_cmd --fooFOO
>   check_cmd --no-foo ""

I've only had time to look into this from t0001 up to t0008-ignores.sh, where
test_check_ignore does this. If these helper functions need to allow comparing
empty files -- how about adding special variation of cmp functions for cases
like this: test_cmp_allow_empty and test_i18ncmp_allow_empty?

I think it would be a good trade-off to allow these helper functions to skip
checking emptiness of arguments for test_cmp. Such patch will require only
s/test_cmp/&_allow_empty/ for these helper functions and it will help catch
cases as bogus test in t5310.

I'll try something like the following on the weekend:

test_cmp() {
if test "$1" != - && ! test -s "$1"
then
echo >&4 "error: trying to compare empty file '$1'"
return 1
fi
if test "$2" != - && ! test -s "$2"
then
echo >&4 "error: trying to compare empty file '$2'"
return 1
fi
test_cmp_allow_empty "$@"
}

test_cmp_allow_empty() {
$GIT_TEST_CMP "$@"
}

(I'm not sure about redirections in test lib functions. The two if's would
probably be in a separate function to be re-used by test_i18ncmp.)

Re: [RFC/PATCH] drop vcs-svn experiment

2018-08-17 Thread Todd Zullinger

Hi Jeff,

Jeff King wrote:
>  .gitignore |   1 -
>  Makefile   |  22 --
>  contrib/svn-fe/.gitignore  |   4 -
>  contrib/svn-fe/Makefile| 105 ---
>  contrib/svn-fe/svn-fe.c|  18 --
>  contrib/svn-fe/svn-fe.txt  |  71 -
>  contrib/svn-fe/svnrdump_sim.py |  68 -
>  remote-testsvn.c   | 337 
>  t/helper/test-line-buffer.c|  81 -
>  t/helper/test-svn-fe.c |  52 
>  t/t9020-remote-svn.sh  |  89 --

Doesn't t/t9010-svn-fe.sh also need to be removed?  It uses
the test-svn-fe helper which is removed.

The Fedora git-svn package has included git-remote-testsvn
for years now but no one has ever filed any bug reports
about it.  I looked at whether it should be packaged last
year.  I came to the conclusion that while it could be used
outside of the test suite it was doubtful it actually was.

-- 
Todd
~~
No one ever went broke underestimating the taste of the American
public.
-- H. L. Mencken

Re: [PATCH v5] clone: report duplicate entries on case-insensitive filesystems

2018-08-17 Thread Torsten Bögershausen

On Fri, Aug 17, 2018 at 06:16:45PM +0200, Nguyễn Thái Ngọc Duy wrote:

The whole patch looks good to me.
(I was just sending a different version, but your version is better :-)

One minor remark, should the line
warning: the following paths have collided 
start with a capital letter:
Warning: the following paths have collided 

> Paths that only differ in case work fine in a case-sensitive
> filesystems, but if those repos are cloned in a case-insensitive one,
> you'll get problems. The first thing to notice is "git status" will
> never be clean with no indication what exactly is "dirty".
> 
> This patch helps the situation a bit by pointing out the problem at
> clone time. Even though this patch talks about case sensitivity, the
> patch makes no assumption about folding rules by the filesystem. It
> simply observes that if an entry has been already checked out at clone
> time when we're about to write a new path, some folding rules are
> behind this.
> 
> In the case that we can't rely on filesystem (via inode number) to do
> this check, fall back to fspathcmp() which is not perfect but should
> not give false positives.
> 
> This patch is tested with vim-colorschemes and Sublime-Gitignore
> repositories on a JFS partition with case insensitive support on
> Linux.

Now even tested under Mac OS/HFS+

[]
>  '
>  
> +test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '

My ambition is to run the test under Windows (both CYGWIN and native) next week,
so that we can remove !MINGW and !CYGWIN

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-17 Thread Junio C Hamano

Andrei Rybak  writes:

> I think it would be a good trade-off to allow these helper functions to skip
> checking emptiness of arguments for test_cmp. Such patch will require only
> s/test_cmp/&_allow_empty/ for these helper functions and it will help catch
> cases as bogus test in t5310.
>
> I'll try something like the following on the weekend:
>
>   test_cmp() {

Style: SP before and after ().

>   if test "$1" != - && ! test -s "$1"
>   then
>   echo >&4 "error: trying to compare empty file '$1'"
>   return 1
>   fi
>   if test "$2" != - && ! test -s "$2"
>   then
>   echo >&4 "error: trying to compare empty file '$2'"
>   return 1
>   fi
>   test_cmp_allow_empty "$@"
>   }

I actually think the above gives way too confusing output, when the
actual output is empty and we are expecting some output.  I.e.

: >expect &&
git cmd >actual &&
test_cmp expect actual

The tester wants to hear from test_cmp "your 'git cmd' produced some
output when we are expecting none" as the primary message.  We are
trying to find bugs in "git" under development, and diagnosing iffy
tests is secondary.  But with your change, the first thing that is
checked is if 'expect' is an empty file and that is what we get
complaints about, without even looking at what is in 'actual'.

It's the same story, and it is even worse than the above, when we
are expecting some output but the command produced no output, i.e.

echo Everything up-to-date. >expect &&
git cmd >actual &&
test_cmp expect actual

We should get a complaint from test_cmp that 'actual' does not match
the string we were expecting (and even better, we show how they are
different by running them thru "diff -u"), not that 'actual' is an
empty file.

>   test_cmp_allow_empty() {
>   $GIT_TEST_CMP "$@"
>   }
>
> (I'm not sure about redirections in test lib functions. The two if's would
> probably be in a separate function to be re-used by test_i18ncmp.)

Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-17 Thread SZEDER Gábor

On Fri, Aug 17, 2018 at 9:27 PM Andrei Rybak  wrote:
>
> On 17/08/18 19:39, SZEDER Gábor wrote:
> >
> > See, we have quite a few tests that extract repetitive common tasks
> > into helper functions, which sometimes includes preparing the expected
> > results and running 'test_cmp', e.g. something like this
> > (oversimplified) example:
> >
> >   check_cmd () {
> > git cmd $1 >actual &&
> > echo "$2" >expect &&
> > test_cmp expect actual
> >   }
> >
> >   check_cmd --fooFOO
> >   check_cmd --no-foo ""
>
> I've only had time to look into this from t0001 up to t0008-ignores.sh, where
> test_check_ignore does this. If these helper functions need to allow comparing
> empty files -- how about adding special variation of cmp functions for cases
> like this: test_cmp_allow_empty and test_i18ncmp_allow_empty?
>
> I think it would be a good trade-off to allow these helper functions to skip
> checking emptiness of arguments for test_cmp. Such patch will require only
> s/test_cmp/&_allow_empty/ for these helper functions and it will help catch
> cases as bogus test in t5310.
>
> I'll try something like the following on the weekend:
>
> test_cmp() {
> if test "$1" != - && ! test -s "$1"
> then
> echo >&4 "error: trying to compare empty file '$1'"
> return 1
> fi
> if test "$2" != - && ! test -s "$2"
> then
> echo >&4 "error: trying to compare empty file '$2'"
> return 1
> fi

Yeah, these conditions are what I instrumented 'test_cmp' with (except
I used 'error "bug in test script ..."') to find callsites that could
be converted to 'test_must_be_empty', that's how I found the bug fixed
in this patch.  However, it resulted in a lot of errors from the cases
mentioned in my previous email.  Then I reached out to Bash and tried
this:

  test_cmp() {
 if test -n "$BASH_VERSION" &&
test "${FUNCNAME[1]}" = "test_eval_inner_" &&
test "$1" != "-" &&
test ! -s "$1"
 then
 error "bug in test script: using test_cmp to check
empty file; use test_must_be_empty instead"
 fi
 $GIT_TEST_CMP "$@"
  }

i.e. to limit the check callsites where 'test_cmp' is called directly
from within a test_expect_{success,failure} block.  This is better,
almost all errors could be converted to 'test_must_be_empty' stright
away; I have the patches almost ready.  There are, however, a few
parametric cases that choke on this: where we run 'test_cmp' in a
loop, e.g. 'cvs update (-p)' in t9400, and I think there was a case
where the 'test_expect_success' block is within a function.

> test_cmp_allow_empty "$@"
> }
>
> test_cmp_allow_empty() {
> $GIT_TEST_CMP "$@"
> }
>
> (I'm not sure about redirections in test lib functions. The two if's would
> probably be in a separate function to be re-used by test_i18ncmp.)

[PATCH 2/3] range-diff: make use of different output indicators

2018-08-17 Thread Stefan Beller

This change itself only changes the internal communication and should
have no visible effect to the user. We instruct the diff code that
produces the inner diffs to use other markers instead of the
usual markers for new, old and context lines.

Signed-off-by: Stefan Beller 
---
 range-diff.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/range-diff.c b/range-diff.c
index b6b9abac266..a906d25f139 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -38,6 +38,14 @@ static int read_patches(const char *range, struct 
string_list *list)
 
argv_array_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
"--reverse", "--date-order", "--decorate=no",
+   /*
+* Choose indicators that are not used anywhere
+* else in diffs, but still look reasonable
+* (e.g. will not be confusing when debugging)
+*/
+   "--output-indicator-new=>",
+   "--output-indicator-old=<",
+   "--output-indicator-context=#",
"--no-abbrev-commit", range,
NULL);
cp.out = -1;
@@ -108,8 +116,18 @@ static int read_patches(const char *range, struct 
string_list *list)
 * we are not interested.
 */
continue;
-   else
+   else if (line.buf[0] == '>') {
+   strbuf_addch(&buf, '+');
+   strbuf_add(&buf, line.buf + 1, line.len - 1);
+   } else if (line.buf[0] == '<') {
+   strbuf_addch(&buf, '-');
+   strbuf_add(&buf, line.buf + 1, line.len - 1);
+   } else if (line.buf[0] == '#') {
+   strbuf_addch(&buf, ' ');
+   strbuf_add(&buf, line.buf + 1, line.len - 1);
+   } else {
strbuf_addbuf(&buf, &line);
+   }
 
strbuf_addch(&buf, '\n');
util->diffsize++;
-- 
2.18.0.265.g16de1b435c9.dirty

[PATCH 1/3] diff.c: add --output-indicator-{new, old, context}

2018-08-17 Thread Stefan Beller

This will prove useful in range-diff in a later patch as we will be able to
differentiate between adding a new file (that line is starting with +++
and then the file name) and regular new lines.

It could also be useful for experimentation in new patch formats, i.e.
we could teach git to emit moved lines with lines other than +/-.

Signed-off-by: Stefan Beller 
---
 diff.c | 21 ++---
 diff.h |  5 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index c5c7739ce34..03486c35b75 100644
--- a/diff.c
+++ b/diff.c
@@ -1281,7 +1281,9 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
else if (c == '-')
set = diff_get_color_opt(o, DIFF_FILE_OLD);
}
-   emit_line_ws_markup(o, set_sign, set, reset, ' ', line, len,
+   emit_line_ws_markup(o, set_sign, set, reset,
+   
o->output_indicators[OUTPUT_INDICATOR_CONTEXT],
+   line, len,
flags & (DIFF_SYMBOL_CONTENT_WS_MASK), 0);
break;
case DIFF_SYMBOL_PLUS:
@@ -1324,7 +1326,9 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
set = diff_get_color_opt(o, DIFF_CONTEXT_BOLD);
flags &= ~DIFF_SYMBOL_CONTENT_WS_MASK;
}
-   emit_line_ws_markup(o, set_sign, set, reset, '+', line, len,
+   emit_line_ws_markup(o, set_sign, set, reset,
+   o->output_indicators[OUTPUT_INDICATOR_NEW],
+   line, len,
flags & DIFF_SYMBOL_CONTENT_WS_MASK,
flags & DIFF_SYMBOL_CONTENT_BLANK_LINE_EOF);
break;
@@ -1367,7 +1371,9 @@ static void emit_diff_symbol_from_struct(struct 
diff_options *o,
else
set = diff_get_color_opt(o, DIFF_CONTEXT_DIM);
}
-   emit_line_ws_markup(o, set_sign, set, reset, '-', line, len,
+   emit_line_ws_markup(o, set_sign, set, reset,
+   o->output_indicators[OUTPUT_INDICATOR_OLD],
+   line, len,
flags & DIFF_SYMBOL_CONTENT_WS_MASK, 0);
break;
case DIFF_SYMBOL_WORDS_PORCELAIN:
@@ -4382,6 +4388,9 @@ void diff_setup(struct diff_options *options)
 
options->file = stdout;
 
+   options->output_indicators[OUTPUT_INDICATOR_NEW] = '+';
+   options->output_indicators[OUTPUT_INDICATOR_OLD] = '-';
+   options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = ' ';
options->abbrev = DEFAULT_ABBREV;
options->line_termination = '\n';
options->break_opt = -1;
@@ -4869,6 +4878,12 @@ int diff_opt_parse(struct diff_options *options,
 options->output_format |= DIFF_FORMAT_DIFFSTAT;
} else if (!strcmp(arg, "--no-compact-summary"))
 options->flags.stat_with_summary = 0;
+   else if (skip_prefix(arg, "--output-indicator-new=", &arg))
+   options->output_indicators[OUTPUT_INDICATOR_NEW] = arg[0];
+   else if (skip_prefix(arg, "--output-indicator-old=", &arg))
+   options->output_indicators[OUTPUT_INDICATOR_OLD] = arg[0];
+   else if (skip_prefix(arg, "--output-indicator-context=", &arg))
+   options->output_indicators[OUTPUT_INDICATOR_CONTEXT] = arg[0];
 
/* renames options */
else if (starts_with(arg, "-B") ||
diff --git a/diff.h b/diff.h
index e1e54256c18..d7edc64454a 100644
--- a/diff.h
+++ b/diff.h
@@ -194,6 +194,11 @@ struct diff_options {
FILE *file;
int close_file;
 
+#define OUTPUT_INDICATOR_NEW 0
+#define OUTPUT_INDICATOR_OLD 1
+#define OUTPUT_INDICATOR_CONTEXT 2
+   char output_indicators[3];
+
struct pathspec pathspec;
pathchange_fn_t pathchange;
change_fn_t change;
-- 
2.18.0.265.g16de1b435c9.dirty

[PATCH 0/3] Better colors in range-diff

2018-08-17 Thread Stefan Beller

This improves colors of the range-diff, see last patch for details.
it is also available via

  git fetch https://github.com/stefanbeller/git sb/range-diff-better-colors

Thanks,
Stefan

Stefan Beller (3):
  diff.c: add --output-indicator-{new, old, context}
  range-diff: make use of different output indicators
  range-diff: indent special lines as context

 diff.c| 21 ++---
 diff.h|  5 +
 range-diff.c  | 22 +-
 t/t3206-range-diff.sh | 12 ++--
 4 files changed, 50 insertions(+), 10 deletions(-)

-- 
2.18.0.265.g16de1b435c9.dirty

[PATCH 3/3] range-diff: indent special lines as context

2018-08-17 Thread Stefan Beller

The range-diff coloring is a bit fuzzy when it comes to special lines of
a diff, such as indicating new and old files with +++ and ---, as it
would pickup the first character and interpret it for its coloring, which
seems annoying as in regular diffs, these lines are colored bold via
DIFF_METAINFO.

By indenting these lines by a white space, they will be treated as context
which is much more useful, an example [1] on the range diff series itself:

[...]
+ diff --git a/Documentation/git-range-diff.txt 
b/Documentation/git-range-diff.txt
+ new file mode 100644
+ --- /dev/null
+ +++ b/Documentation/git-range-diff.txt
+@@
++git-range-diff(1)
[...]
+
  diff --git a/Makefile b/Makefile
  --- a/Makefile
  +++ b/Makefile
[...]

The first lines that introduce the new file for the man page will have the
'+' sign colored and the rest of the line will be bold.

The later lines that indicate a change to the Makefile will be treated as
context both in the outer and inner diff, such that those lines stay
regular color.

[1] ./git-range-diff pr-1/dscho/branch-diff-v3...pr-1/dscho/branch-diff-v4
These tags are found at https://github.com/gitgitgadget/git

Signed-off-by: Stefan Beller 
---
 range-diff.c  |  2 ++
 t/t3206-range-diff.sh | 12 ++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index a906d25f139..3e9b9844012 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -90,6 +90,7 @@ static int read_patches(const char *range, struct string_list 
*list)
strbuf_addch(&buf, '\n');
if (!util->diff_offset)
util->diff_offset = buf.len;
+   strbuf_addch(&buf, ' ');
strbuf_addbuf(&buf, &line);
} else if (in_header) {
if (starts_with(line.buf, "Author: ")) {
@@ -126,6 +127,7 @@ static int read_patches(const char *range, struct 
string_list *list)
strbuf_addch(&buf, ' ');
strbuf_add(&buf, line.buf + 1, line.len - 1);
} else {
+   strbuf_addch(&buf, ' ');
strbuf_addbuf(&buf, &line);
}
 
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 7dc7c80a1de..c94f9bf5ee1 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -133,9 +133,9 @@ test_expect_success 'changed message' '
Z
+Also a silly comment here!
+
-   Zdiff --git a/file b/file
-   Z--- a/file
-   Z+++ b/file
+   Z diff --git a/file b/file
+   Z --- a/file
+   Z +++ b/file
3:  147e64e = 3:  b9cb956 s/11/B/
4:  a63e992 = 4:  8add5f1 s/12/B/
EOF
@@ -152,9 +152,9 @@ test_expect_success 'dual-coloring' '
: 
:+Also a silly comment here!
:+
-   : diff --git a/file b/file
-   : --- a/file
-   : +++ b/file
+   :  diff --git a/file b/file
+   :  --- a/file
+   :  +++ b/file
:3:  0559556 ! 3:  
b9cb956 s/11/B/
:@@ -10,7 +10,7 @@
:  9
-- 
2.18.0.265.g16de1b435c9.dirty

[PATCH 0/6] reuse on-disk deltas for fetches with bitmaps

2018-08-17 Thread Jeff King

This series more aggressively reuses on-disk deltas to serve fetches
when reachability bitmaps tell us a more complete picture of what the
client has. That saves server CPU and results in smaller packs. See the
final patch for numbers and more discussion.

It's a resurrection of this very old series:

  https://public-inbox.org/git/20140326072215.ga31...@sigill.intra.peff.net/

The core idea is good, but it got left as "I should dig into this more
to see if we can do even better". In fact, I _did_ do some of that
digging, as you can see in the thread, so I'm mildly embarrassed not to
have reposted it before now.

We've been using that original at GitHub since 2014, which I think helps
demonstrate the correctness of the approach (and the numbers here and in
that thread show that performance is generally a net win over the status
quo).

I's rebased on top of the current master, since the original made some
assumptions about struct object_entry that are no longer true post-v2.18
(due to the struct-shrinking exercise). So I fixed that and a few other
rough edges. But that also means you're not getting code with 4-years of
production testing behind it. :)

The other really ugly thing in the original was the way it leaked
object_entry structs (though in practice that didn't really matter since
we needed them until the end of the program anyway). This version fixes
that.

  [1/6]: t/perf: factor boilerplate out of test_perf
  [2/6]: t/perf: factor out percent calculations
  [3/6]: t/perf: add infrastructure for measuring sizes
  [4/6]: t/perf: add perf tests for fetches from a bitmapped server
  [5/6]: pack-bitmap: save "have" bitmap from walk
  [6/6]: pack-objects: reuse on-disk deltas for thin "have" objects

 builtin/pack-objects.c | 28 +++
 pack-bitmap.c  | 23 +-
 pack-bitmap.h  |  7 +++
 pack-objects.c | 19 
 pack-objects.h | 20 +++-
 t/perf/README  | 25 ++
 t/perf/aggregate.perl  | 69 ++--
 t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++
 t/perf/perf-lib.sh | 74 +++---
 9 files changed, 258 insertions(+), 52 deletions(-)
 create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh

-Peff

[PATCH 1/6] t/perf: factor boilerplate out of test_perf

2018-08-17 Thread Jeff King

About half of test_perf() is boilerplate preparing to run
_any_ test, and the other half is specifically running a
timing test. Let's split it into two functions, so that we
can reuse the boilerplate in future commits.

Signed-off-by: Jeff King 
---
Best viewed with "-w".

 t/perf/perf-lib.sh | 61 ++
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index e4c343a6b7..a54be09516 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -179,8 +179,8 @@ exit $ret' >&3 2>&4
return "$eval_ret"
 }
 
-
-test_perf () {
+test_wrapper_ () {
+   test_wrapper_func_=$1; shift
test_start_
test "$#" = 3 && { test_prereq=$1; shift; } || test_prereq=
test "$#" = 2 ||
@@ -191,35 +191,44 @@ test_perf () {
base=$(basename "$0" .sh)
echo "$test_count" >>"$perf_results_dir"/$base.subtests
echo "$1" >"$perf_results_dir"/$base.$test_count.descr
-   if test -z "$verbose"; then
-   printf "%s" "perf $test_count - $1:"
-   else
-   echo "perf $test_count - $1:"
-   fi
-   for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do
-   say >&3 "running: $2"
-   if test_run_perf_ "$2"
-   then
-   if test -z "$verbose"; then
-   printf " %s" "$i"
-   else
-   echo "* timing run 
$i/$GIT_PERF_REPEAT_COUNT:"
-   fi
+   base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" 
.sh)"."$test_count"
+   "$test_wrapper_func_" "$@"
+   fi
+
+   test_finish_
+}
+
+test_perf_ () {
+   if test -z "$verbose"; then
+   printf "%s" "perf $test_count - $1:"
+   else
+   echo "perf $test_count - $1:"
+   fi
+   for i in $(test_seq 1 $GIT_PERF_REPEAT_COUNT); do
+   say >&3 "running: $2"
+   if test_run_perf_ "$2"
+   then
+   if test -z "$verbose"; then
+   printf " %s" "$i"
else
-   test -z "$verbose" && echo
-   test_failure_ "$@"
-   break
+   echo "* timing run $i/$GIT_PERF_REPEAT_COUNT:"
fi
-   done
-   if test -z "$verbose"; then
-   echo " ok"
else
-   test_ok_ "$1"
+   test -z "$verbose" && echo
+   test_failure_ "$@"
+   break
fi
-   base="$perf_results_dir"/"$perf_results_prefix$(basename "$0" 
.sh)"."$test_count"
-   "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times
+   done
+   if test -z "$verbose"; then
+   echo " ok"
+   else
+   test_ok_ "$1"
fi
-   test_finish_
+   "$TEST_DIRECTORY"/perf/min_time.perl test_time.* >"$base".times
+}
+
+test_perf () {
+   test_wrapper_ test_perf_ "$@"
 }
 
 # We extend test_done to print timings at the end (./run disables this
-- 
2.18.0.1205.g3878b1e64a

[PATCH 2/6] t/perf: factor out percent calculations

2018-08-17 Thread Jeff King

This will let us reuse the code when we add new values to
aggregate besides times.

Signed-off-by: Jeff King 
---
 t/perf/aggregate.perl | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index bc865160e7..3181b087ab 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -19,21 +19,24 @@ sub get_times {
return ($rt, $4, $5);
 }
 
+sub relative_change {
+   my ($r, $firstr) = @_;
+   if ($firstr > 0) {
+   return sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
+   } elsif ($r == 0) {
+   return "=";
+   } else {
+   return "+inf";
+   }
+}
+
 sub format_times {
my ($r, $u, $s, $firstr) = @_;
if (!defined $r) {
return "";
}
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
-   if (defined $firstr) {
-   if ($firstr > 0) {
-   $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
-   } elsif ($r == 0) {
-   $out .= " =";
-   } else {
-   $out .= " +inf";
-   }
-   }
+   $out .= ' ' . relative_change($r, $firstr) if defined $firstr;
return $out;
 }
 
-- 
2.18.0.1205.g3878b1e64a

[PATCH 3/6] t/perf: add infrastructure for measuring sizes

2018-08-17 Thread Jeff King

The main objective of scripts in the perf framework is to
run "test_perf", which measures the time it takes to run
some operation. However, it can also be interesting to see
the change in the output size of certain operations.

This patch introduces test_size, which records a single
numeric output from the test and shows it in the aggregated
output (with pretty printing and relative size comparison).

Signed-off-by: Jeff King 
---
 t/perf/README | 25 ++
 t/perf/aggregate.perl | 48 ++-
 t/perf/perf-lib.sh| 13 
 3 files changed, 81 insertions(+), 5 deletions(-)

diff --git a/t/perf/README b/t/perf/README
index 21321a0f36..be12090c38 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -168,3 +168,28 @@ that
   While we have tried to make sure that it can cope with embedded
   whitespace and other special characters, it will not work with
   multi-line data.
+
+Rather than tracking the performance by run-time as `test_perf` does, you
+may also track output size by using `test_size`. The stdout of the
+function should be a single numeric value, which will be captured and
+shown in the aggregated output. For example:
+
+   test_perf 'time foo' '
+   ./foo >foo.out
+   '
+
+   test_size 'output size'
+   wc -c ;
return undef if not defined $line;
close $fh or die "cannot close $name: $!";
-   $line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) 
(\d+(?:\.\d+)?)$/
-   or die "bad input line: $line";
-   my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
-   return ($rt, $4, $5);
+   # times
+   if ($line =~ /^(?:(\d+):)?(\d+):(\d+(?:\.\d+)?) (\d+(?:\.\d+)?) 
(\d+(?:\.\d+)?)$/) {
+   my $rt = ((defined $1 ? $1 : 0.0)*60+$2)*60+$3;
+   return ($rt, $4, $5);
+   # size
+   } elsif ($line =~ /^\d+$/) {
+   return $&;
+   } else {
+   die "bad input line: $line";
+   }
 }
 
 sub relative_change {
@@ -32,9 +38,15 @@ sub relative_change {
 
 sub format_times {
my ($r, $u, $s, $firstr) = @_;
+   # no value means we did not finish the test
if (!defined $r) {
return "";
}
+   # a single value means we have a size, not times
+   if (!defined $u) {
+   return format_size($r, $firstr);
+   }
+   # otherwise, we have real/user/system times
my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
$out .= ' ' . relative_change($r, $firstr) if defined $firstr;
return $out;
@@ -54,6 +66,25 @@ sub usage {
exit(1);
 }
 
+sub human_size {
+   my $n = shift;
+   my @units = ('', qw(K M G));
+   while ($n > 900 && @units > 1) {
+   $n /= 1000;
+   shift @units;
+   }
+   return $n unless length $units[0];
+   return sprintf '%.1f%s', $n, $units[0];
+}
+
+sub format_size {
+   my ($size, $first) = @_;
+   # match the width of a time: 0.00(0.00+0.00)
+   my $out = sprintf '%15s', human_size($size);
+   $out .= ' ' . relative_change($size, $first) if defined $first;
+   return $out;
+}
+
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests,
 $codespeed, $sortby, $subsection, $reponame);
 
@@ -184,7 +215,14 @@ sub print_default_results {
my $firstr;
for my $i (0..$#dirs) {
my $d = $dirs[$i];
-   $times{$prefixes{$d}.$t} = 
[get_times("$resultsdir/$prefixes{$d}$t.times")];
+   my $base = "$resultsdir/$prefixes{$d}$t";
+   $times{$prefixes{$d}.$t} = [];
+   foreach my $type (qw(times size)) {
+   if (-e "$base.$type") {
+   $times{$prefixes{$d}.$t} = 
[get_times("$base.$type")];
+   last;
+   }
+   }
my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
my $w = length format_times($r,$u,$s,$firstr);
$colwidth[$i] = $w if $w > $colwidth[$i];
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index a54be09516..11d1922cf5 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -231,6 +231,19 @@ test_perf () {
test_wrapper_ test_perf_ "$@"
 }
 
+test_size_ () {
+   say >&3 "running: $2"
+   if test_eval_ "$2" 3>"$base".size; then
+   test_ok_ "$1"
+   else
+   test_failure_ "$@"
+   fi
+}
+
+test_size () {
+   test_wrapper_ test_size_ "$@"
+}
+
 # We extend test_done to print timings at the end (./run disables this
 # and does it after running everything)
 test_at_end_hook_ () {
-- 
2.18.0.1205.g3878b1e64a

[PATCH 4/6] t/perf: add perf tests for fetches from a bitmapped server

2018-08-17 Thread Jeff King

A server with bitmapped packs can serve a clone very
quickly. However, fetches are not necessarily made any
faster, because we spend a lot less time in object traversal
(which is what bitmaps help with) and more time finding
deltas (because we may have to throw out on-disk deltas if
the client does not have the base).

As a first step to making this faster, this patch introduces
a new perf script to measure fetches into a repo of various
ages from a fully-bitmapped server.

We separately measure the work done by the server (in
pack-objects) and that done by the client (in index-pack).
Furthermore, we measure the size of the resulting pack.

Breaking it down like this (instead of just doing a regular
"git fetch") lets us see how much each side benefits from
any changes. And since we know the pack size, if we estimate
the network speed, then one could calculate a complete
wall-clock time for the operation (though the script does
not do this automatically).

Signed-off-by: Jeff King 
---
 t/perf/p5311-pack-bitmaps-fetch.sh | 45 ++
 1 file changed, 45 insertions(+)
 create mode 100755 t/perf/p5311-pack-bitmaps-fetch.sh

diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh 
b/t/perf/p5311-pack-bitmaps-fetch.sh
new file mode 100755
index 00..b04575951f
--- /dev/null
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='performance of fetches from bitmapped packs'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'create bitmapped server repo' '
+   git config pack.writebitmaps true &&
+   git config pack.writebitmaphashcache true &&
+   git repack -ad
+'
+
+# simulate a fetch from a repository that last fetched N days ago, for
+# various values of N. We do so by following the first-parent chain,
+# and assume the first entry in the chain that is N days older than the current
+# HEAD is where the HEAD would have been then.
+for days in 1 2 4 8 16 32 64 128; do
+   title=$(printf '%10s' "($days days)")
+   test_expect_success "setup revs from $days days ago" '
+   now=$(git log -1 --format=%ct HEAD) &&
+   then=$(($now - ($days * 86400))) &&
+   tip=$(git rev-list -1 --first-parent --until=$then HEAD) &&
+   {
+   echo HEAD &&
+   echo ^$tip
+   } >revs
+   '
+
+   test_perf "server $title" '
+   git pack-objects --stdout --revs \
+--thin --delta-base-offset \
+tmp.pack
+   '
+
+   test_size "size   $title" '
+   wc -c

[PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Jeff King

When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

A few notes on the accessor interface:

 - the bitmap code calls these "haves" because it grew out
   of the want/have negotiation for fetches. But really,
   these are simply the objects that would be flagged
   UNINTERESTING in a regular traversal. Let's use that
   more universal nomenclature for the external module
   interface. We may want to change the internal naming
   inside the bitmap code, but that's outside the scope of
   this patch.

 - it still uses a bare "sha1" rather than "oid". That's
   true of all of the bitmap code. And in this particular
   instance, our caller in pack-objects is dealing with the
   bare sha1 that comes from a packed REF_DELTA (we're
   pointing directly to the mmap'd pack on disk). That's
   something we'll have to deal with as we transition to a
   new hash, but we can wait and see how the caller ends up
   being fixed and adjust this interface accordingly.

Signed-off-by: Jeff King 
---
Funny story: the earlier version of this series called it bitmap_have().
That caused a bug later when somebody tried to build on it, thinking it
was "does the bitmap have this object in the result". Oops. Hence the
more descriptive name.

 pack-bitmap.c | 23 ++-
 pack-bitmap.h |  7 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f0a1937a1c..76fd93a3de 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -86,6 +86,9 @@ struct bitmap_index {
/* Bitmap result of the last performed walk */
struct bitmap *result;
 
+   /* "have" bitmap from the last performed walk */
+   struct bitmap *haves;
+
/* Version of the bitmap index */
unsigned int version;
 
@@ -759,8 +762,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info 
*revs)
bitmap_and_not(wants_bitmap, haves_bitmap);
 
bitmap_git->result = wants_bitmap;
+   bitmap_git->haves = haves_bitmap;
 
-   bitmap_free(haves_bitmap);
return bitmap_git;
 
 cleanup:
@@ -1114,5 +1117,23 @@ void free_bitmap_index(struct bitmap_index *b)
free(b->ext_index.objects);
free(b->ext_index.hashes);
bitmap_free(b->result);
+   bitmap_free(b->haves);
free(b);
 }
+
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
+const unsigned char *sha1)
+{
+   int pos;
+
+   if (!bitmap_git)
+   return 0; /* no bitmap loaded */
+   if (!bitmap_git->haves)
+   return 0; /* walk had no "haves" */
+
+   pos = bitmap_position_packfile(bitmap_git, sha1);
+   if (pos < 0)
+   return 0;
+
+   return bitmap_get(bitmap_git->haves, pos);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 4555907dee..02a60ce670 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
packing_data *mapping
 khash_sha1 *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
 
+/*
+ * After a traversal has been performed on the bitmap_index, this can be
+ * queried to see if a particular object was reachable from any of the
+ * objects flagged as UNINTERESTING.
+ */
+int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
char *sha1);
+
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
-- 
2.18.0.1205.g3878b1e64a

[PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-17 Thread Jeff King

When we serve a fetch, we pass the "wants" and "haves" from
the fetch negotiation to pack-objects. That tells us not
only which objects we need to send, but we also use the
boundary commits as "preferred bases": their trees and blobs
are candidates for delta bases, both for reusing on-disk
deltas and for finding new ones.

However, this misses some opportunities. Modulo some special
cases like shallow or partial clones, we know that every
object reachable from the "haves" could be a preferred base.
We don't use them all for two reasons:

  1. It's expensive to traverse the whole history and
 enumerate all of the objects the other side has.

  2. The delta search is expensive, so we want to keep the
 number of candidate bases sane. The boundary commits
 are the most likely to work.

When we have reachability bitmaps, though, reason 1 no
longer applies. We can efficiently compute the set of
reachable objects on the other side (and in fact already did
so as part of the bitmap set-difference to get the list of
interesting objects). And using this set conveniently
covers the shallow and partial cases, since we have to
disable the use of bitmaps for those anyway.

The second reason argues against using these bases in the
search for new deltas. But there's one case where we can use
this information for free: when we have an existing on-disk
delta that we're considering reusing, we can do so if we
know the other side has the base object. This in fact saves
time during the delta search, because it's one less delta we
have to compute.

And that's exactly what this patch does: when we're
considering whether to reuse an on-disk delta, if bitmaps
tell us the other side has the object (and we're making a
thin-pack), then we reuse it.

Here are the results on p5311 using linux.git, which
simulates a client fetching after `N` days since their last
fetch:

Test origin  HEAD
--
5311.3: server   (1 days)0.27(0.27+0.04) 0.12(0.09+0.03) -55.6%
5311.4: size (1 days)   0.9M  237.0K -73.7%
5311.5: client   (1 days)0.04(0.05+0.00) 0.10(0.10+0.00) +150.0%
5311.7: server   (2 days)0.34(0.42+0.04) 0.13(0.10+0.03) -61.8%
5311.8: size (2 days)   1.5M  347.7K -76.5%
5311.9: client   (2 days)0.07(0.08+0.00) 0.16(0.15+0.01) +128.6%
5311.11: server   (4 days)   0.56(0.77+0.08) 0.13(0.10+0.02) -76.8%
5311.12: size (4 days)  2.8M  566.6K -79.8%
5311.13: client   (4 days)   0.13(0.15+0.00) 0.34(0.31+0.02) +161.5%
5311.15: server   (8 days)   0.97(1.39+0.11) 0.30(0.25+0.05) -69.1%
5311.16: size (8 days)  4.3M1.0M -76.0%
5311.17: client   (8 days)   0.20(0.22+0.01) 0.53(0.52+0.01) +165.0%
5311.19: server  (16 days)   1.52(2.51+0.12) 0.30(0.26+0.03) -80.3%
5311.20: size(16 days)  8.0M2.0M -74.5%
5311.21: client  (16 days)   0.40(0.47+0.03) 1.01(0.98+0.04) +152.5%
5311.23: server  (32 days)   2.40(4.44+0.20) 0.31(0.26+0.04) -87.1%
5311.24: size(32 days) 14.1M4.1M -70.9%
5311.25: client  (32 days)   0.70(0.90+0.03) 1.81(1.75+0.06) +158.6%
5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
5311.28: size(64 days) 89.4M   47.4M -47.0%
5311.29: client  (64 days)   5.71(9.31+0.27) 15.20(15.20+0.32) +166.2%
5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
5311.32: size   (128 days)134.8M  100.4M -25.5%
5311.33: client (128 days)   9.42(16.86+0.49)25.34(25.80+0.46) +169.0%

In all cases we save CPU time on the server (sometimes
significant) and the resulting pack is smaller. We do spend
more CPU time on the client side, because it has to
reconstruct more deltas. But that's the right tradeoff to
make, since clients tend to outnumber servers. It just means
the thin pack mechanism is doing its job.

>From the user's perspective, the end-to-end time of the
operation will generally be faster. E.g., in the 128-day
case, we saved 15s on the server at a cost of 16s on the
client. Since the resulting pack is 34MB smaller, this is a
net win if the network speed is less than 270Mbit/s. And
that's actually the worst case. The 64-day case saves just
over 11s at a cost of just under 11s. So it's a slight win
at any network speed, and the 40MB saved is pure bonus. That
trend continues for the smaller fetches.

The implementation itself is mostly straightforward, with
the new logic going into check_object(). But there are two
tricky bits.

The first is that check_object() needs access to the
relevant information (the thin flag and bitmap result). We
can do this by pushing these into program-lifetime globals.

The second is that the rest of the code assumes that any
reused delta will point to another "struct obj

Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-17 Thread Stefan Beller

On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore  wrote:
>
> Teach list-objects the "tree:0" filter which allows for filtering
> out all tree and blob objects (unless other objects are explicitly
> specified by the user). The purpose of this patch is to allow smaller
> partial clones.
>
> The name of this filter - tree:0 - does not explicitly specify that
> it also filters out all blobs, but this should not cause much confusion
> because blobs are not at all useful without the trees that refer to
> them.
>
> I also consider only:commits as a name, but this is inaccurate because
> it suggests that annotated tags are omitted, but actually they are
> included.

Speaking of tag objects, it is possible to tag anything, including blobs.
Would a blob that is tagged (hence reachable without a tree) be not
filtered by tree:0 (or in the future any deeper depth) ?

I found this series a good read, despite my unfamiliarity of the
partial cloning.

One situation where I scratched my head for a second were previous patches
that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
would be an equally good choice (I am more used to the latter than the former)

Thanks,
Stefan

Re: [PATCH 0/3] Better colors in range-diff

2018-08-17 Thread Junio C Hamano

Stefan Beller  writes:

> This improves colors of the range-diff, see last patch for details.

How does this relate to your other "color with range-diff" topic
that is still in flight?  This supersedes it?  Builds on it?
Something else?

Thanks.

git-bug: Distributed bug tracker embedded in git

2018-08-17 Thread Michael Muré

Hi everyone,

I released today git-bug, a distributed bug tracker that embeds in
git. It use git's internal storage to store bugs information in a way
that can be merged without conflict. You can push/pull to the normal
git remote you are already using to interact with other people. Normal
code and bugs are completely separated and no files are added in the
regular branches.

Someone suggested in the Hacker News thread [0] to post it here as well.

The project is here [1].

It's a all-in-one binary that is picked up by git as a porcelain
command. It features a set of CLI command for simple interaction, an
interactive terminal UI and a rich web UI.

For more information about the internal design, please read this
document [2]. In short, bugs are stored as a series of edit operations
stored in git blobs and assembled in a linear chain of commits. This
allow to have conflict-free merge and to not pollute the regular
branches with bug data. Media embedding is also possible but not yet
finished.

I'd love to have some feedback from you. Contribution are also very
much welcomed.

Best regards,

[0]: https://news.ycombinator.com/item?id=17782121
[1]: https://github.com/MichaelMure/git-bug
[2]: https://github.com/MichaelMure/git-bug/blob/master/doc/model.md

-- 
Michael Muré

Re: [PATCH 0/3] Better colors in range-diff

2018-08-17 Thread Stefan Beller

On Fri, Aug 17, 2018 at 3:04 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This improves colors of the range-diff, see last patch for details.
>
> How does this relate to your other "color with range-diff" topic
> that is still in flight?  This supersedes it?  Builds on it?
> Something else?

It builds on top.

Sorry about missing the obvious,
Stefan

Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-17 Thread Matthew DeVore

On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller  wrote:
>
> On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore  wrote:
> >
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
>
> Speaking of tag objects, it is possible to tag anything, including blobs.
> Would a blob that is tagged (hence reachable without a tree) be not
> filtered by tree:0 (or in the future any deeper depth) ?
I think so. If I try to fetch a tagged tree or blob, it should fetch
that object itself, since I'm referring to it explicitly in the git
pack-objects arguments (I mention fetch since git rev-list apparently
doesn't support specifying non-commits on the command line). This is
similar to how I can fetch a commit that would otherwise be filtered
*if* I specify it explicitly (rather than a child commit).

If you're fetching a tagged tree, then for depth=0, it will fetch the
given tree only, and not fetch any referents of an explicitly-given
tree. For depth=1, it will fetch all direct referents.

If you're fetching a commit, then for depth=0, you will not get any
tree objects, and for depth=1, you'll get only the root tree object
and none of its referrents. So the commit itself is like a "layer" in
the depth count.

>
> I found this series a good read, despite my unfamiliarity of the
> partial cloning.
>
> One situation where I scratched my head for a second were previous patches
> that  use "test_line_count = 0 rev_list_err" whereas using test_must_be_empty
> would be an equally good choice (I am more used to the latter than the former)

Done. Here is an interdiff (sorry, the tab characters are not
maintained by my mail client):

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a1b93c72c..7e2f7ff26 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -200,14 +200,14 @@ test_expect_success 'missing tree objects with
--missing=allow-promisor and --ex
  git -C repo config extensions.partialclone "arbitrary string" &&

  git -C repo rev-list --missing=allow-promisor --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 3 commits, 3 blobs, and 1 tree
  test_line_count = 7 objs &&

  # Do the same for --exclude-promisor-objects, but with all trees gone.
  promise_and_delete $(git -C repo rev-parse baz^{tree}) &&
  git -C repo rev-list --exclude-promisor-objects --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 3 commits, no blobs or trees
  test_line_count = 3 objs
 '
@@ -226,7 +226,7 @@ test_expect_success 'missing non-root tree object
and rev-list' '
  git -C repo config extensions.partialclone "arbitrary string" &&

  git -C repo rev-list --missing=allow-any --objects HEAD >objs
2>rev_list_err &&
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&
  # 1 commit and 1 tree
  test_line_count = 2 objs
 '
diff --git a/t/t6112-rev-list-filters-objects.sh
b/t/t6112-rev-list-filters-objects.sh
index 30bf1c73e..27040d73a 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -206,11 +206,11 @@ test_expect_success 'rev-list W/ --missing=print
and --missing=allow-any for tre
  test_cmp expected missing_objs &&

  # do not complain when a missing tree cannot be parsed
- test_line_count = 0 rev_list_err &&
+ test_must_be_empty rev_list_err &&

  git -C r3 rev-list --missing=allow-any --objects HEAD >objs 2>rev_list_err &&
  ! grep $TREE objs &&
- test_line_count = 0 rev_list_err
+ test_must_be_empty rev_list_err
 '

 # Test tree:0 filter.

>
> Thanks,
> Stefan

Re: [PATCH v6 6/6] list-objects-filter: implement filter tree:0

2018-08-17 Thread Stefan Beller

On Fri, Aug 17, 2018 at 3:20 PM Matthew DeVore  wrote:
>
> On Fri, Aug 17, 2018 at 2:42 PM Stefan Beller  wrote:
> >
> > On Wed, Aug 15, 2018 at 4:23 PM Matthew DeVore  wrote:
> > >
> > > Teach list-objects the "tree:0" filter which allows for filtering
> > > out all tree and blob objects (unless other objects are explicitly
> > > specified by the user). The purpose of this patch is to allow smaller
> > > partial clones.
> > >
> > > The name of this filter - tree:0 - does not explicitly specify that
> > > it also filters out all blobs, but this should not cause much confusion
> > > because blobs are not at all useful without the trees that refer to
> > > them.
> > >
> > > I also consider only:commits as a name, but this is inaccurate because
> > > it suggests that annotated tags are omitted, but actually they are
> > > included.
> >
> > Speaking of tag objects, it is possible to tag anything, including blobs.
> > Would a blob that is tagged (hence reachable without a tree) be not
> > filtered by tree:0 (or in the future any deeper depth) ?
> I think so. If I try to fetch a tagged tree or blob, it should fetch
> that object itself, since I'm referring to it explicitly in the git
> pack-objects arguments (I mention fetch since git rev-list apparently
> doesn't support specifying non-commits on the command line). This is
> similar to how I can fetch a commit that would otherwise be filtered
> *if* I specify it explicitly (rather than a child commit).
>
> If you're fetching a tagged tree, then for depth=0, it will fetch the
> given tree only, and not fetch any referents of an explicitly-given
> tree. For depth=1, it will fetch all direct referents.
>
> If you're fetching a commit, then for depth=0, you will not get any
> tree objects, and for depth=1, you'll get only the root tree object
> and none of its referrents. So the commit itself is like a "layer" in
> the depth count.

That seems smart. Thanks!

>
> >
> > I found this series a good read, despite my unfamiliarity of the
> > partial cloning.
> >
> > One situation where I scratched my head for a second were previous patches
> > that  use "test_line_count = 0 rev_list_err" whereas using 
> > test_must_be_empty
> > would be an equally good choice (I am more used to the latter than the 
> > former)
>
> Done. Here is an interdiff (sorry, the tab characters are not
> maintained by my mail client):

heh. Thanks for switching the style; I should have emphasized that
(after reflection) I found them equally good, I am used to one
over the other more.

So if that is the only issue brought up, I would not even ask for a resend.

Thanks,
Stefan

Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Stefan Beller

> diff --git a/pack-bitmap.h b/pack-bitmap.h
> index 4555907dee..02a60ce670 100644
> --- a/pack-bitmap.h
> +++ b/pack-bitmap.h
> @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct 
> packing_data *mapping
>  khash_sha1 *reused_bitmaps, int show_progress);
>  void free_bitmap_index(struct bitmap_index *);
>
> +/*
> + * After a traversal has been performed on the bitmap_index, this can be
> + * queried to see if a particular object was reachable from any of the
> + * objects flagged as UNINTERESTING.

If the traversal has not been performed, we pretend the
object was not reachable?

Is this a good API design, as it can be used when you do not
have done all preparations? similarly to prepare_bitmap_walk
we could have

if (!bitmap_git->result)
BUG("failed to perform bitmap walk before querying");

> +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
> char *sha1);

You seem to have rebased it to master resolving conflicts only. ;-)
Do we want to talk about object ids here instead?

(This is what I get to think about when reviewing this series
"bottom up". I use "git log -w -p master..HEAD" after applying
the patches, probably I should also use --reverse, such that I
get to see the commit message before the code for each commit
and yet only need to scroll in one direction.)

What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-17 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

Quite a many topics have graduated to 'master', and also a handful
of topics have entered 'next'.  I am planning to tag -rc0 over the
weekend, and some topics that are in 'next' and marked for 'master'
in this issue of "What's cooking" report may be reclassified to cook
in 'next' during the pre-release period when that happens.

Usually, I refrain from merging larger topics in 'next' down to
'master' when we get close to -rc0, but I am wondering if it is
better to merge all of them to 'master', even the ones on the larger
and possibly undercooked side, expecting that we collectively spend
effort on hunting and fixing bugs in them during the pre-release
freeze period.  If we were to go that route, I'd want everybody's
buy-in and I'll promise to ignore any shiny new toys that appear on
list that are not regression fixes to topics merged to 'master'
since the end of the previous cycle to make sure people are not
distracted.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/fetch-nego (2018-08-01) 3 commits
  (merged to 'next' on 2018-08-08 at 87662bb344)
 + fetch doc: cross-link two new negotiation options
 + negotiator: unknown fetch.negotiationAlgorithm should error out
 + Merge branch 'jt/fetch-nego-tip' into ab/fetch-nego

 Update to a few other topics around 'git fetch'.


* ab/fsck-transfer-updates (2018-07-27) 10 commits
  (merged to 'next' on 2018-08-08 at d92085269f)
 + fsck: test and document unknown fsck. values
 + fsck: add stress tests for fsck.skipList
 + fsck: test & document {fetch,receive}.fsck.* config fallback
 + fetch: implement fetch.fsck.*
 + transfer.fsckObjects tests: untangle confusing setup
 + config doc: elaborate on fetch.fsckObjects security
 + config doc: elaborate on what transfer.fsckObjects does
 + config doc: unify the description of fsck.* and receive.fsck.*
 + config doc: don't describe *.fetchObjects twice
 + receive.fsck. tests: remove dead code

 The test performed at the receiving end of "git push" to prevent
 bad objects from entering repository can be customized via
 receive.fsck.* configuration variables; we now have gained a
 counterpart to do the same on the "git fetch" side, with
 fetch.fsck.* configuration variables.


* ab/sha1dc (2018-08-02) 1 commit
  (merged to 'next' on 2018-08-08 at 920c190941)
 + sha1dc: update from upstream

 AIX portability update for the SHA1DC hash, imported from upstream.


* ab/test-must-be-empty (2018-07-30) 1 commit
  (merged to 'next' on 2018-08-08 at 06599ebd1f)
 + tests: make use of the test_must_be_empty function

 Test updates.


* ar/t4150-am-scissors-test-fix (2018-08-06) 1 commit
  (merged to 'next' on 2018-08-08 at e639183205)
 + t4150: fix broken test for am --scissors

 Test fix.


* en/abort-df-conflict-fixes (2018-07-31) 2 commits
  (merged to 'next' on 2018-08-08 at a19cad0bb7)
 + read-cache: fix directory/file conflict handling in read_index_unmerged()
 + t1015: demonstrate directory/file conflict recovery failures

 "git merge --abort" etc. did not clean things up properly when
 there were conflicted entries in the index in certain order that
 are involved in D/F conflicts.  This has been corrected.


* en/t3031-title-fix (2018-08-06) 1 commit
  (merged to 'next' on 2018-08-08 at 3913b03884)
 + t3031: update test description to mention desired behavior

 Test fix.


* es/rebase-i-author-script-fix (2018-07-31) 4 commits
  (merged to 'next' on 2018-08-08 at 6b34261b72)
 + sequencer: don't die() on bogus user-edited timestamp
 + sequencer: fix "rebase -i --root" corrupting author header timestamp
 + sequencer: fix "rebase -i --root" corrupting author header timezone
 + sequencer: fix "rebase -i --root" corrupting author header
 (this branch is used by pw/rebase-i-author-script-fix.)

 The "author-script" file "git rebase -i" creates got broken when
 we started to move the command away from shell script, which is
 getting fixed now.


* es/want-color-fd-defensive (2018-08-03) 1 commit
  (merged to 'next' on 2018-08-08 at a11d90d26f)
 + color: protect against out-of-bounds reads and writes

 Futureproofing a helper function that can easily be misused.


* hn/config-in-code-comment (2018-08-06) 1 commit
  (merged to 'next' on 2018-08-08 at 1fae946a0f)
 + config: document git config getter return value

 Header update.


* jk/diff-rendered-docs (2018-08-06) 1 commit
  (merged to 'next' on 2018-08-08 at fe6e1b4dbe)
 + add a script to diff rendered documentation

 The end result of documentation update has been made to be
 inspected more easily to help developers.


* jk

Re: [PATCH 5/6] pack-bitmap: save "have" bitmap from walk

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 03:39:29PM -0700, Stefan Beller wrote:

> > diff --git a/pack-bitmap.h b/pack-bitmap.h
> > index 4555907dee..02a60ce670 100644
> > --- a/pack-bitmap.h
> > +++ b/pack-bitmap.h
> > @@ -50,6 +50,13 @@ int rebuild_existing_bitmaps(struct bitmap_index *, 
> > struct packing_data *mapping
> >  khash_sha1 *reused_bitmaps, int show_progress);
> >  void free_bitmap_index(struct bitmap_index *);
> >
> > +/*
> > + * After a traversal has been performed on the bitmap_index, this can be
> > + * queried to see if a particular object was reachable from any of the
> > + * objects flagged as UNINTERESTING.
> 
> If the traversal has not been performed, we pretend the
> object was not reachable?

If the traversal hasn't been performed, the results are not defined
(though I suspect yeah, it happens to say "no").

> Is this a good API design, as it can be used when you do not
> have done all preparations? similarly to prepare_bitmap_walk
> we could have
> 
> if (!bitmap_git->result)
> BUG("failed to perform bitmap walk before querying");

That seems like a reasonable precaution.

> > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *, const unsigned 
> > char *sha1);
> 
> You seem to have rebased it to master resolving conflicts only. ;-)
> Do we want to talk about object ids here instead?

See the discussion in the commit message.

-Peff

Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-17 Thread Stefan Beller

On Fri, Aug 17, 2018 at 2:06 PM Jeff King  wrote:
>
> When we serve a fetch, we pass the "wants" and "haves" from
> the fetch negotiation to pack-objects. That tells us not
> only which objects we need to send, but we also use the
> boundary commits as "preferred bases": their trees and blobs
> are candidates for delta bases, both for reusing on-disk
> deltas and for finding new ones.
>
> However, this misses some opportunities. Modulo some special
> cases like shallow or partial clones, we know that every
> object reachable from the "haves" could be a preferred base.
> We don't use them all for two reasons:

s/all/at all/ ?

> The first is that check_object() needs access to the
> relevant information (the thin flag and bitmap result). We
> can do this by pushing these into program-lifetime globals.

I discussed internally if extending the fetch protocol to include
submodule packs would be a good idea, as then you can get all
the superproject+submodule updates via one connection. This
gives some benefits, such as a more consistent view from the
superproject as well as already knowing the have/wants for
the submodule.

With this background story, moving things into globals
makes me sad, but I guess we can flip this decision once
we actually move towards "submodule packs in the
main connection".

>
> The second is that the rest of the code assumes that any
> reused delta will point to another "struct object_entry" as
> its base. But by definition, we don't have such an entry!

I got lost here by the definition (which def?).

  The delta that we look up from the bitmap, doesn't may
  not be in the pack, but it could be based off of an object
  the client already has in its object store and for that
  there is no struct object_entry in memory.

Is that correct?

> So taking all of those options into account, what I ended up
> with is a separate list of "external bases" that are not
> part of the main packing list. Each delta entry that points
> to an external base has a single-bit flag to do so; we have a
> little breathing room in the bitfield section of
> object_entry.
>
> This lets us limit the change primarily to the oe_delta()
> and oe_set_delta_ext() functions. And as a bonus, most of
> the rest of the code does not consider these dummy entries
> at all, saving both runtime CPU and code complexity.

Thanks,
Stefan

Re: git-bug: Distributed bug tracker embedded in git

2018-08-17 Thread Tacitus Aedifex

I really like this idea. I've often wanted an integrated bug database like 
this. My solution has always been to have a subrepo storing bug reports and 
coments in .txt files and then using bash porcelain scripts to make a git-like 
interface. I think I like this better. My only nit is Go. That makes me sad.  
Someone should re-implement this in C or Rust.


//t??

Re: [PATCH 6/6] pack-objects: reuse on-disk deltas for thin "have" objects

2018-08-17 Thread Jeff King

On Fri, Aug 17, 2018 at 03:57:18PM -0700, Stefan Beller wrote:

> On Fri, Aug 17, 2018 at 2:06 PM Jeff King  wrote:
> >
> > When we serve a fetch, we pass the "wants" and "haves" from
> > the fetch negotiation to pack-objects. That tells us not
> > only which objects we need to send, but we also use the
> > boundary commits as "preferred bases": their trees and blobs
> > are candidates for delta bases, both for reusing on-disk
> > deltas and for finding new ones.
> >
> > However, this misses some opportunities. Modulo some special
> > cases like shallow or partial clones, we know that every
> > object reachable from the "haves" could be a preferred base.
> > We don't use them all for two reasons:
> 
> s/all/at all/ ?

No, I meant "we don't use all of them". As in the Pokemon "gotta catch
'em all" slogan. ;)

Probably writing out "all of them" is better.

> > The first is that check_object() needs access to the
> > relevant information (the thin flag and bitmap result). We
> > can do this by pushing these into program-lifetime globals.
> 
> I discussed internally if extending the fetch protocol to include
> submodule packs would be a good idea, as then you can get all
> the superproject+submodule updates via one connection. This
> gives some benefits, such as a more consistent view from the
> superproject as well as already knowing the have/wants for
> the submodule.
> 
> With this background story, moving things into globals
> makes me sad, but I guess we can flip this decision once
> we actually move towards "submodule packs in the
> main connection".

I don't think it significantly changes the existing code, which is
already relying on a ton of globals (most notably to_pack). The first
step in doing multiple packs in the same process is going to be to shove
all of that into a "struct pack_objects_context" or similar, and these
can just follow the rest.

> >
> > The second is that the rest of the code assumes that any
> > reused delta will point to another "struct object_entry" as
> > its base. But by definition, we don't have such an entry!
> 
> I got lost here by the definition (which def?).
> 
>   The delta that we look up from the bitmap, doesn't may
>   not be in the pack, but it could be based off of an object
>   the client already has in its object store and for that
>   there is no struct object_entry in memory.
> 
> Is that correct?

Right, we are interested in objects that we _couldn't_ find a struct
for.  I agree this could be more clear.

-Peff

[PATCH] t2024: mark test using "checkout -p" with PERL prerequisite

2018-08-17 Thread CB Bailey

checkout with the -p switch uses the "add interactive" framework which
is written in Perl. Add a PERL prerequisite to skip this test when built
with NO_PERL.

Signed-off-by: CB Bailey 
---
 t/t2024-checkout-dwim.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
index f79dfbbdd6..0e512c3066 100755
--- a/t/t2024-checkout-dwim.sh
+++ b/t/t2024-checkout-dwim.sh
@@ -75,7 +75,7 @@ test_expect_success 'checkout of branch from multiple remotes 
fails #1' '
test_branch master
 '
 
-test_expect_success 'checkout of branch from multiple remotes fails with 
advice' '
+test_expect_success PERL 'checkout of branch from multiple remotes fails with 
advice' '
git checkout -B master &&
test_might_fail git branch -D foo &&
test_must_fail git checkout foo 2>stderr &&
-- 
2.14.3 (Apple Git-98)

Re: [RFC/PATCH] drop vcs-svn experiment

2018-08-17 Thread Jonathan Nieder

Hi,

Jeff King wrote:

> The code in vcs-svn was started in 2010 as an attempt to
> build a remote-helper for interacting with svn repositories
> (as opposed to git-svn). However, we never got as far as
> shipping a mature remote helper, and the last substantive
> commit was e99d012a6bc in 2012.

I do use svn-fe occasionally, and have done so in the past few years.
That said, it's probably not worth keeping this in tree just for me.

> We do have a git-remote-testsvn, and it is even installed as
> part of "make install".

At a minimum, we should stop doing that.

[...]
> We also ship contrib/svn-fe, which builds on the vcs-svn
> work. However, it does not seem to build out of the box for
> me, as the link step misses some required libraries for
> using libgit.a.

What libraries do you mean?  It builds and runs fine for me with

 $ git diff
 diff --git i/contrib/svn-fe/Makefile w/contrib/svn-fe/Makefile
 index e8651aaf4b5..bd709f8d83b 100644
 --- i/contrib/svn-fe/Makefile
 +++ w/contrib/svn-fe/Makefile
 @@ -4,7 +4,7 @@ CC = cc
  RM = rm -f
  MV = mv
 
 -CFLAGS = -g -O2 -Wall
 +CFLAGS = -g -O2 -Wall -pthread
  LDFLAGS =
  EXTLIBS = -lz

which appears to be platform related, not due to some internal change
in Git.

[...]
> Of course, I could be completely wrong about people using this. Maybe
> svn-fe builds are just completely broken on my system, and maybe people
> really do use testsvn::. But if so, they certainly aren't talking about
> it on the mailing list. :)

My take:

 - svn-fe works fine and has been useful to me, though its Makefile
   could likely be simplified and made more user-friendly

 - I've benefited from the test coverage of having this in-tree

 - testsvn:: is a demo and at a minimum we ought not to install it
   with "make install"

 - keeping this in-tree for the benefit of just one user is excessive,
   so removing it is probably the right thing

 - it would be nice if the commit removing this code from Git includes
   a note to help people find its new home

Would you mind holding off until I'm able to arrange that last bit?

Thanks,
Jonathan

Re: git-bug: Distributed bug tracker embedded in git

2018-08-17 Thread Jonathan Nieder

Hi,

Michael Muré wrote:

> I released today git-bug, a distributed bug tracker that embeds in
> git. It use git's internal storage to store bugs information in a way
> that can be merged without conflict. You can push/pull to the normal
> git remote you are already using to interact with other people. Normal
> code and bugs are completely separated and no files are added in the
> regular branches.

I am a bit unhappy about the namespace grab.  Not for trademark
reasons: the Git trademark rules are pretty clear about this kind of
usage being okay.  Instead, the unhappiness comes because a future Git
command like "git bug" to produce a bug report with appropriate
diagnostics for a bug in Git seems like a likely and useful thing to
get added to Git some day.  And now the name's taken.

Is it too late to ask if it's possible to come up with a less generic
name?

Separately from that, I'm happy to see progress being made in the
distributed bug tracker world; thanks for that!

Thanks,
Jonathan

Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-17 Thread Jonathan Nieder

(-cc: my @google.com email)
Hi,

Junio C Hamano wrote:

> Subject: sideband: do not read beyond the end of input
>
> The caller of maybe_colorize_sideband() gives a counted buffer
> , but the callee checked *src as if it were a NUL terminated
> buffer.  If src[] had all isspace() bytes in it, we would have made
> n negative, and then (1) called number of strncasecmp() to see if
> the remaining bytes in src[] matched keywords, reading beyond the
> end of the array, and/or (2) called strbuf_add() with negative
> count, most likely triggering the "you want to use way too much
> memory" error due to unsigned integer overflow.
>
> Signed-off-by: Junio C Hamano 
> ---
>  sideband.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

This indeed avoids the "you want to use way too much memory" error
when I apply it.

> --- a/sideband.c
> +++ b/sideband.c
> @@ -75,7 +75,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)

Not about this patch: should the 'n' parameter be a size_t instead of
an int?  It doesn't matter in practice (since the caller has an int,
it can never be more than INT_MAX) but it might make the intent
clearer.

Based on inspecting the caller, using an int seems fine.

>   return;
>   }
>  
> - while (isspace(*src)) {
> + while (0 < n && isspace(*src)) {

Yes, we need to check 'n && isspace(*src)' to avoid overflowing the
buffer if it consists entirely of spaces.

>   strbuf_addch(dest, *src);
>   src++;
>   n--;
> @@ -84,6 +84,9 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
>   for (i = 0; i < ARRAY_SIZE(keywords); i++) {
>   struct keyword_entry *p = keywords + i;
>   int len = strlen(p->keyword);
> +
> + if (n <= len)
> + continue;

Using <= instead of < since we look at the character after the word as
well.  Good.

>   /*
>* Match case insensitively, so we colorize output from existing
>* servers regardless of the case that they use for their
>* messages. We only highlight the word precisely, so
>* "successful" stays uncolored.
>*/
>   if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {

Not about this patch: should this check "&& src[len] == ':'" instead,
as discussed upthread?

> @@ -100,8 +103,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
>   }
>   }
>  
> - strbuf_add(dest, src, n);
> + if (0 < n)
> + strbuf_add(dest, src, n);

This check seems unnecessary.  strbuf_add can cope fine with !n.
Should we put

assert(n >= 0);

or even

if (n < 0)
BUG();

instead, since the earlier part of the fix guarantees that n >= 0?

Thanks for the careful work.  With or without such a change,
Reviewed-by: Jonathan Nieder 

Thanks.

Re: What's cooking in git.git (Aug 2018, #04; Fri, 17)

2018-08-17 Thread Duy Nguyen

On Sat, Aug 18, 2018 at 12:44 AM Junio C Hamano  wrote:
> * nd/unpack-trees-with-cache-tree (2018-08-13) 5 commits
>  - unpack-trees: reuse (still valid) cache-tree from src_index
>  - unpack-trees: reduce malloc in cache-tree walk
>  - unpack-trees: optimize walking same trees with cache-tree
>  - unpack-trees: add performance tracing
>  - trace.h: support nested performance tracing
>
>  The unpack_trees() API used in checking out a branch and merging
>  walks one or more trees along with the index.  When the cache-tree
>  in the index tells us that we are walking a tree whose flattened
>  contents is known (i.e. matches a span in the index), as linearly
>  scanning a span in the index is much more efficient than having to
>  open tree objects recursively and listing their entries, the walk
>  can be optimized, which is done in this topic.
>
>  Will merge to and cook in 'next'.

Please hold.

I was going to address your "The last step feels a bit scary" comment
by auditing and adding some cache-tree validation. I'm seeing some bad
cache-tree when running more validation through the test suite. Maybe
it's just bugs in my validation code, but better be safe than sorry.
-- 
Duy

Re: Re* [PATCH v7 1/1] sideband: highlight keywords in remote sideband output

2018-08-17 Thread Jonathan Nieder

Junio C Hamano wrote:

> Subject: sideband: do not read beyond the end of input
>
> The caller of maybe_colorize_sideband() gives a counted buffer
> , but the callee checked *src as if it were a NUL terminated
> buffer.  If src[] had all isspace() bytes in it, we would have made
> n negative, and then (1) called number of strncasecmp() to see if
> the remaining bytes in src[] matched keywords, reading beyond the
> end of the array, and/or (2) called strbuf_add() with negative
> count, most likely triggering the "you want to use way too much
> memory" error due to unsigned integer overflow.
>
> Signed-off-by: Junio C Hamano 
> ---
>  sideband.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

And here are some tests to squash in.

A sideband line consisting entirely of spaces causes us to overflow
our buffer and end up with a negative number of remaining characters,
ultimately resulting in the message

fatal: you want to use way too much memory

when parsing it in order to add color.

We also forget to limit a strncasecmp and isalnum looking for keywords
to color in to the memory region passed in.  Fortunately all callers
put a delimiter character (\0, \r, or \n) after the end of the region
so this does not cause trouble in practice.  Add a test for
futureproofing to protect the new bounds checking code in case that
ever changes.

Signed-off-by: Jonathan Nieder 
---
Thanks again,
Jonathan

 t/t5409-colorize-remote-messages.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/t/t5409-colorize-remote-messages.sh 
b/t/t5409-colorize-remote-messages.sh
index eb1b8aa05df..f81b6813c03 100755
--- a/t/t5409-colorize-remote-messages.sh
+++ b/t/t5409-colorize-remote-messages.sh
@@ -15,6 +15,8 @@ test_expect_success 'setup' '
echo warning: warning
echo prefixerror: error
echo " " "error: leading space"
+   echo ""
+   echo Err
exit 0
EOF
echo 1 >file &&
@@ -44,6 +46,12 @@ test_expect_success 'whole words at line start' '
grep "prefixerror: error" decoded
 '
 
+test_expect_success 'short line' '
+   git -C child -c color.remote=always push -f origin HEAD:short-line 
2>output &&
+   test_decode_color decoded &&
+   grep "remote: Err" decoded
+'
+
 test_expect_success 'case-insensitive' '
git --git-dir child/.git -c color.remote=always push -f origin 
HEAD:refs/heads/case-insensitive 2>output &&
cat output &&
@@ -58,6 +66,12 @@ test_expect_success 'leading space' '
grep "  error: leading space" decoded
 '
 
+test_expect_success 'spaces only' '
+   git -C child -c color.remote=always push -f origin HEAD:only-space 
2>output &&
+   test_decode_color decoded &&
+   grep "remote: " decoded
+'
+
 test_expect_success 'no coloring for redirected output' '
git --git-dir child/.git push -f origin 
HEAD:refs/heads/redirected-output 2>output &&
test_decode_color decoded &&
-- 
2.18.0.1017.ga543ac7ca45

74 matches

Mail list logo