date:20180108

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Jan 08 2018, Dan Jacques jotted:

Thanks, applied this on top of next and it works for me, i.e. install to
/tmp/git and move to /tmp/git2 = works for me. Comments below.

> Enabling RUNTIME_PREFIX_PERL overrides the system-specific Perl script
> installation path generated by MakeMaker to force installation into a
> platform-neutral location, "/share/perl5".

Not generated by MakeMaker anymore :)

>From 3/3 (not not send 2 e-mails):

>+# it. This is intentionally separate from RUNTIME_PREFIX so that notably 
>Windows
>+# can hard-code Perl library paths while still enabling RUNTIME_PREFIX
>+# resolution.

Maybe we covered this in previous submissions, but refresh my memory,
why is the *_PERL define still needed? Reading this explanation doesn't
make sense to me, but I'm probably missing something.

If we have a system where we have some perl library paths on the system
we want to use, then they'll still be in @INC after our 'use lib'-ing,
so we'll find libraries there.

The only reason I can think of for doing this for C and not for Perl
would be if you for some reason want to have a git in /tmp/git but then
use a different version of the Git.pm from some system install, but I
can't imagine why.

Or there's another option...

> + # GIT_EXEC_PATH is supplied by `git` or the test suite. Otherwise, 
> resolve
> + # against the runtime path of this script.
> + require FindBin;
> + require File::Spec;
> + (my $prefix = $ENV{GIT_EXEC_PATH} || $FindBin::Bin) =~ 
> s=${gitexecdir_relative}$==;

So why are we falling back on $FindBin::Bin? Just so you can do
e.g. /tmp/git2/libexec/git-core/git-svn like you can do
/tmp/git2/libexec/git-core/git-status, i.e. will this never be false if
invoked via "git"?

I don't mind it, just wondering if I'm missing something and we need to
use the fallback path in some "normal" codepath.

> + return File::Spec->catdir($prefix, $relpath);

I think you initially got some version of this from me (or not), so this
is probably my fault, but reading this again I think this would be
better as just:

return $prefix . '@@PATHSEP@@' . $relpath;

I.e. right after this we split on @@PATHSEP@@, and that clearly works
(as opposed to using File::Spec->splitpath) since we've used it
forever.

Better just to use the same idiom on both ends to not leave the reader
wondering why we can split paths one way, but need to join them another
way.

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Jeff King

On Sun, Jan 07, 2018 at 07:08:54PM -0500, Derrick Stolee wrote:

> > (Not a critique of this, just a (stupid) question)
> > 
> > What's the practical use-case for this feature? Since it doesn't help
> > with --abbrev=40 the speedup is all in the part that ensures we don't
> > show an ambiguous SHA-1.
> 
> The point of including the --abbrev=40 is to point out that object lookups
> do not get slower with the MIDX feature. Using these "git log" options is a
> good way to balance object lookups and abbreviations with object parsing and
> diff machinery. And while the public data shape I shared did not show a
> difference, our private testing of the Windows repository did show a
> valuable improvement when isolating to object lookups and ignoring
> abbreviation calculations.

Just to make sure I'm parsing this correctly: normal lookups do get faster
when you have a single index, given the right setup?

I'm curious what that setup looked like. Is it just tons and tons of
packs? Is it ones where the packs do not follow the mru patterns very
well?

I think it's worth thinking a bit about, because...

> > If something cares about both throughput and e.g. is saving the
> > abbreviated SHA-1s isn't it better off picking some arbitrary size
> > (e.g. --abbrev=20), after all the default abbreviation is going to show
> > something as small as possible, which may soon become ambigous after the
> > next commit.
> 
> Unfortunately, with the way the abbreviation algorithms work, using
> --abbrev=20 will have similar performance problems because you still need to
> inspect all packfiles to ensure there isn't a collision in the first 20 hex
> characters.

...if what we primarily care about speeding up is abbreviations, is it
crazy to consider disabling the disambiguation step entirely?

The results of find_unique_abbrev are already a bit of a probability
game. They're guaranteed at the moment of generation, but as more
objects are added, ambiguities may be introduced. Likewise, what's
unambiguous for you may not be for somebody else you're communicating
with, if they have their own clone.

Since we now scale the default abbreviation with the size of the repo,
that gives us a bounded and pretty reasonable probability that we won't
hit a collision at all[1].

I.e., what if we did something like this:

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..04c661ba85 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -600,6 +600,15 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;

+   /*
+* A default length of 10 implies a repository big enough that it's
+* getting expensive to double check the ambiguity of each object,
+* and the chance that any particular object of interest has a
+* collision is low.
+*/
+   if (len >= 10)
+   return len;
+
mad.init_len = len;
mad.cur_len = len;
mad.hex = hex;

If I repack my linux.git with "--max-pack-size=64m", I get 67 packs. The
patch above drops "git log --oneline --raw" on the resulting repo from
~150s to ~30s.

With a single pack, it goes from ~33s ~29s. Less impressive, but there's
still some benefit.

There may be other reasons to want MIDX or something like it, but I just
wonder if we can do this much simpler thing to cover the abbreviation
case. I guess the question is whether somebody is going to be annoyed in
the off chance that they hit a collision.

-Peff

[1] I'd have to check the numbers, but IIRC we've set the scaling so
that the chance of having a _single_ collision in the repository is
less than 50%, and rounding to the conservative side (since each hex
char gives us 4 bits). And indeed, "git log --oneline --raw" on
linux.git does not seem to have any collisions at its default of 12
characters, at least in my clone.

We could also consider switching core.disambiguate to "commit",
which makes even a collision less likely to annoy the user.

rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Matwey V. Kornilov

Hello,

I think that rebase preserve-merges algorithm needs further
improvements. Probably, you already know it.

I was playing with rebasing linux kernel tree and found that rebase
preserve-merges works in counterintuitive way while handling
merge-commits. I don't want to discuss arising merge conflicts which
can be illuminated with help of rerere trained at original branch. An
major issue here with silently auto-merging, some merge commits were
silently rebased with different content and it breaks the build :)

I crafted toy example to demonstrate an issue:
https://github.com/matwey/example-git-rebase-pm
Here I have commit range v0.1..v0.2, then I return to v0.1 and
introduce new commit abc-0.1 with new file on v0.1. Then I want to
rebase v0.1..v0.2 onto new abc-0.1. It is obvious that as soon as I
introduced single new file the difference between old v0.2 and new
v0.2 should have only this new file. However it actually contains
other changes.

Reproduce as the following:

git checkout v0.2
git branch abc-0.2
git checkout v0.1
git branch abc-0.1
git checkout abc-0.1
vim LICENSE
git add LICENSE
git commit -a
git rebase --preserve-merges --onto abc-0.1 v0.1 abc-0.2

Actual result:

diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 297edb3..000
--- a/LICENSE
+++ /dev/null
@@ -1,2 +0,0 @@
-Hello, world!
-
diff --git a/mod.c b/mod.c
index e6fa107..e339597 100644
--- a/mod.c
+++ b/mod.c
@@ -2,5 +2,5 @@
 #include "mod.h"

 int mod_fun(char* buf, int len) {
-   return -calc_fun(buf, len);
+   return -calc_fun(buf, len, 0);
 }

Expected result:

diff --git a/LICENSE b/LICENSE
deleted file mode 100644
index 297edb3..000
--- a/LICENSE
+++ /dev/null
@@ -1,2 +0,0 @@
-Hello, world!
-


As far as I understand the root cause of this that when new merge
commit is created by rebase it is done simply by git merge
$new_parents without taking into account any actual state of the
initial merge commit. This is why all manually introduced changes in
merge-commits are lost at rebase stage. I think that the following
improvement of merge-commit recreating stage would be possible:

1) Let we have merge commit in the source tree

-- A --\
M--
-- B --/

And we want to rebase M to the new tree where new parents A' and B'
already existing.

-- A'

-- B'

2) Then I suggest to perform it in three steps. First, lets take all
possible differences between A and M, B and M and so on... and then
try to cherry-pick the diffs on A' and B' respectively. Here AM' and
BM' are temporary commits which is very close in their states (in
ideal case, equal).

-- A' -- (AM)'

-- B' -- (BM)'

3) Then we do git merge (AM)' (BM)' and ask user to resolve the
conflicts if any. Here we obtain the following tree. Where M" is
produced by git merge.

-- A' -- (AM)' --\
  M"
-- B' -- (BM)' --/

4) Then we do merge between A' and B' but reset the state to be the
same as in M" before new merge commit is created:

-- A' -- (AM)' --\
 \\
   M' (==M")M"
 //
-- B' -- (BM)' --/

5) Then (AM)' (BM)' and M" are discarded from the tree.


Probably, I missed something important or incorrectly read
git-rebase--interactive.sh

-- 
With best regards,
Matwey V. Kornilov

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Jeff King

On Mon, Jan 08, 2018 at 05:20:29AM -0500, Jeff King wrote:

> I.e., what if we did something like this:
> 
> diff --git a/sha1_name.c b/sha1_name.c
> index 611c7d24dd..04c661ba85 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -600,6 +600,15 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
> *sha1, int len)
>   if (len == GIT_SHA1_HEXSZ || !len)
>   return GIT_SHA1_HEXSZ;
>  
> + /*
> +  * A default length of 10 implies a repository big enough that it's
> +  * getting expensive to double check the ambiguity of each object,
> +  * and the chance that any particular object of interest has a
> +  * collision is low.
> +  */
> + if (len >= 10)
> + return len;
> +

Oops, this really needs to terminate the string in addition to returning
the length (so it was always printing 40 characters in most cases). The
correct patch is below, but it performs the same.

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..5921298a80 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -600,6 +600,17 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;
 
+   /*
+* A default length of 10 implies a repository big enough that it's
+* getting expensive to double check the ambiguity of each object,
+* and the chance that any particular object of interest has a
+* collision is low.
+*/
+   if (len >= 10) {
+   hex[len] = 0;
+   return len;
+   }
+
mad.init_len = len;
mad.cur_len = len;
mad.hex = hex;

Re: [PATCH v2 3/3] send-email: add test for Linux's get_maintainer.pl

2018-01-08 Thread Matthieu Moy

Eric Sunshine  writes:

> On Fri, Jan 5, 2018 at 1:36 PM, Matthieu Moy  wrote:
>> From: Alex Bennée 
>>
>> We had a regression that broke Linux's get_maintainer.pl. Using
>> Mail::Address to parse email addresses fixed it, but let's protect
>> against future regressions.
>>
>> Patch-edited-by: Matthieu Moy 
>> Signed-off-by: Alex Bennée 
>> Signed-off-by: Matthieu Moy 
>> ---
>> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
>> @@ -172,6 +172,26 @@ test_expect_success $PREREQ 'cc trailer with various 
>> syntax' '
>> +test_expect_success $PREREQ 'setup fake get_maintainer.pl script for cc 
>> trailer' "
>> +   write_script expected-cc-script.sh <<-EOF &&
>> +   echo 'One Person  (supporter:THIS (FOO/bar))'
>> +   echo 'Two Person  (maintainer:THIS THING)'
>> +   echo 'Third List  (moderated list:THIS THING 
>> (FOO/bar))'
>> +   echo ' (moderated list:FOR THING)'
>> +   echo 'f...@example.com (open list:FOR THING (FOO/bar))'
>> +   echo 's...@example.com (open list)'
>> +   EOF
>> +   chmod +x expected-cc-script.sh
>> +"
>> +
>> +test_expect_success $PREREQ 'cc trailer with get_maintainer.pl output' '
>> +   clean_fake_sendmail &&
>> +   git send-email -1 --to=recipi...@example.com \
>> +   --cc-cmd="./expected-cc-script.sh" \
>> +   --smtp-server="$(pwd)/fake.sendmail" &&
>
> Aside from the unnecessary (thus noisy) quotes around the --cc-cmd

Indeed, removed.

> value, my one concern is that someone may come along and want to
> "normalize" it to --cc-cmd="$(pwd)/expected-cc-script.sh" for
> consistency with the following --smtp-server line. This worry is
> compounded by the commit message not explaining why these two lines
> differ (one using "./" and one using "$(pwd)/").

Added a note in the commit message.

> An alternative would be to insert a cleanup/modernization
> patch before this one which changes all the "$(pwd)/" to "./",

For --smtp-server, doing so results in a failing tests. I didn't
investigate on why.

> although you'd still want to explain why that's being done (to wit:
> because --cc-cmd behavior with spaces is not well defined). Or,
> perhaps this isn't an issue and my worry is not justified (after all,
> the test will break if someone changes the "./" to "$(pwd)/").

Also, the existing code is written like this: --cc-cmd is always
relative, --stmp-server is always absolute, including when they're used
in the same command:

test_suppress_self () {
[...]
git send-email --from="$1 <$2>" \
--to=nob...@example.com \
--cc-cmd=./cccmd-sed \
--suppress-cc=self \
--smtp-server="$(pwd)/fake.sendmail" \

Thanks for your careful review,

-- 
Matthieu Moy
https://matthieu-moy.fr/

[PATCH v3 1/3] send-email: add and use a local copy of Mail::Address

2018-01-08 Thread Matthieu Moy

We used to have two versions of the email parsing code. Our
parse_mailboxes (in Git.pm), and Mail::Address which we used if
installed. Unfortunately, both versions have different sets of bugs, and
changing the behavior of git depending on whether Mail::Address is
installed was a bad idea.

A first attempt to solve this was cc90750 (send-email: don't use
Mail::Address, even if available, 2017-08-23), but it turns out our
parse_mailboxes is too buggy for some uses. For example the lack of
nested comments support breaks get_maintainer.pl in the Linux kernel
tree:

  https://public-inbox.org/git/20171116154814.23785-1-alex.ben...@linaro.org/

This patch goes the other way: use Mail::Address anyway, but have a
local copy from CPAN as a fallback, when the system one is not
available.

The duplicated script is small (276 lines of code) and stable in time.
Maintaining the local copy should not be an issue, and will certainly be
less burden than maintaining our own parse_mailboxes.

Another option would be to consider Mail::Address as a hard dependency,
but it's easy enough to save the trouble of extra-dependency to the end
user or packager.

Signed-off-by: Matthieu Moy 
---
No change since v2.

 git-send-email.perl   |   3 +-
 perl/Git/FromCPAN/Mail/Address.pm | 276 ++
 perl/Git/Mail/Address.pm  |  24 
 3 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 perl/Git/FromCPAN/Mail/Address.pm
 create mode 100755 perl/Git/Mail/Address.pm

diff --git a/git-send-email.perl b/git-send-email.perl
index edcc6d3..340b5c8 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -30,6 +30,7 @@ use Error qw(:try);
 use Cwd qw(abs_path cwd);
 use Git;
 use Git::I18N;
+use Git::Mail::Address;
 
 Getopt::Long::Configure qw/ pass_through /;
 
@@ -489,7 +490,7 @@ my ($repoauthor, $repocommitter);
 ($repocommitter) = Git::ident_person(@repo, 'committer');
 
 sub parse_address_line {
-   return Git::parse_mailboxes($_[0]);
+   return map { $_->format } Mail::Address->parse($_[0]);
 }
 
 sub split_addrs {
diff --git a/perl/Git/FromCPAN/Mail/Address.pm 
b/perl/Git/FromCPAN/Mail/Address.pm
new file mode 100644
index 000..13b2ff7
--- /dev/null
+++ b/perl/Git/FromCPAN/Mail/Address.pm
@@ -0,0 +1,276 @@
+# Copyrights 1995-2017 by [Mark Overmeer ].
+#  For other contributors see ChangeLog.
+# See the manual pages for details on the licensing terms.
+# Pod stripped from pm file by OODoc 2.02.
+package Mail::Address;
+use vars '$VERSION';
+$VERSION = '2.19';
+
+use strict;
+
+use Carp;
+
+# use locale;   removed in version 1.78, because it causes taint problems
+
+sub Version { our $VERSION }
+
+
+
+# given a comment, attempt to extract a person's name
+sub _extract_name
+{   # This function can be called as method as well
+my $self = @_ && ref $_[0] ? shift : undef;
+
+local $_ = shift
+or return '';
+
+# Using encodings, too hard. See Mail::Message::Field::Full.
+return '' if m/\=\?.*?\?\=/;
+
+# trim whitespace
+s/^\s+//;
+s/\s+$//;
+s/\s+/ /;
+
+# Disregard numeric names (e.g. 123456.1...@compuserve.com)
+return "" if /^[\d ]+$/;
+
+s/^\((.*)\)$/$1/; # remove outermost parenthesis
+s/^"(.*)"$/$1/;   # remove outer quotation marks
+s/\(.*?\)//g; # remove minimal embedded comments
+s/\\//g;  # remove all escapes
+s/^"(.*)"$/$1/;   # remove internal quotation marks
+s/^([^\s]+) ?, ?(.*)$/$2 $1/; # reverse "Last, First M." if applicable
+s/,.*//;
+
+# Change casing only when the name contains only upper or only
+# lower cased characters.
+unless( m/[A-Z]/ && m/[a-z]/ )
+{   # Set the case of the name to first char upper rest lower
+s/\b(\w+)/\L\u$1/igo;  # Upcase first letter on name
+s/\bMc(\w)/Mc\u$1/igo; # Scottish names such as 'McLeod'
+s/\bo'(\w)/O'\u$1/igo; # Irish names such as 'O'Malley, O'Reilly'
+s/\b(x*(ix)?v*(iv)?i*)\b/\U$1/igo; # Roman numerals, eg 'Level III 
Support'
+}
+
+# some cleanup
+s/\[[^\]]*\]//g;
+s/(^[\s'"]+|[\s'"]+$)//g;
+s/\s{2,}/ /g;
+
+$_;
+}
+
+sub _tokenise
+{   local $_ = join ',', @_;
+my (@words,$snippet,$field);
+
+s/\A\s+//;
+s/[\r\n]+/ /g;
+
+while ($_ ne '')
+{   $field = '';
+if(s/^\s*\(/(/ )# (...)
+{   my $depth = 0;
+
+ PAREN: while(s/^(\(([^\(\)\\]|\\.)*)//)
+{   $field .= $1;
+$depth++;
+while(s/^(([^\(\)\\]|\\.)*\)\s*)//)
+{   $field .= $1;
+last PAREN unless --$depth;
+   $field .= $1 if s/^(([^\(\)\\]|\\.)+)//;
+}
+}
+
+carp "Unmatched () '$field' '$_'"
+if $depth;
+
+$field =~ s/\s+\Z//;
+push @words, $field;
+
+next;
+}
+
+if( s/^("(?:[^"\\]+|\\.)*")\s*//   # "..."
+ || s/^(\[(?:[^\]\\]+|\\.

[PATCH v3 2/3] Remove now useless email-address parsing code

2018-01-08 Thread Matthieu Moy

We now use Mail::Address unconditionaly, hence parse_mailboxes is now
dead code. Remove it and its tests.

Signed-off-by: Matthieu Moy 
---
No change since v2.

 perl/Git.pm  | 71 
 t/t9000-addresses.sh | 27 
 t/t9000/test.pl  | 67 -
 3 files changed, 165 deletions(-)
 delete mode 100755 t/t9000-addresses.sh
 delete mode 100755 t/t9000/test.pl

diff --git a/perl/Git.pm b/perl/Git.pm
index ffa09ac..65e6b32 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -880,77 +880,6 @@ sub ident_person {
return "$ident[0] <$ident[1]>";
 }
 
-=item parse_mailboxes
-
-Return an array of mailboxes extracted from a string.
-
-=cut
-
-# Very close to Mail::Address's parser, but we still have minor
-# differences in some cases (see t9000 for examples).
-sub parse_mailboxes {
-   my $re_comment = qr/\((?:[^)]*)\)/;
-   my $re_quote = qr/"(?:[^\"\\]|\\.)*"/;
-   my $re_word = qr/(?:[^]["\s()<>:;@\\,.]|\\.)+/;
-
-   # divide the string in tokens of the above form
-   my $re_token = qr/(?:$re_quote|$re_word|$re_comment|\S)/;
-   my @tokens = map { $_ =~ /\s*($re_token)\s*/g } @_;
-   my $end_of_addr_seen = 0;
-
-   # add a delimiter to simplify treatment for the last mailbox
-   push @tokens, ",";
-
-   my (@addr_list, @phrase, @address, @comment, @buffer) = ();
-   foreach my $token (@tokens) {
-   if ($token =~ /^[,;]$/) {
-   # if buffer still contains undeterminated strings
-   # append it at the end of @address or @phrase
-   if ($end_of_addr_seen) {
-   push @phrase, @buffer;
-   } else {
-   push @address, @buffer;
-   }
-
-   my $str_phrase = join ' ', @phrase;
-   my $str_address = join '', @address;
-   my $str_comment = join ' ', @comment;
-
-   # quote are necessary if phrase contains
-   # special characters
-   if ($str_phrase =~ /[][()<>:;@\\,.\000-\037\177]/) {
-   $str_phrase =~ s/(^|[^\\])"/$1/g;
-   $str_phrase = qq["$str_phrase"];
-   }
-
-   # add "<>" around the address if necessary
-   if ($str_address ne "" && $str_phrase ne "") {
-   $str_address = qq[<$str_address>];
-   }
-
-   my $str_mailbox = "$str_phrase $str_address 
$str_comment";
-   $str_mailbox =~ s/^\s*|\s*$//g;
-   push @addr_list, $str_mailbox if ($str_mailbox);
-
-   @phrase = @address = @comment = @buffer = ();
-   $end_of_addr_seen = 0;
-   } elsif ($token =~ /^\(/) {
-   push @comment, $token;
-   } elsif ($token eq "<") {
-   push @phrase, (splice @address), (splice @buffer);
-   } elsif ($token eq ">") {
-   $end_of_addr_seen = 1;
-   push @address, (splice @buffer);
-   } elsif ($token eq "@" && !$end_of_addr_seen) {
-   push @address, (splice @buffer), "@";
-   } else {
-   push @buffer, $token;
-   }
-   }
-
-   return @addr_list;
-}
-
 =item hash_object ( TYPE, FILENAME )
 
 Compute the SHA1 object id of the given C considering it is
diff --git a/t/t9000-addresses.sh b/t/t9000-addresses.sh
deleted file mode 100755
index a1ebef6..000
--- a/t/t9000-addresses.sh
+++ /dev/null
@@ -1,27 +0,0 @@
-#!/bin/sh
-
-test_description='compare address parsing with and without Mail::Address'
-. ./test-lib.sh
-
-if ! test_have_prereq PERL; then
-   skip_all='skipping perl interface tests, perl not available'
-   test_done
-fi
-
-perl -MTest::More -e 0 2>/dev/null || {
-   skip_all="Perl Test::More unavailable, skipping test"
-   test_done
-}
-
-perl -MMail::Address -e 0 2>/dev/null || {
-   skip_all="Perl Mail::Address unavailable, skipping test"
-   test_done
-}
-
-test_external_has_tap=1
-
-test_external_without_stderr \
-   'Perl address parsing function' \
-   perl "$TEST_DIRECTORY"/t9000/test.pl
-
-test_done
diff --git a/t/t9000/test.pl b/t/t9000/test.pl
deleted file mode 100755
index dfeaa9c..000
--- a/t/t9000/test.pl
+++ /dev/null
@@ -1,67 +0,0 @@
-#!/usr/bin/perl
-use lib (split(/:/, $ENV{GITPERLLIB}));
-
-use 5.008;
-use warnings;
-use strict;
-
-use Test::More qw(no_plan);
-use Mail::Address;
-
-BEGIN { use_ok('Git') }
-
-my @success_list = (q[Jane],
-   q[j...@example.com],
-   q[],
-   q[Jane ],
-   q[Jane Doe ],
-   q["Jane" ],
-   q["Doe, Jane" ],
-   q[

[PATCH v3 3/3] send-email: add test for Linux's get_maintainer.pl

2018-01-08 Thread Matthieu Moy

From: Alex Bennée 

We had a regression that broke Linux's get_maintainer.pl. Using
Mail::Address to parse email addresses fixed it, but let's protect
against future regressions.

Note that we need --cc-cmd to be relative because this option doesn't
accept spaces in script names (probably to allow --cc-cmd="executable
--option"), while --smtp-server needs to be absolute.

Patch-edited-by: Matthieu Moy 
Signed-off-by: Alex Bennée 
Signed-off-by: Matthieu Moy 
---
Change since v2:

* Mention relative Vs absolute path in commit message.

* Remove useless "chmod +x"

* Remove useless double quotes

 t/t9001-send-email.sh | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 4d261c2..a06e5d7 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -172,6 +172,25 @@ test_expect_success $PREREQ 'cc trailer with various 
syntax' '
test_cmp expected-cc commandline1
 '
 
+test_expect_success $PREREQ 'setup fake get_maintainer.pl script for cc 
trailer' "
+   write_script expected-cc-script.sh <<-EOF
+   echo 'One Person  (supporter:THIS (FOO/bar))'
+   echo 'Two Person  (maintainer:THIS THING)'
+   echo 'Third List  (moderated list:THIS THING 
(FOO/bar))'
+   echo ' (moderated list:FOR THING)'
+   echo 'f...@example.com (open list:FOR THING (FOO/bar))'
+   echo 's...@example.com (open list)'
+   EOF
+"
+
+test_expect_success $PREREQ 'cc trailer with get_maintainer.pl output' '
+   clean_fake_sendmail &&
+   git send-email -1 --to=recipi...@example.com \
+   --cc-cmd=./expected-cc-script.sh \
+   --smtp-server="$(pwd)/fake.sendmail" &&
+   test_cmp expected-cc commandline1
+'
+
 test_expect_success $PREREQ 'setup expect' "
 cat >expected-show-all-headers <<\EOF
 0001-Second.patch
-- 
2.7.4

Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Duy Nguyen

On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const 
> char *path)
> split_index->base = xcalloc(1, sizeof(*split_index->base));
>
> base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);

Personally I prefer the repo_git_path() from v2 (sorry I was away and
could not comment anything). The thing is, git_path() and friends
could do some path translation underneath to support multiple
worktrees. Think of the given path here as a "virtual path" that may
be translated to something else, not exactly  + "/" +
"sharedindex.%s". But in practice, we're not breaking the relationship
between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
manual path transformation here is fine.

What about the other git_path() in this file? With patch applied I still get

> ~/w/git/temp $ git grep git_path read-cache.c
read-cache.c:   shared_index_path = git_path("%s", de->d_name);
read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
read-cache.c: git_path("sharedindex.%s",
sha1_to_hex(si->base->sha1)));
read-cache.c:   const char *shared_index = git_path("sharedindex.%s",

I suppose submodule has not triggered any of these code paths yet. Not
sure if we should deal with them now or wait until later.

Perhaps if we add a "struct repository *" pointer inside index_state,
we could retrieve back the_repository (or others) and call
repo_git_path() everywhere without changing index api too much. I
don't know. I like the  'struct repository' concept but couldn't
follow its development so I don't if this is what it should become.
-- 
Duy

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Ævar Arnfjörð Bjarmason

On Mon, Jan 08 2018, Derrick Stolee jotted:

> On 1/7/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:
>> If something cares about both throughput and e.g. is saving the
>> abbreviated SHA-1s isn't it better off picking some arbitrary size
>> (e.g. --abbrev=20), after all the default abbreviation is going to show
>> something as small as possible, which may soon become ambigous after the
>> next commit.
>
> Unfortunately, with the way the abbreviation algorithms work, using
> --abbrev=20 will have similar performance problems because you still
> need to inspect all packfiles to ensure there isn't a collision in the
> first 20 hex characters.

I meant (but forgot to write) that this would be some new mode,
e.g. --abbrev=20 --no-abbrev-check which would just perform a substr()
of the 40 character SHA-1. It might be interesting to add that for
reasons completely unrelated to your series.

Thanks for answering the rest.

Re: Errors and other unpleasant things found by Cppcheck

2018-01-08 Thread Philip Oakley


From: "Friedrich Spee von Langenfeld" 

Hi,

I analyzed the GitHub repository with Cppcheck. The resulting XML file
is attached. Please open it in Cppcheck to view it comfortably.

Especially the bunch of errors could be of interest to you.


Hi,

Thanks for the submission.

The list prefers that useful information is in plain text so as to avoid 
opening file types that may hide undesirable effects.


Was your analysis part of an organised scan, or a personal insight? It would 
help to know the background.


The project does have a number of known and accepted cases of 'unitialised 
variables' and known memory leaks which are acceptable in those cases.


If you picked out the few key issues that you feel should be addressed then 
a patch can be considered, e.g. the suggestion of the wildmatch macro (L263) 
that depends on the order of evaluation of side effects.


--
Philip

Re: [PATCH v3 1/3] send-email: add and use a local copy of Mail::Address

2018-01-08 Thread Alex Bennée


Matthieu Moy  writes:

> We used to have two versions of the email parsing code. Our
> parse_mailboxes (in Git.pm), and Mail::Address which we used if
> installed. Unfortunately, both versions have different sets of bugs, and
> changing the behavior of git depending on whether Mail::Address is
> installed was a bad idea.
>
> A first attempt to solve this was cc90750 (send-email: don't use
> Mail::Address, even if available, 2017-08-23), but it turns out our
> parse_mailboxes is too buggy for some uses. For example the lack of
> nested comments support breaks get_maintainer.pl in the Linux kernel
> tree:
>
>   https://public-inbox.org/git/20171116154814.23785-1-alex.ben...@linaro.org/
>
> This patch goes the other way: use Mail::Address anyway, but have a
> local copy from CPAN as a fallback, when the system one is not
> available.
>
> The duplicated script is small (276 lines of code) and stable in time.
> Maintaining the local copy should not be an issue, and will certainly be
> less burden than maintaining our own parse_mailboxes.
>
> Another option would be to consider Mail::Address as a hard dependency,
> but it's easy enough to save the trouble of extra-dependency to the end
> user or packager.
>
> Signed-off-by: Matthieu Moy 

Reviewed-by: Alex Bennée 


> ---
> No change since v2.
>
>  git-send-email.perl   |   3 +-
>  perl/Git/FromCPAN/Mail/Address.pm | 276 
> ++
>  perl/Git/Mail/Address.pm  |  24 
>  3 files changed, 302 insertions(+), 1 deletion(-)
>  create mode 100644 perl/Git/FromCPAN/Mail/Address.pm
>  create mode 100755 perl/Git/Mail/Address.pm
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index edcc6d3..340b5c8 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -30,6 +30,7 @@ use Error qw(:try);
>  use Cwd qw(abs_path cwd);
>  use Git;
>  use Git::I18N;
> +use Git::Mail::Address;
>
>  Getopt::Long::Configure qw/ pass_through /;
>
> @@ -489,7 +490,7 @@ my ($repoauthor, $repocommitter);
>  ($repocommitter) = Git::ident_person(@repo, 'committer');
>
>  sub parse_address_line {
> - return Git::parse_mailboxes($_[0]);
> + return map { $_->format } Mail::Address->parse($_[0]);
>  }
>
>  sub split_addrs {
> diff --git a/perl/Git/FromCPAN/Mail/Address.pm 
> b/perl/Git/FromCPAN/Mail/Address.pm
> new file mode 100644
> index 000..13b2ff7
> --- /dev/null
> +++ b/perl/Git/FromCPAN/Mail/Address.pm
> @@ -0,0 +1,276 @@
> +# Copyrights 1995-2017 by [Mark Overmeer ].
> +#  For other contributors see ChangeLog.
> +# See the manual pages for details on the licensing terms.
> +# Pod stripped from pm file by OODoc 2.02.
> +package Mail::Address;
> +use vars '$VERSION';
> +$VERSION = '2.19';
> +
> +use strict;
> +
> +use Carp;
> +
> +# use locale;   removed in version 1.78, because it causes taint problems
> +
> +sub Version { our $VERSION }
> +
> +
> +
> +# given a comment, attempt to extract a person's name
> +sub _extract_name
> +{   # This function can be called as method as well
> +my $self = @_ && ref $_[0] ? shift : undef;
> +
> +local $_ = shift
> +or return '';
> +
> +# Using encodings, too hard. See Mail::Message::Field::Full.
> +return '' if m/\=\?.*?\?\=/;
> +
> +# trim whitespace
> +s/^\s+//;
> +s/\s+$//;
> +s/\s+/ /;
> +
> +# Disregard numeric names (e.g. 123456.1...@compuserve.com)
> +return "" if /^[\d ]+$/;
> +
> +s/^\((.*)\)$/$1/; # remove outermost parenthesis
> +s/^"(.*)"$/$1/;   # remove outer quotation marks
> +s/\(.*?\)//g; # remove minimal embedded comments
> +s/\\//g;  # remove all escapes
> +s/^"(.*)"$/$1/;   # remove internal quotation marks
> +s/^([^\s]+) ?, ?(.*)$/$2 $1/; # reverse "Last, First M." if applicable
> +s/,.*//;
> +
> +# Change casing only when the name contains only upper or only
> +# lower cased characters.
> +unless( m/[A-Z]/ && m/[a-z]/ )
> +{   # Set the case of the name to first char upper rest lower
> +s/\b(\w+)/\L\u$1/igo;  # Upcase first letter on name
> +s/\bMc(\w)/Mc\u$1/igo; # Scottish names such as 'McLeod'
> +s/\bo'(\w)/O'\u$1/igo; # Irish names such as 'O'Malley, O'Reilly'
> +s/\b(x*(ix)?v*(iv)?i*)\b/\U$1/igo; # Roman numerals, eg 'Level III 
> Support'
> +}
> +
> +# some cleanup
> +s/\[[^\]]*\]//g;
> +s/(^[\s'"]+|[\s'"]+$)//g;
> +s/\s{2,}/ /g;
> +
> +$_;
> +}
> +
> +sub _tokenise
> +{   local $_ = join ',', @_;
> +my (@words,$snippet,$field);
> +
> +s/\A\s+//;
> +s/[\r\n]+/ /g;
> +
> +while ($_ ne '')
> +{   $field = '';
> +if(s/^\s*\(/(/ )# (...)
> +{   my $depth = 0;
> +
> + PAREN: while(s/^(\(([^\(\)\\]|\\.)*)//)
> +{   $field .= $1;
> +$depth++;
> +while(s/^(([^\(\)\\]|\\.)*\)\s*)//)
> +{   $field .= $1;
> +last PAREN unless --$depth;
> + $field .= $1 if

Re: [PATCH v3 2/3] Remove now useless email-address parsing code

2018-01-08 Thread Alex Bennée


Matthieu Moy  writes:

> We now use Mail::Address unconditionaly, hence parse_mailboxes is now
> dead code. Remove it and its tests.
>
> Signed-off-by: Matthieu Moy 

Reviewed-by: Alex Bennée 

> ---
> No change since v2.
>
>  perl/Git.pm  | 71 
> 
>  t/t9000-addresses.sh | 27 
>  t/t9000/test.pl  | 67 -
>  3 files changed, 165 deletions(-)
>  delete mode 100755 t/t9000-addresses.sh
>  delete mode 100755 t/t9000/test.pl
>
> diff --git a/perl/Git.pm b/perl/Git.pm
> index ffa09ac..65e6b32 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -880,77 +880,6 @@ sub ident_person {
>   return "$ident[0] <$ident[1]>";
>  }
>
> -=item parse_mailboxes
> -
> -Return an array of mailboxes extracted from a string.
> -
> -=cut
> -
> -# Very close to Mail::Address's parser, but we still have minor
> -# differences in some cases (see t9000 for examples).
> -sub parse_mailboxes {
> - my $re_comment = qr/\((?:[^)]*)\)/;
> - my $re_quote = qr/"(?:[^\"\\]|\\.)*"/;
> - my $re_word = qr/(?:[^]["\s()<>:;@\\,.]|\\.)+/;
> -
> - # divide the string in tokens of the above form
> - my $re_token = qr/(?:$re_quote|$re_word|$re_comment|\S)/;
> - my @tokens = map { $_ =~ /\s*($re_token)\s*/g } @_;
> - my $end_of_addr_seen = 0;
> -
> - # add a delimiter to simplify treatment for the last mailbox
> - push @tokens, ",";
> -
> - my (@addr_list, @phrase, @address, @comment, @buffer) = ();
> - foreach my $token (@tokens) {
> - if ($token =~ /^[,;]$/) {
> - # if buffer still contains undeterminated strings
> - # append it at the end of @address or @phrase
> - if ($end_of_addr_seen) {
> - push @phrase, @buffer;
> - } else {
> - push @address, @buffer;
> - }
> -
> - my $str_phrase = join ' ', @phrase;
> - my $str_address = join '', @address;
> - my $str_comment = join ' ', @comment;
> -
> - # quote are necessary if phrase contains
> - # special characters
> - if ($str_phrase =~ /[][()<>:;@\\,.\000-\037\177]/) {
> - $str_phrase =~ s/(^|[^\\])"/$1/g;
> - $str_phrase = qq["$str_phrase"];
> - }
> -
> - # add "<>" around the address if necessary
> - if ($str_address ne "" && $str_phrase ne "") {
> - $str_address = qq[<$str_address>];
> - }
> -
> - my $str_mailbox = "$str_phrase $str_address 
> $str_comment";
> - $str_mailbox =~ s/^\s*|\s*$//g;
> - push @addr_list, $str_mailbox if ($str_mailbox);
> -
> - @phrase = @address = @comment = @buffer = ();
> - $end_of_addr_seen = 0;
> - } elsif ($token =~ /^\(/) {
> - push @comment, $token;
> - } elsif ($token eq "<") {
> - push @phrase, (splice @address), (splice @buffer);
> - } elsif ($token eq ">") {
> - $end_of_addr_seen = 1;
> - push @address, (splice @buffer);
> - } elsif ($token eq "@" && !$end_of_addr_seen) {
> - push @address, (splice @buffer), "@";
> - } else {
> - push @buffer, $token;
> - }
> - }
> -
> - return @addr_list;
> -}
> -
>  =item hash_object ( TYPE, FILENAME )
>
>  Compute the SHA1 object id of the given C considering it is
> diff --git a/t/t9000-addresses.sh b/t/t9000-addresses.sh
> deleted file mode 100755
> index a1ebef6..000
> --- a/t/t9000-addresses.sh
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -#!/bin/sh
> -
> -test_description='compare address parsing with and without Mail::Address'
> -. ./test-lib.sh
> -
> -if ! test_have_prereq PERL; then
> - skip_all='skipping perl interface tests, perl not available'
> - test_done
> -fi
> -
> -perl -MTest::More -e 0 2>/dev/null || {
> - skip_all="Perl Test::More unavailable, skipping test"
> - test_done
> -}
> -
> -perl -MMail::Address -e 0 2>/dev/null || {
> - skip_all="Perl Mail::Address unavailable, skipping test"
> - test_done
> -}
> -
> -test_external_has_tap=1
> -
> -test_external_without_stderr \
> - 'Perl address parsing function' \
> - perl "$TEST_DIRECTORY"/t9000/test.pl
> -
> -test_done
> diff --git a/t/t9000/test.pl b/t/t9000/test.pl
> deleted file mode 100755
> index dfeaa9c..000
> --- a/t/t9000/test.pl
> +++ /dev/null
> @@ -1,67 +0,0 @@
> -#!/usr/bin/perl
> -use lib (split(/:/, $ENV{GITPERLLIB}));
> -
> -use 5.008;
> -use warnings;
> -use strict;
> -
> -use Test::More qw(no_plan);

NOTE

2018-01-08 Thread Ahmed Zama

-- 
Greetings,

Can you handle a transaction that involves the transfer of fund valued
15 million Euros into a foreign account. I will give you the full
detailed information as soon as I hear from you. Send me the
followings if you are willing and ready to work with me.
1)Full Names (2)Occupation (3)Age (4)Nationality (5)Direct Mobile Line

Ahmed Zama
UBA BANK
OUAGADOUGOU BURKINA FASO

Re: [PATCH v4 8/7] wildmatch test: skip file creation tests on Windows proper

2018-01-08 Thread Johannes Schindelin

Hi Duy,

On Sun, 7 Jan 2018, Duy Nguyen wrote:

> On Sat, Jan 6, 2018 at 7:51 PM, Johannes Schindelin
>  wrote:
> > Nobody likes to run tests that take too
> > long. And look at this:
> >
> > ...
> > ok 1511 - ipathmatch: match 'Z' '[Z-y]'
> > ok 1512 - ipathmatch(ls): match '[Z-y]' 'Z'
> > # still have 84 known breakage(s)
> > # failed 52 among remaining 1428 test(s)
> > 1..1512
> >
> > real5m51.432s
> > user0m33.986s
> > sys 2m13.162s
> >
> > Yep. It takes *over eight minutes*.
> 
> I suppose this is because the sheer number of test cases adds a lot of
> shell overhead on Windows. I wonder if it's better to rewrite this
> test in C instead. We start to do some more unit testing here and
> there and kind of abuse the sh-based test framework for this. Having a
> proper unit test framework would be good anyway since it's sometimes
> hard to create a specific scenario with high level commands.

I agree that it would make a ton of sense to use a proper, portable test
framework written in pure, portable C.

However, this ship has long sailed, hasn't it?

Of course, we do have useful stuff in t/helper/. And we have precedent for
more sensible bulk testing that does not require sh to generate or provide
lists of test data, see e.g. the basename/dirname tests. And we could
organize t/helper/ better, including a refactoring to have a single binary
rather than tons and tons of binaries that all link to libgit.a and
take a lot of space (even with LZMA compression, the current test
artifacts take about 90 megabyte!).

If I had the time I would write this up as a valuable GSoC project. Not
that Junio cares. But I do.

Ciao,
Dscho

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Jan 08 2018, Jeff King jotted:

> On Mon, Jan 08, 2018 at 05:20:29AM -0500, Jeff King wrote:
>
>> I.e., what if we did something like this:
>>
>> diff --git a/sha1_name.c b/sha1_name.c
>> index 611c7d24dd..04c661ba85 100644
>> --- a/sha1_name.c
>> +++ b/sha1_name.c
>> @@ -600,6 +600,15 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
>> *sha1, int len)
>>  if (len == GIT_SHA1_HEXSZ || !len)
>>  return GIT_SHA1_HEXSZ;
>>
>> +/*
>> + * A default length of 10 implies a repository big enough that it's
>> + * getting expensive to double check the ambiguity of each object,
>> + * and the chance that any particular object of interest has a
>> + * collision is low.
>> + */
>> +if (len >= 10)
>> +return len;
>> +
>
> Oops, this really needs to terminate the string in addition to returning
> the length (so it was always printing 40 characters in most cases). The
> correct patch is below, but it performs the same.
>
> diff --git a/sha1_name.c b/sha1_name.c
> index 611c7d24dd..5921298a80 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -600,6 +600,17 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
> *sha1, int len)
>   if (len == GIT_SHA1_HEXSZ || !len)
>   return GIT_SHA1_HEXSZ;
>
> + /*
> +  * A default length of 10 implies a repository big enough that it's
> +  * getting expensive to double check the ambiguity of each object,
> +  * and the chance that any particular object of interest has a
> +  * collision is low.
> +  */
> + if (len >= 10) {
> + hex[len] = 0;
> + return len;
> + }
> +
>   mad.init_len = len;
>   mad.cur_len = len;
>   mad.hex = hex;

That looks much more sensible, leaving aside other potential benefits of
MIDX.

Given the argument Linus made in e6c587c733 ("abbrev: auto size the
default abbreviation", 2016-09-30) maybe we should add a small integer
to the length for good measure, i.e. something like:

if (len >= 10) {
int extra = 2; /* or  just 1? or maybe 0 ... */
hex[len + extra] = 0;
return len + extra;
}

I tried running:

git log --pretty=format:%h --abbrev=7 | perl -nE 'chomp; say 
length'|sort|uniq -c|sort -nr

On several large repos, which forces something like the disambiguation
we had before Linus's patch, on e.g. David Turner's
2015-04-03-1M-git.git test repo it's:

 952858 7
  44541 8
   2861 9
168 10
 17 11
  2 12

And the default abbreviation picks 12. I haven't yet found a case where
it's wrong, but if we wanted to be extra safe we could just add a byte
or two to the SHA-1.

Re: [PATCH v4 8/7] wildmatch test: skip file creation tests on Windows proper

2018-01-08 Thread Johannes Schindelin

Hi Ævar,

On Sat, 6 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> Can you please provide me with the output of the test under -v -x -d
> from github.com:avar/git.git wildmatch-refactor-8 branch?

With -v -x -i:

-- snip --
[...]
expecting success:
printf '%s' '?a?b' >expect &&
git --glob-pathspecs ls-files -z -- '\??\?b' 
>actual.raw 2>actual.err &&

tr -d '\0' actual &&
>expect.err &&
test_cmp expect.err actual.err &&
test_cmp expect actual

++ printf %s '?a?b'
++ git --glob-pathspecs ls-files -z -- '\??\?b'
+ test_eval_ret_=128
+ want_trace
+ test t = t
+ test t = t
+ set +x
error: last command exited with $?=128
not ok 734 - wildmatch(ls): match '\??\?b' '?a?b'
#
#   printf '%s' '?a?b' >expect &&
#   git --glob-pathspecs ls-files -z
#   -- '\??\?b' >actual.raw
#   2>actual.err &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#

real2m9.127s
user0m10.026s
sys 1m0.617s
-- snap --

and

-- snip --
$ cat ./trash\ directory.t3070-wildmatch/actual.err
fatal: Invalid path '/??': No such file or directory
-- snap --

As to the speed:

-- snip --
# still have 144 known breakage(s)
# failed 28 among remaining 1746 test(s)
1..1890

real5m55.162s
user0m26.396s
sys 2m34.152s
-- snap --

... seems to be in the same ballpark. You are just leaning way too heavily
on Unix shell scripting.

FWIW the breakages are:

-- snip --
not ok 734 - wildmatch(ls): match '\??\?b' '?a?b'
#
#   printf '%s' '?a?b' >expect &&
#   git --glob-pathspecs ls-files -z
#   -- '\??\?b' >actual.raw
#   2>actual.err &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 735 - iwildmatch: match '?a?b' '\??\?b'
not ok 736 - iwildmatch(ls): match '\??\?b' '?a?b'
#
#   printf '%s' '?a?b' >expect &&
#   git --glob-pathspecs
#   --icase-pathspecs ls-files -z --
#   '\??\?b' >actual.raw 2>actual.err
#   &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 737 - pathmatch: match '?a?b' '\??\?b'
not ok 738 - pathmatch(ls): match '\??\?b' '?a?b'
#
#   printf '%s' '?a?b' >expect &&
#   git ls-files -z -- '\??\?b'
#   >actual.raw 2>actual.err &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 739 - ipathmatch: match '?a?b' '\??\?b'
not ok 740 - ipathmatch(ls): match '\??\?b' '?a?b'
#
#   printf '%s' '?a?b' >expect &&
#   git --icase-pathspecs ls-files -z
#   -- '\??\?b' >actual.raw
#   2>actual.err &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 741 - cleanup after previous file test
ok 742 - setup wildtest file test for abc
ok 743 - wildmatch: match 'abc' '\a\b\c'
not ok 744 - wildmatch(ls): match '\a\b\c' 'abc'
#
#   printf '%s' 'abc' >expect &&
#   git --glob-pathspecs ls-files -z
#   -- '\a\b\c' >actual.raw
#   2>actual.err &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 745 - iwildmatch: match 'abc' '\a\b\c'
not ok 746 - iwildmatch(ls): match '\a\b\c' 'abc'
#
#   printf '%s' 'abc' >expect &&
#   git --glob-pathspecs
#   --icase-pathspecs ls-files -z --
#   '\a\b\c' >actual.raw 2>actual.err
#   &&
#
#   tr -d '\0' actual &&
#   >expect.err &&
#   test_cmp expect.err actual.err &&
#   test_cmp expect actual
#
ok 747 - pathmatch: match 'abc' '\a\b\c'
not ok 748 - pathmatch(ls): match '\a\b\

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Johannes Schindelin

Hi Peff,

On Mon, 8 Jan 2018, Jeff King wrote:

> On Sun, Jan 07, 2018 at 07:08:54PM -0500, Derrick Stolee wrote:
> 
> > > (Not a critique of this, just a (stupid) question)
> > > 
> > > What's the practical use-case for this feature? Since it doesn't
> > > help with --abbrev=40 the speedup is all in the part that ensures we
> > > don't show an ambiguous SHA-1.
> > 
> > The point of including the --abbrev=40 is to point out that object
> > lookups do not get slower with the MIDX feature. Using these "git log"
> > options is a good way to balance object lookups and abbreviations with
> > object parsing and diff machinery. And while the public data shape I
> > shared did not show a difference, our private testing of the Windows
> > repository did show a valuable improvement when isolating to object
> > lookups and ignoring abbreviation calculations.
> 
> Just to make sure I'm parsing this correctly: normal lookups do get
> faster when you have a single index, given the right setup?
> 
> I'm curious what that setup looked like. Is it just tons and tons of
> packs? Is it ones where the packs do not follow the mru patterns very
> well?
> 
> I think it's worth thinking a bit about, because...
> 
> > > If something cares about both throughput and e.g. is saving the
> > > abbreviated SHA-1s isn't it better off picking some arbitrary size
> > > (e.g. --abbrev=20), after all the default abbreviation is going to show
> > > something as small as possible, which may soon become ambigous after the
> > > next commit.
> > 
> > Unfortunately, with the way the abbreviation algorithms work, using
> > --abbrev=20 will have similar performance problems because you still need to
> > inspect all packfiles to ensure there isn't a collision in the first 20 hex
> > characters.
> 
> ...if what we primarily care about speeding up is abbreviations, is it
> crazy to consider disabling the disambiguation step entirely?

Not crazy. But it would break stuff. Because...

> The results of find_unique_abbrev are already a bit of a probability
> game. They're guaranteed at the moment of generation, but as more
> objects are added, ambiguities may be introduced. Likewise, what's
> unambiguous for you may not be for somebody else you're communicating
> with, if they have their own clone.

... this is only a probability game in the long term, when you consider
new objects to enter from *somewhere*.

But in purely local settings, when we expect no new objects to be
introduced, we do use known-unambiguous abbreviations.

Take the interactive rebase for example. It generates todo lists with
abbreviated commit names, for readability (and it is *really* important to
keep this readable). As we expect new objects to be introduced by the
interactive rebase, we convert that todo list to unabbreviated commit
names before executing the interactive rebase.

Your idea (to not care about unambiguous abbreviations) would break that.

Ciao,
Dscho

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Derrick Stolee


On 1/8/2018 5:20 AM, Jeff King wrote:

On Sun, Jan 07, 2018 at 07:08:54PM -0500, Derrick Stolee wrote:


(Not a critique of this, just a (stupid) question)

What's the practical use-case for this feature? Since it doesn't help
with --abbrev=40 the speedup is all in the part that ensures we don't
show an ambiguous SHA-1.

The point of including the --abbrev=40 is to point out that object lookups
do not get slower with the MIDX feature. Using these "git log" options is a
good way to balance object lookups and abbreviations with object parsing and
diff machinery. And while the public data shape I shared did not show a
difference, our private testing of the Windows repository did show a
valuable improvement when isolating to object lookups and ignoring
abbreviation calculations.

Just to make sure I'm parsing this correctly: normal lookups do get faster
when you have a single index, given the right setup?

I'm curious what that setup looked like. Is it just tons and tons of
packs? Is it ones where the packs do not follow the mru patterns very
well?


The way I repacked the Linux repo creates an artificially good set of 
packs for the MRU cache. When the packfiles are partitioned instead by 
the time the objects were pushed to a remote, the MRU cache performs 
poorly. Improving these object lookups are a primary reason for the MIDX 
feature, and almost all commands improve because of it. 'git log' is 
just the simplest to use for demonstration.



I think it's worth thinking a bit about, because...


If something cares about both throughput and e.g. is saving the
abbreviated SHA-1s isn't it better off picking some arbitrary size
(e.g. --abbrev=20), after all the default abbreviation is going to show
something as small as possible, which may soon become ambigous after the
next commit.

Unfortunately, with the way the abbreviation algorithms work, using
--abbrev=20 will have similar performance problems because you still need to
inspect all packfiles to ensure there isn't a collision in the first 20 hex
characters.

...if what we primarily care about speeding up is abbreviations, is it
crazy to consider disabling the disambiguation step entirely?

The results of find_unique_abbrev are already a bit of a probability
game. They're guaranteed at the moment of generation, but as more
objects are added, ambiguities may be introduced. Likewise, what's
unambiguous for you may not be for somebody else you're communicating
with, if they have their own clone.

Since we now scale the default abbreviation with the size of the repo,
that gives us a bounded and pretty reasonable probability that we won't
hit a collision at all[1].

I.e., what if we did something like this:

diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..04c661ba85 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -600,6 +600,15 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;
  
+	/*

+* A default length of 10 implies a repository big enough that it's
+* getting expensive to double check the ambiguity of each object,
+* and the chance that any particular object of interest has a
+* collision is low.
+*/
+   if (len >= 10)
+   return len;
+
mad.init_len = len;
mad.cur_len = len;
mad.hex = hex;

If I repack my linux.git with "--max-pack-size=64m", I get 67 packs. The
patch above drops "git log --oneline --raw" on the resulting repo from
~150s to ~30s.

With a single pack, it goes from ~33s ~29s. Less impressive, but there's
still some benefit.

There may be other reasons to want MIDX or something like it, but I just
wonder if we can do this much simpler thing to cover the abbreviation
case. I guess the question is whether somebody is going to be annoyed in
the off chance that they hit a collision.


No only are users going to be annoyed when they hit collisions after 
copy-pasting an abbreviated hash, there are also a large number of tools 
that people build that use abbreviated hashes (either for presenting to 
users or because they didn't turn off abbreviations).


Abbreviations cause performance issues in other commands, too (like 
'fetch'!), so whatever short-circuit you put in, it would need to be 
global. A flag on one builtin would not suffice.



-Peff

[1] I'd have to check the numbers, but IIRC we've set the scaling so
 that the chance of having a _single_ collision in the repository is
 less than 50%, and rounding to the conservative side (since each hex
 char gives us 4 bits). And indeed, "git log --oneline --raw" on
 linux.git does not seem to have any collisions at its default of 12
 characters, at least in my clone.

 We could also consider switching core.disambiguate to "commit",
 which makes even a collision less likely to annoy the user.

Re: Possible bug report: git checkout tag problem

2018-01-08 Thread Johannes Schindelin

Hi Myles,

On Mon, 8 Jan 2018, Myles Fong wrote:

> Brief description:
> When two tags are pointing to the same commit, e.g. tagA and tagB, if I
> do `git checkout tagA` then `git checkout tagB`,  and then `git status`,
> it shows `HEAD detached at tagA`
> 
> Expected behaviour:
> I'm expecting it to show `HEAD detached at tagB`, though I understand
> this makes no difference for the repo code, but still a bit confusing
> for me.

The problem here is that Git understands something different from what you
intended: if you say `git checkout `, Git is mostly interested
in the revision, i.e. the commit. Only if that parameter refers to a local
branch name (which is the only type of ref Git expects to advance via the
worktree) does it switch to a named branch. Otherwise it switches to what
I like to call "unnamed branch" (and which for historical reasons is
called "detached HEAD" by Git, something that is unlikely to be understood
without explicit explanation).

Now, as a convenience, Git tries to name the revision when it is on such
an unnamed branch. If a tag points to it, it uses the name of that tag to
describe it. If *multiple* tags point to it, it uses the newest one.

That's why you see what you see. It is intended behavior...

Ciao,
Johannes

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Johannes Schindelin

Hi Matwey,

On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:

> I think that rebase preserve-merges algorithm needs further
> improvements. Probably, you already know it.

Yes. preserve-merges is a fundamentally flawed design.

Please have a look here:

https://github.com/git/git/pull/447

Since we are in a feature freeze in preparation for v2.16.0, I will
submit these patch series shortly after v2.16.0 is released.

> As far as I understand the root cause of this that when new merge
> commit is created by rebase it is done simply by git merge
> $new_parents without taking into account any actual state of the
> initial merge commit.

Indeed. preserve-merges does not allow commits to be reordered. (Actually,
it *does* allow it, but then fails to handle it correctly.) We even have
test cases that mark this as "known breakage".

But really, I do not think it is worth trying to fix the broken design.
Better to go with the new recreate-merges. (I am biased, of course,
because I invented recreate-merges. But then, I also invented
preserve-merges, so ...)

Ciao,
Johannes

Re: [PATCH v3 0/5] Add --no-ahead-behind to status

2018-01-08 Thread Jeff Hostetler




On 1/8/2018 1:37 AM, Jeff King wrote:

On Fri, Jan 05, 2018 at 11:46:24AM -0500, Jeff Hostetler wrote:


I'm mildly negative on this "level 2" config. If influencing the
porcelain via config creates compatibility headaches, then why would we
allow it here? And if it doesn't, then why do we need to protect against
it? This seems to exist in a funny middle ground that cannot decide
whether it is bad or not.

It's like we're inserting a foot-gun, but putting it just far enough out
of reach that we can blame the user when they shoot themselves with it.

Is there a compelling use case for this? From the previous discussion,
this is the strawman I came up with:

[...]

I kinda trying to serve 2 masters here.  As we discussed earlier, we
don't want config options to change porcelain formats, hence the
true/false thing only affecting non-porcelain formats.  On the other
hand, VS and git.exe are on different release schedules.  Normally,
I'd just have VS emit a "git status --no-ahead-behind --porcelain=v2"
and be done, but that requires that git.exe gets updated before VS.
We do control some of that, but if VS gets updated first, that causes
an error, whereas "git -c status.aheadbehind= status --porcelain=v2"
does not.  It is respected if/when git is updated and ignored until
then.  Likewise, if they update git first, we can tell them to set a
config setting on the repo and inherit it for porcelain v2 output
without VS knowing about it.  Sorry, if that's too much detail.


OK, so my strawman was totally off-base. :)

That explanation is interesting. I do think you're facing a real problem
with moving the versions in lockstep. But shoe-horning the feature into
config like this seems like a pretty ugly solution:

   1. Then we're stuck with this weird foot-gun config option forever.

   2. It only solves the problem for this one option. Is there a more
  general solution?

The more general solutions I can come up with are:

   1. Is there a way for a caller to query Git to say "do you understand
  --no-ahead-behind?".

  You can ask "git version", but parsing version numbers is
  problematic. We don't have any kind of "feature flags" output, and
  I'm not sure we'd want to get down to the level of specific
  command-line options there.

  One thing you can do is speculatively run "status --no-ahead-behind",
  and if it returns 129, try again without as a fallback. That incurs
  a failed invocation for the fallback case, but it's quick (we fail
  before looking at any data) and you can cache it for the duration
  of a VS session.

   2. There could be some way to tell the option parser that the next
  flag is optional. E.g.:

git status --optional=--no-ahead-behind

  That would be pretty easy to implement globally in parse-options.c.
  It knows when an option is unrecognized, so it could just treat
  that as a silent noop rather than barfing.

  Of course, that doesn't solve your problem today. It wouldn't be
  safe to start using "--optional" until it had been in several
  released versions.

  And I have a feeling it may not be sufficient without further
  feedback to the caller. For this flag, the caller is happy to say
  "do this if you know how, but otherwise I will cope". But there are
  probably flag where it would need to know whether it had any effect
  or not. So this whole direction is probably crazy.

Of the two, I think (1) is not so bad.


It is OK with me if we omit the last commit in the patch series (that
does the experimental =2 extension) and I'll deal with this separately
(maybe differently) in the gvfs fork.


That would be my preference. Thanks.

-Peff



Interesting ideas, but probably overkill for now.  I'll pull it out
of my next version and deal with it differently our gvfs fork.

Thanks,
Jeff

Re: [PATCH v3 0/7] convert: add support for different encodings

2018-01-08 Thread Lars Schneider


> On 07 Jan 2018, at 10:38, Torsten Bögershausen  wrote:
> 
> On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schnei...@autodesk.com wrote:
>> From: Lars Schneider 
>> 
>> Hi,
>> 
>> Patches 1-5 and 6 are helper functions and preparation.
>> Patch 6 is the actual change.
>> 
>> I am still torn between "checkout-encoding" and "working-tree-encoding"
>> as attribute name. I am happy to hear arguments for/against one or the
>> other.
> 
> checkout-encoding is probably misleading, as it is even the checkin-encoding.

Yeah, I start to think the same.


> What is wrong with working-tree-encoding ?
> I think the 2 "-".
> 
> What was wrong with workingtree-encoding ?

Yeah, the two dashes are a minor annoyance.

However, consider this:

$ git grep 'working tree' -- '*.txt' | wc -l
 570

$ git grep 'working-tree' -- '*.txt' | wc -l
   6

$ git grep 'workingtree' -- '*.txt' | wc -l
   0


$ git grep 'working tree' -- po | wc -l
 704

$ git grep 'working-tree' -- po | wc -l
   0

$ git grep 'workingtree' -- po | wc -l
   0

I think "working tree" is a pretty established term that
endusers might be able to understand. Therefore, I would
like to go with "working-tree-encoding" as it was written
that way at least 6 times in the Git tree before.

Would that work for you?


> Or
> workdir-encoding ?

Although I like the shortness, the term "workdir" might already 
be occupied [1]. Could that cause confusion?

[1] 4f01748d51 (contrib/workdir: add a simple script to create a working 
directory, 2007-03-27)


>> 
>> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten)
>> 
>> * Set "git config core.eol lf" to made the test run on Windows (Dscho)
>> 
>> * Made BOM arrays static (Ramsay)
> 
> 
> Some comments:
> 
> I would like to have the CRLF conversion a little bit more strict -
> many users tend to set core.autocrlf=true or write "* text=auto"
> in the .gitattributes.
> Reading all the effort about BOM markers and UTF-16LE, I think there
> should ne some effort to make the line endings round trip.
> Therefore I changed convert.c to demand that the "text" attribute
> is set to enable CRLF conversions.
> (If I had submitted the patch, I would have demanded
> "text eol=lf" or "text eol=crlf", but the test case t0028 indicates
> that there is a demand to produce line endings as configured in core.eol)

But wouldn't that be inconvenient for the users? E.g. if I add a UTF-16
file on Windows with CRLF then it would be nice if Git would automatically
convert the line endings to LF on Linux, no?

IOW: Why should we handle text files that have a defined checkout-encoding
differently compared to UTF-8 encoded text files? Wouldn't that be unexpected
to the user?

Thanks,
Lars



> 
> Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to
> https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B
> 
> Here is a inter-diff against your version:
> 
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 1bc03e69c..b8d9f91c8 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text 
> processing
>  tools (e.g. 'git diff') as well as most Git web front ends do not
>  visualize the content.
> 
> -In these cases you can teach Git the encoding of a file in the working
> +In these cases you can tell Git the encoding of a file in the working

Oops. I meant to change that already. Thanks!

>  directory with the `checkout-encoding` attribute. If a file with this
>  attributes is added to Git, then Git reencodes the content from the
>  specified encoding to UTF-8 and stores the result in its internal data
> @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you 
> cannot store a file in
>  UTF-8 encoding and if you want Git to be able to process the content as
>  text.
> 
> +Note that when `checkout-encoding` is defined, by default the line
> +endings are not converted. `text=auto` and core.autocrlf are ignored.
> +Set the `text` attribute to enable CRLF conversions.
> +
>  Use the following attributes if your '*.txt' files are UTF-16 encoded
> -with byte order mark (BOM) and you want Git to perform automatic line
> -ending conversion based on your platform.
> +with byte order mark (BOM).
> 
>  
> -*.txttext checkout-encoding=UTF-16
> +*.txtcheckout-encoding=UTF-16
>  
> 
>  Use the following attributes if your '*.txt' files are UTF-16 little
> -endian encoded without BOM and you want Git to use Windows line endings
> -in the working directory.
> +endian encoded without BOM and you want Git to use LF in the repo and
> +CRLF in the working directory.
> 
>  
>  *.txtcheckout-encoding=UTF-16LE text eol=CRLF
> diff --git a/convert.c b/convert.c
> index 13f766d2a..1e29f515e 100644
> --- a/convert.c
>

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Matwey V. Kornilov

2018-01-08 16:56 GMT+03:00 Johannes Schindelin :
> Hi Matwey,
>
> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
>
>> I think that rebase preserve-merges algorithm needs further
>> improvements. Probably, you already know it.
>
> Yes. preserve-merges is a fundamentally flawed design.
>
> Please have a look here:
>
> https://github.com/git/git/pull/447
>
> Since we are in a feature freeze in preparation for v2.16.0, I will
> submit these patch series shortly after v2.16.0 is released.
>
>> As far as I understand the root cause of this that when new merge
>> commit is created by rebase it is done simply by git merge
>> $new_parents without taking into account any actual state of the
>> initial merge commit.
>
> Indeed. preserve-merges does not allow commits to be reordered. (Actually,
> it *does* allow it, but then fails to handle it correctly.) We even have
> test cases that mark this as "known breakage".
>
> But really, I do not think it is worth trying to fix the broken design.
> Better to go with the new recreate-merges. (I am biased, of course,
> because I invented recreate-merges. But then, I also invented
> preserve-merges, so ...)

Well. I just checked --recreate-merges=no-rebase-cousins from the PR
and found that it produces the same wrong result in my test example.
The topology is reproduced correctly, but merge-commit content is
broken.
I did git rebase --recreate-merges=no-rebase-cousins --onto abc-0.1 v0.1 abc-0.2

>
> Ciao,
> Johannes
>



-- 
With best regards,
Matwey V. Kornilov

Re: [BUG] v2.16.0-rc0 seg faults when git bisect skip

2018-01-08 Thread Yasushi SHOJI

Hi all,

Thank you guys for insightful help.  I just read the code and now I understand
what you guys are saying.  Yeah, I can say the fix is "spot on".

But, to be honest, it's hard to see why you need "if (p)" at  the first glance.
So, how about this fix, instead?

I also found a bug in show_list().  I'm attaching a patch as well.
-- 
   yashi
From d149a1348e94ea0246a10181751ce3bf9ba48198 Mon Sep 17 00:00:00 2001
From: Yasushi SHOJI 
Date: Mon, 8 Jan 2018 22:31:10 +0900
Subject: [PATCH 1/2] bisect: debug: convert struct object to object_id

The commit f2fd0760f62e79609fef7bfd7ecebb002e8e4ced converted struct
object to object_id but a debug function show_list(), which is
ifdef'ed to noop, in bisect.c wasn't.

So fix it.
---
 bisect.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bisect.c b/bisect.c
index 0fca17c02..fb3e44903 100644
--- a/bisect.c
+++ b/bisect.c
@@ -132,7 +132,7 @@ static void show_list(const char *debug, int counted, int nr,
 		unsigned flags = commit->object.flags;
 		enum object_type type;
 		unsigned long size;
-		char *buf = read_sha1_file(commit->object.sha1, &type, &size);
+		char *buf = read_sha1_file(commit->object.oid.hash, &type, &size);
 		const char *subject_start;
 		int subject_len;
 
@@ -144,10 +144,10 @@ static void show_list(const char *debug, int counted, int nr,
 			fprintf(stderr, "%3d", weight(p));
 		else
 			fprintf(stderr, "---");
-		fprintf(stderr, " %.*s", 8, sha1_to_hex(commit->object.sha1));
+		fprintf(stderr, " %.*s", 8, sha1_to_hex(commit->object.oid.hash));
 		for (pp = commit->parents; pp; pp = pp->next)
 			fprintf(stderr, " %.*s", 8,
-sha1_to_hex(pp->item->object.sha1));
+sha1_to_hex(pp->item->object.oid.hash));
 
 		subject_len = find_commit_subject(buf, &subject_start);
 		if (subject_len)
-- 
2.15.1

From 2ec8823de3bd0ca8253ddbd07f58b66afcfabe96 Mon Sep 17 00:00:00 2001
From: Yasushi SHOJI 
Date: Mon, 8 Jan 2018 22:35:12 +0900
Subject: [PATCH 2/2] bisect: handle empty list in best_bisection_sorted()

In 7c117184d7 ("bisect: fix off-by-one error in
`best_bisection_sorted()`", 2017-11-05) the more careful logic dealing
with freeing p->next in 50e62a8e70 ("rev-list: implement
--bisect-all", 2007-10-22) was removed.

This function, and also all other functions below find_bisection() to
be complete, will be called with an empty commit list if all commits
are _uninteresting_.  So the functions must be prepared for it.

This commit, instead of restoring the check, moves the clean-up code
into the loop.  To do that this commit changes the non-last-case
branch in the loop to a last-case branch, and clean-up unused trailing
elements there.

We could, on the other hand, do a big "if list" condition at the very
beginning.  But, that doesn't sound right for the current code base.
No other functions does that but all blocks in the functions are
tolerant to an empty list.

By the way, as you can tell from the non-last-case branch we had in
the loop, this fix shouldn't cause a pipeline stall on every iteration
on modern processors with a branch predictor.

Signed-off-by: Yasushi ShOJI 
---
 bisect.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/bisect.c b/bisect.c
index fb3e44903..cec4a537f 100644
--- a/bisect.c
+++ b/bisect.c
@@ -218,7 +218,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 		cnt++;
 	}
 	QSORT(array, cnt, compare_commit_dist);
-	for (p = list, i = 0; i < cnt; i++) {
+	for (p = list, i = 0; i < cnt; i++, p = p->next) {
 		struct object *obj = &(array[i].commit->object);
 
 		strbuf_reset(&buf);
@@ -226,11 +226,13 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n
 		add_name_decoration(DECORATION_NONE, buf.buf, obj);
 
 		p->item = array[i].commit;
-		if (i < cnt - 1)
-			p = p->next;
+
+		if (i == cnt) {
+			/* clean-up unused elements if any */
+			free_commit_list(p->next);
+			p->next = NULL;
+		}
 	}
-	free_commit_list(p->next);
-	p->next = NULL;
 	strbuf_release(&buf);
 	free(array);
 	return list;
-- 
2.15.1

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Matwey V. Kornilov

2018-01-08 17:42 GMT+03:00 Matwey V. Kornilov :
> 2018-01-08 16:56 GMT+03:00 Johannes Schindelin :
>> Hi Matwey,
>>
>> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
>>
>>> I think that rebase preserve-merges algorithm needs further
>>> improvements. Probably, you already know it.
>>
>> Yes. preserve-merges is a fundamentally flawed design.
>>
>> Please have a look here:
>>
>> https://github.com/git/git/pull/447
>>
>> Since we are in a feature freeze in preparation for v2.16.0, I will
>> submit these patch series shortly after v2.16.0 is released.
>>
>>> As far as I understand the root cause of this that when new merge
>>> commit is created by rebase it is done simply by git merge
>>> $new_parents without taking into account any actual state of the
>>> initial merge commit.
>>
>> Indeed. preserve-merges does not allow commits to be reordered. (Actually,
>> it *does* allow it, but then fails to handle it correctly.) We even have
>> test cases that mark this as "known breakage".
>>
>> But really, I do not think it is worth trying to fix the broken design.
>> Better to go with the new recreate-merges. (I am biased, of course,
>> because I invented recreate-merges. But then, I also invented
>> preserve-merges, so ...)
>
> Well. I just checked --recreate-merges=no-rebase-cousins from the PR
> and found that it produces the same wrong result in my test example.
> The topology is reproduced correctly, but merge-commit content is
> broken.
> I did git rebase --recreate-merges=no-rebase-cousins --onto abc-0.1 v0.1 
> abc-0.2

Indeed, exactly as you still say in the documentation: "Merge conflict
resolutions or manual amendments to merge commits are not preserved."
My initial point is that they have to be preserved. Probably in
recreate-merges, if preserve-merges is discontinued.

>
>>
>> Ciao,
>> Johannes
>>
>
>
>
> --
> With best regards,
> Matwey V. Kornilov



-- 
With best regards,
Matwey V. Kornilov

Re: [BUG] v2.16.0-rc0 seg faults when git bisect skip

2018-01-08 Thread Christian Couder

Hi,

On Mon, Jan 8, 2018 at 3:45 PM, Yasushi SHOJI  wrote:
> Hi all,
>
> Thank you guys for insightful help.  I just read the code and now I understand
> what you guys are saying.  Yeah, I can say the fix is "spot on".
>
> But, to be honest, it's hard to see why you need "if (p)" at  the first 
> glance.
> So, how about this fix, instead?

+for (p = list, i = 0; i < cnt; i++, p = p->next) {

Here "i" can reach "cnt - 1" at most, so ...

+if (i == cnt) {
+/* clean-up unused elements if any */
+free_commit_list(p->next);
+p->next = NULL;
+}

... "i == cnt" is always false above. I think it should be "i == cnt - 1".

And with your code one can wonder why the cleanup is part of the loop.

> I also found a bug in show_list().  I'm attaching a patch as well.

Could you send it inline as explained in Documentation/SubmittingPatches?

Thanks,
Christian.

[PATCH v4 0/4] Add --no-ahead-behind to status

2018-01-08 Thread Jeff Hostetler

From: Jeff Hostetler 

This is version 4 of my patch series to avoid expensive ahead/behind
calculations in status.  This version removes the last commit containing
the experimental config setting.  And removes the undefined return values
for the nr_ahead/nr_behind arguments as discussed on the mailing list.

This version does not address "git branch -vv", but that requires
passing data across the for-each-ref barrier and that seemed beyond
the scope of this task.

Jeff Hostetler (4):
  stat_tracking_info: return +1 when branches not equal
  status: add --[no-]ahead-behind to status and commit for V2 format.
  status: update short status to respect --no-ahead-behind
  status: support --no-ahead-behind in long format

 Documentation/config.txt |  6 
 Documentation/git-status.txt |  5 +++
 builtin/checkout.c   |  2 +-
 builtin/commit.c | 18 ++-
 ref-filter.c |  8 ++---
 remote.c | 50 --
 remote.h | 12 ++--
 t/t6040-tracking-info.sh | 73 
 t/t7064-wtstatus-pv2.sh  | 69 +
 wt-status.c  | 41 +
 wt-status.h  |  2 ++
 11 files changed, 250 insertions(+), 36 deletions(-)

-- 
2.9.3

[PATCH v4 1/4] stat_tracking_info: return +1 when branches not equal

2018-01-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Extend stat_tracking_info() to return +1 when branches are not equal and to
take a new "enum ahead_behind_flags" argument to allow skipping the (possibly
expensive) ahead/behind computation.

This will be used in the next commit to allow "git status" to avoid full
ahead/behind calculations for performance reasons.

Signed-off-by: Jeff Hostetler 
---
 ref-filter.c |  8 
 remote.c | 34 +-
 remote.h |  8 +++-
 wt-status.c  |  6 --
 4 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index e728b15..23bcdc4 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1238,8 +1238,8 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
if (atom->u.remote_ref.option == RR_REF)
*s = show_ref(&atom->u.remote_ref.refname, refname);
else if (atom->u.remote_ref.option == RR_TRACK) {
-   if (stat_tracking_info(branch, &num_ours,
-  &num_theirs, NULL)) {
+   if (stat_tracking_info(branch, &num_ours, &num_theirs,
+  NULL, AHEAD_BEHIND_FULL) < 0) {
*s = xstrdup(msgs.gone);
} else if (!num_ours && !num_theirs)
*s = "";
@@ -1256,8 +1256,8 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
free((void *)to_free);
}
} else if (atom->u.remote_ref.option == RR_TRACKSHORT) {
-   if (stat_tracking_info(branch, &num_ours,
-  &num_theirs, NULL))
+   if (stat_tracking_info(branch, &num_ours, &num_theirs,
+  NULL, AHEAD_BEHIND_FULL) < 0)
return;
 
if (!num_ours && !num_theirs)
diff --git a/remote.c b/remote.c
index b220f0d..23177f5 100644
--- a/remote.c
+++ b/remote.c
@@ -1977,16 +1977,23 @@ int ref_newer(const struct object_id *new_oid, const 
struct object_id *old_oid)
 }
 
 /*
- * Compare a branch with its upstream, and save their differences (number
- * of commits) in *num_ours and *num_theirs. The name of the upstream branch
- * (or NULL if no upstream is defined) is returned via *upstream_name, if it
- * is not itself NULL.
+ * Lookup the upstream branch for the given branch and if present, optionally
+ * compute the commit ahead/behind values for the pair.
+ *
+ * If abf is AHEAD_BEHIND_FULL, compute the full ahead/behind and return the
+ * counts in *num_ours and *num_theirs.  If abf is AHEAD_BEHIND_QUICK, skip
+ * the (potentially expensive) a/b computation (*num_ours and *num_theirs are
+ * set to zero).
+ *
+ * The name of the upstream branch (or NULL if no upstream is defined) is
+ * returned via *upstream_name, if it is not itself NULL.
  *
  * Returns -1 if num_ours and num_theirs could not be filled in (e.g., no
- * upstream defined, or ref does not exist), 0 otherwise.
+ * upstream defined, or ref does not exist).  Returns 0 if the commits are
+ * identical.  Returns 1 if commits are different.
  */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
-  const char **upstream_name)
+  const char **upstream_name, enum ahead_behind_flags abf)
 {
struct object_id oid;
struct commit *ours, *theirs;
@@ -2014,11 +2021,13 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
if (!ours)
return -1;
 
+   *num_theirs = *num_ours = 0;
+
/* are we the same? */
-   if (theirs == ours) {
-   *num_theirs = *num_ours = 0;
+   if (theirs == ours)
return 0;
-   }
+   if (abf == AHEAD_BEHIND_QUICK)
+   return 1;
 
/* Run "rev-list --left-right ours...theirs" internally... */
argv_array_push(&argv, ""); /* ignored */
@@ -2034,8 +2043,6 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
die("revision walk setup failed");
 
/* ... and count the commits on each side. */
-   *num_ours = 0;
-   *num_theirs = 0;
while (1) {
struct commit *c = get_revision(&revs);
if (!c)
@@ -2051,7 +2058,7 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
clear_commit_marks(theirs, ALL_REV_FLAGS);
 
argv_array_clear(&argv);
-   return 0;
+   return 1;
 }
 
 /*
@@ -2064,7 +2071,8 @@ int format_tracking_info(struct branch *branch, struct 
strbuf *sb)
char *base;
int upstream_is_gone = 0;
 
-   if (stat_tracking_info(branch, &ours, &theirs, &full_base) < 0) {
+   if (stat_tracking_info(branch, &ours, &theirs, &full_base,
+  AHEAD_BEHIND_FULL) < 0) {
if (!full_base)

[PATCH v4 3/4] status: update short status to respect --no-ahead-behind

2018-01-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach "git status --short --branch" to respect "--no-ahead-behind"
parameter to skip computing ahead/behind counts for the branch and
its upstream and just report '[different]'.

Short status also respect the "status.aheadBehind" config setting.

Signed-off-by: Jeff Hostetler 
---
 t/t6040-tracking-info.sh | 26 ++
 wt-status.c  | 11 +++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index 8f17fd9..053dff3 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -147,6 +147,32 @@ test_expect_success 'status -s -b (diverged from 
upstream)' '
 '
 
 cat >expect <<\EOF
+## b1...origin/master [different]
+EOF
+
+test_expect_success 'status -s -b --no-ahead-behind (diverged from upstream)' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status -s -b --no-ahead-behind | head -1
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
+## b1...origin/master [different]
+EOF
+
+test_expect_success 'status.aheadbehind=false status -s -b (diverged from 
upstream)' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git -c status.aheadbehind=false status -s -b | head -1
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
 ## b5...brokenbase [gone]
 EOF
 
diff --git a/wt-status.c b/wt-status.c
index 3fcab57..a4d3470 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1766,7 +1766,7 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
const char *base;
char *short_base;
const char *branch_name;
-   int num_ours, num_theirs;
+   int num_ours, num_theirs, sti;
int upstream_is_gone = 0;
 
color_fprintf(s->fp, color(WT_STATUS_HEADER, s), "## ");
@@ -1792,8 +1792,9 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
 
color_fprintf(s->fp, branch_color_local, "%s", branch_name);
 
-   if (stat_tracking_info(branch, &num_ours, &num_theirs, &base,
-  AHEAD_BEHIND_FULL) < 0) {
+   sti = stat_tracking_info(branch, &num_ours, &num_theirs, &base,
+s->ahead_behind_flags);
+   if (sti < 0) {
if (!base)
goto conclude;
 
@@ -1805,12 +1806,14 @@ static void wt_shortstatus_print_tracking(struct 
wt_status *s)
color_fprintf(s->fp, branch_color_remote, "%s", short_base);
free(short_base);
 
-   if (!upstream_is_gone && !num_ours && !num_theirs)
+   if (!upstream_is_gone && !sti)
goto conclude;
 
color_fprintf(s->fp, header_color, " [");
if (upstream_is_gone) {
color_fprintf(s->fp, header_color, LABEL(N_("gone")));
+   } else if (s->ahead_behind_flags == AHEAD_BEHIND_QUICK) {
+   color_fprintf(s->fp, header_color, LABEL(N_("different")));
} else if (!num_ours) {
color_fprintf(s->fp, header_color, LABEL(N_("behind ")));
color_fprintf(s->fp, branch_color_remote, "%d", num_theirs);
-- 
2.9.3

[PATCH v4 2/4] status: add --[no-]ahead-behind to status and commit for V2 format.

2018-01-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach "git status" and "git commit" to accept "--no-ahead-behind"
and "--ahead-behind" arguments to request quick or full ahead/behind
reporting.

When "--no-ahead-behind" is given, the existing porcelain V2 line
"branch.ab x y" is replaced with a new "branch equal eq|neq" line.
This indicates that the branch and its upstream are or are not equal
without the expense of computing the full ahead/behind values.

Added "status.aheadBehind" config setting.  This is only used by
non-porcelain format for backward-compatibility.

Signed-off-by: Jeff Hostetler 
---
 Documentation/config.txt |  6 
 Documentation/git-status.txt |  5 
 builtin/commit.c | 18 +++-
 remote.c |  2 ++
 remote.h |  5 ++--
 t/t7064-wtstatus-pv2.sh  | 69 
 wt-status.c  | 30 +--
 wt-status.h  |  2 ++
 8 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9593bfa..affb0d6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3035,6 +3035,12 @@ status.branch::
Set to true to enable --branch by default in linkgit:git-status[1].
The option --no-branch takes precedence over this variable.
 
+status.aheadBehind::
+   Set to true to enable --ahead-behind and to false to enable
+   --no-ahead-behind by default in linkgit:git-status[1] for
+   non-porcelain formats.  This setting is ignored by porcelain
+   formats for backwards compatibility.
+
 status.displayCommentPrefix::
If set to true, linkgit:git-status[1] will insert a comment
prefix before each output line (starting with
diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 9f3a78a..603bf40 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -111,6 +111,11 @@ configuration variable documented in linkgit:git-config[1].
without options are equivalent to 'always' and 'never'
respectively.
 
+--ahead-behind::
+--no-ahead-behind::
+   Display or do not display detailed ahead/behind counts for the
+   branch relative to its upstream branch.  Defaults to true.
+
 ...::
See the 'pathspec' entry in linkgit:gitglossary[7].
 
diff --git a/builtin/commit.c b/builtin/commit.c
index be370f6..416fe2c 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1109,9 +1109,11 @@ static const char *read_commit_message(const char *name)
 static struct status_deferred_config {
enum wt_status_format status_format;
int show_branch;
+   enum ahead_behind_flags ahead_behind;
 } status_deferred_config = {
STATUS_FORMAT_UNSPECIFIED,
-   -1 /* unspecified */
+   -1, /* unspecified */
+   AHEAD_BEHIND_UNSPECIFIED,
 };
 
 static void finalize_deferred_config(struct wt_status *s)
@@ -1137,6 +1139,12 @@ static void finalize_deferred_config(struct wt_status *s)
s->show_branch = status_deferred_config.show_branch;
if (s->show_branch < 0)
s->show_branch = 0;
+
+   if (use_deferred_config &&
+   s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
+   s->ahead_behind_flags = status_deferred_config.ahead_behind;
+   if (s->ahead_behind_flags == AHEAD_BEHIND_UNSPECIFIED)
+   s->ahead_behind_flags = AHEAD_BEHIND_FULL;
 }
 
 static int parse_and_validate_options(int argc, const char *argv[],
@@ -1297,6 +1305,10 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
status_deferred_config.show_branch = git_config_bool(k, v);
return 0;
}
+   if (!strcmp(k, "status.aheadbehind")) {
+   status_deferred_config.ahead_behind = git_config_bool(k, v);
+   return 0;
+   }
if (!strcmp(k, "status.showstash")) {
s->show_stash = git_config_bool(k, v);
return 0;
@@ -1351,6 +1363,8 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
 N_("show branch information")),
OPT_BOOL(0, "show-stash", &s.show_stash,
 N_("show stash information")),
+   OPT_BOOL(0, "ahead-behind", &s.ahead_behind_flags,
+N_("compute full ahead/behind values")),
{ OPTION_CALLBACK, 0, "porcelain", &status_format,
  N_("version"), N_("machine-readable output"),
  PARSE_OPT_OPTARG, opt_parse_porcelain },
@@ -1628,6 +1642,8 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
OPT_SET_INT(0, "short", &status_format, N_("show status 
concisely"),
STATUS_FORMAT_SHORT),
OPT_BOOL(0, "branch", &s.show_branch, N_("show branch 
information")),
+   OPT_BOOL(0, "ahead-behind", &s.ahead_behind_flags,

[PATCH v4 4/4] status: support --no-ahead-behind in long format

2018-01-08 Thread Jeff Hostetler

From: Jeff Hostetler 

Teach long (normal) status format to respect the --no-ahead-behind
parameter and skip the possibly expensive ahead/behind computation
between the branch and the upstream.

Long status also respects "status.aheadBehind" config setting.

Signed-off-by: Jeff Hostetler 
---
 builtin/checkout.c   |  2 +-
 remote.c | 18 +-
 remote.h |  3 ++-
 t/t6040-tracking-info.sh | 47 +++
 wt-status.c  |  2 +-
 5 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index fc4f8fd..655dac2 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -605,7 +605,7 @@ static void report_tracking(struct branch_info *new)
struct strbuf sb = STRBUF_INIT;
struct branch *branch = branch_get(new->name);
 
-   if (!format_tracking_info(branch, &sb))
+   if (!format_tracking_info(branch, &sb, AHEAD_BEHIND_FULL))
return;
fputs(sb.buf, stdout);
strbuf_release(&sb);
diff --git a/remote.c b/remote.c
index 2486afb..e668091 100644
--- a/remote.c
+++ b/remote.c
@@ -2066,15 +2066,16 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
 /*
  * Return true when there is anything to report, otherwise false.
  */
-int format_tracking_info(struct branch *branch, struct strbuf *sb)
+int format_tracking_info(struct branch *branch, struct strbuf *sb,
+enum ahead_behind_flags abf)
 {
-   int ours, theirs;
+   int ours, theirs, sti;
const char *full_base;
char *base;
int upstream_is_gone = 0;
 
-   if (stat_tracking_info(branch, &ours, &theirs, &full_base,
-  AHEAD_BEHIND_FULL) < 0) {
+   sti = stat_tracking_info(branch, &ours, &theirs, &full_base, abf);
+   if (sti < 0) {
if (!full_base)
return 0;
upstream_is_gone = 1;
@@ -2088,10 +2089,17 @@ int format_tracking_info(struct branch *branch, struct 
strbuf *sb)
if (advice_status_hints)
strbuf_addstr(sb,
_("  (use \"git branch --unset-upstream\" to 
fixup)\n"));
-   } else if (!ours && !theirs) {
+   } else if (!sti) {
strbuf_addf(sb,
_("Your branch is up to date with '%s'.\n"),
base);
+   } else if (abf == AHEAD_BEHIND_QUICK) {
+   strbuf_addf(sb,
+   _("Your branch and '%s' refer to different 
commits.\n"),
+   base);
+   if (advice_status_hints)
+   strbuf_addf(sb, _("  (use \"%s\" for details)\n"),
+   "git status --ahead-behind");
} else if (!theirs) {
strbuf_addf(sb,
Q_("Your branch is ahead of '%s' by %d commit.\n",
diff --git a/remote.h b/remote.h
index 27feb63..b2fa5cc 100644
--- a/remote.h
+++ b/remote.h
@@ -265,7 +265,8 @@ enum ahead_behind_flags {
 /* Reporting of tracking info */
 int stat_tracking_info(struct branch *branch, int *num_ours, int *num_theirs,
   const char **upstream_name, enum ahead_behind_flags abf);
-int format_tracking_info(struct branch *branch, struct strbuf *sb);
+int format_tracking_info(struct branch *branch, struct strbuf *sb,
+enum ahead_behind_flags abf);
 
 struct ref *get_local_heads(void);
 /*
diff --git a/t/t6040-tracking-info.sh b/t/t6040-tracking-info.sh
index 053dff3..febf63f 100755
--- a/t/t6040-tracking-info.sh
+++ b/t/t6040-tracking-info.sh
@@ -173,6 +173,53 @@ test_expect_success 'status.aheadbehind=false status -s -b 
(diverged from upstre
 '
 
 cat >expect <<\EOF
+On branch b1
+Your branch and 'origin/master' have diverged,
+and have 1 and 1 different commits each, respectively.
+EOF
+
+test_expect_success 'status --long --branch' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status --long -b | head -3
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+test_expect_success 'status --long --branch' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git -c status.aheadbehind=true status --long -b | head -3
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+cat >expect <<\EOF
+On branch b1
+Your branch and 'origin/master' refer to different commits.
+EOF
+
+test_expect_success 'status --long --branch --no-ahead-behind' '
+   (
+   cd test &&
+   git checkout b1 >/dev/null &&
+   git status --long -b --no-ahead-behind | head -2
+   ) >actual &&
+   test_i18ncmp expect actual
+'
+
+test_expect_success 'status.aheadbehind=false status --long --branch' '
+   (
+   cd test &&
+   git checkout b1

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Johannes Schindelin

Hi,

On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:

> 2018-01-08 17:42 GMT+03:00 Matwey V. Kornilov :
> > 2018-01-08 16:56 GMT+03:00 Johannes Schindelin :
> >> Hi Matwey,
> >>
> >> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
> >>
> >>> I think that rebase preserve-merges algorithm needs further
> >>> improvements. Probably, you already know it.
> >>
> >> Yes. preserve-merges is a fundamentally flawed design.
> >>
> >> Please have a look here:
> >>
> >> https://github.com/git/git/pull/447
> >>
> >> Since we are in a feature freeze in preparation for v2.16.0, I will
> >> submit these patch series shortly after v2.16.0 is released.
> >>
> >>> As far as I understand the root cause of this that when new merge
> >>> commit is created by rebase it is done simply by git merge
> >>> $new_parents without taking into account any actual state of the
> >>> initial merge commit.
> >>
> >> Indeed. preserve-merges does not allow commits to be reordered. (Actually,
> >> it *does* allow it, but then fails to handle it correctly.) We even have
> >> test cases that mark this as "known breakage".
> >>
> >> But really, I do not think it is worth trying to fix the broken design.
> >> Better to go with the new recreate-merges. (I am biased, of course,
> >> because I invented recreate-merges. But then, I also invented
> >> preserve-merges, so ...)
> >
> > Well. I just checked --recreate-merges=no-rebase-cousins from the PR
> > and found that it produces the same wrong result in my test example.
> > The topology is reproduced correctly, but merge-commit content is
> > broken.
> > I did git rebase --recreate-merges=no-rebase-cousins --onto abc-0.1 v0.1 
> > abc-0.2
> 
> Indeed, exactly as you still say in the documentation: "Merge conflict
> resolutions or manual amendments to merge commits are not preserved."
> My initial point is that they have to be preserved. Probably in
> recreate-merges, if preserve-merges is discontinued.

Ah, but that is consistent with how non-merge-preserving rebase works: the
`pick` commands *also* do not record merge conflict resolution...

Ciao,
Johannes

git-p4 + watchman - watching the p4 repo?

2018-01-08 Thread Luke Diamand

Hi!

I could be wrong about this, but I when I tried mixing watchman with
git-p4, I found that on "git p4 submit" it ended up watching the p4
repo, which seems a bit pointless (and was also very slow).

$ [create git-p4 clone of some p4 repo]
$ : >bar
$ git add bar && git commit -m 'adding bar'
$ git p4 submit --origin HEAD^ --shelve
Perforce checkout for depot path //depot/ located at /tmp/p4/cli/
Synchronizing p4 checkout...
... - file(s) up-to-date.
Applying 4ce4057 change
//depot/bar#1 - opened for edit
Adding '/tmp/p4/cli' to watchman's watch list.

Is there any way to stop it doing this?

Thanks!
Luke

[PATCH] travis-ci: build Git during the 'script' phase

2018-01-08 Thread SZEDER Gábor

Ever since we started building and testing Git on Travis CI (522354d70
(Add Travis CI support, 2015-11-27)), we build Git in the
'before_script' phase and run the test suite in the 'script' phase
(except in the later introduced 32 bit Linux and Windows build jobs,
where we build in the 'script' phase').

Contrarily, the Travis CI practice is to build and test in the
'script' phase; indeed Travis CI's default build command for the
'script' phase of C/C++ projects is:

  ./configure && make && make test

The reason why Travis CI does it this way and why it's a better
approach than ours lies in how unsuccessful build jobs are
categorized.  After something went wrong in a build job, its state can
be:

  - 'failed', if a command in the 'script' phase returned an error.
This is indicated by a red 'X' on the Travis CI web interface.

  - 'errored', if a command in the 'before_install', 'install', or
'before_script' phase returned an error, or the build job exceeded
the time limit.  This is shown as a red '!' on the web interface.

This makes it easier, both for humans looking at the Travis CI web
interface and for automated tools querying the Travis CI API, to
decide when an unsuccessful build is our responsibility requiring
human attention, i.e. when a build job 'failed' because of a compiler
error or a test failure, and when it's caused by something beyond our
control and might be fixed by restarting the build job, e.g. when a
build job 'errored' because a dependency couldn't be installed due to
a temporary network error or because the OSX build job exceeded its
time limit.

The drawback of building Git in the 'before_script' phase is that one
has to check the trace log of all 'errored' build jobs, too, to see
what caused the error, as it might have been caused by a compiler
error.  This requires additional clicks and page loads on the web
interface and additional complexity and API requests in automated
tools.

Therefore, move building Git from the 'before_script' phase to the
'script' phase, updating the script's name accordingly as well.
'ci/run-builds.sh' now becomes basically empty, remove it.  Several of
our build job configurations override our default 'before_script' to
do nothing; with this change our default 'before_script' won't do
anything, either, so remove those overriding directives as well.

Signed-off-by: SZEDER Gábor 
---

A verbose commit message for such a change... but I don't know why we
started with building Git in the 'before_script' phase.  522354d70
doesn't tell, and I couldn't find anything relevant in the mailing list
archives.  Whatever the reasons might have been, I think the above
justifies the change.

Should go on top of 'sg/travis-check-untracked' in 'next'.

 .travis.yml | 7 +--
 ci/{run-tests.sh => run-build-and-tests.sh} | 4 +++-
 ci/run-build.sh | 8 
 3 files changed, 4 insertions(+), 15 deletions(-)
 rename ci/{run-tests.sh => run-build-and-tests.sh} (80%)
 delete mode 100755 ci/run-build.sh

diff --git a/.travis.yml b/.travis.yml
index 4684b3f4f..5f5ee4f3b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -33,7 +33,6 @@ matrix:
   compiler:
   addons:
   before_install:
-  before_script:
   script:
 - >
   test "$TRAVIS_REPO_SLUG" != "git/git" ||
@@ -46,7 +45,6 @@ matrix:
   services:
 - docker
   before_install:
-  before_script:
   script: ci/run-linux32-docker.sh
 - env: jobname=StaticAnalysis
   os: linux
@@ -56,7 +54,6 @@ matrix:
   packages:
   - coccinelle
   before_install:
-  before_script:
   script: ci/run-static-analysis.sh
   after_failure:
 - env: jobname=Documentation
@@ -68,13 +65,11 @@ matrix:
   - asciidoc
   - xmlto
   before_install:
-  before_script:
   script: ci/test-documentation.sh
   after_failure:
 
 before_install: ci/install-dependencies.sh
-before_script: ci/run-build.sh
-script: ci/run-tests.sh
+script: ci/run-build-and-tests.sh
 after_failure: ci/print-test-failures.sh
 
 notifications:
diff --git a/ci/run-tests.sh b/ci/run-build-and-tests.sh
similarity index 80%
rename from ci/run-tests.sh
rename to ci/run-build-and-tests.sh
index 22355f009..d3a094603 100755
--- a/ci/run-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -1,11 +1,13 @@
 #!/bin/sh
 #
-# Test Git
+# Build and test Git
 #
 
 . ${0%/*}/lib-travisci.sh
 
 ln -s $HOME/travis-cache/.prove t/.prove
+
+make --jobs=2
 make --quiet test
 
 check_unignored_build_artifacts
diff --git a/ci/run-build.sh b/ci/run-build.sh
deleted file mode 100755
index 4f940d103..0
--- a/ci/run-build.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-#!/bin/sh
-#
-# Build Git
-#
-
-. ${0%/*}/lib-travisci.sh
-
-make --jobs=2
-- 
2.16.0.rc1.67.g706959270

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Matwey V. Kornilov

2018-01-08 19:32 GMT+03:00 Johannes Schindelin :
> Hi,
>
> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
>
>> 2018-01-08 17:42 GMT+03:00 Matwey V. Kornilov :
>> > 2018-01-08 16:56 GMT+03:00 Johannes Schindelin 
>> > :
>> >> Hi Matwey,
>> >>
>> >> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
>> >>
>> >>> I think that rebase preserve-merges algorithm needs further
>> >>> improvements. Probably, you already know it.
>> >>
>> >> Yes. preserve-merges is a fundamentally flawed design.
>> >>
>> >> Please have a look here:
>> >>
>> >> https://github.com/git/git/pull/447
>> >>
>> >> Since we are in a feature freeze in preparation for v2.16.0, I will
>> >> submit these patch series shortly after v2.16.0 is released.
>> >>
>> >>> As far as I understand the root cause of this that when new merge
>> >>> commit is created by rebase it is done simply by git merge
>> >>> $new_parents without taking into account any actual state of the
>> >>> initial merge commit.
>> >>
>> >> Indeed. preserve-merges does not allow commits to be reordered. (Actually,
>> >> it *does* allow it, but then fails to handle it correctly.) We even have
>> >> test cases that mark this as "known breakage".
>> >>
>> >> But really, I do not think it is worth trying to fix the broken design.
>> >> Better to go with the new recreate-merges. (I am biased, of course,
>> >> because I invented recreate-merges. But then, I also invented
>> >> preserve-merges, so ...)
>> >
>> > Well. I just checked --recreate-merges=no-rebase-cousins from the PR
>> > and found that it produces the same wrong result in my test example.
>> > The topology is reproduced correctly, but merge-commit content is
>> > broken.
>> > I did git rebase --recreate-merges=no-rebase-cousins --onto abc-0.1 v0.1 
>> > abc-0.2
>>
>> Indeed, exactly as you still say in the documentation: "Merge conflict
>> resolutions or manual amendments to merge commits are not preserved."
>> My initial point is that they have to be preserved. Probably in
>> recreate-merges, if preserve-merges is discontinued.
>
> Ah, but that is consistent with how non-merge-preserving rebase works: the
> `pick` commands *also* do not record merge conflict resolution...
>

I am sorry, didn't get it. When I do non-merge-preserving rebase
--interactive there is no way to `pick' merge-commit at all.

> Ciao,
> Johannes



-- 
With best regards,
Matwey V. Kornilov

RE: git-p4 + watchman - watching the p4 repo?

2018-01-08 Thread Ben Peart

I haven't used perforce so am unfamiliar with any behaviors specific to that 
but the logic to have git automatically tell watchman to start watching repos 
is just a convenience feature.  Feel free to remove/disable/modify it in the 
fsmonitor-watchman integration script:

if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve 
root .* directory (.*) is not watched/) {
print STDERR "Adding '$git_work_tree' to watchman's watch 
list.\n";
$retry--;
qx/watchman watch "$git_work_tree"/;

Ben

> -Original Message-
> From: Luke Diamand [mailto:l...@diamand.org]
> Sent: Monday, January 8, 2018 12:15 PM
> To: Git Users 
> Cc: Alex Vandiver ; Ben Peart
> 
> Subject: git-p4 + watchman - watching the p4 repo?
> 
> Hi!
> 
> I could be wrong about this, but I when I tried mixing watchman with git-p4, I
> found that on "git p4 submit" it ended up watching the p4 repo, which seems
> a bit pointless (and was also very slow).
> 
> $ [create git-p4 clone of some p4 repo]
> $ : >bar
> $ git add bar && git commit -m 'adding bar'
> $ git p4 submit --origin HEAD^ --shelve
> Perforce checkout for depot path //depot/ located at /tmp/p4/cli/
> Synchronizing p4 checkout...
> ... - file(s) up-to-date.
> Applying 4ce4057 change
> //depot/bar#1 - opened for edit
> Adding '/tmp/p4/cli' to watchman's watch list.
> 
> Is there any way to stop it doing this?
> 
> Thanks!
> Luke

RE: Request for Assist on Limits for Tests

2018-01-08 Thread Randall S. Becker

On January 7, 2018 4:18 PM, brian m. Carlson wrote:
> On Sun, Jan 07, 2018 at 03:57:59PM -0500, Randall S. Becker wrote:
> > I'm looking for a proper (i.e. not sneaky) way to detect the platform
> > I am on during testing so that some tests can be modified/skipped
> > other than using the standard set of dependencies. In particular, the
> > maximum path on current NonStop platforms is 8-bit 2048 bytes. It
> > appears that there are some tests - at least from my preliminary
> > "guessing" - that are beyond that limit once all of the path segments
> > are put together. I would rather have something in git that specifies
> > a path size limit so nothing exceeds it, but that may be wishing.
> 
> The way we usually skip tests automatically is with a test prerequisite.
> You might look at t/test-lib.sh for the test_set_prereq and
test_lazy_prereq
> calls and synthesize one (maybe LONG_PATHS) that meets your needs.  You
> can then annotate those tests with the appropriate prerequisite.
> 
> I expect that for long paths, you will hit a lot of the same issues as
occur on
> Windows, where PATH_MAX may be very small.  It might be valuable to
> expose this information as a build option and then set an appropriate
> variable in t/test-lib.sh.

Where I am, at this point: I have PATH_MAX defined in Makefile as optional
and which can be specified as a number in config.mak.uname. If provided, it
adds -DPATH_MAX to BASIC_CFLAGS, which will ensure consistency with limits.h
(if the values are different, at least c99 warns about it). I've also got it
into GIT-BUILD-OPTIONS, if defined. From there it seems straight-forward to
use it in test scripts using standard shell scripting, however, I can't find
a good model/function for what would be a prerequisite check consistent with
existing git test methods - you know, clarity. One approach I have been
pursuing is to use test_set_prereq if PATH_MAX is defined, and add a new
method like test_missing_prereq_eval that would take PATH_MAX and an
expression, like -le 2048, to cause a test to be skipped if the variable is
defined but the evaluation fails. I'm still having noodling through trying
to make that work, and if anyone has a better idea (please have a better
idea!!), please please suggest it.

Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)/NonStop(2112884442)
-- In my real life, I talk too much.

Re: [PATCH v3 0/7] convert: add support for different encodings

2018-01-08 Thread Torsten Bögershausen

On Mon, Jan 08, 2018 at 03:38:48PM +0100, Lars Schneider wrote:
[]
> > Some comments:
> > 
> > I would like to have the CRLF conversion a little bit more strict -
> > many users tend to set core.autocrlf=true or write "* text=auto"
> > in the .gitattributes.
> > Reading all the effort about BOM markers and UTF-16LE, I think there
> > should ne some effort to make the line endings round trip.
> > Therefore I changed convert.c to demand that the "text" attribute
> > is set to enable CRLF conversions.
> > (If I had submitted the patch, I would have demanded
> > "text eol=lf" or "text eol=crlf", but the test case t0028 indicates
> > that there is a demand to produce line endings as configured in core.eol)
> 
> But wouldn't that be inconvenient for the users? E.g. if I add a UTF-16
> file on Windows with CRLF then it would be nice if Git would automatically
> convert the line endings to LF on Linux, no?
> 
> IOW: Why should we handle text files that have a defined checkout-encoding
> differently compared to UTF-8 encoded text files? Wouldn't that be unexpected
> to the user?
> 
> Thanks,
> Lars

The problem is, if user A has core.autocrlf=true and user B
core.autocrlf=false.
(The line endings don't show up as expected at user B)

Having said that in all shortness, you convinced me:
If text=auto, we care about line endings.
If text,  we care about line endings.

If the .gitattributes don't say anything about text,
we don't convert eol.
(In other words: we don't look at core.autocrlf, when checkout-encoding
is defined)

A new branch is push to github/tboegi

Re: [PATCH 1/8] Doc/gitsubmodules: split a sentence for better readability

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 4:29 PM, Eric Sunshine  wrote:
> On Sat, Jan 6, 2018 at 1:46 PM, Kaartic Sivaraam
>  wrote:
>> Signed-off-by: Kaartic Sivaraam 
>> ---
>> diff --git a/Documentation/gitsubmodules.txt 
>> b/Documentation/gitsubmodules.txt
>> @@ -36,8 +36,8 @@ The `gitlink` entry contains the object name of the commit 
>> that the
>>  The section `submodule.foo.*` in the `.gitmodules` file gives additional
>> -hints to Gits porcelain layer such as where to obtain the submodule via
>> -the `submodule.foo.url` setting.
>> +hints to Gits porcelain layer. For example, the `submodule.foo.url`
>> +setting specifies where to obtain the submodule.
>
> I don't find the original difficult to read (aside, perhaps, from the
> missing comma before "such as"), so I don't feel strongly about this
> change.

Seconded. I am neutral to this change, but as you were keen enough to
come up with the patch, I see no reason to reject it.
Anyway, let's read on!

Thanks,
Stefan

>
> However, since you're touching this, you could apply the s/Gits/Git's/ fix.

Re: [PATCH 2/8] Doc/gitsubmodules: clearly specify advantage of submodule

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/gitsubmodules.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> index bf46b0fb5..cb795c6b6 100644
> --- a/Documentation/gitsubmodules.txt
> +++ b/Documentation/gitsubmodules.txt
> @@ -57,7 +57,7 @@ Submodules can be used for at least two different use cases:
>  * Size of the git repository:
>In its current form Git scales up poorly for large repositories 
> containing
>content that is not compressed by delta computation between trees.
> -  However you can also use submodules to e.g. hold large binary assets
> +  Therefore you can use submodules to hold large binary assets

If this improves readability by a lot, I'd be all for it. But this use
case is just
exemplary. There are also cases of submodules that do not contain big files,
but e.g. have a lengthy history with lots of small files.
So I don't know, as I would want to keep emphasized that this is just
an example.


>and these repositories are then shallowly cloned such that you do not
>have a large history locally.
>  * Transfer size:
> --
> 2.16.0.rc0.223.g4a4ac8367
>

Re: [PATCH 3/8] Doc/gitsubmodules: specify how submodules help in reduced size

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/gitsubmodules.txt | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> index cb795c6b6..3f73983d5 100644
> --- a/Documentation/gitsubmodules.txt
> +++ b/Documentation/gitsubmodules.txt
> @@ -63,6 +63,9 @@ Submodules can be used for at least two different use cases:
>  * Transfer size:
>In its current form Git requires the whole working tree present. It
>does not allow partial trees to be transferred in fetch or clone.
> +  If you have your project as multiple repositories tied together as
> +  submodules in a superproject, you can avoid fetching the working
> +  trees of the repositories you are not interested in.

You do not fetch a working tree, but a whole repository?

>  * Access control:
>By restricting user access to submodules, this can be used to implement
>read/write policies for different users.
> --
> 2.16.0.rc0.223.g4a4ac8367
>

Re: [PATCH 4/8] Doc/gitsubmodules: avoid abbreviations

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/gitsubmodules.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> index 3f73983d5..e3c798d2a 100644
> --- a/Documentation/gitsubmodules.txt
> +++ b/Documentation/gitsubmodules.txt
> @@ -76,9 +76,9 @@ The configuration of submodules
>  Submodule operations can be configured using the following mechanisms
>  (from highest to lowest precedence):
>
> - * The command line for those commands that support taking submodule specs.

++ The command line for those commands that support taking submodules
as part of their pathspecs[1].
++
++[1] pathspec is an official term according to `man gitglossary`.

Maybe?

> -   Most commands have a boolean flag '--recurse-submodules' whether to
> -   recurse into submodules. Examples are `ls-files` or `checkout`.
> + * The command line for those commands that support taking submodule
> +   specifications. Most commands have a boolean flag '--recurse-submodules
> +   whether to recurse into submodules. Examples are `ls-files` or `checkout`.
> Some commands take enums, such as `fetch` and `push`, where you can
> specify how submodules are affected.
>
> --
> 2.16.0.rc0.223.g4a4ac8367
>

Re: [PATCH v3 3/3] send-email: add test for Linux's get_maintainer.pl

2018-01-08 Thread Junio C Hamano

Matthieu Moy  writes:

> From: Alex Bennée 
>
> We had a regression that broke Linux's get_maintainer.pl. Using
> Mail::Address to parse email addresses fixed it, but let's protect
> against future regressions.
>
> Note that we need --cc-cmd to be relative because this option doesn't
> accept spaces in script names (probably to allow --cc-cmd="executable
> --option"), while --smtp-server needs to be absolute.
>
> Patch-edited-by: Matthieu Moy 
> Signed-off-by: Alex Bennée 
> Signed-off-by: Matthieu Moy 
> ---
> Change since v2:
>
> * Mention relative Vs absolute path in commit message.
>
> * Remove useless "chmod +x"
>
> * Remove useless double quotes
>
>  t/t9001-send-email.sh | 19 +++
>  1 file changed, 19 insertions(+)

Thanks, both.  

"while --smtp-server needs to be ..." gave a "Huh?" for somebody who
weren't familiar with the discussion that led to the addition of
that note (i.e. "unlike a near-by test that passes a full-path
$(pwd)/fake.endmail to --smtp-server" would have helped); it is not
a big deal, though.

Let's merge this to 'next' and try to merge in the first batch post
the release.

Thanks.



> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 4d261c2..a06e5d7 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -172,6 +172,25 @@ test_expect_success $PREREQ 'cc trailer with various 
> syntax' '
>   test_cmp expected-cc commandline1
>  '
>  
> +test_expect_success $PREREQ 'setup fake get_maintainer.pl script for cc 
> trailer' "
> + write_script expected-cc-script.sh <<-EOF
> + echo 'One Person  (supporter:THIS (FOO/bar))'
> + echo 'Two Person  (maintainer:THIS THING)'
> + echo 'Third List  (moderated list:THIS THING 
> (FOO/bar))'
> + echo ' (moderated list:FOR THING)'
> + echo 'f...@example.com (open list:FOR THING (FOO/bar))'
> + echo 's...@example.com (open list)'
> + EOF
> +"
> +
> +test_expect_success $PREREQ 'cc trailer with get_maintainer.pl output' '
> + clean_fake_sendmail &&
> + git send-email -1 --to=recipi...@example.com \
> + --cc-cmd=./expected-cc-script.sh \
> + --smtp-server="$(pwd)/fake.sendmail" &&
> + test_cmp expected-cc commandline1
> +'
> +
>  test_expect_success $PREREQ 'setup expect' "
>  cat >expected-show-all-headers <<\EOF
>  0001-Second.patch

Re: [PATCH 5/8] Doc/gitsubmodules: use "Git directory" consistently

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 4:39 PM, Eric Sunshine  wrote:
> On Sat, Jan 6, 2018 at 1:46 PM, Kaartic Sivaraam
>  wrote:
>> Signed-off-by: Kaartic Sivaraam 
>> ---
>> diff --git a/Documentation/gitsubmodules.txt 
>> b/Documentation/gitsubmodules.txt
>> @@ -113,7 +113,7 @@ obtain the submodule from is configured here for example.
>>  This file mainly serves as the mapping between name and path in
>> -the superproject, such that the submodule's git directory can be
>> +the superproject, such that the submodule's Git directory can be
>>  located.
>
> There are two more instances of this capitalization inconsistency
> later in the file. This patch probably ought to address all of them.

Thanks for fixing the capitalization!

Re: [PATCH 6/8] Doc/gitsubmodules: improve readability of certain lines

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/gitsubmodules.txt | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
> index 745a3838e..339fb73db 100644
> --- a/Documentation/gitsubmodules.txt
> +++ b/Documentation/gitsubmodules.txt
> @@ -76,9 +76,10 @@ The configuration of submodules
>  Submodule operations can be configured using the following mechanisms
>  (from highest to lowest precedence):
>
> - * The command line for those commands that support taking submodule
> -   specifications. Most commands have a boolean flag '--recurse-submodules
> -   whether to recurse into submodules. Examples are `ls-files` or `checkout`.
> + * The command line arguments of those commands that support taking submodule
> +   specifications. Most commands have a boolean flag '--recurse-submodules'
> +   which specify whether they should recurse into submodules. Examples are
> +   `ls-files` or `checkout`.
> Some commands take enums, such as `fetch` and `push`, where you can
> specify how submodules are affected.
>
> @@ -90,8 +91,8 @@ Submodule operations can be configured using the following 
> mechanisms
>  For example an effect from the submodule's `.gitignore` file
>  would be observed when you run `git status --ignore-submodules=none` in
>  the superproject. This collects information from the submodule's working
> -directory by running `status` in the submodule, which does pay attention
> -to its `.gitignore` file.
> +directory by running `status` in the submodule while paying attention
> +to the `.gitignore` file of the submodule.

Both are grammatically correct and expressive, thanks!

>  +

Extra spurious line?

>  The submodule's `$GIT_DIR/config` file would come into play when running
>  `git push --recurse-submodules=check` in the superproject, as this would
> @@ -107,13 +108,13 @@ If the submodule is not yet initialized, then the 
> configuration
>  inside the submodule does not exist yet, so configuration where to
>  obtain the submodule from is configured here for example.
>
> - * the `.gitmodules` file inside the superproject. Additionally to the
> -   required mapping between submodule's name and path, a project usually
> + * The `.gitmodules` file inside the superproject. Additionally, if mapping
> +   is required between a submodule's name and its path, a project usually

This changes meaning, originally it tries to say:

* it requires mapping path <-> names.
* but there can be more.

whereas the new lines are:

* mapping is optional
* there can be more.

> uses this file to suggest defaults for the upstream collection
> of repositories.
>  +
> -This file mainly serves as the mapping between name and path in
> -the superproject, such that the submodule's Git directory can be
> +This file mainly serves as the mapping between the name and path of 
> submodules
> +in the superproject, such that the submodule's Git directory can be
>  located.

makes sense!

Thanks,
Stefan

>  +
>  If the submodule has never been initialized, this is the only place
> --
> 2.16.0.rc0.223.g4a4ac8367
>

Re: [PATCH v4 8/7] wildmatch test: skip file creation tests on Windows proper

2018-01-08 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> I agree that there should be some prerequisite to skip this on Windows
> by default since 6 minutes is clearly excessive (although I think you'll
> find it runs a bit faster in the branch above), but that should be
> something like:
>
> test_lazy_prereq EXPENSIVE_ON_WINDOWS '
> test -n "$GIT_TEST_LONG" || test_have_prereq !MINGW,!CYGWIN
> '
>
> As the long runtime is not inherent to the test, but to excessive
> slowness caused by certain operations on certain platforms, or maybe it
> should be EXPENSIVE_ON_SLOW_FS or EXPENSIVE_IF_FORKING_IS_SLOW or if
> we'd like to generalize it.

We already do skip overly long tests everywhere unless explicitly
asked to run them, and the above sounds like a good way to go.

Re: [PATCH] upload-pack: fix some sparse warnings

2018-01-08 Thread Brandon Williams

On 01/05, Ramsay Jones wrote:
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Brandon,
> 
> If you need to re-roll your 'bw/protocol-v2' branch, could you please
> squash this (or something like it) into the relevant patches. The first
> hunk would go in commit 6ec1105192, "upload-pack: convert to a builtin",
> 2018-01-02), whereas the second hunk would go to commit b3f3749a24,
> "upload-pack: factor out processing lines", 2018-01-02).

Thanks for finding these, I'll make sure I include them in the relevant
commits.

> 
> The sparse warnings were:
> 
>SP upload-pack.c
>upload-pack.c:783:43: error: incompatible types for operation (<=)
>upload-pack.c:783:43:left side has type int *depth
>upload-pack.c:783:43:right side has type int
>upload-pack.c:783:43: error: incorrect type in conditional
>upload-pack.c:783:43:got bad type
>upload-pack.c:1389:5: warning: symbol 'cmd_upload_pack' was not declared. 
> Should it be static?
> 
> [Note that the line numbers are off-by-one.]
> 
> I note, in passing, that strtol() returns a 'long int' but *depth is
> an 'int' (yes, the original code was like that too ;-) ).
> 
> Should cmd_upload_pack(), along with the #include "builtin.h", be moved
> to builtin/upload-pack.c?

Junio mentioned something similar when I sent out the WIP series, I must
have forgotten to do that before sending this out.  I'll make that
change :)

> 
> Also, I note that packet_read_with_status(), introduced in commit 4570421c3,
> is not called outside of pkt-line.c; does this symbol need to be extern?

I thought it might, but you're right it doesn't look like it needs to.
I'll change that so its not exported.

> 
> Thanks!
> 
> ATB,
> Ramsay Jones
> 
> 
>  upload-pack.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/upload-pack.c b/upload-pack.c
> index 8002f1f20..6271245e2 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "builtin.h"
>  #include "config.h"
>  #include "refs.h"
>  #include "pkt-line.h"
> @@ -780,7 +781,7 @@ static int process_deepen(const char *line, int *depth)
>   if (skip_prefix(line, "deepen ", &arg)) {
>   char *end = NULL;
>   *depth = strtol(arg, &end, 0);
> - if (!end || *end || depth <= 0)
> + if (!end || *end || *depth <= 0)
>   die("Invalid deepen: %s", line);
>   return 1;
>   }
> -- 
> 2.15.0

-- 
Brandon Williams

Re: [PATCH 7/8] Doc/git-submodule: improve readability and grammar of a sentence

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/git-submodule.txt | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index ff612001d..befbccde6 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -132,9 +132,9 @@ expects by cloning missing submodules and updating the 
> working tree of
>  the submodules. The "updating" can be done in several ways depending
>  on command line options and the value of `submodule..update`
>  configuration variable. The command line option takes precedence over
> -the configuration variable. if neither is given, a checkout is performed.
> -update procedures supported both from the command line as well as setting
> -`submodule..update`:
> +the configuration variable. If neither is given, a checkout is performed.
> +The update procedures supported both from the command line as well as
> +through the `submodule..update` configuration are:

Makes sense!
Thanks,
Stefan

>
> checkout;; the commit recorded in the superproject will be
> checked out in the submodule on a detached HEAD.
> --
> 2.16.0.rc0.223.g4a4ac8367
>

Re: [PATCH 0/8] Doc/submodules: a few updates

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 10:46 AM, Kaartic Sivaraam
 wrote:
> These are just a few improvements that I thought would make the documentation
> related to submodules a little better in various way such as readability,
> consistency etc., These were things I noticed while reading thise documents.
>
> Sorry, for the highly granular patches. I did the commits as and when I was
> reading them and tried to keep them focused to one particular change by 
> rebasing
> them as needed. In case they need some change, let me know.

While small patches are really appreciated for code (bisect, automated
testing, and
the general difficulty to reason about code, as a very small change
may affect the whole
code base), I am not sure if they benefit in documentation.
Documentation is a rather
local human readable thing, so by changing one sentence we don't
affect the understanding
of documentation at a completely unrelated place.

Also it helps to read more than just sentence fragments, i.e. I tried
looking at the
whole paragraph for review. May I suggest to squash them all and
resend as one patch?

>
> I based these patches on top of 'master'.

I am not aware of other submodule patches affecting documentation in master..pu,
so this should be easy to merge.

>
> Apart from the changes, I saw a few things that needed 
> improvement/clarification
> but wasn't able to do that myself due to my limited knowledge of submodules. 
> They
> are listed below. I'll add in patches for them if they are correctly 
> clarified.
>
>
> 1.
>
>  man gitsubmodules
>
>·   The configuration file $GIT_DIR/config in the superproject. 
> Typical configuration at this place is controlling if a submodule is
>recursed into at all via the active flag for example.
>
>If the submodule is not yet initialized, then the configuration 
> inside the submodule does not exist yet, so configuration where to
>obtain the submodule from is configured here for example.
>
> What's the "active flag" mentioned above? Also I find the phrase "is recursed 
> into at all"
> to be a little slippery. How could it be improved?

There are multiple ways to indicate if a submodule is "active", i.e. if Git is
supposed to pay attention. Historically we had to set the
submodule..url flag in the config, but last year Brandon added
submodule.active as well as submodule..active which supersede
the .url flag.

(See is_submodule_active() in submodule.c to see the definitive answer to
"should Git pay attention?")
https://github.com/git/git/blob/master/submodule.c#L224

I wonder if this indicates a lack of documentation when the active
flags were introduced.
They are found in 'man git config', but maybe we need to spell them
out explicitly
in the submodule related docs.

> 2.
>
>  man git submodule
>
>update
>...
>
>checkout
>
>
>If --force is specified, the submodule will be checked out 
> (using git checkout --force if appropriate), even if the commit
>specified in the index of the containing repository already 
> matches the commit checked out in the submodule.
>
> I'm not sure this is conveying all the information it should be conveying.
> It seems to making the user wonder, "How at all does 'git submodule update 
> --force'
> differs from 'git submodule update'?" also "using git checkout --force if 
> appropriate"
> seems to be invoking all sorts confusion as "appropriate" is superfluous.

When "submodule update" is invoked with the `--force` flag, that flag is passed
on to the 'checkout' operation. If you do not give the --force, then
the checkout
will also be done without --force.

>
> How could these confusions be clarified?

I tried giving an alternative snippet above, not sure how else to tell.

Re: [PATCHv3 1/4] color.h: document and modernize header

2018-01-08 Thread Junio C Hamano

Stefan Beller  writes:

>  /*
> - * Set the color buffer (which must be COLOR_MAXLEN bytes)
> - * to the raw color bytes; this is useful for initializing
> - * default color variables.
> + * NEEDSWWORK: document this function or refactor grep.c to stop using this
> + * function.
>   */
> -void color_set(char *dst, const char *color_bytes);
> +extern void color_set(char *dst, const char *color_bytes);

The original that is removed by the patch documents the function
well enough; as long as the NEEDSWORK comment is followed through
in a later step in the series, it's alright, though ;-)

> -int git_config_colorbool(const char *var, const char *value);
> -int want_color(int var);
> -int color_parse(const char *value, char *dst);
> -int color_parse_mem(const char *value, int len, char *dst);
> +/*
> + * Parse a config option, which can be a boolean or one of
> + * "never", "auto", "always". Return a constant of
> + * GIT_COLOR_NEVER for "never" or negative boolean,
> + * GIT_COLOR_ALWAYS for "always" or a positive boolean,
> + * and GIT_COLOR_AUTO for "auto".
> + */
> +extern int git_config_colorbool(const char *var, const char *value);

"never" and "always" not being part of usual boolean vocabulary
makes it a bit awkward to explain.

> +/*
> + * Output the formatted string in the specified color (and then reset to 
> normal
> + * color so subsequent output is uncolored). Omits the color encapsulation if
> + * `color` is NULL. The `color_fprintf_ln` prints a new line after resetting
> + * the color. The `color_print_strbuf` prints the given pre-formatted strbuf
> + * instead, up to its first NUL character.
> + */

Obviously, it does not have to be part of this step nor series, but
the above observation makes us realize that color_print_strbuf()
would probably be an unreasonably narrow interface.  It is not too
much to ask the caller to dereference and pass only the .buf
component of the strbuf to an alternative helper that takes "const
char *" and by doing so would allow us to let other callers that do
not have a strbuf but just a plain string use it, too.

Looks good.

Re: GSoC 2018 Org applications. Deadline = January 23, 2018 at 18:00 (CET)

2018-01-08 Thread Matthieu Moy

"Christian Couder"  wrote:

> Hi,
> 
> On Fri, Jan 5, 2018 at 12:18 PM, Johannes Schindelin
>  wrote:
>> Hi,
>>
>> On Fri, 5 Jan 2018, Matthieu Moy wrote:
>>
>>> If we go for it, we need:
>>>
>>> * Admins
>>>
>>> * Potential mentors
>>
>> Count me in as a potential mentor.
> 
> I am ok to be admin and mentor.

Cool :-)

In case you missed it: there's an iCal/Google calendar link on
the timeline page:

  https://developers.google.com/open-source/gsoc/timeline

If you use an electronic calendar, it's nice to add it to make
sure you never miss a deadline. The URL is the same every year, it's
how I got a reminder that the application was open.

-- 
Matthieu Moy
https://matthieu-moy.fr/

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Dan Jacques

On Mon, 08 Jan 2018, Ævar Arnfjörð Bjarmason replied:

> Thanks, applied this on top of next and it works for me, i.e. install to
> /tmp/git and move to /tmp/git2 = works for me. Comments below.

Good to hear! I've run this through a few machines at my disposal, but
the more hands on the better.

>> Enabling RUNTIME_PREFIX_PERL overrides the system-specific Perl script
>> installation path generated by MakeMaker to force installation into a
>> platform-neutral location, "/share/perl5".
>
> Not generated by MakeMaker anymore :)

Hah good catch! I'll update the commit message.

>>+# it. This is intentionally separate from RUNTIME_PREFIX so that notably 
>>Windows
>>+# can hard-code Perl library paths while still enabling RUNTIME_PREFIX
>>+# resolution.
>
> Maybe we covered this in previous submissions, but refresh my memory,
> why is the *_PERL define still needed? Reading this explanation doesn't
> make sense to me, but I'm probably missing something.
>
> If we have a system where we have some perl library paths on the system
> we want to use, then they'll still be in @INC after our 'use lib'-ing,
> so we'll find libraries there.
>
> The only reason I can think of for doing this for C and not for Perl
> would be if you for some reason want to have a git in /tmp/git but then
> use a different version of the Git.pm from some system install, but I
> can't imagine why.

The reason is entirely due to the way Git-for-Windows is structured. In
Git-for-Windows, Git binaries are run directly from Windows, meaning that
they require RUNTIME_PREFIX resolution. However, Perl scripts are run from
a MinGW universe, within which filesystem paths are fixed. Therefore,
Windows Perl scripts don't require a runtime prefix resolution.

This makes sense because they are clearly functional right now without this
patch enabled :) However, we don't have the luxury of running Perl in a
separate universe on other OSes, so this patch is necessary for them.

I created a separate option because I wanted to ensure that I don't change
anything fundamental in Windows, which currently relies on runtime prefix
resoultion. On all other operating systems, Perl and binary runtime prefix
resolution is disabled by default, so if this patch set does end up having
bugs or edge cases in the Perl runtime prefix code, it won't inpact anybody's
current builds.

I can foresee a future where Windows maintainers decide that
PERL_RUNTIME_PREFIX is fine for Windows and merge the two options; however,
I didn't want to force that decision in the initial implementation.

> > +   # GIT_EXEC_PATH is supplied by `git` or the test suite. Otherwise, 
> > resolve
> > +   # against the runtime path of this script.
> > +   require FindBin;
> > +   require File::Spec;
> > +   (my $prefix = $ENV{GIT_EXEC_PATH} || $FindBin::Bin) =~ 
> > s=${gitexecdir_relative}$==;
>
> So why are we falling back on $FindBin::Bin? Just so you can do
> e.g. /tmp/git2/libexec/git-core/git-svn like you can do
> /tmp/git2/libexec/git-core/git-status, i.e. will this never be false if
> invoked via "git"?
>
> I don't mind it, just wondering if I'm missing something and we need to
> use the fallback path in some "normal" codepath.

Yep, exactly. The ability to directly invoke Perl scripts is currently
functional in non-runtime-prefix builds, so enabling it in runtime-prefix
builds seemed appropriate. I have found this useful for testing.

However, since GIT_EXEC_PATH is probably going to be the common path,
I'll scoop the FindBin code (including the "require" statement) into a
conditional in v6 and use it only when GIT_EXEC_PATH is empty.

> > +   return File::Spec->catdir($prefix, $relpath);
>
> I think you initially got some version of this from me (or not), so this
> is probably my fault, but reading this again I think this would be
> better as just:
>
> return $prefix . '@@PATHSEP@@' . $relpath;
>
> I.e. right after this we split on @@PATHSEP@@, and that clearly works
> (as opposed to using File::Spec->splitpath) since we've used it
> forever.
>
> Better just to use the same idiom on both ends to not leave the reader
> wondering why we can split paths one way, but need to join them another
> way.

PATHSEP is the path separator (":"), as opposed to the filesystem separator
("/"). We split on PATHSEP below b/c we need to "use lib" as an array, but
it may be a ":"-delimited string.

Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes

2018-01-08 Thread Jonathan Tan

On Sun,  7 Jan 2018 13:14:42 -0500
Derrick Stolee  wrote:

> +Design Details
> +--
> +
> +- The MIDX file refers only to packfiles in the same directory
> +  as the MIDX file.
> +
> +- A special file, 'midx-head', stores the hash of the latest
> +  MIDX file so we can load the file without performing a dirstat.
> +  This file is especially important with incremental MIDX files,
> +  pointing to the newest file.

I presume that the actual MIDX files are named by hash? (You might have
written this somewhere that I somehow missed.)

Also, I notice that in the "Future Work" section, the possibility of
multiple MIDX files is raised. Could this 'midx-head' file be allowed to
store multiple such files? That way, we avoid a bit of file format
churn (in that we won't need to define a new "chunk" in the future).

> +- If a packfile exists in the pack directory but is not referenced
> +  by the MIDX file, then the packfile is loaded into the packed_git
> +  list and Git can access the objects as usual. This behavior is
> +  necessary since other tools could add packfiles to the pack
> +  directory without notifying Git.
> +
> +- The MIDX file should be only a supplemental structure. If a
> +  user downgrades or disables the `core.midx` config setting,
> +  then the existing .idx and .pack files should be sufficient
> +  to operate correctly.

Let me try to summarize: so, at this point, there are no
backwards-incompatible changes to the repo disk format. Unupdated code
paths (and old versions of Git) can just read the .idx and .pack files,
as always. Updated code paths will look at the .midx and .idx files, and
will sort them as follows:
 - .midx files go into a data structure
 - .idx files not referenced by any .midx files go into the
   existing packed_git data structure

A writer can either merely write a new packfile (like old versions of
Git) or write a packfile and update the .midx file, and everything above
will still work. In the event that a writer deletes an existing packfile
referenced by a .midx (for example, old versions of Git during a
repack), we will lose the advantages of the .midx file - we will detect
that the .midx no longer works when attempting to read an object given
its information, but in this case, we can recover by dropping the .midx
file and loading all the .idx files it references that still exist.

As a reviewer, I think this is a very good approach, and this does make
things easier to review (as opposed to, say, an approach where a lot of
the code must be aware of .midx files).

Re: [PATCHv3 2/4] builtin/blame: dim uninteresting metadata

2018-01-08 Thread Junio C Hamano

Stefan Beller  writes:

> +color.blame.repeatedMeta::
> + Use the customized color for the part of git-blame output that
> + is repeated meta information per line (such as commit id,
> + author name, date and timezone). Defaults to dark gray.
> +
> ...

"Dark gray on default background" may alleviate worrries from those
who prefer black ink on white paper display by hinting that both
foreground and background colors can be configured.

Do we want to make this overridable from the command line,
i.e. --color-repeated-meta=gray?

> +#define OUTPUT_COLOR_LINE02000

The name of the macro implies that this is (or at least can be) a
lot more generic UI request than merely "paint the same metadata on
adjacent lines in a different color".

> + OPT_BIT(0, "color-lines", &output_option, N_("color redundant 
> metadata from previous line differently"), OUTPUT_COLOR_LINE),

Should this eventually become "--color=" that is more
usual and generic, possibly defaulting to "auto" in the future, I
wonder?

> diff --git a/color.h b/color.h
> index 2e768a10c6..2df2f86698 100644
> --- a/color.h
> +++ b/color.h
> @@ -30,6 +30,7 @@ struct strbuf;
>  #define GIT_COLOR_BLUE   "\033[34m"
>  #define GIT_COLOR_MAGENTA"\033[35m"
>  #define GIT_COLOR_CYAN   "\033[36m"
> +#define GIT_COLOR_DARK   "\033[1;30m"
>  #define GIT_COLOR_BOLD_RED   "\033[1;31m"
>  #define GIT_COLOR_BOLD_GREEN "\033[1;32m"
>  #define GIT_COLOR_BOLD_YELLOW"\033[1;33m"

How about using CYAN just like "diff" output uses it to paint the
least interesting lines?  That way we will keep the "uninteresting
is cyan" consistency for the default settings without adding a new
color to the palette.

Re: rebase preserve-merges: incorrect merge commits

2018-01-08 Thread Johannes Schindelin

Hi Matwey,

On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:

> 2018-01-08 19:32 GMT+03:00 Johannes Schindelin :
> >
> > On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
> >
> >> 2018-01-08 17:42 GMT+03:00 Matwey V. Kornilov :
> >> > 2018-01-08 16:56 GMT+03:00 Johannes Schindelin 
> >> > :
> >> >> Hi Matwey,
> >> >>
> >> >> On Mon, 8 Jan 2018, Matwey V. Kornilov wrote:
> >> >>
> >> >>> I think that rebase preserve-merges algorithm needs further
> >> >>> improvements. Probably, you already know it.
> >> >>
> >> >> Yes. preserve-merges is a fundamentally flawed design.
> >> >>
> >> >> Please have a look here:
> >> >>
> >> >> https://github.com/git/git/pull/447
> >> >>
> >> >> Since we are in a feature freeze in preparation for v2.16.0, I will
> >> >> submit these patch series shortly after v2.16.0 is released.
> >> >>
> >> >>> As far as I understand the root cause of this that when new merge
> >> >>> commit is created by rebase it is done simply by git merge
> >> >>> $new_parents without taking into account any actual state of the
> >> >>> initial merge commit.
> >> >>
> >> >> Indeed. preserve-merges does not allow commits to be reordered. 
> >> >> (Actually,
> >> >> it *does* allow it, but then fails to handle it correctly.) We even have
> >> >> test cases that mark this as "known breakage".
> >> >>
> >> >> But really, I do not think it is worth trying to fix the broken design.
> >> >> Better to go with the new recreate-merges. (I am biased, of course,
> >> >> because I invented recreate-merges. But then, I also invented
> >> >> preserve-merges, so ...)
> >> >
> >> > Well. I just checked --recreate-merges=no-rebase-cousins from the PR
> >> > and found that it produces the same wrong result in my test example.
> >> > The topology is reproduced correctly, but merge-commit content is
> >> > broken.
> >> > I did git rebase --recreate-merges=no-rebase-cousins --onto abc-0.1 v0.1 
> >> > abc-0.2
> >>
> >> Indeed, exactly as you still say in the documentation: "Merge conflict
> >> resolutions or manual amendments to merge commits are not preserved."
> >> My initial point is that they have to be preserved. Probably in
> >> recreate-merges, if preserve-merges is discontinued.
> >
> > Ah, but that is consistent with how non-merge-preserving rebase works: the
> > `pick` commands *also* do not record merge conflict resolution...
> >
> 
> I am sorry, didn't get it. When I do non-merge-preserving rebase
> --interactive there is no way to `pick' merge-commit at all.

Right, but you can `pick` commits and you can get merge conflicts. And you
need to resolve those merge conflicts and those merge conflict resolutions
are not preserved for future interactive rebases, unless you use `rerere`
(in which case it also extends to `pick`ing merge commits in
merge-preserving mode).

Ciao,
Johannes

Re: [PATCH 2/4] builtin/blame: dim uninteresting metadata

2018-01-08 Thread Stefan Beller

On Sat, Jan 6, 2018 at 12:26 AM, Eric Sunshine  wrote:
> On Thu, Jan 4, 2018 at 5:10 PM, Stefan Beller  wrote:
>> On Thu, Dec 28, 2017 at 2:29 PM, Eric Sunshine  
>> wrote:
>>> On Thu, Dec 28, 2017 at 4:03 PM, Stefan Beller  wrote:
 +static inline void colors_unset(const char **use_color, const char 
 **reset_color)
 +{
 +   *use_color = "";
 +   *reset_color = "";
 +}
 +
 +static inline void colors_set(const char **use_color, const char 
 **reset_color)
 +{
 +   *use_color = repeated_meta_color;
 +   *reset_color = GIT_COLOR_RESET;
 +}
>>>
>>> I'm not convinced that this colors_unset() / colors_set() /
>>> setup_line_color() abstraction is buying much. With this abstraction,
>>> I found the code more difficult to reason about than if the colors
>>> were just set/unset manually in the code which needs the colors. I
>>> *could* perhaps imagine setup_line_color() existing as a separate
>>> function since it is slightly more complex than the other two, but as
>>> it has only a single caller through all patches, even that may not be
>>> sufficient to warrant its existence.
>>
>> Have you viewed this patch in context of the following patch?
>> Originally I was convinced an abstraction was not needed, but
>> as the next patch shows, a helper per field seems handy.
>
> I did take the other patch into consideration when making the
> observation, and I still found the code more difficult to reason about
> than if these minor bits of code had merely been inlined into the
> callers. I neglected to mention previously that part of the problem
> may be that these function names do not do a good job of conveying
> what is being done, thus I repeatedly had to consult the function
> implementations while reading callers in order to understand what was
> going on.

I asked the question before rewriting and resending, and now I agree
that we do not want to have these small helpers.

Thanks,
Stefan

Re: [PATCH v4 0/4] Add --no-ahead-behind to status

2018-01-08 Thread Ben Peart




On 1/8/2018 10:48 AM, Jeff Hostetler wrote:

From: Jeff Hostetler 

This is version 4 of my patch series to avoid expensive ahead/behind
calculations in status.  This version removes the last commit containing
the experimental config setting.  And removes the undefined return values
for the nr_ahead/nr_behind arguments as discussed on the mailing list.


While I like the simplicity of just turning this off completely, I do 
wonder if we could come up with a better user experience.  For example, 
could we somehow limit the time spent computing the before/after and if 
it exceeds that limit, drop back to saying they are "different" rather 
than computing the exact number of commits before/after.


I was thinking about something similar to the logic we use today about 
whether to start reporting progress on other long commands. That would 
mean you could still get the ahead/behind values if you aren't that far 
behind but would only get "different" if that calculation gets too 
expensive (which implies the actual value isn't going to be all that 
helpful anyway).




This version does not address "git branch -vv", but that requires
passing data across the for-each-ref barrier and that seemed beyond
the scope of this task.

Jeff Hostetler (4):
   stat_tracking_info: return +1 when branches not equal
   status: add --[no-]ahead-behind to status and commit for V2 format.
   status: update short status to respect --no-ahead-behind
   status: support --no-ahead-behind in long format

  Documentation/config.txt |  6 
  Documentation/git-status.txt |  5 +++
  builtin/checkout.c   |  2 +-
  builtin/commit.c | 18 ++-
  ref-filter.c |  8 ++---
  remote.c | 50 --
  remote.h | 12 ++--
  t/t6040-tracking-info.sh | 73 
  t/t7064-wtstatus-pv2.sh  | 69 +
  wt-status.c  | 41 +
  wt-status.h  |  2 ++
  11 files changed, 250 insertions(+), 36 deletions(-)

Re: [PATCHv3 4/4] builtin/blame: highlight recently changed lines

2018-01-08 Thread Junio C Hamano

Stefan Beller  writes:

> +static struct color_field {
> + timestamp_t hop;
> + char col[COLOR_MAXLEN];
> +} *colorfield;
> +static int colorfield_nr, colorfield_alloc;
> +
> +static void parse_color_fields(const char *s)
> +{
> + struct string_list l = STRING_LIST_INIT_DUP;
> + struct string_list_item *item;
> + enum { EXPECT_DATE, EXPECT_COLOR } next = EXPECT_COLOR;
> +
> + /* Ideally this would be stripped and split at the same time? */
> + string_list_split(&l, s, ',', -1);
> + ALLOC_GROW(colorfield, colorfield_nr + 1, colorfield_alloc);
> +
> + for_each_string_list_item(item, &l) {
> + switch (next) {
> + case EXPECT_DATE:
> + colorfield[colorfield_nr].hop = 
> approxidate(item->string);
> + next = EXPECT_COLOR;
> + colorfield_nr++;
> + ALLOC_GROW(colorfield, colorfield_nr + 1, 
> colorfield_alloc);
> + break;

This should make sure cf[i].hop is monotonically increasing to avoid
end-user mistakes, I would think (what's 'hop' by the way?).

> + case EXPECT_COLOR:
> + if (color_parse(item->string, 
> colorfield[colorfield_nr].col))
> + die(_("expecting a color: %s"), item->string);

When you have a typo in one of your configuration files, say "[color
"blame"] highlightrecent = 1,week,blue,...", you'd want to see a bit
more than just "expecting a color: week" to help you diagnose and
resolve the issue.  Giving the name of the variable and the file the
wrong definition was found in would be needed, givin that this is
called from the config callback git_blame_config() below.

> + next = EXPECT_DATE;
> + break;
> + }
> + }
> +
> + if (next == EXPECT_COLOR)
> + die (_("must end with a color"));

Same here.

>   OPT_BIT(0, "color-lines", &output_option, N_("color redundant 
> metadata from previous line differently"), OUTPUT_COLOR_LINE),
>   OPT_BIT(0, "color-fields", &output_option, N_("color redundant 
> metadata fields from previous line differently"), OUTPUT_COLOR_FIELDS),
> + OPT_BIT(0, "heated-lines", &output_option, N_("color lines by 
> age"), OUTPUT_HEATED_LINES),

These options may be useful while having fun experimenting, but my
gut feeling is that these are too fine-grained for end-users to
tweak per invocation basis (which is what command line options are
for).  

But perhaps I am biased (as anybody else), as personally I find
anything beyond step 2/4 uninteresting, and not adding too many of
these options is consistent with that viewpoint ;-)

In any case, thanks for a fun read.

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Jan 08 2018, Dan Jacques jotted:

> On Mon, 08 Jan 2018, Ævar Arnfjörð Bjarmason replied:
>>>+# it. This is intentionally separate from RUNTIME_PREFIX so that notably 
>>>Windows
>>>+# can hard-code Perl library paths while still enabling RUNTIME_PREFIX
>>>+# resolution.
>>
>> Maybe we covered this in previous submissions, but refresh my memory,
>> why is the *_PERL define still needed? Reading this explanation doesn't
>> make sense to me, but I'm probably missing something.
>>
>> If we have a system where we have some perl library paths on the system
>> we want to use, then they'll still be in @INC after our 'use lib'-ing,
>> so we'll find libraries there.
>>
>> The only reason I can think of for doing this for C and not for Perl
>> would be if you for some reason want to have a git in /tmp/git but then
>> use a different version of the Git.pm from some system install, but I
>> can't imagine why.
>
> The reason is entirely due to the way Git-for-Windows is structured. In
> Git-for-Windows, Git binaries are run directly from Windows, meaning that
> they require RUNTIME_PREFIX resolution. However, Perl scripts are run from
> a MinGW universe, within which filesystem paths are fixed. Therefore,
> Windows Perl scripts don't require a runtime prefix resolution.
>
> This makes sense because they are clearly functional right now without this
> patch enabled :) However, we don't have the luxury of running Perl in a
> separate universe on other OSes, so this patch is necessary for them.
>
> I created a separate option because I wanted to ensure that I don't change
> anything fundamental in Windows, which currently relies on runtime prefix
> resoultion. On all other operating systems, Perl and binary runtime prefix
> resolution is disabled by default, so if this patch set does end up having
> bugs or edge cases in the Perl runtime prefix code, it won't inpact anybody's
> current builds.
>
> I can foresee a future where Windows maintainers decide that
> PERL_RUNTIME_PREFIX is fine for Windows and merge the two options; however,
> I didn't want to force that decision in the initial implementation.

Makes sense, well not really, But that's not your fault, but Windows's.

I do think you're being overly conservative here, the perl change is no
more invasive than the C changes (less so actually), and from anyone
who's not on Windows it makes sense to be able to enable this with just
RUNTIME_PREFIX=YesPlease, and have NO_RUNTIME_PREFIX_PERL=NotNeededHere
for Windows, if someone ends up needing it.

We usually hide stuff you might want in general, but isn't needed on one
special snowflake platform behind NO_*, not the other way around.

Maybe others disagre... 

>> > +  # GIT_EXEC_PATH is supplied by `git` or the test suite. Otherwise, 
>> > resolve
>> > +  # against the runtime path of this script.
>> > +  require FindBin;
>> > +  require File::Spec;
>> > +  (my $prefix = $ENV{GIT_EXEC_PATH} || $FindBin::Bin) =~ 
>> > s=${gitexecdir_relative}$==;
>>
>> So why are we falling back on $FindBin::Bin? Just so you can do
>> e.g. /tmp/git2/libexec/git-core/git-svn like you can do
>> /tmp/git2/libexec/git-core/git-status, i.e. will this never be false if
>> invoked via "git"?
>>
>> I don't mind it, just wondering if I'm missing something and we need to
>> use the fallback path in some "normal" codepath.
>
> Yep, exactly. The ability to directly invoke Perl scripts is currently
> functional in non-runtime-prefix builds, so enabling it in runtime-prefix
> builds seemed appropriate. I have found this useful for testing.
>
> However, since GIT_EXEC_PATH is probably going to be the common path,
> I'll scoop the FindBin code (including the "require" statement) into a
> conditional in v6 and use it only when GIT_EXEC_PATH is empty.

Both make sense.

>> > +  return File::Spec->catdir($prefix, $relpath);
>>
>> I think you initially got some version of this from me (or not), so this
>> is probably my fault, but reading this again I think this would be
>> better as just:
>>
>> return $prefix . '@@PATHSEP@@' . $relpath;
>>
>> I.e. right after this we split on @@PATHSEP@@, and that clearly works
>> (as opposed to using File::Spec->splitpath) since we've used it
>> forever.
>>
>> Better just to use the same idiom on both ends to not leave the reader
>> wondering why we can split paths one way, but need to join them another
>> way.
>
> PATHSEP is the path separator (":"), as opposed to the filesystem separator
> ("/"). We split on PATHSEP below b/c we need to "use lib" as an array, but
> it may be a ":"-delimited string.

Yes, silly me. Nevermind.

Re: [PATCH v4 0/4] Add --no-ahead-behind to status

2018-01-08 Thread Jeff Hostetler




On 1/8/2018 2:49 PM, Ben Peart wrote:



On 1/8/2018 10:48 AM, Jeff Hostetler wrote:

From: Jeff Hostetler 

This is version 4 of my patch series to avoid expensive ahead/behind
calculations in status.  This version removes the last commit containing
the experimental config setting.  And removes the undefined return values
for the nr_ahead/nr_behind arguments as discussed on the mailing list.


While I like the simplicity of just turning this off completely, I do wonder if we could 
come up with a better user experience.  For example, could we somehow limit the time 
spent computing the before/after and if it exceeds that limit, drop back to saying they 
are "different" rather than computing the exact number of commits before/after.

I was thinking about something similar to the logic we use today about whether to start 
reporting progress on other long commands. That would mean you could still get the 
ahead/behind values if you aren't that far behind but would only get 
"different" if that calculation gets too expensive (which implies the actual 
value isn't going to be all that helpful anyway).



After a off-line conversation with the others I'm going to look into
a version that is limited to n commits rather than be completely on or
off.  I think if you are for example less than 100 a/b then those numbers
have value; if you are past n, then they have much less value.

I'd rather do it by a fixed limit than by time to ensure that output
is predictable on graph shape and not on system load.

Jeff

Re: [PATCHv3 4/4] builtin/blame: highlight recently changed lines

2018-01-08 Thread Stefan Beller

On Mon, Jan 8, 2018 at 11:56 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> +static struct color_field {
>> + timestamp_t hop;
>> + char col[COLOR_MAXLEN];
>> +} *colorfield;
>> +static int colorfield_nr, colorfield_alloc;
>> +
>> +static void parse_color_fields(const char *s)
>> +{
>> + struct string_list l = STRING_LIST_INIT_DUP;
>> + struct string_list_item *item;
>> + enum { EXPECT_DATE, EXPECT_COLOR } next = EXPECT_COLOR;
>> +
>> + /* Ideally this would be stripped and split at the same time? */
>> + string_list_split(&l, s, ',', -1);
>> + ALLOC_GROW(colorfield, colorfield_nr + 1, colorfield_alloc);
>> +
>> + for_each_string_list_item(item, &l) {
>> + switch (next) {
>> + case EXPECT_DATE:
>> + colorfield[colorfield_nr].hop = 
>> approxidate(item->string);
>> + next = EXPECT_COLOR;
>> + colorfield_nr++;
>> + ALLOC_GROW(colorfield, colorfield_nr + 1, 
>> colorfield_alloc);
>> + break;
>
> This should make sure cf[i].hop is monotonically increasing to avoid
> end-user mistakes, I would think (what's 'hop' by the way?).
>
>> + case EXPECT_COLOR:
>> + if (color_parse(item->string, 
>> colorfield[colorfield_nr].col))
>> + die(_("expecting a color: %s"), item->string);
>
> When you have a typo in one of your configuration files, say "[color
> "blame"] highlightrecent = 1,week,blue,...", you'd want to see a bit
> more than just "expecting a color: week" to help you diagnose and
> resolve the issue.  Giving the name of the variable and the file the
> wrong definition was found in would be needed, givin that this is
> called from the config callback git_blame_config() below.
>
>> + next = EXPECT_DATE;
>> + break;
>> + }
>> + }
>> +
>> + if (next == EXPECT_COLOR)
>> + die (_("must end with a color"));
>
> Same here.
>
>>   OPT_BIT(0, "color-lines", &output_option, N_("color redundant 
>> metadata from previous line differently"), OUTPUT_COLOR_LINE),
>>   OPT_BIT(0, "color-fields", &output_option, N_("color redundant 
>> metadata fields from previous line differently"), OUTPUT_COLOR_FIELDS),
>> + OPT_BIT(0, "heated-lines", &output_option, N_("color lines by 
>> age"), OUTPUT_HEATED_LINES),
>
> These options may be useful while having fun experimenting, but my
> gut feeling is that these are too fine-grained for end-users to
> tweak per invocation basis (which is what command line options are
> for).
>
> But perhaps I am biased (as anybody else), as personally I find
> anything beyond step 2/4 uninteresting, and not adding too many of
> these options is consistent with that viewpoint ;-)

See, I find 2 and 3 uninteresting and just did it 'because someone else
hinted at that is what they want'. Maybe I was a bad listener.

4 (maybe with 2 in combination) would be all I need as that allows me
to quickly judge the trustworthiness of code (old code is better,
just like most liquors? ;)

> In any case, thanks for a fun read.

Thanks, I'll reread the comments and see if I can remove some
options to make it useful for upstream consumption.

Thanks,
Stefan

upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Colin Walters

Hi, so quite a while ago I wrote this:
https://github.com/cgwalters/git-evtag

Since I last posted about this on the list here, of course
shattered.io happened.  It also looks
like there was a node.js implementation written.

Any interest in having this in core git?

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Johannes Schindelin

Hi,

On Mon, 8 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> 
> On Mon, Jan 08 2018, Dan Jacques jotted:
> 
> From 3/3 (not not send 2 e-mails):
> 
> >+# it. This is intentionally separate from RUNTIME_PREFIX so that notably 
> >Windows
> >+# can hard-code Perl library paths while still enabling RUNTIME_PREFIX
> >+# resolution.
> 
> Maybe we covered this in previous submissions, but refresh my memory,
> why is the *_PERL define still needed? Reading this explanation doesn't
> make sense to me, but I'm probably missing something.

If the reason is to accommodate Windows, I think it'd make more sense to
change the way Git for Windows handles this, and use the same relative
paths (if possible, that is, see the GITPERLLIB problems I mentioned
elsewhere and which necessitated
https://github.com/git-for-windows/git/commit/3b2f716bd8).

BTW I managed to run your `runtime-prefix` branch through VSTS builds on
Windows, macOS and Linux and they all pass the test suite. (Including the
RUNTIME_PREFIX_PERL=YesPlease setting you added for Travis CI testing.)

What do you think? Should we just fold the RUNTIME_PREFIX_PERL handling
into RUNTIME_PREFIX and be done with that part?

Ciao,
Johannes

Re: [PATCH 2/6] fsmonitor: Stop inline'ing mark_fsmonitor_valid / _invalid

2018-01-08 Thread Ben Peart




On 1/4/2018 5:27 PM, Johannes Schindelin wrote:

Hi Alex,

On Tue, 2 Jan 2018, Alex Vandiver wrote:


These were inline'd when they were first introduced, presumably as an
optimization for cases when they were called in tight loops.  This
complicates using these functions, as untracked_cache_invalidate_path
is defined in dir.h.

Leave the inline'ing up to the compiler's decision, for ease of use.




I'm fine with these not being inline.  I was attempting to minimize the 
performance impact of the fsmonitor code when it was not even turned on. 
 Inlineing these functions allowed it to be kept to a simple test but I 
suspect (especially with modern optimizing compilers) that the overhead 
of calling a function to do that test is negligible.



As a compromise, you could leave the rather simple mark_fsmonitor_valid()
as inlined function. It should be by far the more-called function, anyway.

Ciao,
Johannes

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Johannes Schindelin

Hi,

On Mon, 8 Jan 2018, Dan Jacques wrote:

> On Mon, 08 Jan 2018, Ævar Arnfjörð Bjarmason replied:
> 
> >>+# it. This is intentionally separate from RUNTIME_PREFIX so that
> >>notably Windows +# can hard-code Perl library paths while still
> >>enabling RUNTIME_PREFIX +# resolution.
> >
> > Maybe we covered this in previous submissions, but refresh my memory,
> > why is the *_PERL define still needed? Reading this explanation
> > doesn't make sense to me, but I'm probably missing something.
> >
> > If we have a system where we have some perl library paths on the
> > system we want to use, then they'll still be in @INC after our 'use
> > lib'-ing, so we'll find libraries there.
> >
> > The only reason I can think of for doing this for C and not for Perl
> > would be if you for some reason want to have a git in /tmp/git but
> > then use a different version of the Git.pm from some system install,
> > but I can't imagine why.
> 
> The reason is entirely due to the way Git-for-Windows is structured. In
> Git-for-Windows, Git binaries are run directly from Windows, meaning
> that they require RUNTIME_PREFIX resolution. However, Perl scripts are
> run from a MinGW universe, within which filesystem paths are fixed.
> Therefore, Windows Perl scripts don't require a runtime prefix
> resolution.

As I mentioned in the mail I just finished and sent (I started it hours
ago, but then got busy with other things while the builds were running): I
am totally cool with changing this on Windows, too. Should simplify
things, right?

Ciao,
Johannes

Re: [PATCH 4/6] fsmonitor: Make output of test-dump-fsmonitor more concise

2018-01-08 Thread Ben Peart




On 1/4/2018 5:33 PM, Johannes Schindelin wrote:

Hi Alex,

On Tue, 2 Jan 2018, Alex Vandiver wrote:


Rather than display one very long line, summarize the contents of that
line.  The tests do not currently rely on any content except the first
line ("no fsmonitor" / "fsmonitor last update").


The more interesting part would be the entries with outdated ("invalid")
information. I thought that this information was pretty useful for
debugging. Maybe we could still keep at least that part, or at least
trigger outputting it via a command-line flag?



During the development and testing of fsmonitor, I found the '+-' to be 
helpful (especially since it is in index order).  I could touch a file 
and verify that it showed up as invalid and that it was the file I 
expected by its placement in the index.


I'd hate to have to add options to a test program for more/less output. 
I do like your additions of the time since updated and the final counts. 
 I prefer more information rather than less in my test tools - how 
about this?



diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
index 5d61b0d621..8503da288d 100644
--- a/t/helper/test-dump-fsmonitor.c
+++ b/t/helper/test-dump-fsmonitor.c
@@ -20,11 +20,13 @@ int cmd_main(int ac, const char **av)
   (uintmax_t)istate->fsmonitor_last_update,
   (now - istate->fsmonitor_last_update)/1.0e9);

-   for (i = 0; i < istate->cache_nr; i++)
+   for (i = 0; i < istate->cache_nr; i++) {
+   printf((istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) 
? "+" : "-");

if (istate->cache[i]->ce_flags & CE_FSMONITOR_VALID)
valid++;
+   }

-   printf("  valid: %d\n", valid);
+   printf("\n  valid: %d\n", valid);
printf("  invalid: %d\n", istate->cache_nr - valid);

return 0;



Ciao,
Johannes

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Johannes Schindelin

Hi,

On Mon, 8 Jan 2018, Colin Walters wrote:

> Hi, so quite a while ago I wrote this:
> https://github.com/cgwalters/git-evtag

For the benefit of readers who prefer to stay in their mail readers:

git-evtag

git-evtag can be used as a replacement for git-tag -s. It will
generate a strong checksum (called Git-EVTag-v0-SHA512) over the
commit, tree, and blobs it references (and recursively over
submodules). A primary rationale for this is that the underlying SHA1
algorithm of git is under increasing threat. Further, a goal here
is to create a checksum that covers the entire source of a single
revision as a replacement for tarballs + checksums.

> Since I last posted about this on the list here, of course
> shattered.io happened.  It also looks
> like there was a node.js implementation written.
> 
> Any interest in having this in core git?  

I have no opinion, I was just curious what this otherwise undescribed
thing was about.

Ciao,
Johannes

Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes

2018-01-08 Thread Derrick Stolee


On 1/8/2018 2:32 PM, Jonathan Tan wrote:

On Sun,  7 Jan 2018 13:14:42 -0500
Derrick Stolee  wrote:


+Design Details
+--
+
+- The MIDX file refers only to packfiles in the same directory
+  as the MIDX file.
+
+- A special file, 'midx-head', stores the hash of the latest
+  MIDX file so we can load the file without performing a dirstat.
+  This file is especially important with incremental MIDX files,
+  pointing to the newest file.

I presume that the actual MIDX files are named by hash? (You might have
written this somewhere that I somehow missed.)

Also, I notice that in the "Future Work" section, the possibility of
multiple MIDX files is raised. Could this 'midx-head' file be allowed to
store multiple such files? That way, we avoid a bit of file format
churn (in that we won't need to define a new "chunk" in the future).


I hadn't considered this idea, and I like it. I'm not sure this is a 
robust solution, since isolated MIDX files don't contain information 
that they could use other MIDX files, or what order they should be in. I 
think the "order" of incremental MIDX files is important in a few ways 
(such as the "stable object order" idea).


I will revisit this idea when I come back with the incremental MIDX 
feature. For now, the only reference to "number of base MIDX files" is 
in one byte of the MIDX header. We should consider changing that byte 
for this patch.



+- If a packfile exists in the pack directory but is not referenced
+  by the MIDX file, then the packfile is loaded into the packed_git
+  list and Git can access the objects as usual. This behavior is
+  necessary since other tools could add packfiles to the pack
+  directory without notifying Git.
+
+- The MIDX file should be only a supplemental structure. If a
+  user downgrades or disables the `core.midx` config setting,
+  then the existing .idx and .pack files should be sufficient
+  to operate correctly.

Let me try to summarize: so, at this point, there are no
backwards-incompatible changes to the repo disk format. Unupdated code
paths (and old versions of Git) can just read the .idx and .pack files,
as always. Updated code paths will look at the .midx and .idx files, and
will sort them as follows:
  - .midx files go into a data structure
  - .idx files not referenced by any .midx files go into the
existing packed_git data structure

A writer can either merely write a new packfile (like old versions of
Git) or write a packfile and update the .midx file, and everything above
will still work. In the event that a writer deletes an existing packfile
referenced by a .midx (for example, old versions of Git during a
repack), we will lose the advantages of the .midx file - we will detect
that the .midx no longer works when attempting to read an object given
its information, but in this case, we can recover by dropping the .midx
file and loading all the .idx files it references that still exist.

As a reviewer, I think this is a very good approach, and this does make
things easier to review (as opposed to, say, an approach where a lot of
the code must be aware of .midx files).


Thanks! That is certainly the idea. If you know about MIDX, then you can 
benefit from it. If you do not, then you have all the same data 
available to you do to your work. Having a MIDX file will not break 
other tools (libgit2, JGit, etc.).


One thing I'd like to determine before this patch goes to v1 is how much 
we should make the other packfile-aware commands also midx-aware. My gut 
reaction right now is to have git-repack call 'git midx --clear' if 
core.midx=true and a packfile was deleted. However, this could easily be 
changed with 'git midx --clear' followed by 'git midx --write 
--update-head' if midx-head exists.


Thanks,
-Stolee

[PATCH] bisect: avoid NULL pointer dereference

2018-01-08 Thread René Scharfe

7c117184d7 (bisect: fix off-by-one error in `best_bisection_sorted()`)
fixed an off-by-one error, plugged a memory leak and removed a NULL
check.  However, the pointer p *is* actually NULL if an empty list is
passed to the function.  Let's add the check back for safety.  Bisecting
nothing doesn't make too much sense, but that's no excuse for crashing.

Found with GCC's -Wnull-dereference.

Signed-off-by: Rene Scharfe 
---
 bisect.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 0fca17c02b..2f3008b078 100644
--- a/bisect.c
+++ b/bisect.c
@@ -229,8 +229,10 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
if (i < cnt - 1)
p = p->next;
}
-   free_commit_list(p->next);
-   p->next = NULL;
+   if (p) {
+   free_commit_list(p->next);
+   p->next = NULL;
+   }
strbuf_release(&buf);
free(array);
return list;
-- 
2.15.1

Re: [PATCH 3/3] merge-recursive: Avoid incorporating uncommitted changes in a merge

2018-01-08 Thread Junio C Hamano

Elijah Newren  writes:

> diff --git a/merge-recursive.c b/merge-recursive.c
> index 2ecf495cc2..780f81a8bd 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -1952,6 +1952,13 @@ int merge_trees(struct merge_options *o,
>   }
>  
>   if (oid_eq(&common->object.oid, &merge->object.oid)) {
> + struct strbuf sb = STRBUF_INIT;
> +
> + if (index_has_changes(&sb)) {
> + err(o, _("Dirty index: cannot merge (dirty: %s)"),
> + sb.buf);
> + return 0;
> + }
>   output(o, 0, _("Already up to date!"));
>   *result = head;
>   return 1;

I haven't come up with an addition to the test suite, but I suspect
this change is conceptually wrong.  What if a call to this function
is made during a recursive, inner merge?

Perhaps something like this is needed?

 merge-recursive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 780f81a8bd..0fc580d8ca 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1954,7 +1954,7 @@ int merge_trees(struct merge_options *o,
if (oid_eq(&common->object.oid, &merge->object.oid)) {
struct strbuf sb = STRBUF_INIT;
 
-   if (index_has_changes(&sb)) {
+   if (!o->call_depth && index_has_changes(&sb)) {
err(o, _("Dirty index: cannot merge (dirty: %s)"),
sb.buf);
return 0;

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Santiago Torres

Hi,

I personally like the idea of git-evtags, but I feel that they could be
made so that push certificates (and being hash-algorithm agnostic)
should provide the same functionality with less code.

To me, a git evtag is basically a signed tag + a data structure similar
to a push certificate embedded in it. I wonder if, with the current
tooling in git, this could be done as a custom command...

Cheers!
-Santiago.

On Mon, Jan 08, 2018 at 03:12:00PM -0500, Colin Walters wrote:
> Hi, so quite a while ago I wrote this:
> https://github.com/cgwalters/git-evtag
> 
> Since I last posted about this on the list here, of course
> shattered.io happened.  It also looks
> like there was a node.js implementation written.
> 
> Any interest in having this in core git?  

signature.asc
Description: PGP signature

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Colin Walters

On Mon, Jan 8, 2018, at 3:40 PM, Santiago Torres wrote:
> Hi,
> 
> I personally like the idea of git-evtags, but I feel that they could be
> made so that push certificates (and being hash-algorithm agnostic)
> should provide the same functionality with less code.

What's a "push certificate"?  (I really tried to find it in Google,
even going to page 4 where one can start to see tumbleweeds
going by... I'm fairly certain you're not talking about something related
to iOS notifications)

Re: [PATCH] bisect: avoid NULL pointer dereference

2018-01-08 Thread Johannes Schindelin

Hi René,

On Mon, 8 Jan 2018, René Scharfe wrote:

> 7c117184d7 (bisect: fix off-by-one error in `best_bisection_sorted()`)
> fixed an off-by-one error, plugged a memory leak and removed a NULL
> check.  However, the pointer p *is* actually NULL if an empty list is
> passed to the function.  Let's add the check back for safety.  Bisecting
> nothing doesn't make too much sense, but that's no excuse for crashing.
> 
> Found with GCC's -Wnull-dereference.
> 
> Signed-off-by: Rene Scharfe 
> ---
>  bisect.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/bisect.c b/bisect.c
> index 0fca17c02b..2f3008b078 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -229,8 +229,10 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list, int n
>   if (i < cnt - 1)
>   p = p->next;
>   }
> - free_commit_list(p->next);
> - p->next = NULL;
> + if (p) {
> + free_commit_list(p->next);
> + p->next = NULL;
> + }
>   strbuf_release(&buf);
>   free(array);
>   return list;

Isn't this identical to
https://public-inbox.org/git/20180103184852.27271-1-ava...@gmail.com/ ?

Ciao,
Dscho

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Stefan Beller

On Mon, Jan 8, 2018 at 12:40 PM, Santiago Torres  wrote:
> Hi,
>
> I personally like the idea of git-evtags, but I feel that they could be
> made so that push certificates (and being hash-algorithm agnostic)
> should provide the same functionality with less code.
>
> To me, a git evtag is basically a signed tag + a data structure similar
> to a push certificate embedded in it. I wonder if, with the current
> tooling in git, this could be done as a custom command...

In that case, why not migrate Git to a new hash function instead
of adding a very niche fixup?

See Documentation/technical/hash-function-transition.txt
for how to do it.

Personally I'd dislike to include ev-tags as it might send a signal
of "papering over sha1 issues instead of fixing it".

push certificates are somewhat underdocumented, see the
git-push man page, which contains
   --[no-]signed, --signed=(true|false|if-asked)
   GPG-sign the push request to update refs on the
   receiving side, to allow it to be checked by the
   hooks and/or be logged. If false or --no-signed, no
   signing will be attempted. If true or --signed, the
   push will fail if the server does not support signed
   pushes. If set to if-asked, sign if and only if the
   server supports signed pushes. The push will also
   fail if the actual call to gpg --sign fails. See git-
   receive-pack(1) for the details on the receiving end.

Going to receive-pack(1), there is an excerpt:

  When accepting a signed push (see git-push(1)), the
   signed push certificate is stored in a blob and an
   environment variable GIT_PUSH_CERT can be consulted for
   its object name. See the description of post-receive hook
   for an example. In addition, the certificate is verified
   using GPG and the result is exported with the following
   environment variables:
...

Stefan

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Santiago Torres

Yeah, I see where you're coming from. I don't think push certificates
have caught on yet...

You can read on them on [1], and also under the
Documentation/git-push:147.

There's also another PR trying to make a sample hook for signed
pushes on [2].

The basic idea is to push a signed data structure with relevant git
reference information as a git object to avoid a server/mitm from moving
references around.

Cheers!
-Santiago.

[1] 
https://public-inbox.org/git/1408485987-3590-1-git-send-email-gits...@pobox.com/
[2] https://public-inbox.org/git/20171202091248.6037-1-r...@shikherverma.com/

On Mon, Jan 08, 2018 at 03:42:33PM -0500, Colin Walters wrote:
> 
> 
> On Mon, Jan 8, 2018, at 3:40 PM, Santiago Torres wrote:
> > Hi,
> > 
> > I personally like the idea of git-evtags, but I feel that they could be
> > made so that push certificates (and being hash-algorithm agnostic)
> > should provide the same functionality with less code.
> 
> What's a "push certificate"?  (I really tried to find it in Google,
> even going to page 4 where one can start to see tumbleweeds
> going by... I'm fairly certain you're not talking about something related
> to iOS notifications) 

signature.asc
Description: PGP signature

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Santiago Torres

> Personally I'd dislike to include ev-tags as it might send a signal
> of "papering over sha1 issues instead of fixing it".

+1. I probably didn't convey it well, but this is what I was hoping for.
I think git has enough building blocks to provide something akin to git
evtags already.

Thanks,
-Santiago.


signature.asc
Description: PGP signature

Re: [PATCH 6/6] fsmonitor: Use fsmonitor data in `git diff`

2018-01-08 Thread Ben Peart




On 1/5/2018 5:22 PM, Junio C Hamano wrote:

Johannes Schindelin  writes:


diff --git a/diff-lib.c b/diff-lib.c
index 8104603a3..13ff00d81 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -95,6 +95,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
  
  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
  
+	if (!(option & DIFF_SKIP_FSMONITOR))

+   refresh_fsmonitor(&the_index);
+
if (diff_unmerged_stage < 0)
diff_unmerged_stage = 2;


I read over this hunk five times, and only now am I able to wrap my head
around this: if we do *not* want to skip the fsmonitor data, we refresh
the fsmonitor data in the index.

That feels a bit like an unneeded double negation. Speaking for myself, I
would prefore `DIFF_IGNORE_FSMONITOR` instead, it would feel less like a
double negation then. But I am not a native speaker, so I might be wrong.


I do find the logic a bit convoluted with double negative.



It's great to see more use of the fsmonitor data.  Thanks for doing this!

I agree with the sentiment that the logic as written is confusing.  I'll 
also point out that DIFF_IGNORE_FSMONITOR would be more consistent with 
the similar CE_MATCH_IGNORE_FSMONITOR flag and logic.


I'm also confused why we would not want to use the fsmonitor data in the 
'add' case.  When would you ever need to add a file that had not been 
modified?

cherry-picking fails after making a directory a submodule

2018-01-08 Thread Per Cederqvist

I have a situation where I have switched a directory from being a
subdirectory to being a submodule.  I then try to cherry-pick a commit
from a taskbranch that was made before the switch to the master
branch.  The commit touches a file outside the subdirectory/submodule.
Yet "git cherry-pick" fails with this error message:

> error: could not apply 78c403e... Add a project feature
> hint: after resolving the conflicts, mark the corrected paths
> hint: with 'git add ' or 'git rm '
> hint: and commit the result with 'git commit'

I can resolve the situation by running "git add libfoo && git
cherry-pick --continue".  The generated commit contains no changes to
"libfoo".

I don't understand why I need to manually add libfoo, as the commit
I'm cherry-picking doesn't touch anything in libfoo.

The script below can reproduce the issue.  Tested with git 2.15.1,
2.14.0 and 2.8.0, all with the same result.

Is this a bug in "git cherry-pick"?

-- cut here for cherry-across-submodule --
#!/bin/sh
#
# This script creates a simple repo, where the "libfoo" directory
# initially is a normal directory, but later becomes a git submodule.
# It then tries to cherry-pick a commit (that doesn't touch libfoo)
# that was created before the conversion to master (after the
# conversion).  This fails for unclear reasons.

# I've tested this with the following git versions:
#  - 2.8.0
#  - 2.14.0
#  - 2.15.1
#
# They all behave the same

# export PATH=/usr/local/git-2.15.1/bin:$PATH

set -e -x

git --version

# Refuse to run if this directory already exists, to prevent data loss.
mkdir cherry-across-submodule-root
cd cherry-across-submodule-root

mkdir root
(cd root && git init --bare libfoo.git)
(cd root && git init --bare project.git)

mkdir workspace
(cd workspace && git clone ../root/libfoo)
(cd workspace && git clone ../root/project)

proj_commit ()
{
(cd workspace/project &&
printf "$1\n" >> $2 &&
git add $2 &&
git commit -m"$3")
}

foo_commit ()
{
(cd workspace/libfoo &&
printf "$1\n" >> $2 &&
git add $2 &&
git commit -m"$3")
}

both_commit ()
{
foo_commit "$1" $2 "$3"
proj_commit "$1" libfoo/$2 "Imported libfoo: $3"
}

proj_commit "This is a project" README "Started the project"
mkdir workspace/project/libfoo
both_commit "This is a library" README "Started the library"
both_commit "all:\n\ttouch libfoo.a" Makefile "Build something"
proj_commit "all:\n\tmake -C libfoo" Makefile "Build libfoo"
proj_commit "ceder" AUTHORS "I made this"
both_commit "GPL" "COPYING" "Add license info"
(cd workspace/libfoo && git push)
(cd workspace/project && git push)
(cd workspace/project && git checkout -b task-1)
proj_commit "int feature() { return 17; }" feature.c "Add a project feature"
(cd workspace/project && git push -u origin task-1)

assert_clean()
{
(cd workspace/project &&
[ -z "`git status --porcelain`" ] )
}

# Cherrypicking task-1 to task-2 works fine.
(cd workspace/project && git checkout -b task-2 master && git
cherry-pick task-1)
assert_clean

(cd workspace/project &&
 git checkout master &&
 git rm -r libfoo &&
 git submodule add -b master ../libfoo.git libfoo &&
 git commit -m"Made libfoo a submodule")
assert_clean


# Now suddenly cherrypicking fails?  I get this message from the
# cherry-pick command:

# error: could not apply 78c403e... Add a project feature
# hint: after resolving the conflicts, mark the corrected paths
# hint: with 'git add ' or 'git rm '
# hint: and commit the result with 'git commit'

(cd workspace/project && git checkout -b task-3 master && git
cherry-pick task-1)

# At this point, "git status --porcelain" prints two lines:
# A  feature.c
# AU libfoo

assert_clean
-- cut here for cherry-across-submodule --

/ceder

Re: [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags

2018-01-08 Thread Junio C Hamano

lars.schnei...@autodesk.com writes:

> diff --git a/sha1_file.c b/sha1_file.c
> index afe4b90f6e..dcb02e9ffd 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -75,14 +75,14 @@ static struct cached_object *find_cached_object(const 
> unsigned char *sha1)
>  }
>  
>  
> -static enum safe_crlf get_safe_crlf(unsigned flags)
> +static int get_conv_flags(unsigned flags)
>  {
>   if (flags & HASH_RENORMALIZE)
> - return SAFE_CRLF_RENORMALIZE;
> + return CONV_EOL_RENORMALIZE;
>   else if (flags & HASH_WRITE_OBJECT)
> - return safe_crlf;
> + return global_conv_flags_eol | CONV_WRITE_OBJECT;

This macro has not yet introduced at this point (it appears in 6/7
if I am not mistaken).

Re: cherry-picking fails after making a directory a submodule

2018-01-08 Thread Stefan Beller

On Mon, Jan 8, 2018 at 1:08 PM, Per Cederqvist  wrote:
> I have a situation where I have switched a directory from being a
> subdirectory to being a submodule.  I then try to cherry-pick a commit
> from a taskbranch that was made before the switch to the master
> branch.  The commit touches a file outside the subdirectory/submodule.
> Yet "git cherry-pick" fails with this error message:
>
>> error: could not apply 78c403e... Add a project feature
>> hint: after resolving the conflicts, mark the corrected paths
>> hint: with 'git add ' or 'git rm '
>> hint: and commit the result with 'git commit'
>
> I can resolve the situation by running "git add libfoo && git
> cherry-pick --continue".  The generated commit contains no changes to
> "libfoo".
>
> I don't understand why I need to manually add libfoo, as the commit
> I'm cherry-picking doesn't touch anything in libfoo.
>
> The script below can reproduce the issue.  Tested with git 2.15.1,
> 2.14.0 and 2.8.0, all with the same result.
>
> Is this a bug in "git cherry-pick"?

Could you please test with
github.com/git/git/commit/c641ca67072946f95f87e7b21f13f3d4e73701e3
included? (See its parent commit, for the test)

>From my cursory read that commit is the issue addressed in that commit.

Re: [PATCH] bisect: avoid NULL pointer dereference

2018-01-08 Thread René Scharfe

Hello Dscho,

Am 08.01.2018 um 21:45 schrieb Johannes Schindelin:
> Isn't this identical to
> https://public-inbox.org/git/20180103184852.27271-1-ava...@gmail.com/ ?

yes, indeed, thanks.  So here's an upvote for Ævar's patch then. :)

(I should have sent it earlier, but was not fully convinced it could be
triggered in the wild.)

René

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Junio C Hamano

Johannes Schindelin  writes:

> On Mon, 8 Jan 2018, Dan Jacques wrote:
>
>> On Mon, 08 Jan 2018, Ævar Arnfjörð Bjarmason replied:
>> 
>> >>+# it. This is intentionally separate from RUNTIME_PREFIX so that
>> >>notably Windows +# can hard-code Perl library paths while still
>> >>enabling RUNTIME_PREFIX +# resolution.
>> >
>> > Maybe we covered this in previous submissions, but refresh my memory,
>> > why is the *_PERL define still needed? Reading this explanation
>> > doesn't make sense to me, but I'm probably missing something.
>> >
>> > If we have a system where we have some perl library paths on the
>> > system we want to use, then they'll still be in @INC after our 'use
>> > lib'-ing, so we'll find libraries there.
>> >
>> > The only reason I can think of for doing this for C and not for Perl
>> > would be if you for some reason want to have a git in /tmp/git but
>> > then use a different version of the Git.pm from some system install,
>> > but I can't imagine why.
>> 
>> The reason is entirely due to the way Git-for-Windows is structured. In
>> Git-for-Windows, Git binaries are run directly from Windows, meaning
>> that they require RUNTIME_PREFIX resolution. However, Perl scripts are
>> run from a MinGW universe, within which filesystem paths are fixed.
>> Therefore, Windows Perl scripts don't require a runtime prefix
>> resolution.
>
> As I mentioned in the mail I just finished and sent (I started it hours
> ago, but then got busy with other things while the builds were running): I
> am totally cool with changing this on Windows, too. Should simplify
> things, right?

Wonderful to see that you two are in agreement.  Will look forward
to see a simplified solution in a later round ;-)

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Dan Jacques

On 2018-01-08 20:27, Johannes Schindelin wrote:

> > Maybe we covered this in previous submissions, but refresh my memory,
> > why is the *_PERL define still needed? Reading this explanation doesn't
> > make sense to me, but I'm probably missing something.
>
> If the reason is to accommodate Windows, I think it'd make more sense to
> change the way Git for Windows handles this, and use the same relative
> paths (if possible, that is, see the GITPERLLIB problems I mentioned
> elsewhere and which necessitated
> https://github.com/git-for-windows/git/commit/3b2f716bd8).
> (...)
> What do you think? Should we just fold the RUNTIME_PREFIX_PERL handling
> into RUNTIME_PREFIX and be done with that part?
> (...)
> As I mentioned in the mail I just finished and sent (I started it hours
> ago, but then got busy with other things while the builds were running): I
> am totally cool with changing this on Windows, too. Should simplify
> things, right?

No objections here. I see it as adding slightly more risk to this patch's
potential impact on Windows builds, but if Git-for-Windows is okay with that,
I'll go ahead and fold RUNTIME_PREFIX_PERL into RUNTIME_PREFIX for
simplicity's sake.

I'll add a "NO_RUNTIME_PREFIX_PERL" flag as per avarab@'s suggestion as a
potential mitigation if a problem does end up arising in Windows builds,
with a note that NO_RUNTIME_PREFIX_PERL can be deleted if everything seems
to be working. What do you think?

> BTW I managed to run your `runtime-prefix` branch through VSTS builds on
> Windows, macOS and Linux and they all pass the test suite. (Including the
> RUNTIME_PREFIX_PERL=YesPlease setting you added for Travis CI testing.)

Great news, thanks for doing this!

Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes

2018-01-08 Thread Jonathan Tan

On Mon, 8 Jan 2018 15:35:59 -0500
Derrick Stolee  wrote:

> Thanks! That is certainly the idea. If you know about MIDX, then you can 
> benefit from it. If you do not, then you have all the same data 
> available to you do to your work. Having a MIDX file will not break 
> other tools (libgit2, JGit, etc.).
> 
> One thing I'd like to determine before this patch goes to v1 is how much 
> we should make the other packfile-aware commands also midx-aware. My gut 
> reaction right now is to have git-repack call 'git midx --clear' if 
> core.midx=true and a packfile was deleted. However, this could easily be 
> changed with 'git midx --clear' followed by 'git midx --write 
> --update-head' if midx-head exists.

My opinion is that these are sufficient:
 (a) functionality to create a .midx file from scratch (deleting any
 existing ones)
 (b) either:
 - update of packfile.c to read (one or more) midx files (like in
   patch 18), and possibly usage in a benchmark, or
 - any other usage of midx file (e.g. abbreviations, like in patch
   17)

In general, a way to create them (so that they can be used from a
cronjob, etc.), and a way to consume them to show that the new way works
and is indeed faster. This functionality in itself might be sufficient
to merge in - this would already be useful in situations where it is
undesirable for repacks to happen often, and we can "bridge" the
intervals between repacks using a cronjob that periodically generates
.midx files in order to keep up the object lookup performance.

In particular, I think that it is fine to omit a more sophisticated
"repack" for now - the .midx mechanism must tolerate packs referenced by
.midx files suddenly disappearing anyway, and in this way, at least we
can demonstrate that the .midx mechanism still works in this case.

Re: [PATCH] travis-ci: build Git during the 'script' phase

2018-01-08 Thread Junio C Hamano

SZEDER Gábor  writes:

> The reason why Travis CI does it this way and why it's a better
> approach than ours lies in how unsuccessful build jobs are
> categorized.  ...
> ...
> This makes it easier, both for humans looking at the Travis CI web
> interface and for automated tools querying the Travis CI API,...
> ...
> A verbose commit message for such a change... but I don't know why we
> started with building Git in the 'before_script' phase.

Thanks for writing it up clearly.  TBH, I didn't even realize that
there were meaningful distinctions between the two cases after
seeing that sometimes our tests were failing and sometimes erroring
;-)

> Should go on top of 'sg/travis-check-untracked' in 'next'.

Re: cherry-picking fails after making a directory a submodule

2018-01-08 Thread Per Cederqvist

On Mon, Jan 8, 2018 at 10:46 PM, Stefan Beller  wrote:
> On Mon, Jan 8, 2018 at 1:08 PM, Per Cederqvist  wrote:
>> I have a situation where I have switched a directory from being a
>> subdirectory to being a submodule.  I then try to cherry-pick a commit
>> from a taskbranch that was made before the switch to the master
>> branch.  The commit touches a file outside the subdirectory/submodule.
>> Yet "git cherry-pick" fails with this error message:
>>
>>> error: could not apply 78c403e... Add a project feature
>>> hint: after resolving the conflicts, mark the corrected paths
>>> hint: with 'git add ' or 'git rm '
>>> hint: and commit the result with 'git commit'
>>
>> I can resolve the situation by running "git add libfoo && git
>> cherry-pick --continue".  The generated commit contains no changes to
>> "libfoo".
>>
>> I don't understand why I need to manually add libfoo, as the commit
>> I'm cherry-picking doesn't touch anything in libfoo.
>>
>> The script below can reproduce the issue.  Tested with git 2.15.1,
>> 2.14.0 and 2.8.0, all with the same result.
>>
>> Is this a bug in "git cherry-pick"?
>
> Could you please test with
> github.com/git/git/commit/c641ca67072946f95f87e7b21f13f3d4e73701e3
> included? (See its parent commit, for the test)
>
> From my cursory read that commit is the issue addressed in that commit.

Thanks!  I can confirm that applying the changes to merge-recursive.c from
that commit fixes the issue in 2.15.1.

/ceder

Re: [PATCH] bisect: avoid NULL pointer dereference

2018-01-08 Thread Junio C Hamano

René Scharfe  writes:

> 7c117184d7 (bisect: fix off-by-one error in `best_bisection_sorted()`)
> fixed an off-by-one error, plugged a memory leak and removed a NULL
> check.  However, the pointer p *is* actually NULL if an empty list is
> passed to the function.  Let's add the check back for safety.  Bisecting
> nothing doesn't make too much sense, but that's no excuse for crashing.
>
> Found with GCC's -Wnull-dereference.
>
> Signed-off-by: Rene Scharfe 
> ---

Thanks.  I think this is the same as 2e9fdc79 ("bisect: fix a
regression causing a segfault", 2018-01-03) but the log we see here
explains what goes wrong much better than the other one ;-)

>  bisect.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/bisect.c b/bisect.c
> index 0fca17c02b..2f3008b078 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -229,8 +229,10 @@ static struct commit_list *best_bisection_sorted(struct 
> commit_list *list, int n
>   if (i < cnt - 1)
>   p = p->next;
>   }
> - free_commit_list(p->next);
> - p->next = NULL;
> + if (p) {
> + free_commit_list(p->next);
> + p->next = NULL;
> + }
>   strbuf_release(&buf);
>   free(array);
>   return list;

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Ævar Arnfjörð Bjarmason


On Mon, Jan 08 2018, Dan Jacques jotted:

> On 2018-01-08 20:27, Johannes Schindelin wrote:
>
>> > Maybe we covered this in previous submissions, but refresh my memory,
>> > why is the *_PERL define still needed? Reading this explanation doesn't
>> > make sense to me, but I'm probably missing something.
>>
>> If the reason is to accommodate Windows, I think it'd make more sense to
>> change the way Git for Windows handles this, and use the same relative
>> paths (if possible, that is, see the GITPERLLIB problems I mentioned
>> elsewhere and which necessitated
>> https://github.com/git-for-windows/git/commit/3b2f716bd8).
>> (...)
>> What do you think? Should we just fold the RUNTIME_PREFIX_PERL handling
>> into RUNTIME_PREFIX and be done with that part?
>> (...)
>> As I mentioned in the mail I just finished and sent (I started it hours
>> ago, but then got busy with other things while the builds were running): I
>> am totally cool with changing this on Windows, too. Should simplify
>> things, right?
>
> No objections here. I see it as adding slightly more risk to this patch's
> potential impact on Windows builds, but if Git-for-Windows is okay with that,
> I'll go ahead and fold RUNTIME_PREFIX_PERL into RUNTIME_PREFIX for
> simplicity's sake.
>
> I'll add a "NO_RUNTIME_PREFIX_PERL" flag as per avarab@'s suggestion as a
> potential mitigation if a problem does end up arising in Windows builds,
> with a note that NO_RUNTIME_PREFIX_PERL can be deleted if everything seems
> to be working. What do you think?

To be clear, I meant that if it's determined by you/others that an
opt-out on Windows is needed I think it makes sense to make it a NO_*
flag, but if there's a solution where we can just turn it on for
everything then ideally we'd just have RUNTIME_PREFIX=YesPlease.

Re: [PATCH v5 2/3] Makefile: add Perl runtime prefix support

2018-01-08 Thread Dan Jacques

On Mon, 08 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> > I'll add a "NO_RUNTIME_PREFIX_PERL" flag as per avarab@'s suggestion as a
> > potential mitigation if a problem does end up arising in Windows builds,
> > with a note that NO_RUNTIME_PREFIX_PERL can be deleted if everything seems
> > to be working. What do you think?
>
> To be clear, I meant that if it's determined by you/others that an
> opt-out on Windows is needed I think it makes sense to make it a NO_*
> flag, but if there's a solution where we can just turn it on for
> everything then ideally we'd just have RUNTIME_PREFIX=YesPlease.

Oh that's fair. Okay, we'll go all in and just have RUNTIME_PREFIX.

Re: [PATCH] travis-ci: build Git during the 'script' phase

2018-01-08 Thread Lars Schneider

> On 08 Jan 2018, at 23:07, Junio C Hamano  wrote:
> 
> SZEDER Gábor  writes:
> 
>> The reason why Travis CI does it this way and why it's a better
>> approach than ours lies in how unsuccessful build jobs are
>> categorized.  ...
>> ...
>> This makes it easier, both for humans looking at the Travis CI web
>> interface and for automated tools querying the Travis CI API,...
>> ...
>> A verbose commit message for such a change... but I don't know why we
>> started with building Git in the 'before_script' phase.
> 
> Thanks for writing it up clearly.  TBH, I didn't even realize that
> there were meaningful distinctions between the two cases after
> seeing that sometimes our tests were failing and sometimes erroring
> ;-)

I understand the reasons for the proposed patch. However, I did this 
intentionally back then. Here is my reason:

If `make` is successful, then I am not interested in its output.
Look at this run: https://travis-ci.org/szeder/git/jobs/324271623

You have to scroll down 1,406 lines to get to the test result 
output (this is usually the interesting part).

If this is a valid argument for you, would it be an option to
pipe the verbose `make` output to a file and only print it in case
of error (we do something similar for the tests already).

- Lars

Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Thomas Gummerer

On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
> > const char *path)
> > split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything).

It felt slightly nicer to me as well, but it threw up a bunch of
questions about how worktrees will fit together with struct
repository.  As I don't feel very strongly about this either way I
decided to go with what Brandon suggested as an alternative, which
allows us to defer that decision.  I'd be happy to revert this to the
way I had it in v2, but I don't want to drag the discussion on too
long either, as this series does fix some real regressions.

> The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly  + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.
>
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
> read-cache.c: git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",

Good point, I hadn't looked at these, I only looked at the current
test failures.  I'm going to be away for the rest of the week, but
I'll have a look at them when I come back.

> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

Interesting.  I didn't follow the development of  struct repository
too closely either, so I'm not sure.  Brandon might have more of an
opinion on that? :)

> -- 
> Duy

Re: [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags

2018-01-08 Thread Lars Schneider


> On 08 Jan 2018, at 22:28, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> diff --git a/sha1_file.c b/sha1_file.c
>> index afe4b90f6e..dcb02e9ffd 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -75,14 +75,14 @@ static struct cached_object *find_cached_object(const 
>> unsigned char *sha1)
>> }
>> 
>> 
>> -static enum safe_crlf get_safe_crlf(unsigned flags)
>> +static int get_conv_flags(unsigned flags)
>> {
>>  if (flags & HASH_RENORMALIZE)
>> -return SAFE_CRLF_RENORMALIZE;
>> +return CONV_EOL_RENORMALIZE;
>>  else if (flags & HASH_WRITE_OBJECT)
>> -return safe_crlf;
>> +return global_conv_flags_eol | CONV_WRITE_OBJECT;
> 
> This macro has not yet introduced at this point (it appears in 6/7
> if I am not mistaken).

Nice catch. I'll fix that in the next iteration.

Is it OK if I send the next iteration soon or would you prefer
it if I wait until after 2.16 release?

Plus, is it ok to keep the base of the series or would you prefer
it if I rebase it to the latest master (because of a minor conflict)?

Thanks,
Lars

Re: [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags

2018-01-08 Thread Junio C Hamano

Lars Schneider  writes:

> Nice catch. I'll fix that in the next iteration.
>
> Is it OK if I send the next iteration soon or would you prefer
> it if I wait until after 2.16 release?
>
> Plus, is it ok to keep the base of the series or would you prefer
> it if I rebase it to the latest master (because of a minor conflict)?

I do not see this topic as a fix for grave bug that needs to go to
older maintenance track---it is rather a new feature, isn't it?  So
a rebased series that cleanly applies on top of 2.16 final would be
a reasonable way to go forward.

Thanks.

Re: [PATCH] oidset: don't return value from oidset_init

2018-01-08 Thread Junio C Hamano

Thomas Gummerer  writes:

> c3a9ad3117 ("oidset: add iterator methods to oidset", 2017-11-21)
> introduced a 'oidset_init()' function in oidset.h, which has void as
> return type, but returns an expression.
> ...
> diff --git a/oidset.h b/oidset.h
> index 783abceccd..40ec5f87fe 100644
> --- a/oidset.h
> +++ b/oidset.h
> @@ -27,7 +27,7 @@ struct oidset {
>  
>  static inline void oidset_init(struct oidset *set, size_t initial_size)
>  {
> - return oidmap_init(&set->map, initial_size);
> + oidmap_init(&set->map, initial_size);
>  }

Makes sense.  Perhaps "inline" hids this from error-checking
compilers, I wonder?

Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Brandon Williams

On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
> > const char *path)
> > split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything). The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly  + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.

My biggest complaint about v2 is that we still don't quite know the best
way to integrate worktrees and struct repository yet so I was very
reluctant to start having them interact in the way v2 was using them
together.  I'm very much in favor of this version (v3) as each worktree
can explicitly provide their gitdir to be used to determine where to
read the shared index file without having to replicate a struct
repository for each.

> 
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
> read-cache.c: git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",
> 
> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

I'm not too keen on having an index_state struct contain a back pointer
to a repository struct.  I do think that we may want to have worktree
structs contain a back pointer to the struct repository they correspond
to.  That way there is only one instance of the repository (and
object-store once that gets integrated) yet multiple worktrees.

-- 
Brandon Williams

Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Duy Nguyen

On Tue, Jan 9, 2018 at 6:38 AM, Brandon Williams  wrote:
> On 01/08, Duy Nguyen wrote:
>> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
>> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
>> > const char *path)
>> > split_index->base = xcalloc(1, sizeof(*split_index->base));
>> >
>> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
>> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
>> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
>>
>> Personally I prefer the repo_git_path() from v2 (sorry I was away and
>> could not comment anything). The thing is, git_path() and friends
>> could do some path translation underneath to support multiple
>> worktrees. Think of the given path here as a "virtual path" that may
>> be translated to something else, not exactly  + "/" +
>> "sharedindex.%s". But in practice, we're not breaking the relationship
>> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
>> manual path transformation here is fine.
>
> My biggest complaint about v2 is that we still don't quite know the best
> way to integrate worktrees and struct repository yet so I was very
> reluctant to start having them interact in the way v2 was using them
> together.

This will be on my todo list (after I have finished with
nd/worktree-move which has been on 'pu' for too long). I'm ok with v3
then.

> I'm very much in favor of this version (v3) as each worktree
> can explicitly provide their gitdir to be used to determine where to
> read the shared index file without having to replicate a struct
> repository for each.

Isn't that the end goal though (I vaguely recall this discussion when
'struct repository' was an idea), that we will pass struct repository
around [1] rather than relying on global objects.

[1] or in this case struct worktree-something because the index
belongs more to a worktree than the object/refs database.

>> What about the other git_path() in this file? With patch applied I still get
>>
>> > ~/w/git/temp $ git grep git_path read-cache.c
>> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
>> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
>> read-cache.c: git_path("sharedindex.%s",
>> sha1_to_hex(si->base->sha1)));
>> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",
>>
>> I suppose submodule has not triggered any of these code paths yet. Not
>> sure if we should deal with them now or wait until later.
>>
>> Perhaps if we add a "struct repository *" pointer inside index_state,
>> we could retrieve back the_repository (or others) and call
>> repo_git_path() everywhere without changing index api too much. I
>> don't know. I like the  'struct repository' concept but couldn't
>> follow its development so I don't if this is what it should become.
>
> I'm not too keen on having an index_state struct contain a back pointer
> to a repository struct.  I do think that we may want to have worktree
> structs contain a back pointer to the struct repository they correspond
> to.  That way there is only one instance of the repository (and
> object-store once that gets integrated) yet multiple worktrees.

Yeah back pointer from worktree struct makes sense.
-- 
Duy

Re: upstreaming https://github.com/cgwalters/git-evtag ?

2018-01-08 Thread Colin Walters

On Mon, Jan 8, 2018, at 3:49 PM, Stefan Beller wrote:
> On Mon, Jan 8, 2018 at 12:40 PM, Santiago Torres  wrote:
> > Hi,
> >
> > I personally like the idea of git-evtags, but I feel that they could be
> > made so that push certificates (and being hash-algorithm agnostic)
> > should provide the same functionality with less code.
> >
> > To me, a git evtag is basically a signed tag + a data structure similar
> > to a push certificate embedded in it. I wonder if, with the current
> > tooling in git, this could be done as a custom command...
> 
> In that case, why not migrate Git to a new hash function instead
> of adding a very niche fixup?

Every day, for many years I find it maddening and really ridiculous
that the Fedora package build process insists I provide it a tarball
instead of being able to just fetch a signed git tag.

Now while I haven't fought the battle to teach Fedora to actually use
this, I think I have a pretty strong argument that git-evtag very clearly
fulfills the same role that a signed tarball does.

In particular, how a single checksum covers the entire source - no
hash tree involved.  The way that the evtag is "horizontal" across
the source while the git tree is "vertical" around history means
they're complementary.

> See Documentation/technical/hash-function-transition.txt
> for how to do it.

evtag took me a day or two to write initially and doesn't
impose any requirements on users except a small additional
bit of software.

In contrast, working on hash-function-transition.txt?  That
seems like it'd easily consume many person-months of work.
And that plan only exists post-shatter.io, whereas git-evtag
long predates both.

> Personally I'd dislike to include ev-tags as it might send a signal
> of "papering over sha1 issues instead of fixing it".

I don't agree.  I think it's pretty clear that a hash function transition
would be a huge amount of work - not least because of course
there are now at least two widely used implementations of git in C,
plus https://www.eclipse.org/jgit/ plus...

> push certificates are somewhat underdocumented, see the

Why not call them "git signed pushes"?  Junio's post
even says "the signed push".

And I just looked at this a little bit more but I'm not sure I
see how this covers the same goal as evtags; it seems
more about stopping someone from MITM my push
to github.com, and not about ensuring integrity from
someone pulling from github.com (and not wanting
to fully trust github).

Re: [PATCH v3 5/7] convert_to_git(): safe_crlf/checksafe becomes int conv_flags

2018-01-08 Thread Torsten Bögershausen

On Mon, Jan 08, 2018 at 11:47:20PM +0100, Lars Schneider wrote:
> 
> > On 08 Jan 2018, at 22:28, Junio C Hamano  wrote:
> > 
> > lars.schnei...@autodesk.com writes:
> > 
> >> diff --git a/sha1_file.c b/sha1_file.c
> >> index afe4b90f6e..dcb02e9ffd 100644
> >> --- a/sha1_file.c
> >> +++ b/sha1_file.c
> >> @@ -75,14 +75,14 @@ static struct cached_object *find_cached_object(const 
> >> unsigned char *sha1)
> >> }
> >> 
> >> 
> >> -static enum safe_crlf get_safe_crlf(unsigned flags)
> >> +static int get_conv_flags(unsigned flags)
> >> {
> >>if (flags & HASH_RENORMALIZE)
> >> -  return SAFE_CRLF_RENORMALIZE;
> >> +  return CONV_EOL_RENORMALIZE;
> >>else if (flags & HASH_WRITE_OBJECT)
> >> -  return safe_crlf;
> >> +  return global_conv_flags_eol | CONV_WRITE_OBJECT;
> > 
> > This macro has not yet introduced at this point (it appears in 6/7
> > if I am not mistaken).
> 
> Nice catch. I'll fix that in the next iteration.
> 
> Is it OK if I send the next iteration soon or would you prefer
> it if I wait until after 2.16 release?
> 
> Plus, is it ok to keep the base of the series or would you prefer
> it if I rebase it to the latest master (because of a minor conflict)?
> 
> Thanks,
> Lars

I noticed the missing macro as well, while doing the rebase
to git.git/master, but forget to mention it here on the list

Lars, if you want, please have a look here:
https://github.com/tboegi/git/tree/180108-1858-For-lars-schneider-encode-V3C

Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

2018-01-08 Thread Jeff King

On Mon, Jan 08, 2018 at 02:43:00PM +0100, Johannes Schindelin wrote:

> Take the interactive rebase for example. It generates todo lists with
> abbreviated commit names, for readability (and it is *really* important to
> keep this readable). As we expect new objects to be introduced by the
> interactive rebase, we convert that todo list to unabbreviated commit
> names before executing the interactive rebase.
> 
> Your idea (to not care about unambiguous abbreviations) would break that.

I think that could be easily worked around for rebase by asking git to
check ambiguity during the conversion. Speed is much less of a problem
there, because we're doing a relatively small number of abbreviations
(compared to "git log --oneline --raw", which is abbreviating almost
every object in the repository).

But I agree it's a potential problem for other scripts that we might not
have control over. I hadn't really intended this to be the default
behavior (my patch was just trying to show the direction). But it does
make for a pretty awful interface if callers have to opt into it
manually ("git log --oneline --no-really-go-fast").

I am a bit curious if there's a bounded probability that people would
find acceptable for Git to give an ambiguous abbreviation. We already
accept 1 in 2^160, of course. But would, e.g., one in a million be OK?

-Peff

1 2 >

1 - 100 of 103 matches

Mail list logo