date:20180712

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Eric Sunshine

On Wed, Jul 11, 2018 at 5:37 PM Junio C Hamano  wrote:
> As with the previous "transform the script and feed the result to
> shell" approach, this risks to force us into writing our tests in a
> subset of valid shell language, which is the primary reason why I
> was not enthused when I saw the previous round.  The worst part of
> it is that the subset is not strictly defined based on the shell
> language syntax or features (e.g. we allow this and that feature but
> not that other feature) but "whatever that does not cause the linter
> script to trigger false positives".

Some observations perhaps worth considering:

The linter is happy (no false positives) with the 13000+ existing
tests (though, of course, not all of them use subshells). Those tests,
written over many years, vary quite wildly in style and implementation
approach, so the "subset" of shell language accepted by the linter is
quite broad.

The original --chain-lint series (jk/test-chain-lint) had to make some
changes, such as wrapping code in a {...} block[1], merely to pacify
the linter. v2 of the subshell linter required no such changes.

The subshell linter was crafted to be on par with the existing
--chain-lint in terms of strictness (and looseness), so the subshell
linter is not more strict than the existing implementation. (For
instance, one can escape the strict &&-chain requirement in the
existing --chain-lint by wrapping code in a {...} block. The subshell
linter intentionally allows that escape, as well.)

And, perhaps most important: We're not tied indefinitely to the
"subset" implemented by the current linter. If it is indeed found to
be too strict or limiting, it can always be loosened or retired
altogether.

Thanks for the feedback.

[1]: bfe998fc9b (t0050: appease --chain-lint, 2015-03-20)

[PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Akinori MUSHA

It looks like write_author_script() intends to write out a file in
Bourne shell syntax, but it doesn't put a closing single quote on the
last line.

This patch makes .git/rebase-merge/author-script actually parsable by
sh(1) by adding a single quote and a linefeed to terminate the line
properly.

Signed-off-by: Akinori MUSHA 
---
 sequencer.c   |  1 +
 t/t3404-rebase-interactive.sh | 13 +
 2 files changed, 14 insertions(+)

diff --git a/sequencer.c b/sequencer.c
index 4034c0461..5f32b6df1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -651,6 +651,7 @@ static int write_author_script(const char *message)
strbuf_addch(&buf, *(message++));
else
strbuf_addf(&buf, "'%c'", *(message++));
+   strbuf_addstr(&buf, "'\n");
res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
strbuf_release(&buf);
return res;
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 352a52e59..345b103eb 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -75,6 +75,19 @@ test_expect_success 'rebase --keep-empty' '
test_line_count = 6 actual
 '
 
+test_expect_success 'rebase -i writes out .git/rebase-merge/author-script in 
"edit" that sh(1) can parse' '
+   test_when_finished "git rebase --abort ||:" &&
+   git checkout master &&
+   set_fake_editor &&
+   FAKE_LINES="edit 1" git rebase -i HEAD^ &&
+   test -f .git/rebase-merge/author-script &&
+   unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
+   eval "$(cat .git/rebase-merge/author-script)" &&
+   test "$(git show --quiet --pretty=format:%an)" = "$GIT_AUTHOR_NAME" &&
+   test "$(git show --quiet --pretty=format:%ae)" = "$GIT_AUTHOR_EMAIL" &&
+   test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
"$GIT_AUTHOR_DATE"
+'
+
 test_expect_success 'rebase -i with the exec command' '
git checkout master &&
(
-- 
2.18.0


-- 
Akinori MUSHA / https://akinori.org/

[PATCH v3 2/3] t/lib-httpd: add the strip_access_log() helper function

2018-07-12 Thread SZEDER Gábor

Four tests in three httpd-related test scripts check the contents of
Apache's 'access.log', and they all do so by running 'sed' with the
exact same script consisting of four s/// commands to strip
uninteresting log fields and to vertically align the requested URLs.

Extract this into a common helper function 'strip_access_log' in
'lib-httpd.sh', and use it in all of those tests.

Signed-off-by: SZEDER Gábor 
---
 t/lib-httpd.sh  |  9 +
 t/t5541-http-push-smart.sh  | 14 ++
 t/t5551-http-fetch-smart.sh |  7 +--
 t/t5561-http-backend.sh |  7 +--
 4 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..b6788fea57 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -287,3 +287,12 @@ expect_askpass() {
test_cmp "$TRASH_DIRECTORY/askpass-expect" \
 "$TRASH_DIRECTORY/askpass-query"
 }
+
+strip_access_log() {
+   sed -e "
+   s/^.* \"//
+   s/\"//
+   s/ [1-9][0-9]*\$//
+   s/^GET /GET  /
+   " "$HTTPD_ROOT_PATH"/access.log
+}
diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index 5c9ca19401..83390f80cc 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -47,12 +47,7 @@ test_expect_success 'no empty path components' '
cd "$ROOT_PATH" &&
git clone $HTTPD_URL/smart/test_repo.git/ test_repo_clone &&
 
-   sed -e "
-   s/^.* \"//
-   s/\"//
-   s/ [1-9][0-9]*\$//
-   s/^GET /GET  /
-   " >act <"$HTTPD_ROOT_PATH"/access.log &&
+   strip_access_log >act &&
test_cmp exp act
 '
 
@@ -134,12 +129,7 @@ GET  
/smart/test_repo.git/info/refs?service=git-receive-pack HTTP/1.1 200
 POST /smart/test_repo.git/git-receive-pack HTTP/1.1 200
 EOF
 test_expect_success 'used receive-pack service' '
-   sed -e "
-   s/^.* \"//
-   s/\"//
-   s/ [1-9][0-9]*\$//
-   s/^GET /GET  /
-   " >act <"$HTTPD_ROOT_PATH"/access.log &&
+   strip_access_log >act &&
test_cmp exp act
 '
 
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 913089b144..c8b6ec493b 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -103,12 +103,7 @@ GET  /smart/repo.git/info/refs?service=git-upload-pack 
HTTP/1.1 200
 POST /smart/repo.git/git-upload-pack HTTP/1.1 200
 EOF
 test_expect_success 'used upload-pack service' '
-   sed -e "
-   s/^.* \"//
-   s/\"//
-   s/ [1-9][0-9]*\$//
-   s/^GET /GET  /
-   " >act <"$HTTPD_ROOT_PATH"/access.log &&
+   strip_access_log >act &&
test_cmp exp act
 '
 
diff --git a/t/t5561-http-backend.sh b/t/t5561-http-backend.sh
index 84a955770a..4248b5a06d 100755
--- a/t/t5561-http-backend.sh
+++ b/t/t5561-http-backend.sh
@@ -129,12 +129,7 @@ GET  /smart/repo.git/info/refs?service=git-receive-pack 
HTTP/1.1 403 -
 POST /smart/repo.git/git-receive-pack HTTP/1.1 403 -
 EOF
 test_expect_success 'server request log matches test results' '
-   sed -e "
-   s/^.* \"//
-   s/\"//
-   s/ [1-9][0-9]*\$//
-   s/^GET /GET  /
-   " >act <"$HTTPD_ROOT_PATH"/access.log &&
+   strip_access_log >act &&
test_cmp exp act
 '
 
-- 
2.18.0.305.g66e9e0a543

[PATCH v3 3/3] t/lib-httpd: avoid occasional failures when checking access.log

2018-07-12 Thread SZEDER Gábor

The last test of 't5561-http-backend.sh', 'server request log matches
test results' may fail occasionally, because the order of entries in
Apache's access log doesn't match the order of requests sent in the
previous tests, although all the right requests are there.  I saw it
fail on Travis CI five times in the span of about half a year, when
the order of two subsequent requests was flipped, and could trigger
the failure with a modified Git.  However, I was unable to trigger it
with stock Git on my machine.  Three tests in
't5541-http-push-smart.sh' and 't5551-http-fetch-smart.sh' check
requests in the log the same way, so they might be prone to a similar
occasional failure as well.

When a test sends a HTTP request, it can continue execution after
'git-http-backend' fulfilled that request, but Apache writes the
corresponding access log entry only after 'git-http-backend' exited.
Some time inevitably passes between fulfilling the request and writing
the log entry, and, under unfavourable circumstances, enough time
might pass for the subsequent request to be sent and fulfilled by a
different Apache thread or process, and then Apache writes access log
entries racily.

This effect can be exacerbated by adding a bit of variable delay after
the request is fulfilled but before 'git-http-backend' exits, e.g.
like this:

  diff --git a/http-backend.c b/http-backend.c
  index f3dc218b2..bbf4c125b 100644
  --- a/http-backend.c
  +++ b/http-backend.c
  @@ -709,5 +709,7 @@ int cmd_main(int argc, const char **argv)
   max_request_buffer);

cmd->imp(&hdr, cmd_arg);
  + if (getpid() % 2)
  + sleep(1);
return 0;
   }

This delay considerably increases the chances of log entries being
written out of order, and in turn makes t5561's last test fail almost
every time.  Alas, it doesn't seem to be enough to trigger a similar
failure in t5541 and t5551.

So, since we can't just rely on the order of access log entries always
corresponding the order of requests, make checking the access log more
deterministic by sorting (simply lexicographically) both the stripped
access log entries and the expected entries before the comparison with
'test_cmp'.  This way the order of log entries won't matter and
occasional out-of-order entries won't trigger a test failure, but the
comparison will still notice any unexpected or missing log entries.

OTOH, this sorting will make it harder to identify from which test an
unexpected log entry came from or which test's request went missing.
Therefore, in case of an error include the comparison of the unsorted
log enries in the test output as well.

And since all this should be performed in four tests in three test
scripts, put this into a new helper function 'check_access_log' in
't/lib-httpd.sh'.

Signed-off-by: SZEDER Gábor 
---
 t/lib-httpd.sh  | 12 
 t/t5541-http-push-smart.sh  |  6 ++
 t/t5551-http-fetch-smart.sh |  3 +--
 t/t5561-http-backend.sh |  3 +--
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index b6788fea57..7f060aebd0 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -296,3 +296,15 @@ strip_access_log() {
s/^GET /GET  /
" "$HTTPD_ROOT_PATH"/access.log
 }
+
+# Requires one argument: the name of a file containing the expected stripped
+# access log entries.
+check_access_log() {
+   sort "$1" >"$1".sorted &&
+   strip_access_log >access.log.stripped &&
+   sort access.log.stripped >access.log.sorted &&
+   if ! test_cmp "$1".sorted access.log.sorted
+   then
+   test_cmp "$1" access.log.stripped
+   fi
+}
diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index 83390f80cc..a9836e4af0 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -47,8 +47,7 @@ test_expect_success 'no empty path components' '
cd "$ROOT_PATH" &&
git clone $HTTPD_URL/smart/test_repo.git/ test_repo_clone &&
 
-   strip_access_log >act &&
-   test_cmp exp act
+   check_access_log exp
 '
 
 test_expect_success 'clone remote repository' '
@@ -129,8 +128,7 @@ GET  
/smart/test_repo.git/info/refs?service=git-receive-pack HTTP/1.1 200
 POST /smart/test_repo.git/git-receive-pack HTTP/1.1 200
 EOF
 test_expect_success 'used receive-pack service' '
-   strip_access_log >act &&
-   test_cmp exp act
+   check_access_log exp
 '
 
 test_http_push_nonff "$HTTPD_DOCUMENT_ROOT_PATH"/test_repo.git \
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index c8b6ec493b..3aab44bdcb 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -103,8 +103,7 @@ GET  /smart/repo.git/info/refs?service=git-upload-pack 
HTTP/1.1 200
 POST /smart/repo.git/git-upload-pack HTTP/1.1 200
 EOF
 test_expect_success 'used upload-pack service' '
-   strip_access_log >act &&
-   test_cmp exp act
+   check_access_log

[PATCH v3 1/3] t5541: clean up truncating access log

2018-07-12 Thread SZEDER Gábor

In the second test of 't5541-http-push-smart.sh', 'no empty path
components' we truncate Apache's access log by running:

  echo >.../access.log

There are two issues with this approach:

  - This doesn't leave an empty file behind, like a proper truncation
would, but a file with a lone newline in it.  Consequently, a
later test checking the log's contents must consider this improper
truncation and include an empty line in the expected content.

  - This truncation is done in the middle of the test, because,
quoting the in-code comment, "we do this [truncation] before the
actual comparison to ensure the log is cleared" even when
subsequent 'test_cmp' fails.  Alas, this is not quite robust
enough, as it is conceivable that 'git clone' fails after already
having sent a request, in which case the access log would not be
truncated and would leave stray log entries behind.

Since there is no need for that newline at all, drop the 'echo' from
the truncation and adjust the expected content accordingly.
Furthermore, make sure that the truncation is performed no matter
whether and how 'git clone' fails unexpectedly by specifying it as a
'test_when_finished' command.

Signed-off-by: SZEDER Gábor 
---
 t/t5541-http-push-smart.sh | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index a2af693068..5c9ca19401 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -38,6 +38,10 @@ GET  /smart/test_repo.git/info/refs?service=git-upload-pack 
HTTP/1.1 200
 POST /smart/test_repo.git/git-upload-pack HTTP/1.1 200
 EOF
 test_expect_success 'no empty path components' '
+   # Clear the log, so that it does not affect the "used receive-pack
+   # service" test which reads the log too.
+   test_when_finished ">\"$HTTPD_ROOT_PATH\"/access.log" &&
+
# In the URL, add a trailing slash, and see if git appends yet another
# slash.
cd "$ROOT_PATH" &&
@@ -49,13 +53,6 @@ test_expect_success 'no empty path components' '
s/ [1-9][0-9]*\$//
s/^GET /GET  /
" >act <"$HTTPD_ROOT_PATH"/access.log &&
-
-   # Clear the log, so that it does not affect the "used receive-pack
-   # service" test which reads the log too.
-   #
-   # We do this before the actual comparison to ensure the log is cleared.
-   echo > "$HTTPD_ROOT_PATH"/access.log &&
-
test_cmp exp act
 '
 
@@ -124,7 +121,6 @@ test_expect_success 'rejected update prints status' '
 rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
 
 cat >exp <

[PATCH v3 0/3] Fix occasional test failures in http tests

2018-07-12 Thread SZEDER Gábor

To recap: 't5561-http-backend.sh' is prone to occasional failures;
luckily it's not 'git-http-backend's fault, but the test script is a
bit racy; patch 3/3's commit message discusses the details at length.

Changes since v2:

  - Use 'test_when_finished' to clear the access log in the first
patch.  (and necessary adjustments to the subsequent patches
because of conflicts)
  
Interdiff included below.  (I wanted to include range-diff, but it
didn't make any sense)


SZEDER Gábor (3):
  t5541: clean up truncating access log
  t/lib-httpd: add the strip_access_log() helper function
  t/lib-httpd: avoid occasional failures when checking access.log

 t/lib-httpd.sh  | 21 +
 t/t5541-http-push-smart.sh  | 28 ++--
 t/t5551-http-fetch-smart.sh |  8 +---
 t/t5561-http-backend.sh |  8 +---
 4 files changed, 29 insertions(+), 36 deletions(-)

-- 
2.18.0.305.g66e9e0a543

diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index a481e3989a..a9836e4af0 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -38,6 +38,10 @@ GET  /smart/test_repo.git/info/refs?service=git-upload-pack 
HTTP/1.1 200
 POST /smart/test_repo.git/git-upload-pack HTTP/1.1 200
 EOF
 test_expect_success 'no empty path components' '
+   # Clear the log, so that it does not affect the "used receive-pack
+   # service" test which reads the log too.
+   test_when_finished ">\"$HTTPD_ROOT_PATH\"/access.log" &&
+
# In the URL, add a trailing slash, and see if git appends yet another
# slash.
cd "$ROOT_PATH" &&
@@ -46,12 +50,6 @@ test_expect_success 'no empty path components' '
check_access_log exp
 '
 
-test_expect_success 'clear access log' '
-   # Clear the log, so that it does not affect the "used receive-pack
-   # service" test which reads the log too.
-   >"$HTTPD_ROOT_PATH"/access.log
-'
-
 test_expect_success 'clone remote repository' '
rm -rf test_repo_clone &&
git clone $HTTPD_URL/smart/test_repo.git test_repo_clone &&

git@vger.kernel.org

2018-07-12 Thread SZEDER Gábor



> Signed-off-by: Eric Sunshine 
> ---
>  t/t5300-pack-object.sh |  2 +-
>  t/t5302-pack-index.sh  |  2 +-
>  t/t5401-update-hooks.sh|  4 ++--
>  t/t5500-fetch-pack.sh  |  2 +-
>  t/t5505-remote.sh  |  2 +-
>  t/t5512-ls-remote.sh   |  4 ++--
>  t/t5516-fetch-push.sh  | 10 +-
>  t/t5517-push-mirror.sh | 10 +-
>  t/t5526-fetch-submodules.sh|  2 +-
>  t/t5531-deep-submodule-push.sh |  2 +-
>  t/t5543-atomic-push.sh |  2 +-
>  t/t5601-clone.sh   |  2 +-
>  t/t5605-clone-local.sh |  2 +-
>  t/t5801-remote-helpers.sh  |  2 +-
>  14 files changed, 24 insertions(+), 24 deletions(-)

The change below should be squashed into this patch to fix a
previously unnoticed broken &&-chain.  I think you missed it, because
this test script is rather expensive and you didn't run it with
GIT_TEST_CLONE_2GB=YesPlease.


diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
index df822d9a3e..2c6bc07344 100755
--- a/t/t5608-clone-2gb.sh
+++ b/t/t5608-clone-2gb.sh
@@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
printf "blob\nmark :$i\ndata $blobsize\n" &&
#test-tool genrandom $i $blobsize &&
printf "%-${blobsize}s" $i &&
-   echo "M 100644 :$i $i" >> commit
+   echo "M 100644 :$i $i" >> commit &&
i=$(($i+1)) ||
echo $? > exit-status
 done &&

Re: [PATCH 3/3] rebase --rebase-merges: adjust man page for octopus support

2018-07-12 Thread Johannes Schindelin

Hi Elijah,

On Wed, 11 Jul 2018, Elijah Newren wrote:

> On Fri, Mar 9, 2018 at 8:36 AM, Johannes Schindelin via GitGitGadget
>  wrote:
>
> > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> > index 0e20a66e7..c4bcd24bb 100644
> > --- a/Documentation/git-rebase.txt
> > +++ b/Documentation/git-rebase.txt
> > @@ -889,7 +889,8 @@ If a `merge` command fails for any reason other than 
> > merge conflicts (i.e.
> >  when the merge operation did not even start), it is rescheduled 
> > immediately.
> >
> >  At this time, the `merge` command will *always* use the `recursive`
> > -merge strategy, with no way to choose a different one. To work around
> > +merge strategy for regular merges, and `octopus` for octopus merges,
> > +strategy, with no way to choose a different one. To work around
> 
> The "...merges, strategy, with..." looks like an incomplete edit.
> Perhaps "...and `octopus` strategy for octopus merges, with no way..."
> ?

Okay. Will be fixed in v2.

Ciao,
Dscho

Hello Beautiful

2018-07-12 Thread Jack

Hi Dear, my name is Jack and i am seeking for a relationship in which i will 
feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each 
other more if you do not mind. I am open to answering questions from you as i 
think my approach is a little inappropriate. Hope to hear back from you.

Jack.

Re: [PATCH 0/3] rebase -r: support octopus merges

2018-07-12 Thread Johannes Schindelin

Hi Junio,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > The `--rebase-merges` option of `git rebase` generates todo lists that
> > include the merge commits that are to be rebased.
> >
> > To keep things simpler to review, I decided to support only regular,
> > 2-parent merge commits first.
> >
> > With this patch series, support is extended to cover also octopus
> > merges.
> 
> ;-)
> 
> To be honest, I am not sure if there still are people who use octopus
> (even though I freely admit that I am guilty of inventing the term and
> the mechanism), given its downsides, primary one of which is that it
> makes bisection less efficient.
> 
> Nevertheless, this *is* the right thing to do from feature completeness
> point of view.  Thanks for following it through.

--preserve-merges supports octopus merges, IIRC. And as I want to
deprecate --preserve-merges, --rebase-merges *has* to support octopus
merges, whether you or I would ever use them or not.

Ciao,
Dscho

Re: [PATCH 0/3] rebase -r: support octopus merges

2018-07-12 Thread Johannes Schindelin

Hi Stefan,

On Wed, 11 Jul 2018, Stefan Beller wrote:

> On Wed, Jul 11, 2018 at 10:35 AM Junio C Hamano  wrote:
> > To be honest, I am not sure if there still are people who use
> > octopus
> 
> The latest merge with more than 2 parents in linux.git is df958569dbaa
> (Merge branches 'acpi-tables' and 'acpica', 2018-07-05), although
> looking through the log of octopusses I get the impression that mostly
> Rafael J. Wysocki  is really keen on these.
> :-)

IMO core Git contributors seriously need to forget about using the Linux
kernel repository as the gold standard when looking how Git is used.

Git is used in *so many* different scenarios, and both in terms of
commits/day as well as overall repository size *and* development speed,
Linux is not even in the "smack down the middle" category. Compared to
what is being done with Git on a daily basis, the Linux kernel repository
(and project structure) is relatively small.

A much more meaningful measure would be: how many octopus merge commits
have been pushed to GitHub in the past two weeks. I don't think I have the
technical means to answer that question, though.

In any case, the Git project is run in such a way that even having a
feature used even by just single user whose name happens to be Andrew
Morton declares that feature off-limits for deprecation.

When applying this context to `--rebase-merges` and Octopus merges, even a
single user would be sufficient for us to support that feature. And I am
sure that there are more than just a dozen users of this feature.

Ciao,
Dscho

Re: [PATCH 1/3] merge: allow reading the merge commit message from a file

2018-07-12 Thread Johannes Schindelin

Hi Junio,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > diff --git a/builtin/merge.c b/builtin/merge.c
> > index 4a4c09496..b0e907751 100644
> > --- a/builtin/merge.c
> > +++ b/builtin/merge.c
> > @@ -111,6 +111,35 @@ static int option_parse_message(const struct option 
> > *opt,
> > return 0;
> >  }
> >  
> > +static int option_read_message(struct parse_opt_ctx_t *ctx,
> > +  const struct option *opt, int unset)
> > +{
> > +   struct strbuf *buf = opt->value;
> > +   const char *arg;
> > +
> > +   if (unset)
> > +   BUG("-F cannot be negated");
> 
> The message "-F cannot be negated" looks as if it is pointing out a
> mistake by the end user, and does not mesh well with the real reason
> why this is BUG() and is not die().
> 
> I understand that this is BUG() not die() because options[] array
> tells this callback not to be called with unset by having the
> PARSE_OPT_NONEG bit there.

Okay. I would have appreciated some sort of indication what you prefer
instead. I went with "--no-file?!?"

> > +   if (ctx->opt) {
> > +   arg = ctx->opt;
> > +   ctx->opt = NULL;
> > +   } else if (ctx->argc > 1) {
> > +   ctx->argc--;
> > +   arg = *++ctx->argv;
> > +   } else
> > +   return opterror(opt, "requires a value", 0);
> > +
> > +   if (buf->len)
> > +   strbuf_addch(buf, '\n');
> 
> Do we assume that buf, if it is not empty, is properly terminated
> with LF already?  I am wondering if the real reason we do these two
> lines is to make sure we have a separating blank line between what
> is already there (if there already is something) and what we add, in
> which case the above would want to say
> 
>   if (buf->len) {
>   strbuf_complete_line(buf);
>   strbuf_addch(buf, '\n');
>   }
> 
> instead.

True. Thanks for the suggestion!

> > +   if (ctx->prefix && !is_absolute_path(arg))
> > +   arg = prefix_filename(ctx->prefix, arg);
> > +   if (strbuf_read_file(buf, arg, 0) < 0)
> > +   return error(_("could not read file '%s'"), arg);
> > +   have_message = 1;
> 
> A similar question is what we would want to do when the file ends
> with an incomplete line.  With "--log", we would be appending more
> stuff to buf, and we'd want to complete such an incomplete line
> before that happens, either here or in the code immediately before
> "--log" is processed.

This is what I inserted here:

strbuf_complete_line(buf);

> > +
> > +   return 0;
> > +}
> > +
> >  static struct strategy *get_strategy(const char *name)
> >  {
> > int i;

Thanks for the review, and especially for the suggestions how to improve
the code.

Ciao,
Dscho

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee


On 7/11/2018 5:48 AM, SZEDER Gábor wrote:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..ab895ebb32 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,10 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
  
+core.multiPackIndex::

+   Use the multi-pack-index file to track multiple packfiles using a
+   single index. See linkgit:technical/multi-pack-index[1].

The 'linkgit' macro should be used to create links to other man pages,
but 'technical/multi-pack-index' is not a man page and this causes
'make check-docs' to complain:

   LINT lint-docs
   ./config.txt:929: nongit link: technical/multi-pack-index[1]
   Makefile:456: recipe for target 'lint-docs' failed
   make[1]: *** [lint-docs] Error 1

Thanks for this point. It seems to work using 
"link:technical/multi-pack-index[1]", which is what I'll use in the next 
version.


-Stolee

Re: [PATCH 2/3] rebase --rebase-merges: add support for octopus merges

2018-07-12 Thread Johannes Schindelin

Hi Eric,

On Wed, 11 Jul 2018, Eric Sunshine wrote:

> On Wed, Jul 11, 2018 at 8:38 AM Johannes Schindelin via GitGitGadget
>  wrote:

> > diff --git a/sequencer.c b/sequencer.c
> > @@ -2932,7 +2966,8 @@ static int do_merge(struct commit *commit, const char 
> > *arg, int arg_len,
> > -   strbuf_addf(&buf, "Merge branch '%.*s'",
> > +   strbuf_addf(&buf, "Merge %s '%.*s'",
> > +   to_merge->next ? "branches" : "branch",
> 
> Error messages in this file are already localizable. If this text ever
> becomes localizable, then this "sentence lego" could be problematic
> for translators.

As pointed out by Junio, this is not really a message eligible for
localization, at least not right now. If it becomes localized later, it
will be very easy to change the sentence lego then.

I am just wary of making life harder for the compiler that wants to verify
that the printf() style parameters match the format.

> > @@ -2956,28 +2991,76 @@ static int do_merge(struct commit *commit, const 
> > char *arg, int arg_len,
> > +   cmd.git_cmd = 1;
> > +   argv_array_push(&cmd.args, "merge");
> > +   argv_array_push(&cmd.args, "-s");
> > +   argv_array_push(&cmd.args, "octopus");
> 
> argv_array_pushl(&cmd.args, "-s", "octopus", NULL);
> 
> which would make it clear that these two arguments must remain
> together, and prevent someone from coming along and inserting a new
> argument between them.

I use a single `argv_array_pushl()` call now, with intuitive line
wrapping.

> > +   argv_array_push(&cmd.args, "--no-edit");
> > +   argv_array_push(&cmd.args, "--no-ff");
> > +   argv_array_push(&cmd.args, "--no-log");
> > +   argv_array_push(&cmd.args, "--no-stat");
> > +   argv_array_push(&cmd.args, "-F");
> > +   argv_array_push(&cmd.args, git_path_merge_msg());
> 
> Ditto:
> argv_array_pushl(&cmd.args, "-L", git_path_merge_msg(), NULL);
> 
> > diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> > @@ -329,4 +329,38 @@ test_expect_success 'labels that are object IDs are 
> > rewritten' '
> > +test_expect_success 'octopus merges' '
> > +   git checkout -b three &&
> > +   test_commit before-octopus &&
> > +   test_commit three &&
> > +   git checkout -b two HEAD^ &&
> > +   test_commit two &&
> > +   git checkout -b one HEAD^ &&
> > +   test_commit one &&
> > +   test_tick &&
> > +   (GIT_AUTHOR_NAME="Hank" GIT_AUTHOR_EMAIL="hank@sea.world" \
> > +git merge -m "Tüntenfüsch" two three) &&
> 
> Unnecessary subshell?

Yes, I agree. I don't recall why I did it that way, probably I tried
something else originally that required a subshell, or something.

Thank you for helping me make the patches better.

Ciao,
Dscho

Re: [PATCH 2/3] rebase --rebase-merges: add support for octopus merges

2018-07-12 Thread Johannes Schindelin

Hi Junio,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> Eric Sunshine  writes:
> 
> >> @@ -2956,28 +2991,76 @@ static int do_merge(struct commit *commit, const 
> >> char *arg, int arg_len,
> >> +   cmd.git_cmd = 1;
> >> +   argv_array_push(&cmd.args, "merge");
> >> +   argv_array_push(&cmd.args, "-s");
> >> +   argv_array_push(&cmd.args, "octopus");
> >
> > argv_array_pushl(&cmd.args, "-s", "octopus", NULL);
> >
> > which would make it clear that these two arguments must remain
> > together, and prevent someone from coming along and inserting a new
> > argument between them.
> 
> A valid point.  It is OK to break "merge" and "-s octopus" into
> separate push invocations, but not "-s" and "octopus".  Or perhaps
> push it as a single "--strategy=octopus" argument, which would be
> a better approach anyway.

I had missed that in the documentation, as the synopsis does not mention
it.

Thank you!
Dscho

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee


On 7/6/2018 1:39 AM, Eric Sunshine wrote:

On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:

The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads.

Signed-off-by: Derrick Stolee 
---
diff --git a/cache.h b/cache.h
@@ -814,6 +814,7 @@ extern char *git_replace_ref_base;
+extern int core_multi_pack_index;
diff --git a/config.c b/config.c
@@ -1313,6 +1313,11 @@ static int git_default_core_config(const char *var, 
const char *value)
+   if (!strcmp(var, "core.multipackindex")) {
+   core_multi_pack_index = git_config_bool(var, value);
+   return 0;
+   }

This is a rather unusual commit. This new configuration is assigned,
but it's never actually consulted, which means that it's impossible
for it to have any impact on functionality, yet tests are being added
to check whether it _did_ have any impact on functionality. Confusing.

Patch 17 does consult 'core_multi_pack_index', so it's only at that
point that it could have any impact. This situation would be less
confusing if you swapped patches 16 and 17 (and, of course, move the
declaration of 'core_multi_pack_index' to patch 17 with a reasonable
default value).


You're right that this commit is a bit too aware of the future, but I 
disagree with the recommendation to change it.


Yes, in this commit there is no possible way that these tests could 
fail. The point is that patches 17-23 all change behavior if this 
setting is on, and we want to make sure we do not break at any point 
along that journey (or in future iterations of the multi-pack-index 
feature).


With this in mind, I don't think there is a better commit to place these 
tests.



diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
@@ -61,12 +63,42 @@ test_expect_success 'write midx with one v1 pack' '
+midx_git_two_modes() {
+   git -c core.multiPackIndex=false $1 >expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}

Here, you define midx_git_two_modes() and compare_results_with_midx()...


  test_expect_success 'write midx with one v2 pack' '
-   git pack-objects --index-version=2,0x40 pack/test expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}

... and here you define both functions again.


This was a mistake. Thanks for catching it.


Thanks,

-Stolee

Re: [PATCH 0/6] Compile cleanly in pedantic mode

2018-07-12 Thread Johannes Schindelin

Hi,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> Beat Bolli  writes:
> 
> > Should we add a "pedantic" flag to DEVOPTS that would simplify
> > building pedantically? It would also have to set
> > USE_PARENS_AROUND_GETTEXT_N so as to not overwhelm the developer with
> > too much output.
> 
> That may be something worth discussing before doing; I'd prefer to
> wait until these 6 patches, plus the unsized static array one you
> did spearately, graduates to the 'master' branch.

If this change to DEVOPTS was done as 7/7, then this would be very easy to
guarantee (or as 2/2 if you want to graduate this here patch series faster
than the static array forward-declaration fix).

Just an idea,
Dscho

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread SZEDER Gábor

On Thu, Jul 12, 2018 at 3:01 PM Derrick Stolee  wrote:
>
> On 7/11/2018 5:48 AM, SZEDER Gábor wrote:
> >> diff --git a/Documentation/config.txt b/Documentation/config.txt
> >> index ab641bf5a9..ab895ebb32 100644
> >> --- a/Documentation/config.txt
> >> +++ b/Documentation/config.txt
> >> @@ -908,6 +908,10 @@ core.commitGraph::
> >>  Enable git commit graph feature. Allows reading from the
> >>  commit-graph file.
> >>
> >> +core.multiPackIndex::
> >> +Use the multi-pack-index file to track multiple packfiles using a
> >> +single index. See linkgit:technical/multi-pack-index[1].
> > The 'linkgit' macro should be used to create links to other man pages,
> > but 'technical/multi-pack-index' is not a man page and this causes
> > 'make check-docs' to complain:
> >
> >LINT lint-docs
> >./config.txt:929: nongit link: technical/multi-pack-index[1]
> >Makefile:456: recipe for target 'lint-docs' failed
> >make[1]: *** [lint-docs] Error 1
> >
> Thanks for this point. It seems to work using
> "link:technical/multi-pack-index[1]", which is what I'll use in the next
> version.

It doesn't work, it merely works around the build failure.

The generated man page looks like this:

  core.multiPackIndex
  Use the multi-pack-index file to track multiple packfiles using a
  single index. See 1[1].

And the resulting html page looks similar:

  core.multiPackIndex

  Use the multi-pack-index file to track multiple packfiles using a
  single index. See 1.

where that "1" is a link pointing to the non-existing URL
file:///home/me/src/git/Documentation/technical/multi-pack-index

Re: [PATCH v1] handle lower case drive letters on Windows

2018-07-12 Thread Johannes Schindelin

Hi Ben,

On Wed, 11 Jul 2018, Ben Peart wrote:

> Teach test-drop-caches to handle lower case drive letters on Windows.

Maybe mention that you can trigger this by using a lower case drive letter
in Powershell (CMD normalizes your path, Powershell does not).

> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index d6bcfddf13..37047189c3 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -16,7 +16,7 @@ static int cmd_sync(void)
>   if ((0 == dwRet) || (dwRet > MAX_PATH))
>   return error("Error getting current directory");
>  
> - if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
> + if ((toupper(Buffer[0]) < 'A') || (toupper(Buffer[0]) > 'Z'))
>   return error("Invalid drive letter '%c'", Buffer[0]);

This is a rather incomplete test, as it does not even test for the colon
after the drive letter.

But I guess the more common case where this code path triggers the error
is when your current directory is a UNC path to a file share (again, only
possible in Powershell, CMD does not let you do that unless you play
serious tricks).

So maybe a much better idea is to use our `has_dos_drive_prefix()`
function:

if (!has_dos_drive_prefix(Buffer))
return error("'%s': invalid drive letter", Buffer);

Ciao,
Johannes

Re: [GSoC] GSoC with git, week 10

2018-07-12 Thread Paul-Sebastian Ungureanu


Hello,


Sorry for late notification, my blog is up too.

https://prertik.github.io/post/week-10


Good job!

I also forgot to post it here, although the blog entry was available 
since days ago. I am sorry for that!


https://ungps.github.io/

P.S: Since the moment I wrote that text, there are no more bugs. Soon (I 
hope today), I will update this blog entry with a link to the Github repo.


Best,
Paul

Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-12 Thread Derrick Stolee


On 7/6/2018 12:36 AM, Eric Sunshine wrote:

On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:


+   for i in $(test_seq 6 10)
+   do
+   iii=$(printf '%03i' $i)
+   test-tool genrandom "bar" 200 >wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii &&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii 
&&
+   i=$(expr $i + 1) || return 1
+   done &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list2 &&
+   git update-ref HEAD $commit
+'

There seems to be a fair bit of duplication in these tests which
create objects. Is it possible to factor out some of this code into a
shell function?


In addition to the other small changes, this refactor in particular was 
a big change (but a good one). I'm sending my current progress in this 
direction, as I expect this can be improved.


To make the commit_and_list_objects method more generic to all 
situations, I had to add an extra commit, which will cause some of the 
numbers to change in the later 'midx_read_expect' calls.


Thanks,

-Stolee

-->8--

From cb38bb284fd05cf2230725b6cb9ead5795c913f2 Mon Sep 17 00:00:00 2001
From: Derrick Stolee 
Date: Thu, 31 May 2018 15:05:00 -0400
Subject: [PATCH] t5319: expand test data

As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 83 +
 1 file changed, 83 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2ecc369529..a50be41bc0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -13,9 +13,92 @@ midx_read_expect () {
 }

 test_expect_success 'write midx with no packs' '
+   test_when_finished rm -f pack/multi-pack-index &&
    git multi-pack-index --object-dir=. write &&
    test_path_is_file pack/multi-pack-index &&
    midx_read_expect
 '

+generate_objects () {
+   i=$1
+   iii=$(printf '%03i' $i)
+   {
+   test-tool genrandom "bar" 200 &&
+   test-tool genrandom "baz $iii" 50
+   } >wide_delta_$iii &&
+   {
+   test-tool genrandom "foo"$i 100 &&
+   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
+   test-tool genrandom "foo"$(( $i + 2 )) 100
+   } >>deep_delta_$iii &&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
+}
+
+commit_and_list_objects () {
+   {
+   echo 101 &&
+   test-tool genrandom 100 8192;
+   } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEAD+   git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)    
.*/\\1/"

+   } >obj-list &&
+   git reset --hard $commit
+}
+
+test_expect_success 'create objects' '
+   test_commit initial &&
+   for i in $(test_seq 1 5)
+   do
+   generate_objects $i
+   done &&
+   commit_and_list_objects
+'
+
+test_expect_success 'write midx with one v1 pack' '
+   pack=$(git pack-objects --index-version=1 pack/test +   test_when_finished rm pack/test-$pack.pack pack/test-$pack.idx 
pack/multi-pack-index &&

+   git multi-pack-index --object-dir=. write &&
+   midx_read_expect
+'
+
+test_expect_success 'write midx with one v2 pack' '
+   git pack-objects --index-version=2,0x40 pack/test

Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-12 Thread Johannes Schindelin

Hi Vitali,

[please avoid top-posting on this mailing list]

On Wed, 11 Jul 2018, Vitali Lovich wrote:

> On Wed, Jul 11, 2018 at 7:50 PM Vitali Lovich  wrote:
> >
> > Typically git rev-parse --show-toplevel prints the folder containing
> > the .git folder regardless what subdirectory one is in.  One exception
> > I have found is that if one is within the context of git rebase --exec
> > then show-toplevel always just prints the current directory it's
> > running from.
> >
> > Repro (starting with cwd within git project):
> > > (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> > > git rebase -i 18404434bf406f6a6f892ed73320c5cf9cc187dd
> > # Stop at some commit for editing
> > > (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> > > git rebase 18404434bf406f6a6f892ed73320c5cf9cc187dd -x "(cd xdiff; git 
> > > rev-parse --show-toplevel)"
> > ... path to git repository/xdiff !!!
> >
> > This seems like incorrect behaviour to me since it's a weird
> > inconsistency (even with other rebase contexts) & the documentation
> > says "Show the absolute path of the top-level directory." with no
> > caveats.
>
> Sorry.  Forgot to include the git versions I tested with (2.13.1,
> 2.17.0, 2.18.0)

This is actually not so much a bug in `rebase` as in `rev-parse
--show-top-level`:

$ GIT_DIR=$PWD/.git git -C xdiff rev-parse --show-toplevel
C:/git-sdk-64/usr/src/git/xdiff

Ciao,
Johannes

Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-12 Thread Johannes Schindelin

Hi,

On Thu, 12 Jul 2018, Johannes Schindelin wrote:

> On Wed, 11 Jul 2018, Vitali Lovich wrote:
> 
> > On Wed, Jul 11, 2018 at 7:50 PM Vitali Lovich  wrote:
> > >
> > > Typically git rev-parse --show-toplevel prints the folder containing
> > > the .git folder regardless what subdirectory one is in.  One exception
> > > I have found is that if one is within the context of git rebase --exec
> > > then show-toplevel always just prints the current directory it's
> > > running from.
> > >
> > > Repro (starting with cwd within git project):
> > > > (cd xdiff; git rev-parse --show-toplevel)
> > > ... path to git repository
> > > > git rebase -i 18404434bf406f6a6f892ed73320c5cf9cc187dd
> > > # Stop at some commit for editing
> > > > (cd xdiff; git rev-parse --show-toplevel)
> > > ... path to git repository
> > > > git rebase 18404434bf406f6a6f892ed73320c5cf9cc187dd -x "(cd xdiff; git 
> > > > rev-parse --show-toplevel)"
> > > ... path to git repository/xdiff !!!
> > >
> > > This seems like incorrect behaviour to me since it's a weird
> > > inconsistency (even with other rebase contexts) & the documentation
> > > says "Show the absolute path of the top-level directory." with no
> > > caveats.
> >
> > Sorry.  Forgot to include the git versions I tested with (2.13.1,
> > 2.17.0, 2.18.0)
> 
> This is actually not so much a bug in `rebase` as in `rev-parse
> --show-top-level`:
> 
>   $ GIT_DIR=$PWD/.git git -C xdiff rev-parse --show-toplevel
>   C:/git-sdk-64/usr/src/git/xdiff

And the reason for this behavior is that setting `GIT_DIR` explicitly
makes Git *always* consider the current working directory to be the
top-level directory:

https://github.com/git/git/blob/v2.18.0/setup.c#L1061-L1063

I wonder whether changing the line
(https://github.com/git/git/blob/v2.18.0/sequencer.c#L2635)

   argv_array_pushf(&child_env, "GIT_DIR=%s", absolute_path(get_git_dir()));

to

   argv_array_push(&child_env, "GIT_DIR");

breaks anything (and whether it fixes the bug you demonstrated via
`rev-parse --show-toplevel`).

Ciao,
Johannes

Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Johannes Schindelin

Hi Gábor,

On Wed, 11 Jul 2018, SZEDER Gábor wrote:

> > diff --git a/linear-assignment.c b/linear-assignment.c
> > new file mode 100644
> > index 0..0b0344b5f
> > --- /dev/null
> > +++ b/linear-assignment.c
> > @@ -0,0 +1,203 @@
> > +/*
> > + * Based on: Jonker, R., & Volgenant, A. (1987). A shortest augmenting 
> > path
> > + * algorithm for dense and sparse linear assignment problems. 
> > Computing,
> > + * 38(4), 325-340.
> > + */
> > +#include "cache.h"
> > +#include "linear-assignment.h"
> > +
> > +#define COST(column, row) cost[(column) + column_count * (row)]
> > +
> > +/*
> > + * The parameter `cost` is the cost matrix: the cost to assign column j to 
> > row
> > + * i is `cost[j + column_count * i].
> > + */
> > +void compute_assignment(int column_count, int row_count, int *cost,
> > +   int *column2row, int *row2column)
> > +{
> 
> [...]
> 
> > +update:
> > +   /* updating of the column pieces */
> > +   for (k = 0; k < last; k++) {
> > +   int j1 = col[k];
> > +   v[j1] += d[j1] - min;
> > +   }
> > +
> > +   /* augmentation */
> > +   do {
> > +   if (j < 0)
> > +   BUG("negative j: %d", j);
> > +   i = pred[j];
> > +   column2row[j] = i;
> > +   k = j;
> > +   j = row2column[i];
> > +   row2column[i] = k;
> 
> Coccinelle suggests using SWAP(j, row2column[i]) instead of the last
> three lines above.
> It's more idiomatic, and it avoids (ab)using the 'k' variable
> (elsewhere used as loop variable) as a temporary variable.

Good point.

I audited the rest of the code in this file, and there are no more swap
operations.

Thanks,
Dscho

Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-12 Thread Junio C Hamano

Vitali Lovich  writes:

> Repro (starting with cwd within git project):
>> (cd xdiff; git rev-parse --show-toplevel)
> ... path to git repository
>> git rebase -i 18404434bf406f6a6f892ed73320c5cf9cc187dd
> # Stop at some commit for editing
>> (cd xdiff; git rev-parse --show-toplevel)
> ... path to git repository
>> git rebase 18404434bf406f6a6f892ed73320c5cf9cc187dd -x "(cd xdiff; git 
>> rev-parse --show-toplevel)"
> ... path to git repository/xdiff !!!
>
> This seems like incorrect behaviour to me since it's a weird
> inconsistency (even with other rebase contexts) & the documentation
> says "Show the absolute path of the top-level directory." with no
> caveats.

Does it reproduce with older rebase (say from v2.10 days), too?

I suspect that the above is because "git rebase" is exporting
GIT_DIR without exporting GIT_WORK_TREE.  When the former is given
without the latter, that tells Git that you are at the top-level of
the working tree (if that is not the case, you also export the
latter to tell Git where the top-level is).  

I suspect that "git rebase" before the ongoing piecemeal rewrite to
C started (or scripted Porcelain in general) refrained from
exporting GIT_DIR to the environment, and if my suspicion is correct,
older "git rebase" would behave differently to your test case.

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Johannes Schindelin

Hi Junio,

On Wed, 11 Jul 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > To summarize, there are two commits recorded for that Message-Id, the
> > later one not mapped back, and neither is the correct commit that made it
> > into `master`.
> >
> > It would be nice to figure out what went wrong there, and how to fix it
> > for the future (and also to fix up the existing mis-mappings in `amlog`).
> 
> I think what happened is that I used to have post-rewrite, but
> because it did not solve the real issue of multiple commits existing
> for the same message ID (either because of amending, or because of
> running "am" multiple times while looking for the best base to
> contruct a topic branch for the series that contains it) *and* the
> one that will eventually used in the final history may not be the
> last one (e.g. I may "am" twice to see if an older base I use in my
> second attempt is a better one than the base I originally used, and
> the patches may even apply cleanly to the older history, but may
> turn out to need semantic adjustment, at which point I would discard
> that second attempt and use the old commit from the first attempt
> that built on a newer base), I stopped using it.
> 
> The mid-to-commit, for it to be relialble, needs to keep mapping for
> all the commits created from a single message, instead of being the
> last-one-survives mapping.  I just didn't have that much interest
> back when I decided it was not worth and dropped the post-rewrite, I
> think.

I would like to ask you to reinstate the post-rewrite hook, as it still
improves the situation over the current one.

Of course, it would be nice to get the automation into a shape where
the mappings in `refs/notes/amlog` of commits that hit `next` are fixed,
if necessary, to stop referring to commits that did not make it into
`next`.

Because the *concept* of `amlog` is quite useful, to put back at least
*some* of the information we lost by transiting Git commits via mails
without any connection to their original commits. It is still the most
annoying thing when I contribute patches myself.

Ciao,
Dscho

[PATCH] t6036: fix broken && chain in sub-shell

2018-07-12 Thread Ramsay Jones



Signed-off-by: Ramsay Jones 
---

Hi Junio,

I had a test failure on 'pu' today - Eric's chain-lint series
found another broken chain in one of Elijah's new tests (on the
'en/t6036-recursive-corner-cases' branch).

ATB,
Ramsay Jones

 t/t6036-recursive-corner-cases.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6036-recursive-corner-cases.sh 
b/t/t6036-recursive-corner-cases.sh
index 2a44acace..59e52c5a0 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -495,7 +495,7 @@ test_expect_success 'setup differently handled merges of 
directory/file conflict
test_write_lines 1 2 3 4 5 6 7 8 >a &&
git add a &&
git commit -m E3 &&
-   git tag E3
+   git tag E3 &&
 
git checkout C^0 &&
test_must_fail git merge B^0 &&
-- 
2.18.0

Re: [PATCH 0/6] Compile cleanly in pedantic mode

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

>> That may be something worth discussing before doing; I'd prefer to
>> wait until these 6 patches, plus the unsized static array one you
>> did spearately, graduates to the 'master' branch.
>
> If this change to DEVOPTS was done as 7/7, then this would be very easy to
> guarantee (or as 2/2 if you want to graduate this here patch series faster
> than the static array forward-declaration fix).

It would guarantee that there is *no* period to discuss if it is too
inconvenient for developers between the time these 6 plus 1 graduate
to the master branch (and willing volunteers have chance to try it
themselves) and the change actuallly is forced upon the developers.

That is why I said I'd prefer to wait.

A separate patch that depends on the 6 plus 1 topic that is kept in
'next' longer than the main series is a possibility, though.

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee


On 7/12/2018 9:31 AM, SZEDER Gábor wrote:

On Thu, Jul 12, 2018 at 3:01 PM Derrick Stolee  wrote:

On 7/11/2018 5:48 AM, SZEDER Gábor wrote:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..ab895ebb32 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,10 @@ core.commitGraph::
  Enable git commit graph feature. Allows reading from the
  commit-graph file.

+core.multiPackIndex::
+Use the multi-pack-index file to track multiple packfiles using a
+single index. See linkgit:technical/multi-pack-index[1].

The 'linkgit' macro should be used to create links to other man pages,
but 'technical/multi-pack-index' is not a man page and this causes
'make check-docs' to complain:

LINT lint-docs
./config.txt:929: nongit link: technical/multi-pack-index[1]
Makefile:456: recipe for target 'lint-docs' failed
make[1]: *** [lint-docs] Error 1


Thanks for this point. It seems to work using
"link:technical/multi-pack-index[1]", which is what I'll use in the next
version.

It doesn't work, it merely works around the build failure.

The generated man page looks like this:

   core.multiPackIndex
   Use the multi-pack-index file to track multiple packfiles using a
   single index. See 1[1].

And the resulting html page looks similar:

   core.multiPackIndex

   Use the multi-pack-index file to track multiple packfiles using a
   single index. See 1.

where that "1" is a link pointing to the non-existing URL
file:///home/me/src/git/Documentation/technical/multi-pack-index


Right. Sorry. I also see that I use the correct kind of links in 
Documentation/git-multi-pack-index.txt (see below) so I will use it 
here, too.


SEE ALSO

See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
Document] and link:technical/pack-format.html[The Multi-Pack-Index
Format] for more information on the multi-pack-index feature.

Thanks,

-Stolee

Re: [PATCH v3 2/2] Fix use of strategy options with interactive rebases

2018-07-12 Thread Johannes Schindelin

Hi Elijah,

On Wed, 27 Jun 2018, Elijah Newren wrote:

> git-rebase.sh wrote strategy options to .git/rebase/merge/strategy_opts
> in the following format:
>   '--ours'  '--renormalize'
> Note the double spaces.
> 
> git-rebase--interactive uses sequencer.c to parse that file, and
> sequencer.c used split_cmdline() to get the individual strategy options.
> After splitting, sequencer.c prefixed each "option" with a double dash,
> so, concatenating all its options would result in:
>   -- --ours -- --renormalize
> 
> So, when it ended up calling try_merge_strategy(), that in turn would run
>   git merge-$strategy -- --ours -- --renormalize $merge_base -- $head $remote
> 
> instead of the expected/desired
>   git merge-$strategy --ours --renormalize $merge_base -- $head $remote
> 
> Remove the extra spaces so that when it goes through split_cmdline() we end
> up with the desired command line.
> 
> Signed-off-by: Elijah Newren 
> ---
>  git-rebase.sh  | 2 +-
>  sequencer.c| 7 ++-
>  t/t3418-rebase-continue.sh | 2 +-
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/git-rebase.sh b/git-rebase.sh
> index 19bdebb480..f3b10c7f62 100755
> --- a/git-rebase.sh
> +++ b/git-rebase.sh
> @@ -328,7 +328,7 @@ do
>   do_merge=t
>   ;;
>   --strategy-option=*)
> - strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
> "--${1#--strategy-option=}")"
> + strategy_opts="$strategy_opts $(git rev-parse --sq-quote 
> "--${1#--strategy-option=}" | sed -e s/^.//)"

Didn't you mean to use "s/^ *//" instead?

>   do_merge=t
>   test -z "$strategy" && strategy=recursive
>   ;;
> diff --git a/sequencer.c b/sequencer.c
> index 5354d4d51e..ef9237c814 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2206,6 +2206,7 @@ static int populate_opts_cb(const char *key, const char 
> *value, void *data)
>  static void read_strategy_opts(struct replay_opts *opts, struct strbuf *buf)
>  {
>   int i;
> + char *strategy_opts_string;
>  
>   strbuf_reset(buf);
>   if (!read_oneliner(buf, rebase_path_strategy(), 0))
> @@ -2214,7 +2215,11 @@ static void read_strategy_opts(struct replay_opts 
> *opts, struct strbuf *buf)
>   if (!read_oneliner(buf, rebase_path_strategy_opts(), 0))
>   return;
>  
> - opts->xopts_nr = split_cmdline(buf->buf, (const char ***)&opts->xopts);
> + strategy_opts_string = buf->buf;
> + if (*strategy_opts_string == ' ')

I think that this would ideally even be a `while` instead of an `if`.

> + strategy_opts_string++;
> + opts->xopts_nr = split_cmdline(strategy_opts_string,
> +(const char ***)&opts->xopts);
>   for (i = 0; i < opts->xopts_nr; i++) {
>   const char *arg = opts->xopts[i];
>  
> diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
> index 11546d6e14..c145dbac38 100755
> --- a/t/t3418-rebase-continue.sh
> +++ b/t/t3418-rebase-continue.sh
> @@ -74,7 +74,7 @@ test_expect_success 'rebase --continue remembers merge 
> strategy and options' '
>   test -f funny.was.run
>  '
>  
> -test_expect_failure 'rebase -i --continue handles merge strategy and 
> options' '
> +test_expect_success 'rebase -i --continue handles merge strategy and 
> options' '
>   rm -fr .git/rebase-* &&
>   git reset --hard commit-new-file-F2-on-topic-branch &&
>   test_commit "commit-new-file-F3-on-topic-branch-for-dash-i" F3 32 &&

Thanks!
Dscho

[PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Ben Peart

On Windows, if a tool calls SetCurrentDirectory with a lower case drive
letter, the subsequent call to GetCurrentDirectory will return the same
lower case drive letter. Powershell, for example, does not normalize the
path. If that happens, test-drop-caches will error out as it does not
correctly to handle lower case drive letters.

Helped-by: Johannes Schindelin 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1ff8de1b6c
Checkout: git fetch https://github.com/benpeart/git drop-caches-v3 && git 
checkout 1ff8de1b6c

### Interdiff (v2..v3):

diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
index 37047189c3..f65e301f9d 100644
--- a/t/helper/test-drop-caches.c
+++ b/t/helper/test-drop-caches.c
@@ -16,8 +16,8 @@ static int cmd_sync(void)
if ((0 == dwRet) || (dwRet > MAX_PATH))
return error("Error getting current directory");

-   if ((toupper(Buffer[0]) < 'A') || (toupper(Buffer[0]) > 'Z'))
-   return error("Invalid drive letter '%c'", Buffer[0]);
+   if (!has_dos_drive_prefix(Buffer))
+   return error("'%s': invalid drive letter", Buffer);

szVolumeAccessPath[4] = Buffer[0];
hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,

### Patches

 t/helper/test-drop-caches.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
index d6bcfddf13..f65e301f9d 100644
--- a/t/helper/test-drop-caches.c
+++ b/t/helper/test-drop-caches.c
@@ -16,8 +16,8 @@ static int cmd_sync(void)
if ((0 == dwRet) || (dwRet > MAX_PATH))
return error("Error getting current directory");
 
-   if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
-   return error("Invalid drive letter '%c'", Buffer[0]);
+   if (!has_dos_drive_prefix(Buffer))
+   return error("'%s': invalid drive letter", Buffer);
 
szVolumeAccessPath[4] = Buffer[0];
hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,

base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9
-- 
2.17.0.gvfs.1.123.g449c066

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread Jeff King

On Wed, Jul 11, 2018 at 10:49:09PM -0700, William Chargin wrote:

> This patch broadens the set of commits matched by ":/" pathspecs to
> include commits reachable from HEAD but not any named ref. This avoids
> surprising behavior when working with a detached HEAD and trying to
> refer to a commit that was recently created and only exists within the
> detached state.
> 
> If multiple worktrees exist, only the current worktree's HEAD is
> considered reachable. This is consistent with the existing behavior for
> other per-worktree refs: e.g., bisect refs are considered reachable, but
> only within the relevant worktree.
> 
> Signed-off-by: Jeff King 
> Signed-off-by: William Chargin 
> Based-on-patch-by: Jeff King 
> ---
>  Documentation/revisions.txt   |  2 +-
>  sha1-name.c   |  1 +
>  t/t4208-log-magic-pathspec.sh | 26 ++
>  3 files changed, 28 insertions(+), 1 deletion(-)

This one looks good to me. Thanks.

-Peff

Re: [PATCH v2] git-rebase--merge: modernize "git-$cmd" to "git $cmd"

2018-07-12 Thread Johannes Schindelin

Hi Elijah,

On Wed, 27 Jun 2018, Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
> 
> Changes since v1:
>   - Fixed up commit message (move below comment to below diffstat as
> originally intended)
> 
> Long term I just want to make git-rebase--merge go away, so this patch
> will eventually be obsoleted.  But since I'm waiting for multiple
> topics to merge down before re-submitting that series, and since that
> series has some open questions as well, I figure it's worth
> (re-)submitting this simple fix in the mean time.

I carry essentially the same patch in Git for Windows for a while now
(more than a year, to be a little preciser):

https://github.com/git-for-windows/git/commit/42c6f1c943a

(but it seems that I either missed one when I wrote that commit, or I
missed when it was introduced)

There are more dashed forms in Git's code base, still, see e.g.

https://github.com/git-for-windows/git/commit/4b3fc41b117
https://github.com/git-for-windows/git/commit/c47a29c373c

I would *love* to see those go away.

FWIW I had originally also "undashed" the use of `git-receive-pack`, but
that breaks things, as the dashed form was unfortunately baked into the
protocol (which is of course a design mistake even if Linus still denies
it).

It would go a long way to help with platforms and packaging methods where
hardlinks are simply inconvenient. Because we could then finally get rid
of (almost) all those hardlinked builtins.

Ciao,
Dscho

Re: [PATCH v3 1/3] t5541: clean up truncating access log

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 02:22:14PM +0200, SZEDER Gábor wrote:

> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
> index a2af693068..5c9ca19401 100755
> --- a/t/t5541-http-push-smart.sh
> +++ b/t/t5541-http-push-smart.sh
> @@ -38,6 +38,10 @@ GET  
> /smart/test_repo.git/info/refs?service=git-upload-pack HTTP/1.1 200
>  POST /smart/test_repo.git/git-upload-pack HTTP/1.1 200
>  EOF
>  test_expect_success 'no empty path components' '
> + # Clear the log, so that it does not affect the "used receive-pack
> + # service" test which reads the log too.
> + test_when_finished ">\"$HTTPD_ROOT_PATH\"/access.log" &&

I think you want to escape the $ here, too, so the eval sees the
variable and not its expanded value.

Other than that, this iteration looks good (and patch 3 is much easier
to read).

-Peff

[PATCH] ref-filter: mark some file-local symbols as static

2018-07-12 Thread Ramsay Jones



Signed-off-by: Ramsay Jones 
---

Hi Olga,

If you need to re-roll your 'ot/ref-filter-object-info' branch,
could you please squash this into the relevant patch (commit c5d9a471d6,
"ref-filter: use oid_object_info() to get object", 2018-07-09).

[Both sparse and my static-check.pl script are complaining about
the 'oi' and 'oi_deref' symbols.]

Thanks!

ATB,
Ramsay Jones

 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index 9aedf29e0..70fb15619 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -63,7 +63,7 @@ struct refname_atom {
int lstrip, rstrip;
 };
 
-struct expand_data {
+static struct expand_data {
struct object_id oid;
enum object_type type;
unsigned long size;
-- 
2.18.0

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread Junio C Hamano

William Chargin  writes:

> This patch broadens the set of commits matched by ":/" pathspecs to

As we discussed during the review on v1, ":/"
is *NOT* pathspec (that is why having these tests in t4208 is wrong
but we are following existing mistakes).  It is a way to specify a
commit object name (unfortunately "extended SHA-1" is the phrase we
often call the various ways to name an object that are implemented
in sha1-name.c).

> include commits reachable from HEAD but not any named ref. This avoids
> surprising behavior when working with a detached HEAD and trying to
> refer to a commit that was recently created and only exists within the
> detached state.
>
> If multiple worktrees exist, only the current worktree's HEAD is
> considered reachable. This is consistent with the existing behavior for
> other per-worktree refs: e.g., bisect refs are considered reachable, but
> only within the relevant worktree.
>
> Signed-off-by: Jeff King 
> Signed-off-by: William Chargin 
> Based-on-patch-by: Jeff King 

Now you have Peff's sign-off for the one-liner code change, the last
one is redundant.

> ---

Other than the above two nits, the patch looks good.  I would have
broken the line after "including HEAD.", but the slightly-long line
is also OK.

>  Documentation/revisions.txt   |  2 +-
>  sha1-name.c   |  1 +
>  t/t4208-log-magic-pathspec.sh | 26 ++
>  3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
> index 7d1bd4409..aa9579eba 100644
> --- a/Documentation/revisions.txt
> +++ b/Documentation/revisions.txt
> @@ -184,7 +184,7 @@ existing tag object.
>A colon, followed by a slash, followed by a text, names
>a commit whose commit message matches the specified regular expression.
>This name returns the youngest matching commit which is
> -  reachable from any ref. The regular expression can match any part of the
> +  reachable from any ref, including HEAD. The regular expression can match 
> any part of the
>commit message. To match messages starting with a string, one can use
>e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what
>is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
> diff --git a/sha1-name.c b/sha1-name.c
> index 60d9ef3c7..641ca12f9 100644
> --- a/sha1-name.c
> +++ b/sha1-name.c
> @@ -1650,6 +1650,7 @@ static int get_oid_with_context_1(const char *name,
>   struct commit_list *list = NULL;
>  
>   for_each_ref(handle_one_ref, &list);
> + head_ref(handle_one_ref, &list);
>   commit_list_sort_by_date(&list);
>   return get_oid_oneline(name + 2, oid, list);
>   }
> diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh
> index 62f335b2d..4c8f3b8e1 100755
> --- a/t/t4208-log-magic-pathspec.sh
> +++ b/t/t4208-log-magic-pathspec.sh
> @@ -25,6 +25,32 @@ test_expect_success '"git log :/a -- " should not be 
> ambiguous' '
>   git log :/a --
>  '
>  
> +test_expect_success '"git log :/detached -- " should find a commit only in 
> HEAD' '
> + test_when_finished "git checkout master" &&
> + git checkout --detach &&
> + # Must manually call `test_tick` instead of using `test_commit`,
> + # because the latter additionally creates a tag, which would make
> + # the commit reachable not only via HEAD.
> + test_tick &&
> + git commit --allow-empty -m detached &&
> + test_tick &&
> + git commit --allow-empty -m something-else &&
> + git log :/detached --
> +'
> +
> +test_expect_success '"git log :/detached -- " should not find an orphaned 
> commit' '
> + test_must_fail git log :/detached --
> +'
> +
> +test_expect_success '"git log :/detached -- " should find HEAD only of own 
> worktree' '
> + git worktree add other-tree HEAD &&
> + git -C other-tree checkout --detach &&
> + test_tick &&
> + git -C other-tree commit --allow-empty -m other-detached &&
> + git -C other-tree log :/other-detached -- &&
> + test_must_fail git log :/other-detached --
> +'
> +
>  test_expect_success '"git log -- :/a" should not be ambiguous' '
>   git log -- :/a
>  '

Re: [PATCH v3 06/24] multi-pack-index: load into memory

2018-07-12 Thread Derrick Stolee


On 7/9/2018 3:08 PM, Junio C Hamano wrote:

Derrick Stolee  writes:

diff --git a/midx.h b/midx.h
index dbdbe9f873..2d83dd9ec1 100644
--- a/midx.h
+++ b/midx.h
@@ -1,6 +1,10 @@
  #ifndef __MIDX_H__
  #define __MIDX_H__
  
+struct multi_pack_index;

I actually was quite surprised that this struct is defined in
object-store.h and not here.  It feels the other way around.

The raw_object_store needs to know that such an in-core structure
might exist as an optional feature in an object store, but as an
optional feature, I suspect that it has a pointer to an instance of
multi_pack_index, instead of embedding the struct itself in it, so I
would have expected to see an "I am only telling you that there is a
struct with this name, but I am leaving it opaque as you do not have
any business looking inside the struct yourself.  You only need to
be aware of the type's existence and a pointer to it so that you can
call helpers that know what's inside and that should be sufficient
for your needs." decl like this in object-store.h and instead an
actual implementation in here.


I thought it natural to include the struct definition next to the 
definition for 'struct packed_git', but I like your separation of 
concerns. Perhaps we could move the packed_git definition to packfile.h 
as well (separately). Of course, that sounds like an unnecessary churn.


Thanks,

-Stolee

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread William Chargin

> As we discussed during the review on v1, ":/"
> is *NOT* pathspec (that is why having these tests in t4208 is wrong
> but we are following existing mistakes).

Ah, I understand the terminology better now. Thanks. I'll change the
commit message wording to use "extended SHA-1s" instead of "pathspecs".

> Now you have Peff's sign-off for the one-liner code change, the last
> one is redundant.

Okay: I'll remove the "Based-on-patch-by" line.

> Other than the above two nits, the patch looks good.  I would have
> broken the line after "including HEAD.", but the slightly-long line
> is also OK.

Happy to change this while making the above edits. :-)

Best,
WC

Re: [PATCH 1/3] merge: allow reading the merge commit message from a file

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

>> > +static int option_read_message(struct parse_opt_ctx_t *ctx,
>> > + const struct option *opt, int unset)
>> > +{
>> > +  struct strbuf *buf = opt->value;
>> > +  const char *arg;
>> > +
>> > +  if (unset)
>> > +  BUG("-F cannot be negated");
>> 
>> The message "-F cannot be negated" looks as if it is pointing out a
>> mistake by the end user, and does not mesh well with the real reason
>> why this is BUG() and is not die().
>> 
>> I understand that this is BUG() not die() because options[] array
>> tells this callback not to be called with unset by having the
>> PARSE_OPT_NONEG bit there.
>
> Okay. I would have appreciated some sort of indication what you prefer
> instead. I went with "--no-file?!?"

I have no strong preference; anything is OK as long as the message
is unique and points reading developer in the right direction, and
"--no-file?!?" signals quite strongly that the code is not expected
that it has to handle that option at this point (instead, it expects
somebody else has dealt with it), so it sounds fine.

I think doing all of these inside parse_options callback means that
you can have "merge -F file1 -F file2" and slurp contents from both
files as separate paragraphs.  I briefly wondered if --no-file is
something the end user might want to be able to use to discard what
has been read so far, but "merge -m msg -F file --no-file" would
have to discard everything, not just what we read from the file, so
it would not be useful with the structure of the message assembly
we have today, which this code builds on.

>> > +  if (ctx->opt) {
>> > +  arg = ctx->opt;
>> > +  ctx->opt = NULL;
>> > +  } else if (ctx->argc > 1) {
>> > +  ctx->argc--;
>> > +  arg = *++ctx->argv;
>> > +  } else
>> > +  return opterror(opt, "requires a value", 0);
>> > +
>> > +  if (buf->len)
>> > +  strbuf_addch(buf, '\n');
>> 
>> Do we assume that buf, if it is not empty, is properly terminated
>> with LF already?  I am wondering if the real reason we do these two
>> lines is to make sure we have a separating blank line between what
>> is already there (if there already is something) and what we add, in
>> which case the above would want to say
>> 
>>  if (buf->len) {
>>  strbuf_complete_line(buf);
>>  strbuf_addch(buf, '\n');
>>  }
>> 
>> instead.
>
> True. Thanks for the suggestion!
>
>> > +  if (ctx->prefix && !is_absolute_path(arg))
>> > +  arg = prefix_filename(ctx->prefix, arg);
>> > +  if (strbuf_read_file(buf, arg, 0) < 0)
>> > +  return error(_("could not read file '%s'"), arg);
>> > +  have_message = 1;
>> 
>> A similar question is what we would want to do when the file ends
>> with an incomplete line.  With "--log", we would be appending more
>> stuff to buf, and we'd want to complete such an incomplete line
>> before that happens, either here or in the code immediately before
>> "--log" is processed.
>
> This is what I inserted here:
>
>   strbuf_complete_line(buf);

I had a slight suspicion that completing immediately before we
append anything in a later step in the codepath would be safer.
When we get a complaint: 

'merge -F file' when I am not using '--log' or adding sign-off,
adds an extra newline at the end when I deliberately give a file
that ends with an incomplete line for such and such reasons.

I do not think I would have a good argument why the then-current
behaviour is not a bug but an intended behaviour.

And I do not think the fact that I am unable to fill "such and such"
above means such a complaint is nonsense---it merely indicates that
I lack imagination and that I am not thinking enough to accomodate
other people's needs.

Re: [PATCH 0/3] rebase -r: support octopus merges

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

> Git is used in *so many* different scenarios, and both in terms of
> commits/day as well as overall repository size *and* development speed,

That misses another important factor, though.  How well the project
uses the tool, iow, how canonical should its way of using the tool
be considered and encouraged to be imitated by others.

> A much more meaningful measure would be: how many octopus merge commits
> have been pushed to GitHub in the past two weeks. I don't think I have the
> technical means to answer that question, though.

It does not mean that misusing a feature is a good thing and should
be encouraged if many misguided people do so.

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee


On 7/12/2018 9:19 AM, Derrick Stolee wrote:

On 7/6/2018 1:39 AM, Eric Sunshine wrote:

On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:

The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads.

Signed-off-by: Derrick Stolee 
---
diff --git a/cache.h b/cache.h
@@ -814,6 +814,7 @@ extern char *git_replace_ref_base;
+extern int core_multi_pack_index;
diff --git a/config.c b/config.c
@@ -1313,6 +1313,11 @@ static int git_default_core_config(const char 
*var, const char *value)

+   if (!strcmp(var, "core.multipackindex")) {
+   core_multi_pack_index = git_config_bool(var, value);
+   return 0;
+   }

This is a rather unusual commit. This new configuration is assigned,
but it's never actually consulted, which means that it's impossible
for it to have any impact on functionality, yet tests are being added
to check whether it _did_ have any impact on functionality. Confusing.

Patch 17 does consult 'core_multi_pack_index', so it's only at that
point that it could have any impact. This situation would be less
confusing if you swapped patches 16 and 17 (and, of course, move the
declaration of 'core_multi_pack_index' to patch 17 with a reasonable
default value).


You're right that this commit is a bit too aware of the future, but I 
disagree with the recommendation to change it.


Yes, in this commit there is no possible way that these tests could 
fail. The point is that patches 17-23 all change behavior if this 
setting is on, and we want to make sure we do not break at any point 
along that journey (or in future iterations of the multi-pack-index 
feature).


With this in mind, I don't think there is a better commit to place 
these tests.


Of course, as I convert this global config variable into an on-demand 
check as promised [1] this commit seems even more trivial. I'm going to 
squash it with PATCH 17.


[1] 
https://public-inbox.org/git/b5733625-29c8-4317-ff44-d27c2fca1...@gmail.com/

Re: [PATCH v3 15/24] midx: write object offsets

2018-07-12 Thread Derrick Stolee


On 7/6/2018 1:27 AM, Eric Sunshine wrote:

On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:

The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee 
---
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
@@ -6,25 +6,28 @@ test_description='multi-pack-indexes'
+# usage: corrupt_data   []
+corrupt_data() {

Style: corrupt_data () {


+   file=$1
+   pos=$2
+   data="${3:-\0}"
+   printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
+}
+
+# Force 64-bit offsets by manipulating the idx file.
+# This makes the IDX file _incorrect_ so be careful to clean up after!
+test_expect_success 'force some 64-bit offsets with pack-objects' '
+   mkdir objects64 &&
+   mkdir objects64/pack &&
+   pack64=$(git pack-objects --index-version=2,0x40 objects64/pack/test-64 

I guess you don't have to worry about the POSIXPERM prerequisite here
because the file is already writable on Windows, right?


Correct. And I want this test to still run on Windows.

Thanks,

-Stolee

Re: [PATCH v1] handle lower case drive letters on Windows

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

> So maybe a much better idea is to use our `has_dos_drive_prefix()`
> function:
>
>   if (!has_dos_drive_prefix(Buffer))
>   return error("'%s': invalid drive letter", Buffer);

"'%s': path without valid drive prefix", you mean?

Re: [PATCH v3 0/6] Object store refactoring: commit graph

2018-07-12 Thread Junio C Hamano

Jonathan Tan  writes:

> This is on _both_ ds/commit-graph-fsck and sb/object-store-lookup,
> following Stolee's suggestion.
>
> (It also seems better to build it this way to me, since both these
> branches are going into "next" according to the latest What's Cooking.)

OK.  I am perfectly OK if this left lookup_commit() with older
function signature and instead asked the integrator to fix up with
evil merge with sb/object-store-lookup, but given that this topic is
no more urgent than the other one, making them intertwined is fine.

Merging ds/commit-graph-fsck into sb/object-store-lookup already
requires an evil merge to build correctly due to this new parameter.
I really wish we did the more canonical "do not change signature of
a widely used function.  If the widely used one is a mere
specialization that calls the new one with the default parameter,
make that widely used one a thin wraper (or a macro) of the new one"
approach to avoid having to repeatedly create such pointless evil
merges X-<.

>  Makefile   |   1 +
>  builtin/commit-graph.c |   2 +
>  builtin/fsck.c |   2 +-
>  cache.h|   1 -
>  commit-graph.c | 108 +
>  commit-graph.h |  11 ++--
>  commit.c   |   6 +--
>  config.c   |   5 --
>  environment.c  |   1 -
>  object-store.h |   6 +++
>  object.c   |   5 ++
>  ref-filter.c   |   2 +-
>  t/helper/test-repository.c |  82 
>  t/helper/test-tool.c   |   1 +
>  t/helper/test-tool.h   |   1 +
>  t/t5318-commit-graph.sh|  35 
>  16 files changed, 207 insertions(+), 62 deletions(-)
>  create mode 100644 t/helper/test-repository.c

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 06:50:20AM -0400, Eric Sunshine wrote:

> And, perhaps most important: We're not tied indefinitely to the
> "subset" implemented by the current linter. If it is indeed found to
> be too strict or limiting, it can always be loosened or retired
> altogether.

Yeah, I agree this is the key point.

Like Junio, I'm a little nervous that this is going to end up being a
maintenance burden. People may hit false positives and then be
confronted with this horrible mass of sed to try to figure out what went
wrong (which isn't to bust on your sed in particular; I think you made a
heroic effort in commenting).

But I came around to thinking:

  - this found and fixed real problems in the test suite, with minimal
false positives across the existing code

  - it's being done by a long-time contributor, not somebody who is
going to dump sed on us and leave

  - worst case is that relief is only a "git revert" away

So I'm OK with merging it, and even running it by default.

-Peff

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

> I would like to ask you to reinstate the post-rewrite hook, as it still
> improves the situation over the current one.

Without post-rewrite I seem to be getting correct amlog entries for
commits created by "git rebase"; do our rebase--am backend still
trigger post-applypatch hook in its "am" phase to apply the patches
created with "format-patch"?

Re: [PATCH] ref-filter: mark some file-local symbols as static

2018-07-12 Thread Оля Тележная

2018-07-12 18:57 GMT+03:00 Ramsay Jones :
>
> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Olga,
>
> If you need to re-roll your 'ot/ref-filter-object-info' branch,
> could you please squash this into the relevant patch (commit c5d9a471d6,
> "ref-filter: use oid_object_info() to get object", 2018-07-09).
>
> [Both sparse and my static-check.pl script are complaining about
> the 'oi' and 'oi_deref' symbols.]

You are right. Thanks a lot!

>
> Thanks!
>
> ATB,
> Ramsay Jones
>
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 9aedf29e0..70fb15619 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -63,7 +63,7 @@ struct refname_atom {
> int lstrip, rstrip;
>  };
>
> -struct expand_data {
> +static struct expand_data {
> struct object_id oid;
> enum object_type type;
> unsigned long size;
> --
> 2.18.0

Re: [PATCH 0/2] Fix --rebase-merges with custom commentChar

2018-07-12 Thread Junio C Hamano

Aaron Schrab  writes:

> Subject: [PATCH v2] sequencer: use configured comment character
>
> Use the configured comment character when generating comments about
> branches in a todo list.  Failure to honor this configuration causes a
> failure to parse the resulting todo list.

OK.

>
> Note that the comment_line_char has already been resolved by this point,
> even if the user has configured the comment character to be selected
> automatically.

Isn't this a slight lie?

The core.commentchar=auto setting is noticed by everybody (including
the users of the sequencer machinery), but it is honored only by
builtin/commit.c::prepare_to_commit() that is called by
builtin/commit.c::cmd_commit(), i.e. the implementation of "git
commit" that should not be used as a subroutine by other commands,
and by nothing else.  If the user has core.commentchar=auto, the
comment_line_char is left to the default '#' in the sequencer
codepath.

I think the patch is still correct and safe, but the reason why it
is so is not because we chose a suitable character (that is how I
read what "has already been resolved by this point" means) by
calling builtin/commit.c::adjust_comment_line_char().  Isn't it
because the "script" the function is working on does not have a line
that came from arbitrary end-user input that may happen to begin
with '#', hence the default '#' is safe to use?

> Signed-off-by: Aaron Schrab 
> ---
>  sequencer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 4034c0461b..caf91af29d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3991,7 +3991,7 @@ static int make_script_with_merges(struct 
> pretty_print_context *pp,
>   entry = oidmap_get(&state.commit2label, &commit->object.oid);
>  
>   if (entry)
> - fprintf(out, "\n# Branch %s\n", entry->string);
> + fprintf(out, "\n%c Branch %s\n", comment_line_char, 
> entry->string);
>   else
>   fprintf(out, "\n");

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Junio C Hamano

"Akinori MUSHA"  writes:

> It looks like write_author_script() intends to write out a file in
> Bourne shell syntax, but it doesn't put a closing single quote on the
> last line.

s/closing single quote/& and the terminating newline/?

>
> This patch makes .git/rebase-merge/author-script actually parsable by
> sh(1) by adding a single quote and a linefeed to terminate the line
> properly.

Sounds good.

I wonder why this breakage was left unnoticed for a long time,
though.  It's not like writing and reading the author-script from C
code was done first in the "rebase -i" and friends that are users of
the sequencer machinery (I think we had code to do so in "git am"
that was rewritten in C first).  Do we have a similar issue over
there as well?  If not, perhaps if we reused the existing code that
was not broken, we wouldn't have seen this breakage on the sequencer
side?

Thanks.

>
> Signed-off-by: Akinori MUSHA 
> ---
>  sequencer.c   |  1 +
>  t/t3404-rebase-interactive.sh | 13 +
>  2 files changed, 14 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index 4034c0461..5f32b6df1 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -651,6 +651,7 @@ static int write_author_script(const char *message)
>   strbuf_addch(&buf, *(message++));
>   else
>   strbuf_addf(&buf, "'%c'", *(message++));
> + strbuf_addstr(&buf, "'\n");
>   res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
>   strbuf_release(&buf);
>   return res;
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 352a52e59..345b103eb 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -75,6 +75,19 @@ test_expect_success 'rebase --keep-empty' '
>   test_line_count = 6 actual
>  '
>  
> +test_expect_success 'rebase -i writes out .git/rebase-merge/author-script in 
> "edit" that sh(1) can parse' '
> + test_when_finished "git rebase --abort ||:" &&
> + git checkout master &&
> + set_fake_editor &&
> + FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> + test -f .git/rebase-merge/author-script &&
> + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
> + eval "$(cat .git/rebase-merge/author-script)" &&
> + test "$(git show --quiet --pretty=format:%an)" = "$GIT_AUTHOR_NAME" &&
> + test "$(git show --quiet --pretty=format:%ae)" = "$GIT_AUTHOR_EMAIL" &&
> + test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
> "$GIT_AUTHOR_DATE"
> +'
> +
>  test_expect_success 'rebase -i with the exec command' '
>   git checkout master &&
>   (
> -- 
> 2.18.0

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Junio C Hamano

SZEDER Gábor  writes:

>> Thanks for this point. It seems to work using
>> "link:technical/multi-pack-index[1]", which is what I'll use in the next
>> version.
>
> It doesn't work, it merely works around the build failure.

Sorry. I fell into the same trap X-<.

link:techincal/multi-pack-index.html[the technical documentation
for it]

or something like that, perhaps.

Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-12 Thread Vitali Lovich

On Thu, Jul 12, 2018 at 8:23 AM Junio C Hamano  wrote:
>
> Vitali Lovich  writes:
>
> > Repro (starting with cwd within git project):
> >> (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> >> git rebase -i 18404434bf406f6a6f892ed73320c5cf9cc187dd
> > # Stop at some commit for editing
> >> (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> >> git rebase 18404434bf406f6a6f892ed73320c5cf9cc187dd -x "(cd xdiff; git 
> >> rev-parse --show-toplevel)"
> > ... path to git repository/xdiff !!!
> >
> > This seems like incorrect behaviour to me since it's a weird
> > inconsistency (even with other rebase contexts) & the documentation
> > says "Show the absolute path of the top-level directory." with no
> > caveats.
>
> Does it reproduce with older rebase (say from v2.10 days), too?
>
> I suspect that the above is because "git rebase" is exporting
> GIT_DIR without exporting GIT_WORK_TREE.  When the former is given
> without the latter, that tells Git that you are at the top-level of
> the working tree (if that is not the case, you also export the
> latter to tell Git where the top-level is).
>
> I suspect that "git rebase" before the ongoing piecemeal rewrite to
> C started (or scripted Porcelain in general) refrained from
> exporting GIT_DIR to the environment, and if my suspicion is correct,
> older "git rebase" would behave differently to your test case.

Unfortunately I don't have an older git version handy to test this out.

Re: [PATCH v3 0/6] Object store refactoring: commit graph

2018-07-12 Thread Derrick Stolee


On 7/11/2018 6:42 PM, Jonathan Tan wrote:

This is on _both_ ds/commit-graph-fsck and sb/object-store-lookup,
following Stolee's suggestion.

(It also seems better to build it this way to me, since both these
branches are going into "next" according to the latest What's Cooking.)

Junio wrote in [1]:


I've added SQUASH??? patch at the tip of each of the above,
rebuilt 'pu' with them and pushed the result out.  It seems that
Travis is happier with the result.

Please do not forget to squash them in when/if rerolling.  If there
is no need to change anything else other than squashing them, you
can tell me to which commit in your series the fix needs to be
squashed in (that would save me time to figure it out, obviously).

I'm rerolling because I also need to update the last patch with the new
lookup_commit() function signature that Stefan's sb/object-store-lookup
introduces. I have squashed the SQUASH??? patch into the corresponding
patch in this patch set.

Changes from v2:
  - now also based on sb/object-store-lookup in addition to
ds/commit-graph-fsck (I rebased ds/commit-graph-fsck onto
sb-object-store-lookup, then rebased this patch set onto the result)
  - patches 1-5 are unchanged
  - patch 6:
- used "PRItime" instead of "ul" when printing a timestamp (the
  SQUASH??? patch)
- updated invocations of lookup_commit() to take a repository object

[1] https://public-inbox.org/git/xmqqpnzt1myi@gitster-ct.c.googlers.com/


I re-read the patch series and think you addressed all feedback. I have 
no more to add.


Reviewed-by: Derrick Stolee

git@vger.kernel.org

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
> The change below should be squashed into this patch to fix a
> previously unnoticed broken &&-chain.  I think you missed it, because
> this test script is rather expensive and you didn't run it with
> GIT_TEST_CLONE_2GB=YesPlease.
>
> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
> -   echo "M 100644 :$i $i" >> commit
> +   echo "M 100644 :$i $i" >> commit &&

Thanks for finding this. I tried to get as much coverage as possible
by installing packages I don't normally have installed (Apache, cvs,
cvsps, Subversion, Perforce, etc.) and even temporarily modified a
script or two to force it run when I simply couldn't meet some
prerequisite, thus reducing the "skipped" messages to a minimum, but I
wasn't even aware of this prerequisite since I never saw a "skipped"
message for it.

Looking at it more closely, I think the reason it didn't come to my
attention is that this script doesn't use the standard skip_all="..."
mechanism for skipping the tests but instead "rolls its own", and
apparently 'prove' simply swallowed (or was unable to produce) an
overall "skipped" message for this script.

Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 10:10 AM Derrick Stolee  wrote:
> On 7/6/2018 12:36 AM, Eric Sunshine wrote:
> > There seems to be a fair bit of duplication in these tests which
> > create objects. Is it possible to factor out some of this code into a
> > shell function?
>
> In addition to the other small changes, this refactor in particular was
> a big change (but a good one). I'm sending my current progress in this
> direction, as I expect this can be improved.

I like the amount of code reduction. A couple minor comments...

> +generate_objects () {
> +   i=$1
> +   iii=$(printf '%03i' $i)
> +   {
> +   test-tool genrandom "bar" 200 &&
> +   test-tool genrandom "baz $iii" 50
> +   } >wide_delta_$iii &&
> +   {
> +   test-tool genrandom "foo"$i 100 &&
> +   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
> +   test-tool genrandom "foo"$(( $i + 2 )) 100
> +   } >>deep_delta_$iii &&

I think this should be: s/>>/>/

> +   echo $iii >file_$iii &&
> +   test-tool genrandom "$iii" 8192 >>file_$iii &&

And this: s/>>/>/

> +   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
> +}

Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-12 Thread Derrick Stolee


On 7/12/2018 2:02 PM, Eric Sunshine wrote:

On Thu, Jul 12, 2018 at 10:10 AM Derrick Stolee  wrote:

On 7/6/2018 12:36 AM, Eric Sunshine wrote:

There seems to be a fair bit of duplication in these tests which
create objects. Is it possible to factor out some of this code into a
shell function?

In addition to the other small changes, this refactor in particular was
a big change (but a good one). I'm sending my current progress in this
direction, as I expect this can be improved.

I like the amount of code reduction. A couple minor comments...


+generate_objects () {
+   i=$1
+   iii=$(printf '%03i' $i)
+   {
+   test-tool genrandom "bar" 200 &&
+   test-tool genrandom "baz $iii" 50
+   } >wide_delta_$iii &&
+   {
+   test-tool genrandom "foo"$i 100 &&
+   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
+   test-tool genrandom "foo"$(( $i + 2 )) 100
+   } >>deep_delta_$iii &&

I think this should be: s/>>/>/


It should!


+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&

And this: s/>>/>/


In addition, I should wrap these two commands in { } like the files above.

Thanks,

-Stolee

Re: [GSoC][PATCH v3 12/13] rebase -i: implement the logic to initialize the variable $revision in C

2018-07-12 Thread Junio C Hamano

Alban Gruin  writes:

> @@ -50,6 +71,13 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   N_("prepare the branch to be rebased"), 
> PREPARE_BRANCH),
>   OPT_CMDMODE(0, "complete-action", &command,
>   N_("complete the action"), COMPLETE_ACTION),
> + OPT_STRING(0, "onto", &onto, N_("onto"), N_("onto")),
> + OPT_STRING(0, "restrict-revision", &restrict_revisions,
> +N_("restrict-revision"), N_("restrict revision")),
> + OPT_STRING(0, "squash-onto", &squash_onto, N_("squash-onto"),
> +N_("squash onto")),
> + OPT_STRING(0, "upstream", &upstream, N_("upstream"),
> +N_("the upstream commit")),
>   OPT_END()
>   };
>  
> @@ -77,8 +105,30 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!sequencer_continue(&opts);
>   if (command == ABORT && argc == 1)
>   return !!sequencer_remove_state(&opts);
> - if (command == MAKE_SCRIPT && argc > 1)
> - return !!sequencer_make_script(stdout, argc, argv, flags);
> + if (command == MAKE_SCRIPT && (argc == 1 || argc == 2)) {

Sorry but I am confused.  The call to rebase--helper --make-script
in git-rebase--interactive.sh we see below has only dashed options
(like --upstream, --onto, --squash-onto, --restrict-revision) and
the option parameters given to these options.

What are the remaining 1 or 2 arguments we are processing here?

Well, actually argc==1 means there is nothing else, so I am asking
about the case where argc==2.

> + char *revisions = NULL;
> + struct argv_array make_script_args = ARGV_ARRAY_INIT;
> +
> + if (!upstream && squash_onto)
> + write_file(path_squash_onto(), "%s\n", squash_onto);
> +
> + ret = get_revision_ranges(upstream, onto, &head_hash, 
> &revisions);
> + if (ret)
> + return ret;
> +
> + argv_array_pushl(&make_script_args, "", revisions, NULL);
> + if (argc == 2)
> + argv_array_push(&make_script_args, restrict_revisions);
> +
> + ret = !!sequencer_make_script(stdout,
> +   make_script_args.argc, 
> make_script_args.argv,
> +   flags);

Exactly the same comment on !! as an earlier step applies here.

> + free(revisions);
> + argv_array_clear(&make_script_args);

If I am reading the body of this if() block correctly, I think it
does everything init_revisions_and_shortrevisions shell function
does, i.e. compute $revisions for both cases with or without
upstream and write squash-onto state if needed, so that we can call
the sequencer_make_script() helper with necessary $revisions arg
without being passed from the command line of --make-script helper.

But the hunk below tells me that you are still calling
init_revisions_and_shortrevisions shell function before we are
called.  Why?  IOW, why isn't this step removing that shell function
*and* the call to it, now its logic is fully implemented in the body
of this if() block?

> + return ret;
> + }
>   if (command == CHECK_TODO_LIST && argc == 1)
>   return !!check_todo_list();
>   if (command == REARRANGE_SQUASH && argc == 1)

Thanks.

> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 0d66c0f8b..4ca47aed1 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -92,7 +92,9 @@ git_rebase__interactive () {
>   git rebase--helper --make-script ${keep_empty:+--keep-empty} \
>   ${rebase_merges:+--rebase-merges} \
>   ${rebase_cousins:+--rebase-cousins} \
> - $revisions ${restrict_revision+^$restrict_revision} >"$todo" ||
> + ${upstream:+--upstream "$upstream"} ${onto:+--onto "$onto"} \
> + ${squash_onto:+--squash-onto "$squash_onto"} \
> + ${restrict_revision:+--restrict-revision ^"$restrict_revision"} 
> >"$todo" ||
>   die "$(gettext "Could not generate todo list")"
>  
>   exec git rebase--helper --complete-action "$shortrevisions" 
> "$onto_name" \

git@vger.kernel.org

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 01:44:49PM -0400, Eric Sunshine wrote:

> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
> > The change below should be squashed into this patch to fix a
> > previously unnoticed broken &&-chain.  I think you missed it, because
> > this test script is rather expensive and you didn't run it with
> > GIT_TEST_CLONE_2GB=YesPlease.
> >
> > diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
> > @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
> > -   echo "M 100644 :$i $i" >> commit
> > +   echo "M 100644 :$i $i" >> commit &&
> 
> Thanks for finding this. I tried to get as much coverage as possible
> by installing packages I don't normally have installed (Apache, cvs,
> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
> script or two to force it run when I simply couldn't meet some
> prerequisite, thus reducing the "skipped" messages to a minimum, but I
> wasn't even aware of this prerequisite since I never saw a "skipped"
> message for it.

I think in theory we should be able to lint _every_ test, even if we're
not running it. After all, the point is that a proper linting runs zero
commands.

That said, it may not be worth the implementation effort. The linting
happens at a pretty low-level, and we've already decided to skip tests
long before then (even for single prereqs, let alone skip_all cases
where we exit the script early).

> Looking at it more closely, I think the reason it didn't come to my
> attention is that this script doesn't use the standard skip_all="..."
> mechanism for skipping the tests but instead "rolls its own", and
> apparently 'prove' simply swallowed (or was unable to produce) an
> overall "skipped" message for this script.

Yeah, that is a bit funny. For a whole-test skip like this, I think
skip_all is easier to read, as it is immediately apparent to the reader
that nothing that _doesn't_ meet that prereq should be in the file. So
I'd be happy to see it switched (though it's not _that_ big a deal, so
leaving it is fine, too).

By the way, "prove --directives" can help with finding individual
skipped tests.

-Peff

Re: [GSoC][PATCH v3 12/13] rebase -i: implement the logic to initialize the variable $revision in C

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

> If I am reading the body of this if() block correctly, I think it
> does everything init_revisions_and_shortrevisions shell function
> does, i.e. compute $revisions for both cases with or without
> upstream and write squash-onto state if needed, so that we can call
> the sequencer_make_script() helper with necessary $revisions arg
> without being passed from the command line of --make-script helper.
>
> But the hunk below tells me that you are still calling
> init_revisions_and_shortrevisions shell function before we are
> called.  Why?  IOW, why isn't this step removing that shell function
> *and* the call to it, now its logic is fully implemented in the body
> of this if() block?

You can ignore this part (but not the rest) of my comments, as 13/13
answers it adequately.  After this step, the shell function still
needs to be called to set $shortrevisions.

Thanks.

git@vger.kernel.org

2018-07-12 Thread Junio C Hamano

Eric Sunshine  writes:

> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
>> The change below should be squashed into this patch to fix a
>> previously unnoticed broken &&-chain.  I think you missed it, because
>> this test script is rather expensive and you didn't run it with
>> GIT_TEST_CLONE_2GB=YesPlease.
>>
>> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
>> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
>> -   echo "M 100644 :$i $i" >> commit
>> +   echo "M 100644 :$i $i" >> commit &&
>
> Thanks for finding this. I tried to get as much coverage as possible
> by installing packages I don't normally have installed (Apache, cvs,
> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
> script or two to force it run ...

Thanks, both.  I think the &&-chain fix series is already large and
also otherwise seem to be pretty solid, so let's not reroll but
queue this addition on top.

git@vger.kernel.org

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

> Eric Sunshine  writes:
>
>> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
>>> The change below should be squashed into this patch to fix a
>>> previously unnoticed broken &&-chain.  I think you missed it, because
>>> this test script is rather expensive and you didn't run it with
>>> GIT_TEST_CLONE_2GB=YesPlease.
>>>
>>> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
>>> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
>>> -   echo "M 100644 :$i $i" >> commit
>>> +   echo "M 100644 :$i $i" >> commit &&
>>
>> Thanks for finding this. I tried to get as much coverage as possible
>> by installing packages I don't normally have installed (Apache, cvs,
>> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
>> script or two to force it run ...
>
> Thanks, both.  I think the &&-chain fix series is already large and
> also otherwise seem to be pretty solid, so let's not reroll but
> queue this addition on top.

Oops, sent before completing the message.

For that to happen, we need a sign-off ;-)

I guess this one would have been caught with the "sed script on
subshell" linter that does not execute?

-- >8 --
Subject: t5608: fix broken &&-chain

This is inside a loop that is run inside a subshell, in a test that
is protected with CLONE_2GB prerequisite, one or more which is quite
likely reason why it wasn't caught durin the previous clean-up.

Signed-off-by: Junio C Hamano 
---
 t/t5608-clone-2gb.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
index df822d9a3e..2c6bc07344 100755
--- a/t/t5608-clone-2gb.sh
+++ b/t/t5608-clone-2gb.sh
@@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
printf "blob\nmark :$i\ndata $blobsize\n" &&
#test-tool genrandom $i $blobsize &&
printf "%-${blobsize}s" $i &&
-   echo "M 100644 :$i $i" >> commit
+   echo "M 100644 :$i $i" >> commit &&
i=$(($i+1)) ||
echo $? > exit-status
 done &&

git@vger.kernel.org

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 2:35 PM Junio C Hamano  wrote:
> Junio C Hamano  writes:
> Oops, sent before completing the message.
>
> For that to happen, we need a sign-off ;-)
>
> I guess this one would have been caught with the "sed script on
> subshell" linter that does not execute?

Yes, this is correctly caught when the prerequisite is met.

> -- >8 --
> Subject: t5608: fix broken &&-chain
>
> This is inside a loop that is run inside a subshell, in a test that
> is protected with CLONE_2GB prerequisite, one or more which is quite
> likely reason why it wasn't caught durin the previous clean-up.

s/durin/during/

The exact reason is that the prerequisite was not met (indeed, I
wasn't even aware of that prerequisite), so the commit message can be
more direct:

This was missed by the previous clean-ups due to an unmet
CLONE_2GB prerequisite.

Thanks for saving a round-trip.

> Signed-off-by: Junio C Hamano

Re: [PATCH] t6036: fix broken && chain in sub-shell

2018-07-12 Thread Junio C Hamano

Ramsay Jones  writes:

> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> I had a test failure on 'pu' today - Eric's chain-lint series
> found another broken chain in one of Elijah's new tests (on the
> 'en/t6036-recursive-corner-cases' branch).

Thanks, I see the same breakage in my build, too.

>
> ATB,
> Ramsay Jones
>
>  t/t6036-recursive-corner-cases.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t6036-recursive-corner-cases.sh 
> b/t/t6036-recursive-corner-cases.sh
> index 2a44acace..59e52c5a0 100755
> --- a/t/t6036-recursive-corner-cases.sh
> +++ b/t/t6036-recursive-corner-cases.sh
> @@ -495,7 +495,7 @@ test_expect_success 'setup differently handled merges of 
> directory/file conflict
>   test_write_lines 1 2 3 4 5 6 7 8 >a &&
>   git add a &&
>   git commit -m E3 &&
> - git tag E3
> + git tag E3 &&
>  
>   git checkout C^0 &&
>   test_must_fail git merge B^0 &&

git@vger.kernel.org

2018-07-12 Thread Junio C Hamano

Eric Sunshine  writes:

> On Thu, Jul 12, 2018 at 2:35 PM Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>> Oops, sent before completing the message.
>>
>> For that to happen, we need a sign-off ;-)
>>
>> I guess this one would have been caught with the "sed script on
>> subshell" linter that does not execute?
>
> Yes, this is correctly caught when the prerequisite is met.
>
>> -- >8 --
>> Subject: t5608: fix broken &&-chain
>>
>> This is inside a loop that is run inside a subshell, in a test that
>> is protected with CLONE_2GB prerequisite, one or more which is quite
>> likely reason why it wasn't caught durin the previous clean-up.
>
> s/durin/during/
>
> The exact reason is that the prerequisite was not met (indeed, I
> wasn't even aware of that prerequisite), so the commit message can be
> more direct:
>
> This was missed by the previous clean-ups due to an unmet
> CLONE_2GB prerequisite.
>
> Thanks for saving a round-trip.
>
>> Signed-off-by: Junio C Hamano 

There still need a sign-off, so ... ;-)

git@vger.kernel.org

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 2:50 PM Junio C Hamano  wrote:
> Eric Sunshine  writes:
> > The exact reason is that the prerequisite was not met (indeed, I
> > wasn't even aware of that prerequisite), so the commit message can be
> > more direct:
> >
> > This was missed by the previous clean-ups due to an unmet
> > CLONE_2GB prerequisite.
> >
> > Thanks for saving a round-trip.
> >
> >> Signed-off-by: Junio C Hamano 
>
> There still need a sign-off, so ... ;-)

For whose sign-off are you waiting, SZEDER's or mine? (I presume SZEDER's.)

gitweb and Levenshtein

2018-07-12 Thread David Brown

Howdy, I want to hack the getweb_make_index.perl script to create a 
string search using: 
https://github.com/git/git/blob/master/levenshtein.c.


How do i reference the compiled code?

I would like to call this routine using Java and maybe Perl.

Please advise.

Thanks.

Regards,

[PATCH] RFC: submodule-config: introduce trust level

2018-07-12 Thread Stefan Beller

The series merged at 614ea03a71e (Merge branch
'bw/submodule-config-cleanup', 2017-08-26) went to great length to make it
explicit to the caller where a value came from, as that would help the
caller to be careful to decide which values to take from where, i.e. be
careful about security implications.

In practice we always want to stack the settings starting with the
.gitmodules file as a base and then put the config on top of it.
So let's manage the security aspects impolitely in the submodule-config
machinery directly where we implement its parsing as there is a good
place to reason about the trust that we need to put into a parsed value.

This patch implements the trust level that is passed to the parsing,'
currently we only pass in the 'in_repo' level of trust, which is the
.gitmodules file.

Follow up patches could add other sources that populate the submodule
config again.

Signed-off-by: Stefan Beller 
---

 This is on top of ao/config-from-gitmodules.

 submodule-config.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 77421a49719..09eab9f00e0 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -21,6 +21,11 @@ struct submodule_cache {
unsigned gitmodules_read:1;
 };
 
+enum submodule_config_trust_level {
+   in_repo = 1,
+   configured_by_user = 2
+};
+
 /*
  * thin wrapper struct needed to insert 'struct submodule' entries to
  * the hashmap
@@ -387,12 +392,14 @@ struct parse_config_parameter {
struct submodule_cache *cache;
const struct object_id *treeish_name;
const struct object_id *gitmodules_oid;
+   enum submodule_config_trust_level source;
int overwrite;
 };
 
 static int parse_config(const char *var, const char *value, void *data)
 {
struct parse_config_parameter *me = data;
+   enum submodule_config_trust_level source = me->source;
struct submodule *submodule;
struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
int ret = 0;
@@ -406,6 +413,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
 name.buf);
 
if (!strcmp(item.buf, "path")) {
+   /* all sources allowed */
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite && submodule->path)
@@ -419,6 +427,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
cache_put_path(me->cache, submodule);
}
} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+   /* all sources allowed */
/* when parsing worktree configurations we can die early */
int die_on_error = is_null_oid(me->gitmodules_oid);
if (!me->overwrite &&
@@ -430,6 +439,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
var, value,
die_on_error);
} else if (!strcmp(item.buf, "ignore")) {
+   /* all sources allowed */
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite && submodule->ignore)
@@ -446,6 +456,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
submodule->ignore = xstrdup(value);
}
} else if (!strcmp(item.buf, "url")) {
+   /* all sources allowed */
if (!value) {
ret = config_error_nonbool(var);
} else if (!me->overwrite && submodule->url) {
@@ -456,16 +467,27 @@ static int parse_config(const char *var, const char 
*value, void *data)
submodule->url = xstrdup(value);
}
} else if (!strcmp(item.buf, "update")) {
+   struct submodule_update_strategy st;
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite &&
 submodule->update_strategy.type != 
SM_UPDATE_UNSPECIFIED)
warn_multiple_config(me->treeish_name, submodule->name,
 "update");
-   else if (parse_submodule_update_strategy(value,
-&submodule->update_strategy) < 0)
-   die(_("invalid value for %s"), var);
+   else if (parse_submodule_update_strategy(value, &st) < 0)
+   die(_("invalid value for %s"), var);
+   else if (source <= in_repo) {
+   if (st.type == SM_UPDATE_COMMAND) {
+   submodule->update_strategy.type = st.type;
+   submodule->update_strategy.c

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread Junio C Hamano

William Chargin  writes:

>> As we discussed during the review on v1, ":/"
>> is *NOT* pathspec (that is why having these tests in t4208 is wrong
>> but we are following existing mistakes).
>
> Ah, I understand the terminology better now. Thanks. I'll change the
> commit message wording to use "extended SHA-1s" instead of "pathspecs".
>
>> Now you have Peff's sign-off for the one-liner code change, the last
>> one is redundant.
>
> Okay: I'll remove the "Based-on-patch-by" line.
>
>> Other than the above two nits, the patch looks good.  I would have
>> broken the line after "including HEAD.", but the slightly-long line
>> is also OK.
>
> Happy to change this while making the above edits. :-)

Let's save a round-trip.  Here is what I'd queue for now, and you
can just say "looks good" if you agree with the result, or send an
updated version ;-).

Thanks for working on this.

-- >8 --
From: William Chargin 
Date: Wed, 11 Jul 2018 22:49:09 -0700
Subject: [PATCH] sha1-name.c: for ":/", find detached HEAD commits

This patch broadens the set of commits matched by ":/" to
include commits reachable from HEAD but not any named ref. This avoids
surprising behavior when working with a detached HEAD and trying to
refer to a commit that was recently created and only exists within the
detached state.

If multiple worktrees exist, only the current worktree's HEAD is
considered reachable. This is consistent with the existing behavior for
other per-worktree refs: e.g., bisect refs are considered reachable, but
only within the relevant worktree.

Signed-off-by: Jeff King 
Signed-off-by: William Chargin 
Signed-off-by: Junio C Hamano 
---
 Documentation/revisions.txt   |  3 ++-
 sha1_name.c   |  1 +
 t/t4208-log-magic-pathspec.sh | 26 ++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index dfcc49c72c..0845c3f590 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -180,7 +180,8 @@ existing tag object.
   A colon, followed by a slash, followed by a text, names
   a commit whose commit message matches the specified regular expression.
   This name returns the youngest matching commit which is
-  reachable from any ref. The regular expression can match any part of the
+  reachable from any ref, including HEAD.
+  The regular expression can match any part of the
   commit message. To match messages starting with a string, one can use
   e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what
   is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..19f66713e1 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1649,6 +1649,7 @@ static int get_oid_with_context_1(const char *name,
struct commit_list *list = NULL;
 
for_each_ref(handle_one_ref, &list);
+   head_ref(handle_one_ref, &list);
commit_list_sort_by_date(&list);
return get_oid_oneline(name + 2, oid, list);
}
diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh
index a1705f70cf..69643d101d 100755
--- a/t/t4208-log-magic-pathspec.sh
+++ b/t/t4208-log-magic-pathspec.sh
@@ -25,6 +25,32 @@ test_expect_success '"git log :/a -- " should not be 
ambiguous' '
git log :/a --
 '
 
+test_expect_success '"git log :/detached -- " should find a commit only in 
HEAD' '
+   test_when_finished "git checkout master" &&
+   git checkout --detach &&
+   # Must manually call `test_tick` instead of using `test_commit`,
+   # because the latter additionally creates a tag, which would make
+   # the commit reachable not only via HEAD.
+   test_tick &&
+   git commit --allow-empty -m detached &&
+   test_tick &&
+   git commit --allow-empty -m something-else &&
+   git log :/detached --
+'
+
+test_expect_success '"git log :/detached -- " should not find an orphaned 
commit' '
+   test_must_fail git log :/detached --
+'
+
+test_expect_success '"git log :/detached -- " should find HEAD only of own 
worktree' '
+   git worktree add other-tree HEAD &&
+   git -C other-tree checkout --detach &&
+   test_tick &&
+   git -C other-tree commit --allow-empty -m other-detached &&
+   git -C other-tree log :/other-detached -- &&
+   test_must_fail git log :/other-detached --
+'
+
 test_expect_success '"git log -- :/a" should not be ambiguous' '
git log -- :/a
 '
-- 
2.18.0-129-ge3331758f1

Re: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Junio C Hamano

Ben Peart  writes:

> On Windows, if a tool calls SetCurrentDirectory with a lower case drive
> letter, the subsequent call to GetCurrentDirectory will return the same
> lower case drive letter. Powershell, for example, does not normalize the
> path. If that happens, test-drop-caches will error out as it does not
> correctly to handle lower case drive letters.
>
> Helped-by: Johannes Schindelin 
> Signed-off-by: Ben Peart 
> ---

Thanks.  Will replace, even though showing the whole Buffer (I am
guessing it is the equivalent of result from getcwd(3) call) and
complaining about drive "letter" feels a bit strange ;-)

> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/1ff8de1b6c
> Checkout: git fetch https://github.com/benpeart/git drop-caches-v3 && git 
> checkout 1ff8de1b6c
> 
> ### Interdiff (v2..v3):
> 
> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index 37047189c3..f65e301f9d 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -16,8 +16,8 @@ static int cmd_sync(void)
>   if ((0 == dwRet) || (dwRet > MAX_PATH))
>   return error("Error getting current directory");
> 
> - if ((toupper(Buffer[0]) < 'A') || (toupper(Buffer[0]) > 'Z'))
> - return error("Invalid drive letter '%c'", Buffer[0]);
> + if (!has_dos_drive_prefix(Buffer))
> + return error("'%s': invalid drive letter", Buffer);
> 
>   szVolumeAccessPath[4] = Buffer[0];
>   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
> 
> ### Patches
>
>  t/helper/test-drop-caches.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index d6bcfddf13..f65e301f9d 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -16,8 +16,8 @@ static int cmd_sync(void)
>   if ((0 == dwRet) || (dwRet > MAX_PATH))
>   return error("Error getting current directory");
>  
> - if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
> - return error("Invalid drive letter '%c'", Buffer[0]);
> + if (!has_dos_drive_prefix(Buffer))
> + return error("'%s': invalid drive letter", Buffer);
>  
>   szVolumeAccessPath[4] = Buffer[0];
>   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
>
> base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9

RE: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Ben Peart

> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, July 12, 2018 3:13 PM
> To: Ben Peart 
> Cc: git@vger.kernel.org; sbel...@google.com; johannes.schinde...@gmx.de
> Subject: Re: [PATCH v3] handle lower case drive letters on Windows
> 
> Ben Peart  writes:
> 
> > On Windows, if a tool calls SetCurrentDirectory with a lower case drive
> > letter, the subsequent call to GetCurrentDirectory will return the same
> > lower case drive letter. Powershell, for example, does not normalize the
> > path. If that happens, test-drop-caches will error out as it does not
> > correctly to handle lower case drive letters.
> >
> > Helped-by: Johannes Schindelin 
> > Signed-off-by: Ben Peart 
> > ---
> 
> Thanks.  Will replace, even though showing the whole Buffer (I am
> guessing it is the equivalent of result from getcwd(3) call) and
> complaining about drive "letter" feels a bit strange ;-)
> 

Thanks.  I thought it was strange as well until I realized you only see the
error message if there _isn't_ a valid drive letter so seeing the entire
path makes sense as it will likely be something like "\\server\volume\directory"

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 12:56 PM Jeff King  wrote:
> Like Junio, I'm a little nervous that this is going to end up being a
> maintenance burden. People may hit false positives and then be
> confronted with this horrible mass of sed to try to figure out what went
> wrong [...]

A very valid concern.

> But I came around to thinking:
>   - this found and fixed real problems in the test suite, with minimal
> false positives across the existing code

The counterargument (and arguing against my own case) is that, while
it found 3 or 4 genuine test bugs hidden by &&-breakage, they were
just that: bugs in the tests; they weren't hiding any bugs in Git
itself, which is pretty measly return for the effort invested in the
linter.

However, existing tests aside, the more important goal is detecting
problems in new or updated tests hiding genuine bugs in changes to Git
itself, so it may have some value.

>   - worst case is that relief is only a "git revert" away

Right. It's just a developer aid, not a user-facing feature which has
to be maintained in perpetuity, so retiring it is easy.

[PATCH v4 03/23] multi-pack-index: add builtin

2018-07-12 Thread Derrick Stolee

This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  3 ++-
 Documentation/git-multi-pack-index.txt | 36 ++
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/multi-pack-index.c | 34 
 command-list.txt   |  1 +
 git.c  |  1 +
 7 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c

diff --git a/.gitignore b/.gitignore
index 388cc4beee..25633bc515 100644
--- a/.gitignore
+++ b/.gitignore
@@ -99,8 +99,9 @@
 /git-mergetool--lib
 /git-mktag
 /git-mktree
-/git-name-rev
+/git-multi-pack-index
 /git-mv
+/git-name-rev
 /git-notes
 /git-p4
 /git-pack-redundant
diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
new file mode 100644
index 00..ec9982cbfc
--- /dev/null
+++ b/Documentation/git-multi-pack-index.txt
@@ -0,0 +1,36 @@
+git-multi-pack-index(1)
+==
+
+NAME
+
+git-multi-pack-index - Write and verify multi-pack-indexes
+
+
+SYNOPSIS
+
+[verse]
+'git multi-pack-index' [--object-dir=]
+
+DESCRIPTION
+---
+Write or verify a multi-pack-index (MIDX) file.
+
+OPTIONS
+---
+
+--object-dir=::
+   Use given directory for the location of Git objects. We check
+   `/packs/multi-pack-index` for the current MIDX file, and
+   `/packs` for the pack-files to index.
+
+
+SEE ALSO
+
+See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
+Document] and link:technical/pack-format.html[The Multi-Pack-Index
+Format] for more information on the multi-pack-index feature.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index e4b503d259..54610875ec 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/mktag.o
 BUILTIN_OBJS += builtin/mktree.o
+BUILTIN_OBJS += builtin/multi-pack-index.o
 BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
diff --git a/builtin.h b/builtin.h
index 4e0f64723e..70997d7ace 100644
--- a/builtin.h
+++ b/builtin.h
@@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, 
const char *prefix);
 extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_mktag(int argc, const char **argv, const char *prefix);
 extern int cmd_mktree(int argc, const char **argv, const char *prefix);
+extern int cmd_multi_pack_index(int argc, const char **argv, const char 
*prefix);
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
new file mode 100644
index 00..3161ddae86
--- /dev/null
+++ b/builtin/multi-pack-index.c
@@ -0,0 +1,34 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_multi_pack_index_usage[] = {
+   N_("git multi-pack-index [--object-dir=]"),
+   NULL
+};
+
+static struct opts_multi_pack_index {
+   const char *object_dir;
+} opts;
+
+int cmd_multi_pack_index(int argc, const char **argv,
+const char *prefix)
+{
+   static struct option builtin_multi_pack_index_options[] = {
+   OPT_FILENAME(0, "object-dir", &opts.object_dir,
+ N_("object directory containing set of packfile and 
pack-index pairs")),
+   OPT_END(),
+   };
+
+   git_config(git_default_config, NULL);
+
+   argc = parse_options(argc, argv, prefix,
+builtin_multi_pack_index_options,
+builtin_multi_pack_index_usage, 0);
+
+   if (!opts.object_dir)
+   opts.object_dir = get_object_directory();
+
+   return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index e1c26c1bb7..61071f8fa2 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -123,6 +123,7 @@ git-merge-index plumbingmanipulators
 git-merge-one-file  purehelpers
 git-mergetool   ancillarymanipulators   
complete
 git-merge-tree  ancillaryinterrogators
+git-multi-pack-indexplumbingmanipulators
 git-mktag   plumbingmanipulators
 git-mktree  plumbingmanipulators
 git-mv  mainporcelain   w

[PATCH v4 00/23] Multi-pack-index (MIDX)

2018-07-12 Thread Derrick Stolee

v3 had a lot of interesting feedback, most of which was non-functional,
but made a big impact on the shape of the patch, especially the test
script.

These are the important changes:

* 'git multi-pack-index' will report usage if the 'write' verb is not
  provided, or if extra parameters are provided. A later series will
  create the 'verify' verb.

* t5319-multi-pack-index.sh has a reoganized way to generate object
  data, so it has fewer code clones.

* 'test-tool read-midx' uses '-' instead of '_'.

* The global 'core_multi_pack_index' is replaced with a one-time call to
  git_config_bool() per repository that loads a multi-pack-index.

* 'struct multi_pack_index' is now defined in midx.h and kept anonymous
  to object-store.h.

* Added a test that 'git repack' removes the multi-pack-index.

* Fixed a doc bug when linking to the technical docs.

I included the diff between the latest ds/multi-pack-index and this
series as part of this message.

You can see the CI builds for Linux, Mac, and Windows linked from the
GitHub pull request [1].

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/5

Derrick Stolee (23):
  multi-pack-index: add design document
  multi-pack-index: add format details
  multi-pack-index: add builtin
  multi-pack-index: add 'write' verb
  midx: write header information to lockfile
  multi-pack-index: load into memory
  t5319: expand test data
  packfile: generalize pack directory list
  multi-pack-index: read packfile list
  multi-pack-index: write pack names in chunk
  midx: read pack names into array
  midx: sort and deduplicate objects from packfiles
  midx: write object ids in a chunk
  midx: write object id fanout chunk
  midx: write object offsets
  config: create core.multiPackIndex setting
  midx: read objects from multi-pack-index
  midx: use midx in abbreviation calculations
  midx: use existing midx when writing new one
  midx: use midx in approximate_object_count
  midx: prevent duplicate packfile loads
  packfile: skip loading index if in multi-pack-index
  midx: clear midx on repack

 .gitignore   |   3 +-
 Documentation/config.txt |   5 +
 Documentation/git-multi-pack-index.txt   |  56 ++
 Documentation/technical/multi-pack-index.txt | 109 +++
 Documentation/technical/pack-format.txt  |  77 ++
 Makefile |   3 +
 builtin.h|   1 +
 builtin/multi-pack-index.c   |  47 +
 builtin/repack.c |   9 +
 command-list.txt |   1 +
 git.c|   1 +
 midx.c   | 918 +++
 midx.h   |  44 +
 object-store.h   |   9 +
 packfile.c   | 169 +++-
 packfile.h   |   9 +
 sha1-name.c  |  70 ++
 t/helper/test-read-midx.c|  51 ++
 t/helper/test-tool.c |   1 +
 t/helper/test-tool.h |   1 +
 t/t5319-multi-pack-index.sh  | 179 
 21 files changed, 1720 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 Documentation/technical/multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100644 t/helper/test-read-midx.c
 create mode 100755 t/t5319-multi-pack-index.sh


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
-- 
2.18.0.118.gd4f65b8d14

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9dcde07a34..25f817ca42 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -910,7 +910,8 @@ core.commitGraph::
 
 core.multiPackIndex::
Use the multi-pack-index file to track multiple packfiles using a
-   single index. See link:technical/multi-pack-index[1].
+   single index. See link:technical/multi-pack-index.html[the
+   multi-pack-index design document].
 
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index be97c9372e..a62af1caca 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir ] 
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -18,7 +18,7 @@ Write or verify a multi-pack-index (MIDX) file.
 OPTIONS
 ---
 
---object-dir ::
+--object-dir=::
Use given directory for the location of Git objects. We check
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.

[PATCH v4 01/23] multi-pack-index: add design document

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/multi-pack-index.txt | 109 +++
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/technical/multi-pack-index.txt

diff --git a/Documentation/technical/multi-pack-index.txt 
b/Documentation/technical/multi-pack-index.txt
new file mode 100644
index 00..d7e57639f7
--- /dev/null
+++ b/Documentation/technical/multi-pack-index.txt
@@ -0,0 +1,109 @@
+Multi-Pack-Index (MIDX) Design Notes
+
+
+The Git object directory contains a 'pack' directory containing
+packfiles (with suffix ".pack") and pack-indexes (with suffix
+".idx"). The pack-indexes provide a way to lookup objects and
+navigate to their offset within the pack, but these must come
+in pairs with the packfiles. This pairing depends on the file
+names, as the pack-index differs only in suffix with its pack-
+file. While the pack-indexes provide fast lookup per packfile,
+this performance degrades as the number of packfiles increases,
+because abbreviations need to inspect every packfile and we are
+more likely to have a miss on our most-recently-used packfile.
+For some large repositories, repacking into a single packfile
+is not feasible due to storage space or excessive repack times.
+
+The multi-pack-index (MIDX for short) stores a list of objects
+and their offsets into multiple packfiles. It contains:
+
+- A list of packfile names.
+- A sorted list of object IDs.
+- A list of metadata for the ith object ID including:
+  - A value j referring to the jth packfile.
+  - An offset within the jth packfile for the object.
+- If large offsets are required, we use another list of large
+  offsets similar to version 2 pack-indexes.
+
+Thus, we can provide O(log N) lookup time for any number
+of packfiles.
+
+Design Details
+--
+
+- The MIDX is stored in a file named 'multi-pack-index' in the
+  .git/objects/pack directory. This could be stored in the pack
+  directory of an alternate. It refers only to packfiles in that
+  same directory.
+
+- The pack.multiIndex config setting must be on to consume MIDX files.
+
+- The file format includes parameters for the object ID hash
+  function, so a future change of hash algorithm does not require
+  a change in format.
+
+- The MIDX keeps only one record per object ID. If an object appears
+  in multiple packfiles, then the MIDX selects the copy in the most-
+  recently modified packfile.
+
+- If there exist packfiles in the pack directory not registered in
+  the MIDX, then those packfiles are loaded into the `packed_git`
+  list and `packed_git_mru` cache.
+
+- The pack-indexes (.idx files) remain in the pack directory so we
+  can delete the MIDX file, set core.midx to false, or downgrade
+  without any loss of information.
+
+- The MIDX file format uses a chunk-based approach (similar to the
+  commit-graph file) that allows optional data to be added.
+
+Future Work
+---
+
+- Add a 'verify' subcommand to the 'git midx' builtin to verify the
+  contents of the multi-pack-index file match the offsets listed in
+  the corresponding pack-indexes.
+
+- The multi-pack-index allows many packfiles, especially in a context
+  where repacking is expensive (such as a very large repo), or
+  unexpected maintenance time is unacceptable (such as a high-demand
+  build machine). However, the multi-pack-index needs to be rewritten
+  in full every time. We can extend the format to be incremental, so
+  writes are fast. By storing a small "tip" multi-pack-index that
+  points to large "base" MIDX files, we can keep writes fast while
+  still reducing the number of binary searches required for object
+  lookups.
+
+- The reachability bitmap is currently paired directly with a single
+  packfile, using the pack-order as the object order to hopefully
+  compress the bitmaps well using run-length encoding. This could be
+  extended to pair a reachability bitmap with a multi-pack-index. If
+  the multi-pack-index is extended to store a "stable object order"
+  (a function Order(hash) = integer that is constant for a given hash,
+  even as the multi-pack-index is updated) then a reachability bitmap
+  could point to a multi-pack-index and be updated independently.
+
+- Packfiles can be marked as "special" using empty files that share
+  the initial name but replace ".pack" with ".keep" or ".promisor".
+  We can add an optional chunk of data to the multi-pack-index that
+  records flags of information about the packfiles. This allows new
+  states, such as 'repacked' or 'redeltified', that can help with
+  pack maintenance in a multi-pack environment. It may also be
+  helpful to organize packfiles by object type (commit, tree, blob,
+  etc.) and use this metadata to help that maintenance.
+
+- The partial clone feature records special "promisor" packs that
+  may point to objects that are not stored locally, but available
+  on request to a server. The multi-pack-index does not cu

[PATCH v4 02/23] multi-pack-index: add format details

2018-07-12 Thread Derrick Stolee

The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt | 49 +
 1 file changed, 49 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 70a99fd142..e060e693f4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,52 @@ Pack file entry: <+
 corresponding packfile.
 
 20-byte SHA-1-checksum of all of the above.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+   4-byte signature:
+   The signature is: {'M', 'I', 'D', 'X'}
+
+   1-byte version number:
+   Git only writes or recognizes version 1.
+
+   1-byte Object Id Version
+   Git only writes or recognizes version 1 (SHA1).
+
+   1-byte number of "chunks"
+
+   1-byte number of base multi-pack-index files:
+   This value is currently always zero.
+
+   4-byte number of pack files
+
+CHUNK LOOKUP:
+
+   (C + 1) * 12 bytes providing the chunk offsets:
+   First 4 bytes describe chunk id. Value 0 is a terminating label.
+   Other 8 bytes provide offset in current file for chunk to start.
+   (Chunks are provided in file-order, so you can infer the length
+   using the next chunk position if necessary.)
+
+   The remaining data in the body is described one chunk at a time, and
+   these chunks may be given in any order. Chunks are required unless
+   otherwise specified.
+
+CHUNK DATA:
+
+   (This section intentionally left incomplete.)
+
+TRAILER:
+
+   20-byte SHA1-checksum of the above contents.
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 05/23] midx: write header information to lockfile

2018-07-12 Thread Derrick Stolee

As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.

As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.

Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 50 +
 t/t5319-multi-pack-index.sh |  4 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 32468db1a2..f85f2d334d 100644
--- a/midx.c
+++ b/midx.c
@@ -1,7 +1,57 @@
 #include "cache.h"
+#include "csum-file.h"
+#include "lockfile.h"
 #include "midx.h"
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1
+#define MIDX_HEADER_SIZE 12
+
+static char *get_midx_filename(const char *object_dir)
+{
+   return xstrfmt("%s/pack/multi-pack-index", object_dir);
+}
+
+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   unsigned char byte_values[4];
+
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;
+   byte_values[2] = num_chunks;
+   byte_values[3] = 0; /* unused */
+   hashwrite(f, byte_values, sizeof(byte_values));
+   hashwrite_be32(f, num_packs);
+
+   return MIDX_HEADER_SIZE;
+}
+
 int write_midx_file(const char *object_dir)
 {
+   unsigned char num_chunks = 0;
+   char *midx_name;
+   struct hashfile *f = NULL;
+   struct lock_file lk;
+
+   midx_name = get_midx_filename(object_dir);
+   if (safe_create_leading_directories(midx_name)) {
+   UNLEAK(midx_name);
+   die_errno(_("unable to create leading directories of %s"),
+ midx_name);
+   }
+
+   hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
+   f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+   FREE_AND_NULL(midx_name);
+
+   write_midx_header(f, num_chunks, 0);
+
+   finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+   commit_lock_file(&lk);
+
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ec3ddbe79c..50e80f8f2c 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,7 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 test_expect_success 'write midx with no packs' '
-   git multi-pack-index --object-dir=. write
+   test_when_finished rm -f pack/multi-pack-index &&
+   git multi-pack-index --object-dir=. write &&
+   test_path_is_file pack/multi-pack-index
 '
 
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 04/23] multi-pack-index: add 'write' verb

2018-07-12 Thread Derrick Stolee

In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 22 +-
 Makefile   |  1 +
 builtin/multi-pack-index.c | 17 +++--
 midx.c |  7 +++
 midx.h |  6 ++
 t/t5319-multi-pack-index.sh| 10 ++
 6 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100755 t/t5319-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index ec9982cbfc..a62af1caca 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir=]
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -23,6 +23,26 @@ OPTIONS
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.
 
+write::
+   When given as the verb, write a new MIDX file to
+   `/packs/multi-pack-index`.
+
+
+EXAMPLES
+
+
+* Write a MIDX file for the packfiles in the current .git folder.
++
+---
+$ git multi-pack-index write
+---
+
+* Write a MIDX file for the packfiles in an alternate object store.
++
+---
+$ git multi-pack-index --object-dir  write
+---
+
 
 SEE ALSO
 
diff --git a/Makefile b/Makefile
index 54610875ec..f5636c711d 100644
--- a/Makefile
+++ b/Makefile
@@ -890,6 +890,7 @@ LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
+LIB_OBJS += midx.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 3161ddae86..6a7aa00cf2 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -2,9 +2,10 @@
 #include "cache.h"
 #include "config.h"
 #include "parse-options.h"
+#include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=]"),
+   N_("git multi-pack-index [--object-dir=] write"),
NULL
 };
 
@@ -30,5 +31,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
if (!opts.object_dir)
opts.object_dir = get_object_directory();
 
-   return 0;
+   if (argc == 0)
+   goto usage;
+
+   if (!strcmp(argv[0], "write")) {
+   if (argc > 1)
+   goto usage;
+
+   return write_midx_file(opts.object_dir);
+   }
+
+usage:
+   usage_with_options(builtin_multi_pack_index_usage,
+  builtin_multi_pack_index_options);
 }
diff --git a/midx.c b/midx.c
new file mode 100644
index 00..32468db1a2
--- /dev/null
+++ b/midx.c
@@ -0,0 +1,7 @@
+#include "cache.h"
+#include "midx.h"
+
+int write_midx_file(const char *object_dir)
+{
+   return 0;
+}
diff --git a/midx.h b/midx.h
new file mode 100644
index 00..dbdbe9f873
--- /dev/null
+++ b/midx.h
@@ -0,0 +1,6 @@
+#ifndef __MIDX_H__
+#define __MIDX_H__
+
+int write_midx_file(const char *object_dir);
+
+#endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
new file mode 100755
index 00..ec3ddbe79c
--- /dev/null
+++ b/t/t5319-multi-pack-index.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+test_description='multi-pack-indexes'
+. ./test-lib.sh
+
+test_expect_success 'write midx with no packs' '
+   git multi-pack-index --object-dir=. write
+'
+
+test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 06/23] multi-pack-index: load into memory

2018-07-12 Thread Derrick Stolee

Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.

Signed-off-by: Derrick Stolee 
---
 Makefile|  1 +
 midx.c  | 79 +
 midx.h  | 18 +
 object-store.h  |  2 +
 t/helper/test-read-midx.c   | 31 +++
 t/helper/test-tool.c|  1 +
 t/helper/test-tool.h|  1 +
 t/t5319-multi-pack-index.sh | 11 +-
 8 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-read-midx.c

diff --git a/Makefile b/Makefile
index f5636c711d..0b801d1b16 100644
--- a/Makefile
+++ b/Makefile
@@ -717,6 +717,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-read-cache.o
+TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
diff --git a/midx.c b/midx.c
index f85f2d334d..c1ff5acf85 100644
--- a/midx.c
+++ b/midx.c
@@ -1,18 +1,97 @@
 #include "cache.h"
 #include "csum-file.h"
 #include "lockfile.h"
+#include "object-store.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
+#define MIDX_BYTE_FILE_VERSION 4
+#define MIDX_BYTE_HASH_VERSION 5
+#define MIDX_BYTE_NUM_CHUNKS 6
+#define MIDX_BYTE_NUM_PACKS 8
 #define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
+#define MIDX_HASH_LEN 20
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+struct multi_pack_index *load_multi_pack_index(const char *object_dir)
+{
+   struct multi_pack_index *m = NULL;
+   int fd;
+   struct stat st;
+   size_t midx_size;
+   void *midx_map = NULL;
+   uint32_t hash_version;
+   char *midx_name = get_midx_filename(object_dir);
+
+   fd = git_open(midx_name);
+
+   if (fd < 0)
+   goto cleanup_fail;
+   if (fstat(fd, &st)) {
+   error_errno(_("failed to read %s"), midx_name);
+   goto cleanup_fail;
+   }
+
+   midx_size = xsize_t(st.st_size);
+
+   if (midx_size < MIDX_MIN_SIZE) {
+   error(_("multi-pack-index file %s is too small"), midx_name);
+   goto cleanup_fail;
+   }
+
+   FREE_AND_NULL(midx_name);
+
+   midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+   FLEX_ALLOC_MEM(m, object_dir, object_dir, strlen(object_dir));
+   m->fd = fd;
+   m->data = midx_map;
+   m->data_len = midx_size;
+
+   m->signature = get_be32(m->data);
+   if (m->signature != MIDX_SIGNATURE) {
+   error(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
+ m->signature, MIDX_SIGNATURE);
+   goto cleanup_fail;
+   }
+
+   m->version = m->data[MIDX_BYTE_FILE_VERSION];
+   if (m->version != MIDX_VERSION) {
+   error(_("multi-pack-index version %d not recognized"),
+ m->version);
+   goto cleanup_fail;
+   }
+
+   hash_version = m->data[MIDX_BYTE_HASH_VERSION];
+   if (hash_version != MIDX_HASH_VERSION) {
+   error(_("hash version %u does not match"), hash_version);
+   goto cleanup_fail;
+   }
+   m->hash_len = MIDX_HASH_LEN;
+
+   m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
+
+   m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
+
+   return m;
+
+cleanup_fail:
+   free(m);
+   free(midx_name);
+   if (midx_map)
+   munmap(midx_map, midx_size);
+   if (0 <= fd)
+   close(fd);
+   return NULL;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index dbdbe9f873..0e05051bca 100644
--- a/midx.h
+++ b/midx.h
@@ -1,6 +1,24 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+struct multi_pack_index {
+   int fd;
+
+   const unsigned char *data;
+   size_t data_len;
+
+   uint32_t signature;
+   unsigned char version;
+   unsigned char hash_len;
+   unsigned char num_chunks;
+   uint32_t num_packs;
+   uint32_t num_objects;
+
+   char object_dir[FLEX_ARRAY];
+};
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+
 int write_midx_file(const char *object_dir);
 
 #endif
diff --git a/object-store.h b/object-store.h
index d683112fd7..13a766aea8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -84,6 +84,8 @@ struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 };
 
+struct multi_pack_inde

[PATCH v4 07/23] t5319: expand test data

2018-07-12 Thread Derrick Stolee

As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 84 +
 1 file changed, 84 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 506bd8abb8..1240127ec1 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -18,4 +18,88 @@ test_expect_success 'write midx with no packs' '
midx_read_expect
 '
 
+generate_objects () {
+   i=$1
+   iii=$(printf '%03i' $i)
+   {
+   test-tool genrandom "bar" 200 &&
+   test-tool genrandom "baz $iii" 50
+   } >wide_delta_$iii &&
+   {
+   test-tool genrandom "foo"$i 100 &&
+   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
+   test-tool genrandom "foo"$(( $i + 2 )) 100
+   } >deep_delta_$iii &&
+   {
+   echo $iii &&
+   test-tool genrandom "$iii" 8192
+   } >file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
+}
+
+commit_and_list_objects () {
+   {
+   echo 101 &&
+   test-tool genrandom 100 8192;
+   } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list &&
+   git reset --hard $commit
+}
+
+test_expect_success 'create objects' '
+   test_commit initial &&
+   for i in $(test_seq 1 5)
+   do
+   generate_objects $i
+   done &&
+   commit_and_list_objects
+'
+
+test_expect_success 'write midx with one v1 pack' '
+   pack=$(git pack-objects --index-version=1 pack/test

[PATCH v4 11/23] midx: read pack names into array

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c  | 17 +
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  5 +
 t/t5319-multi-pack-index.sh | 17 -
 4 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index ca7a32bf95..fcdf6553ce 100644
--- a/midx.c
+++ b/midx.c
@@ -37,6 +37,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
uint32_t i;
+   const char *cur_pack_name;
 
fd = git_open(midx_name);
 
@@ -115,6 +116,22 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
 
+   m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+
+   cur_pack_name = (const char *)m->chunk_pack_names;
+   for (i = 0; i < m->num_packs; i++) {
+   m->pack_names[i] = cur_pack_name;
+
+   cur_pack_name += strlen(cur_pack_name) + 1;
+
+   if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
+   error(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
+ m->pack_names[i - 1],
+ m->pack_names[i]);
+   goto cleanup_fail;
+   }
+   }
+
return m;
 
 cleanup_fail:
diff --git a/midx.h b/midx.h
index 38af01fa3b..17b56172e3 100644
--- a/midx.h
+++ b/midx.h
@@ -16,6 +16,7 @@ struct multi_pack_index {
 
const unsigned char *chunk_pack_names;
 
+   const char **pack_names;
char object_dir[FLEX_ARRAY];
 };
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 3f2d2cfa78..76a60d7882 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -6,6 +6,7 @@
 
 static int read_midx_file(const char *object_dir)
 {
+   uint32_t i;
struct multi_pack_index *m = load_multi_pack_index(object_dir);
 
if (!m)
@@ -24,6 +25,10 @@ static int read_midx_file(const char *object_dir)
 
printf("\n");
 
+   printf("packs:\n");
+   for (i = 0; i < m->num_packs; i++)
+   printf("%s\n", m->pack_names[i]);
+
printf("object-dir: %s\n", m->object_dir);
 
return 0;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 7512d55c92..e8da082c64 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -5,11 +5,18 @@ test_description='multi-pack-indexes'
 
 midx_read_expect () {
NUM_PACKS=$1
-   cat >expect <<-EOF
-   header: 4d494458 1 1 $NUM_PACKS
-   chunks: pack-names
-   object-dir: .
-   EOF
+   {
+   cat <<-EOF &&
+   header: 4d494458 1 1 $NUM_PACKS
+   chunks: pack-names
+   packs:
+   EOF
+   if test $NUM_PACKS -ge 1
+   then
+   ls pack/ | grep idx | sort
+   fi &&
+   printf "object-dir: .\n"
+   } >expect &&
test-tool read-midx . >actual &&
test_cmp expect actual
 }
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 10/23] multi-pack-index: write pack names in chunk

2018-07-12 Thread Derrick Stolee

The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |   6 +
 midx.c  | 174 +++-
 midx.h  |   2 +
 t/helper/test-read-midx.c   |   7 +
 t/t5319-multi-pack-index.sh |   3 +-
 5 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index e060e693f4..6c5a77475f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -296,6 +296,12 @@ CHUNK LOOKUP:
 
 CHUNK DATA:
 
+   Packfile Names (ID: {'P', 'N', 'A', 'M'})
+   Stores the packfile names as concatenated, null-terminated strings.
+   Packfiles must be listed in lexicographic order for fast lookups by
+   name. This is the only chunk not guaranteed to be a multiple of four
+   bytes in length, so should be the last chunk for alignment reasons.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index f742d7ccd7..ca7a32bf95 100644
--- a/midx.c
+++ b/midx.c
@@ -17,6 +17,11 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
+#define MIDX_MAX_CHUNKS 1
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -31,6 +36,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
void *midx_map = NULL;
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
+   uint32_t i;
 
fd = git_open(midx_name);
 
@@ -82,6 +88,33 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
 
+   for (i = 0; i < m->num_chunks; i++) {
+   uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
+MIDX_CHUNKLOOKUP_WIDTH * i);
+   uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 
+
+MIDX_CHUNKLOOKUP_WIDTH * i);
+
+   switch (chunk_id) {
+   case MIDX_CHUNKID_PACKNAMES:
+   m->chunk_pack_names = m->data + chunk_offset;
+   break;
+
+   case 0:
+   die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
+   break;
+
+   default:
+   /*
+* Do nothing on unrecognized chunks, allowing 
future
+* extensions to add optional chunks.
+*/
+   break;
+   }
+   }
+
+   if (!m->chunk_pack_names)
+   die(_("multi-pack-index missing required pack-name chunk"));
+
return m;
 
 cleanup_fail:
@@ -113,8 +146,11 @@ static size_t write_midx_header(struct hashfile *f,
 
 struct pack_list {
struct packed_git **list;
+   char **names;
uint32_t nr;
uint32_t alloc_list;
+   uint32_t alloc_names;
+   size_t pack_name_concat_len;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -124,6 +160,7 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
 
if (ends_with(file_name, ".idx")) {
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
packs->list[packs->nr] = add_packed_git(full_path,
full_path_len,
@@ -134,18 +171,89 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,

[PATCH v4 15/23] midx: write object offsets

2018-07-12 Thread Derrick Stolee

The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  15 +++-
 midx.c  | 100 +++-
 midx.h  |   2 +
 t/helper/test-read-midx.c   |   4 +
 t/t5319-multi-pack-index.sh |  49 +---
 5 files changed, 155 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 3215f7bfcd..cab5bdd2ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -311,7 +311,20 @@ CHUNK DATA:
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
 
-   (This section intentionally left incomplete.)
+   Object Offsets (ID: {'O', 'O', 'F', 'F'})
+   Stores two 4-byte values for every object.
+   1: The pack-int-id for the pack storing this object.
+   2: The offset within the pack.
+   If all offsets are less than 2^31, then the large offset chunk
+   will not exist and offsets are stored as in IDX v1.
+   If there is at least one offset value larger than 2^32-1, then
+   the large offset chunk must exist. If the large offset chunk
+   exists and the 31st bit is on, then removing that bit reveals
+   the row in the large offsets containing the 8-byte offset of
+   this object.
+
+   [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+   8-byte offsets into large packfiles.
 
 TRAILER:
 
diff --git a/midx.c b/midx.c
index 7a954eb0cd..e83110ae92 100644
--- a/midx.c
+++ b/midx.c
@@ -18,13 +18,18 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 3
+#define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
 #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -112,6 +117,14 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_oid_lookup = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OBJECTOFFSETS:
+   m->chunk_object_offsets = m->data + 
chunk_offset;
+   break;
+
+   case MIDX_CHUNKID_LARGEOFFSETS:
+   m->chunk_large_offsets = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -131,6 +144,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
+   if (!m->chunk_object_offsets)
+   die(_("multi-pack-index missing required object offsets 
chunk"));
 
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
@@ -454,6 +469,56 @@ static size_t write_midx_oid_lookup(struct hashfile *f, 
unsigned char hash_len,
return written;
 }
 
+static size_t write_midx_object_offsets(struct hashfile *f, int 
large_offset_needed,
+   struct pack_midx_entry *objects, 
uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i, nr_large_offset = 0;
+   size_t written = 0;
+
+   for (i

[PATCH v4 14/23] midx: write object id fanout chunk

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  5 +++
 midx.c  | 53 +++--
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  4 +-
 t/t5319-multi-pack-index.sh | 16 
 5 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 78ee0489c6..3215f7bfcd 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,11 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Fanout (ID: {'O', 'I', 'D', 'F'})
+   The ith entry, F[i], stores the number of OIDs with first
+   byte at most i. Thus F[255] stores the total
+   number of objects.
+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
diff --git a/midx.c b/midx.c
index 3f113e1beb..7a954eb0cd 100644
--- a/midx.c
+++ b/midx.c
@@ -18,11 +18,13 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 2
+#define MIDX_MAX_CHUNKS 3
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -102,6 +104,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDFANOUT:
+   m->chunk_oid_fanout = (uint32_t *)(m->data + 
chunk_offset);
+   break;
+
case MIDX_CHUNKID_OIDLOOKUP:
m->chunk_oid_lookup = m->data + chunk_offset;
break;
@@ -121,9 +127,13 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_fanout)
+   die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
 
+   m->num_objects = ntohl(m->chunk_oid_fanout[255]);
+
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
cur_pack_name = (const char *)m->chunk_pack_names;
@@ -389,6 +399,35 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_fanout(struct hashfile *f,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   struct pack_midx_entry *last = objects + nr_objects;
+   uint32_t count = 0;
+   uint32_t i;
+
+   /*
+   * Write the first-level table (the list is sorted,
+   * but we use a 256-entry lookup to be able to avoid
+   * having to do eight extra binary search iterations).
+   */
+   for (i = 0; i < 256; i++) {
+   struct pack_midx_entry *next = list;
+
+   while (next < last && next->oid.hash[0] == i) {
+   count++;
+   next++;
+   }
+
+   hashwrite_be32(f, count);
+   list = next;
+   }
+
+   return MIDX_CHUNK_FANOUT_SIZE;
+}
+
 static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
struct pack_midx_entry *objects,
uint32_t nr_objects)
@@ -461,7 +500,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 2;
+   num_chunks = 3;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -469,9 +508,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
MIDX_CHUNK_FANOUT_SIZE;
+

[PATCH v4 18/23] midx: use midx in abbreviation calculations

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 +
 midx.h  |  3 +++
 packfile.c  |  6 +
 packfile.h  |  1 +
 sha1-name.c | 70 +
 5 files changed, 91 insertions(+)

diff --git a/midx.c b/midx.c
index 182535933c..4e014ff6e3 100644
--- a/midx.c
+++ b/midx.c
@@ -203,6 +203,17 @@ int bsearch_midx(const struct object_id *oid, struct 
multi_pack_index *m, uint32
MIDX_HASH_LEN, result);
 }
 
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n)
+{
+   if (n >= m->num_objects)
+   return NULL;
+
+   hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n);
+   return oid;
+}
+
 static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
const unsigned char *offset_data;
diff --git a/midx.h b/midx.h
index 377838c9ca..1b976df873 100644
--- a/midx.h
+++ b/midx.h
@@ -31,6 +31,9 @@ struct multi_pack_index {
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n);
 int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
diff --git a/packfile.c b/packfile.c
index bc763d91b9..c0eb5ac885 100644
--- a/packfile.c
+++ b/packfile.c
@@ -961,6 +961,12 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
 }
 
+struct multi_pack_index *get_multi_pack_index(struct repository *r)
+{
+   prepare_packed_git(r);
+   return r->objects->multi_pack_index;
+}
+
 struct list_head *get_packed_git_mru(struct repository *r)
 {
prepare_packed_git(r);
diff --git a/packfile.h b/packfile.h
index b0eed44c0b..046280caf3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -45,6 +45,7 @@ extern void install_packed_git(struct repository *r, struct 
packed_git *pack);
 
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct repository *r);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..7dc71201e6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "repository.h"
+#include "midx.h"
 
 static int get_oid_oneline(const char *, struct object_id *, struct 
commit_list *);
 
@@ -149,6 +150,32 @@ static int match_sha(unsigned len, const unsigned char *a, 
const unsigned char *
return 1;
 }
 
+static void unique_in_midx(struct multi_pack_index *m,
+  struct disambiguate_state *ds)
+{
+   uint32_t num, i, first = 0;
+   const struct object_id *current = NULL;
+   num = m->num_objects;
+
+   if (!num)
+   return;
+
+   bsearch_midx(&ds->bin_pfx, m, &first);
+
+   /*
+* At this point, "first" is the location of the lowest object
+* with an object name that could match "bin_pfx".  See if we have
+* 0, 1 or more objects that actually match(es).
+*/
+   for (i = first; i < num && !ds->ambiguous; i++) {
+   struct object_id oid;
+   current = nth_midxed_object_oid(&oid, m, i);
+   if (!match_sha(ds->len, ds->bin_pfx.hash, current->hash))
+   break;
+   update_candidates(ds, current);
+   }
+}
+
 static void unique_in_pack(struct packed_git *p,
   struct disambiguate_state *ds)
 {
@@ -177,8 +204,12 @@ static void unique_in_pack(struct packed_git *p,
 
 static void find_short_packed_object(struct disambiguate_state *ds)
 {
+   struct multi_pack_index *m;
struct packed_git *p;
 
+   for (m = get_multi_pack_index(the_repository); m && !ds->ambiguous;
+m = m->next)
+   unique_in_midx(m, ds);
for (p = get_packed_git(the_repository); p && !ds->ambiguous;
 p = p->next)
unique_in_pack(p, ds);
@@ -527,6 +558,42 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
 }
 
+static void find_abbrev_len_for_midx(struct multi_pack_index *m,
+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, first = 0;
+   struct object_id oid;
+   const struct object_id *mad_oid;
+
+   if (!m->num_objects)
+   return;
+
+   num = m->num_objects;
+   mad_oid = mad->oid;
+   m

[PATCH v4 08/23] packfile: generalize pack directory list

2018-07-12 Thread Derrick Stolee

In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().

Signed-off-by: Derrick Stolee 
---
 packfile.c | 101 +
 packfile.h |   6 
 2 files changed, 69 insertions(+), 38 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7cd45aa4b2..ee1ab9b804 100644
--- a/packfile.c
+++ b/packfile.c
@@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list)
report_helper(list, seen_bits, first, list->nr);
 }
 
-static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+void for_each_file_in_pack_dir(const char *objdir,
+  each_file_in_pack_dir_fn fn,
+  void *data)
 {
struct strbuf path = STRBUF_INIT;
size_t dirnamelen;
DIR *dir;
struct dirent *de;
-   struct string_list garbage = STRING_LIST_INIT_DUP;
 
strbuf_addstr(&path, objdir);
strbuf_addstr(&path, "/pack");
@@ -759,53 +760,77 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
strbuf_addch(&path, '/');
dirnamelen = path.len;
while ((de = readdir(dir)) != NULL) {
-   struct packed_git *p;
-   size_t base_len;
-
if (is_dot_or_dotdot(de->d_name))
continue;
 
strbuf_setlen(&path, dirnamelen);
strbuf_addstr(&path, de->d_name);
 
-   base_len = path.len;
-   if (strip_suffix_mem(path.buf, &base_len, ".idx")) {
-   /* Don't reopen a pack we already have. */
-   for (p = r->objects->packed_git; p;
-p = p->next) {
-   size_t len;
-   if (strip_suffix(p->pack_name, ".pack", &len) &&
-   len == base_len &&
-   !memcmp(p->pack_name, path.buf, len))
-   break;
-   }
-   if (p == NULL &&
-   /*
-* See if it really is a valid .idx file with
-* corresponding .pack file that we can map.
-*/
-   (p = add_packed_git(path.buf, path.len, local)) != 
NULL)
-   install_packed_git(r, p);
-   }
-
-   if (!report_garbage)
-   continue;
-
-   if (ends_with(de->d_name, ".idx") ||
-   ends_with(de->d_name, ".pack") ||
-   ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep") ||
-   ends_with(de->d_name, ".promisor"))
-   string_list_append(&garbage, path.buf);
-   else
-   report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
+   fn(path.buf, path.len, de->d_name, data);
}
+
closedir(dir);
-   report_pack_garbage(&garbage);
-   string_list_clear(&garbage, 0);
strbuf_release(&path);
 }
 
+struct prepare_pack_data {
+   struct repository *r;
+   struct string_list *garbage;
+   int local;
+};
+
+static void prepare_pack(const char *full_name, size_t full_name_len,
+const char *file_name, void *_data)
+{
+   struct prepare_pack_data *data = (struct prepare_pack_data *)_data;
+   struct packed_git *p;
+   size_t base_len = full_name_len;
+
+   if (strip_suffix_mem(full_name, &base_len, ".idx")) {
+   /* Don't reopen a pack we already have. */
+   for (p = data->r->objects->packed_git; p; p = p->next) {
+   size_t len;
+   if (strip_suffix(p->pack_name, ".pack", &len) &&
+   len == base_len &&
+   !memcmp(p->pack_name, full_name, len))
+   break;
+   }
+
+   if (!p) {
+   p = add_packed_git(full_name, full_name_len, 
data->local);
+   if (p)
+   install_packed_git(data->r, p);
+   }
+   }
+
+   if (!report_garbage)
+   return;
+
+   if (ends_with(file_name, ".idx") ||
+   ends_with(file_name, ".pack") ||
+   ends_with(file_name, ".bitmap") ||
+   ends_with(file_name, ".keep") ||
+   ends_with(file_name, ".promisor"))
+   string_list_append(data->garbage, full_name);
+   else
+   report_garbage(PACKDIR_FILE_GARBAGE, full_name);
+}
+
+static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+{
+   struct prepare_pack_data data;
+   struct string_list garbage = STRING_LI

[PATCH v4 17/23] midx: read objects from multi-pack-index

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c | 91 +-
 midx.h |  3 ++
 packfile.c |  8 -
 3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 4090cf4ca4..182535933c 100644
--- a/midx.c
+++ b/midx.c
@@ -5,7 +5,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
-#include "packfile.h"
+#include "sha1-lookup.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -151,6 +151,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+   m->packs = xcalloc(m->num_packs, sizeof(*m->packs));
 
cur_pack_name = (const char *)m->chunk_pack_names;
for (i = 0; i < m->num_packs; i++) {
@@ -178,6 +179,94 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+{
+   struct strbuf pack_name = STRBUF_INIT;
+
+   if (pack_int_id >= m->num_packs)
+   BUG("bad pack-int-id");
+
+   if (m->packs[pack_int_id])
+   return 0;
+
+   strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
+   m->pack_names[pack_int_id]);
+
+   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1);
+   strbuf_release(&pack_name);
+   return !m->packs[pack_int_id];
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result)
+{
+   return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
+   MIDX_HASH_LEN, result);
+}
+
+static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+{
+   const unsigned char *offset_data;
+   uint32_t offset32;
+
+   offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
+   offset32 = get_be32(offset_data + sizeof(uint32_t));
+
+   if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
+   if (sizeof(offset32) < sizeof(uint64_t))
+   die(_("multi-pack-index stores a 64-bit offset, but 
off_t is too small"));
+
+   offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
+   return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * 
offset32);
+   }
+
+   return offset32;
+}
+
+static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t 
pos)
+{
+   return get_be32(m->chunk_object_offsets + pos * 
MIDX_CHUNK_OFFSET_WIDTH);
+}
+
+static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry 
*e, uint32_t pos)
+{
+   uint32_t pack_int_id;
+   struct packed_git *p;
+
+   if (pos >= m->num_objects)
+   return 0;
+
+   pack_int_id = nth_midxed_pack_int_id(m, pos);
+
+   if (prepare_midx_pack(m, pack_int_id))
+   die(_("error preparing packfile from multi-pack-index"));
+   p = m->packs[pack_int_id];
+
+   /*
+   * We are about to tell the caller where they can locate the
+   * requested object.  We better make sure the packfile is
+   * still here and can be accessed before supplying that
+   * answer, as it may have been deleted since the MIDX was
+   * loaded!
+   */
+   if (!is_pack_valid(p))
+   return 0;
+
+   e->offset = nth_midxed_offset(m, pos);
+   e->p = p;
+
+   return 1;
+}
+
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m)
+{
+   uint32_t pos;
+
+   if (!bsearch_midx(oid, m, &pos))
+   return 0;
+
+   return nth_midxed_pack_entry(m, e, pos);
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
diff --git a/midx.h b/midx.h
index 9bcfc82d2e..377838c9ca 100644
--- a/midx.h
+++ b/midx.h
@@ -25,10 +25,13 @@ struct multi_pack_index {
const unsigned char *chunk_large_offsets;
 
const char **pack_names;
+   struct packed_git **packs;
char object_dir[FLEX_ARRAY];
 };
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
diff --git a/packfile.c b/packfile.c
index 5d4493dbf4..bc763d91b9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1902,11 +1902,17 @@ static int fill_pack_entry(const struct object_id *oid,
 int find_pack_entry(struct repository *r, const struct object_id *oid, struct 
pack_entry *e)
 {
struct list_head *pos;
+   stru

[PATCH v4 12/23] midx: sort and deduplicate objects from packfiles

2018-07-12 Thread Derrick Stolee

Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee 
---
 midx.c | 128 +
 packfile.c |  17 +++
 packfile.h |   2 +
 3 files changed, 147 insertions(+)

diff --git a/midx.c b/midx.c
index fcdf6553ce..29f8de5ee6 100644
--- a/midx.c
+++ b/midx.c
@@ -4,6 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "packfile.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -182,12 +183,21 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
packs->list[packs->nr] = add_packed_git(full_path,
full_path_len,
0);
+
if (!packs->list[packs->nr]) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
 
+   if (open_pack_index(packs->list[packs->nr])) {
+   warning(_("failed to open pack-index '%s'"),
+   full_path);
+   close_pack(packs->list[packs->nr]);
+   FREE_AND_NULL(packs->list[packs->nr]);
+   return;
+   }
+
packs->names[packs->nr] = xstrdup(file_name);
packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
@@ -228,6 +238,119 @@ static void sort_packs_by_name(char **pack_names, 
uint32_t nr_packs, uint32_t *p
free(pairs);
 }
 
+struct pack_midx_entry {
+   struct object_id oid;
+   uint32_t pack_int_id;
+   time_t pack_mtime;
+   uint64_t offset;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+   const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+   const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+   int cmp = oidcmp(&a->oid, &b->oid);
+
+   if (cmp)
+   return cmp;
+
+   if (a->pack_mtime > b->pack_mtime)
+   return -1;
+   else if (a->pack_mtime < b->pack_mtime)
+   return 1;
+
+   return a->pack_int_id - b->pack_int_id;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+   struct packed_git *p,
+   uint32_t cur_object,
+   struct pack_midx_entry *entry)
+{
+   if (!nth_packed_object_oid(&entry->oid, p, cur_object))
+   die(_("failed to locate object %d in packfile"), cur_object);
+
+   entry->pack_int_id = pack_int_id;
+   entry->pack_mtime = p->mtime;
+
+   entry->offset = nth_packed_object_offset(p, cur_object);
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+ uint32_t *perm,
+ uint32_t nr_packs,
+ uint32_t *nr_objects)
+{
+   uint32_t cur_fanout, cur_pack, cur_object;
+   uint32_t alloc_fanout, alloc_objects, total_objects = 0;
+   struct pack_midx_entry *entries_by_fanout = NULL;
+   struct pack_midx_entry *deduplicated_entries = NULL;
+
+   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   total_objects += p[cur_pack]->num_objects;
+
+   /*
+* As we de-duplicate by fanout value, we expect the fanout
+* slices to be evenly distributed, with some noise. Hence,
+* allocate slightly more than one 256th.
+*/
+   alloc_objects = a

[PATCH v4 09/23] multi-pack-index: read packfile list

2018-07-12 Thread Derrick Stolee

When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().

Signed-off-by: Derrick Stolee 
---
 midx.c  | 48 -
 t/t5319-multi-pack-index.sh | 15 ++--
 2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/midx.c b/midx.c
index c1ff5acf85..f742d7ccd7 100644
--- a/midx.c
+++ b/midx.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "csum-file.h"
+#include "dir.h"
 #include "lockfile.h"
+#include "packfile.h"
 #include "object-store.h"
 #include "midx.h"
 
@@ -109,12 +111,41 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_list {
+   struct packed_git **list;
+   uint32_t nr;
+   uint32_t alloc_list;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+const char *file_name, void *data)
+{
+   struct pack_list *packs = (struct pack_list *)data;
+
+   if (ends_with(file_name, ".idx")) {
+   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+
+   packs->list[packs->nr] = add_packed_git(full_path,
+   full_path_len,
+   0);
+   if (!packs->list[packs->nr]) {
+   warning(_("failed to add packfile '%s'"),
+   full_path);
+   return;
+   }
+
+   packs->nr++;
+   }
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char num_chunks = 0;
char *midx_name;
+   uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
+   struct pack_list packs;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -123,14 +154,29 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
+   packs.nr = 0;
+   packs.alloc_list = 16;
+   packs.list = NULL;
+   ALLOC_ARRAY(packs.list, packs.alloc_list);
+
+   for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
+
hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
 
-   write_midx_header(f, num_chunks, 0);
+   write_midx_header(f, num_chunks, packs.nr);
 
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
commit_lock_file(&lk);
 
+   for (i = 0; i < packs.nr; i++) {
+   if (packs.list[i]) {
+   close_pack(packs.list[i]);
+   free(packs.list[i]);
+   }
+   }
+
+   free(packs.list);
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1240127ec1..54117a7f49 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,8 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 midx_read_expect () {
+   NUM_PACKS=$1
cat >expect <<-EOF
-   header: 4d494458 1 0 0
+   header: 4d494458 1 0 $NUM_PACKS
object-dir: .
EOF
test-tool read-midx . >actual &&
@@ -15,7 +16,7 @@ midx_read_expect () {
 test_expect_success 'write midx with no packs' '
test_when_finished rm -f pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
-   midx_read_expect
+   midx_read_expect 0
 '
 
 generate_objects () {
@@ -65,13 +66,13 @@ test_expect_success 'write midx with one v1 pack' '
pack=$(git pack-objects --index-version=1 pack/test

[PATCH v4 13/23] midx: write object ids in a chunk

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  4 +++
 midx.c  | 47 +++--
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  2 ++
 t/t5319-multi-pack-index.sh |  4 +--
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 6c5a77475f..78ee0489c6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,10 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Lookup (ID: {'O', 'I', 'D', 'L'})
+   The OIDs for all objects in the MIDX are stored in lexicographic
+   order in this chunk.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index 29f8de5ee6..3f113e1beb 100644
--- a/midx.c
+++ b/midx.c
@@ -18,9 +18,10 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 1
+#define MIDX_MAX_CHUNKS 2
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 
 static char *get_midx_filename(const char *object_dir)
@@ -101,6 +102,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   m->chunk_oid_lookup = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -116,6 +121,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_lookup)
+   die(_("multi-pack-index missing required OID lookup chunk"));
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
@@ -382,6 +389,32 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i;
+   size_t written = 0;
+
+   for (i = 0; i < nr_objects; i++) {
+   struct pack_midx_entry *obj = list++;
+
+   if (i < nr_objects - 1) {
+   struct pack_midx_entry *next = list;
+   if (oidcmp(&obj->oid, &next->oid) >= 0)
+   BUG("OIDs not in order: %s >= %s",
+   oid_to_hex(&obj->oid),
+   oid_to_hex(&next->oid));
+   }
+
+   hashwrite(f, obj->oid.hash, (int)hash_len);
+   written += hash_len;
+   }
+
+   return written;
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char cur_chunk, num_chunks = 0;
@@ -428,7 +461,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 1;
+   num_chunks = 2;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -436,9 +469,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = 0;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = 0;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * 
MIDX_HASH_LEN;
+
for (i = 0; i <= num_chunks; i++) {
if (i && chunk_offsets[i] < chunk_offsets[i - 1])
BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64,
@@ -468,6 +505,10 @@ int write_midx_file(const char *object_dir)
written += write_midx_pack_names(f, 
packs.names, packs.nr);
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   written += write_midx_oid_lookup(f, 
MIDX_HASH_LEN, entries, nr_entries);
+   break;
+
default:

[PATCH v4 16/23] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee

The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Read this config setting in the new prepare_multi_pack_index_one()
which is called during prepare_packed_git(). This check is run once
per repository.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads. Currently, these
would only catch an error in the prepare_multi_pack_index_one(), but
with later commits will catch errors in object lookups, abbreviations,
and approximate object counts.

Signed-off-by: Derrick Stolee 

midx: prepare midxed_git struct

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt|  5 
 midx.c  | 25 ++
 midx.h  |  5 
 object-store.h  |  7 +
 packfile.c  |  6 -
 t/t5319-multi-pack-index.sh | 51 +++--
 6 files changed, 85 insertions(+), 14 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..25f817ca42 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,11 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
 
+core.multiPackIndex::
+   Use the multi-pack-index file to track multiple packfiles using a
+   single index. See link:technical/multi-pack-index.html[the
+   multi-pack-index design document].
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/midx.c b/midx.c
index e83110ae92..4090cf4ca4 100644
--- a/midx.c
+++ b/midx.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "config.h"
 #include "csum-file.h"
 #include "dir.h"
 #include "lockfile.h"
@@ -177,6 +178,30 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
+{
+   struct multi_pack_index *m = r->objects->multi_pack_index;
+   struct multi_pack_index *m_search;
+   int config_value;
+
+   if (repo_config_get_bool(r, "core.multipackindex", &config_value) ||
+   !config_value)
+   return 0;
+
+   for (m_search = m; m_search; m_search = m_search->next)
+   if (!strcmp(object_dir, m_search->object_dir))
+   return 1;
+
+   r->objects->multi_pack_index = load_multi_pack_index(object_dir);
+
+   if (r->objects->multi_pack_index) {
+   r->objects->multi_pack_index->next = m;
+   return 1;
+   }
+
+   return 0;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index e15966272f..9bcfc82d2e 100644
--- a/midx.h
+++ b/midx.h
@@ -1,7 +1,11 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+#include "repository.h"
+
 struct multi_pack_index {
+   struct multi_pack_index *next;
+
int fd;
 
const unsigned char *data;
@@ -25,6 +29,7 @@ struct multi_pack_index {
 };
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
 
diff --git a/object-store.h b/object-store.h
index 13a766aea8..c2b162489a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -105,6 +105,13 @@ struct raw_object_store {
 */
struct oidmap *replace_map;
 
+   /*
+* private data
+*
+* should only be accessed directly by packfile.c and midx.c
+*/
+   struct multi_pack_index *multi_pack_index;
+
/*
 * private data
 *
diff --git a/packfile.c b/packfile.c
index 3d652212c6..5d4493dbf4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -15,6 +15,7 @@
 #include "tree-walk.h"
 #include "tree.h"
 #include "object-store.h"
+#include "midx.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -935,10 +936,13 @@ static void prepare_packed_git(struct repository *r)
 
if (r->objects->packed_git_initialized)
return;
+   prepare_multi_pack_index_one(r, r->objects->objectdir);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
-   for (alt = r->objects->alt_odb_list; alt; alt = alt->next)
+   for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
+   prepare_multi_pack_index_one(r, alt->path);
prepare_packed_git_one(r, alt->path, 0);
+   }
rearrange_packed_git(r);

[PATCH v4 22/23] packfile: skip loading index if in multi-pack-index

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 packfile.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index 2c819a0ad8..e6ecf12ab5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -469,8 +469,19 @@ static int open_packed_git_1(struct packed_git *p)
ssize_t read_result;
const unsigned hashsz = the_hash_algo->rawsz;
 
-   if (!p->index_data && open_pack_index(p))
-   return error("packfile %s index unavailable", p->pack_name);
+   if (!p->index_data) {
+   struct multi_pack_index *m;
+   const char *pack_name = strrchr(p->pack_name, '/');
+
+   for (m = the_repository->objects->multi_pack_index;
+m; m = m->next) {
+   if (midx_contains_pack(m, pack_name))
+   break;
+   }
+
+   if (!m && open_pack_index(p))
+   return error("packfile %s index unavailable", 
p->pack_name);
+   }
 
if (!pack_max_fds) {
unsigned int max_fds = get_max_fd_limit();
@@ -521,6 +532,10 @@ static int open_packed_git_1(struct packed_git *p)
" supported (try upgrading GIT to a newer version)",
p->pack_name, ntohl(hdr.hdr_version));
 
+   /* Skip index checking if in multi-pack-index */
+   if (!p->index_data)
+   return 0;
+
/* Verify the pack matches its index. */
if (p->num_objects != ntohl(hdr.hdr_entries))
return error("packfile %s claims to have %"PRIu32" objects"
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 23/23] midx: clear midx on repack

2018-07-12 Thread Derrick Stolee

If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c|  9 +
 midx.c  | 12 
 midx.h  |  1 +
 t/t5319-multi-pack-index.sh |  9 +
 4 files changed, 31 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..7f7cdc8b17 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "midx.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -174,6 +175,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int midx_cleared = 0;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
@@ -333,6 +335,13 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for_each_string_list_item(item, &names) {
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
+
+   if (!midx_cleared) {
+   /* if we move a packfile, it will invalidated 
the midx */
+   clear_midx_file(get_object_directory());
+   midx_cleared = 1;
+   }
+
fname = mkpathdup("%s/pack-%s%s", packdir,
item->string, exts[ext].name);
if (!file_exists(fname)) {
diff --git a/midx.c b/midx.c
index bf2334acc6..19b7df338e 100644
--- a/midx.c
+++ b/midx.c
@@ -904,3 +904,15 @@ int write_midx_file(const char *object_dir)
free(midx_name);
return 0;
 }
+
+void clear_midx_file(const char *object_dir)
+{
+   char *midx = get_midx_filename(object_dir);
+
+   if (remove_path(midx)) {
+   UNLEAK(midx);
+   die(_("failed to clear multi-pack-index at %s"), midx);
+   }
+
+   free(midx);
+}
diff --git a/midx.h b/midx.h
index d4cde99473..e3b07f1586 100644
--- a/midx.h
+++ b/midx.h
@@ -39,5 +39,6 @@ int midx_contains_pack(struct multi_pack_index *m, const char 
*idx_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
+void clear_midx_file(const char *object_dir);
 
 #endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 601e28a2f0..5ad6614465 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -141,6 +141,15 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'repack removes multi-pack-index' '
+   test_path_is_file $objdir/pack/multi-pack-index &&
+   git repack -adf &&
+   test_path_is_missing $objdir/pack/multi-pack-index
+'
+
+compare_results_with_midx "after repack"
+
+
 # usage: corrupt_data   []
 corrupt_data () {
file=$1
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 19/23] midx: use existing midx when writing new one

2018-07-12 Thread Derrick Stolee

Due to how Windows handles replacing a lockfile when there is an open
handle, create the close_midx() method to close the existing midx before
writing the new one.

Signed-off-by: Derrick Stolee 
---
 midx.c | 116 ++---
 midx.h |   1 +
 2 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 4e014ff6e3..bf2334acc6 100644
--- a/midx.c
+++ b/midx.c
@@ -179,6 +179,23 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static void close_midx(struct multi_pack_index *m)
+{
+   uint32_t i;
+   munmap((unsigned char *)m->data, m->data_len);
+   close(m->fd);
+   m->fd = -1;
+
+   for (i = 0; i < m->num_packs; i++) {
+   if (m->packs[i]) {
+   close_pack(m->packs[i]);
+   free(m->packs);
+   }
+   }
+   FREE_AND_NULL(m->packs);
+   FREE_AND_NULL(m->pack_names);
+}
+
 static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
@@ -278,6 +295,29 @@ int fill_midx_entry(const struct object_id *oid, struct 
pack_entry *e, struct mu
return nth_midxed_pack_entry(m, e, pos);
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
+{
+   uint32_t first = 0, last = m->num_packs;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   const char *current;
+   int cmp;
+
+   current = m->pack_names[mid];
+   cmp = strcmp(idx_name, current);
+   if (!cmp)
+   return 1;
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   return 0;
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
@@ -326,6 +366,7 @@ struct pack_list {
uint32_t alloc_list;
uint32_t alloc_names;
size_t pack_name_concat_len;
+   struct multi_pack_index *m;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -334,6 +375,9 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
struct pack_list *packs = (struct pack_list *)data;
 
if (ends_with(file_name, ".idx")) {
+   if (packs->m && midx_contains_pack(packs->m, file_name))
+   return;
+
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
@@ -419,6 +463,23 @@ static int midx_oid_compare(const void *_a, const void *_b)
return a->pack_int_id - b->pack_int_id;
 }
 
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+ uint32_t *pack_perm,
+ struct pack_midx_entry *e,
+ uint32_t pos)
+{
+   if (pos >= m->num_objects)
+   return 1;
+
+   nth_midxed_object_oid(&e->oid, m, pos);
+   e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)];
+   e->offset = nth_midxed_offset(m, pos);
+
+   /* consider objects in midx to be from "old" packs */
+   e->pack_mtime = 0;
+   return 0;
+}
+
 static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
@@ -444,7 +505,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
  * Copy only the de-duplicated entries (selected by most-recent modified time
  * of a packfile containing the object).
  */
-static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ struct packed_git **p,
  uint32_t *perm,
  uint32_t nr_packs,
  uint32_t *nr_objects)
@@ -453,8 +515,9 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
struct pack_midx_entry *entries_by_fanout = NULL;
struct pack_midx_entry *deduplicated_entries = NULL;
+   uint32_t start_pack = m ? m->num_packs : 0;
 
-   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
total_objects += p[cur_pack]->num_objects;
 
/*
@@ -471,7 +534,23 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
uint32_t nr_fanout =

[PATCH v4 21/23] midx: prevent duplicate packfile loads

2018-07-12 Thread Derrick Stolee

The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.

When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.

We still need to load the packfiles that are not tracked by the
multi-pack-index.

Signed-off-by: Derrick Stolee 
---
 packfile.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/packfile.c b/packfile.c
index 97e7812b6b..2c819a0ad8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -795,6 +795,7 @@ struct prepare_pack_data {
struct repository *r;
struct string_list *garbage;
int local;
+   struct multi_pack_index *m;
 };
 
 static void prepare_pack(const char *full_name, size_t full_name_len,
@@ -805,6 +806,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len,
size_t base_len = full_name_len;
 
if (strip_suffix_mem(full_name, &base_len, ".idx")) {
+   if (data->m && midx_contains_pack(data->m, file_name))
+   return;
/* Don't reopen a pack we already have. */
for (p = data->r->objects->packed_git; p; p = p->next) {
size_t len;
@@ -839,6 +842,12 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
 
+   data.m = r->objects->multi_pack_index;
+
+   /* look for the multi-pack-index for this object directory */
+   while (data.m && strcmp(data.m->object_dir, objdir))
+   data.m = data.m->next;
+
data.r = r;
data.garbage = &garbage;
data.local = local;
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 20/23] midx: use midx in approximate_object_count

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 packfile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/packfile.c b/packfile.c
index c0eb5ac885..97e7812b6b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -861,10 +861,13 @@ unsigned long approximate_object_count(void)
 {
if (!the_repository->objects->approximate_object_count_valid) {
unsigned long count;
+   struct multi_pack_index *m;
struct packed_git *p;
 
prepare_packed_git(the_repository);
count = 0;
+   for (m = get_multi_pack_index(the_repository); m; m = m->next)
+   count += m->num_objects;
for (p = the_repository->objects->packed_git; p; p = p->next) {
if (open_pack_index(p))
continue;
-- 
2.18.0.118.gd4f65b8d14

[PATCH 0/6] git-submodule.sh: convert part of cmd_update to C

2018-07-12 Thread Stefan Beller

I thought about writing it all in one go, but the series got too large,
so let's chew one bite at a time.

Thanks,
Stefan

Stefan Beller (6):
  git-submodule.sh: align error reporting for update mode to use path
  git-submodule.sh: rename unused variables
  builtin/submodule--helper: factor out submodule updating
  builtin/submodule--helper: store update_clone information in a struct
  builtin/submodule--helper: factor out method to update a single
submodule
  submodule--helper: introduce new update-module-mode helper

 builtin/submodule--helper.c | 152 
 git-submodule.sh|  22 +-
 2 files changed, 122 insertions(+), 52 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 2/6] git-submodule.sh: rename unused variables

2018-07-12 Thread Stefan Beller

The 'mode' variable is not used in cmd_update for its original purpose,
rename it to 'dummy' as it only serves the purpose to abort quickly
documenting this knowledge.

The variable 'stage' is also not used any more in cmd_update, so remove it.

This went unnoticed as first each function used the commonly used
submodule listing, which was converted in 74703a1e4df (submodule: rewrite
`module_list` shell function in C, 2015-09-02). When cmd_update was
using its own function starting in 48308681b07 (git submodule update:
have a dedicated helper for cloning, 2016-02-29), its removal was missed.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 5 ++---
 git-submodule.sh| 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 20ae9191ca3..ebcfe85bfa9 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1571,9 +1571,8 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
needs_cloning = !file_exists(sb.buf);
 
strbuf_reset(&sb);
-   strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
-   oid_to_hex(&ce->oid), ce_stage(ce),
-   needs_cloning, ce->name);
+   strbuf_addf(&sb, "dummy %s %d\t%s\n",
+   oid_to_hex(&ce->oid), needs_cloning, ce->name);
string_list_append(&suc->projectlines, sb.buf);
 
if (!needs_cloning)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8a611865397..56588aa304d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -531,9 +531,9 @@ cmd_update()
"$@" || echo "#unmatched" $?
} | {
err=
-   while read -r mode sha1 stage just_cloned sm_path
+   while read -r quickabort sha1 just_cloned sm_path
do
-   die_if_unmatched "$mode" "$sha1"
+   die_if_unmatched "$quickabort" "$sha1"
 
name=$(git submodule--helper name "$sm_path") || exit
if ! test -z "$update"
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 5/6] builtin/submodule--helper: factor out method to update a single submodule

2018-07-12 Thread Stefan Beller

In a later patch we'll find this method handy.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c9c3fe2bf28..4bbf580df79 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1725,10 +1725,17 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static void update_submodule(struct submodule_update_clone_information *suci)
+{
+   fprintf(stdout, "dummy %s %d\t%s\n",
+   oid_to_hex(&suci->oid),
+   suci->just_cloned,
+   suci->sub->path);
+}
+
 static int update_submodules(struct submodule_update_clone *suc)
 {
int i;
-   struct strbuf sb = STRBUF_INIT;
 
run_processes_parallel(suc->max_jobs,
   update_clone_get_next_task,
@@ -1747,16 +1754,9 @@ static int update_submodules(struct 
submodule_update_clone *suc)
if (suc->quickstop)
return 1;
 
-   for (i = 0; i < suc->submodule_lines_nr; i++) {
-   strbuf_addf(&sb, "dummy %s %d\t%s\n",
-   oid_to_hex(&suc->submodule_lines[i].oid),
-   suc->submodule_lines[i].just_cloned,
-   suc->submodule_lines[i].sub->path);
-   fprintf(stdout, "%s", sb.buf);
-   strbuf_reset(&sb);
-   }
+   for (i = 0; i < suc->submodule_lines_nr; i++)
+   update_submodule(&suc->submodule_lines[i]);
 
-   strbuf_release(&sb);
return 0;
 }
 
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 1/6] git-submodule.sh: align error reporting for update mode to use path

2018-07-12 Thread Stefan Beller

All other error messages in cmd_update are reporting the submodule based
on its path, so let's do that for invalid update modes, too.

Signed-off-by: Stefan Beller 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 5f9d9f6ea37..8a611865397 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -627,7 +627,7 @@ cmd_update()
must_die_on_failure=yes
;;
*)
-   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule '$name'")"
+   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule path '$path'")"
esac
 
if (sanitize_submodule_env; cd "$sm_path" && $command 
"$sha1")
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 6/6] submodule--helper: introduce new update-module-mode helper

2018-07-12 Thread Stefan Beller

This chews off a bit of the shell part of the update command in
git-submodule.sh. When writing the C code, keep in mind that the
submodule--helper part will go away eventually and we want to have
a C function that is able to determine the submodule update strategy,
it as a nicety, make determine_submodule_update_strategy accessible
for arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 61 +
 git-submodule.sh| 16 +-
 2 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 4bbf580df79..e53231cf286 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1444,6 +1444,66 @@ static int module_clone(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static void determine_submodule_update_strategy(struct repository *r,
+   int just_cloned,
+   const char *path,
+   const char *update,
+   struct 
submodule_update_strategy *out)
+{
+   const struct submodule *sub = submodule_from_path(r, &null_oid, path);
+   char *key;
+   const char *val;
+
+   key = xstrfmt("submodule.%s.update", sub->name);
+
+   if (update) {
+   trace_printf("parsing update");
+   if (parse_submodule_update_strategy(update, out) < 0)
+   die(_("Invalid update mode '%s' for submodule path 
'%s'"),
+   update, path);
+   } else if (!repo_config_get_string_const(r, key, &val)) {
+   if (parse_submodule_update_strategy(val, out) < 0)
+   die(_("Invalid update mode '%s' configured for 
submodule path '%s'"),
+   val, path);
+   } else if (sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
+   trace_printf("loaded thing");
+   out->type = sub->update_strategy.type;
+   out->command = sub->update_strategy.command;
+   } else
+   out->type = SM_UPDATE_CHECKOUT;
+
+   if (just_cloned &&
+   (out->type == SM_UPDATE_MERGE ||
+out->type == SM_UPDATE_REBASE ||
+out->type == SM_UPDATE_NONE))
+   out->type = SM_UPDATE_CHECKOUT;
+
+   free(key);
+}
+
+static int module_update_module_mode(int argc, const char **argv, const char 
*prefix)
+{
+   const char *path, *update = NULL;
+   int just_cloned;
+   struct submodule_update_strategy update_strategy = { .type = 
SM_UPDATE_CHECKOUT };
+
+   if (argc < 3 || argc > 4)
+   die("submodule--helper update-module-clone expects 
  []");
+
+   just_cloned = git_config_int("just_cloned", argv[1]);
+   path = argv[2];
+
+   if (argc == 4)
+   update = argv[3];
+
+   determine_submodule_update_strategy(the_repository,
+   just_cloned, path, update,
+   &update_strategy);
+   fprintf(stdout, submodule_strategy_to_string(&update_strategy));
+
+   return 0;
+}
+
 struct submodule_update_clone_information {
const struct submodule *sub;
struct object_id oid;
@@ -2039,6 +2099,7 @@ static struct cmd_struct commands[] = {
{"list", module_list, 0},
{"name", module_name, 0},
{"clone", module_clone, 0},
+   {"update-module-mode", module_update_module_mode, 0},
{"update-clone", update_clone, 0},
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 56588aa304d..215760898ce 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -535,27 +535,13 @@ cmd_update()
do
die_if_unmatched "$quickabort" "$sha1"
 
-   name=$(git submodule--helper name "$sm_path") || exit
-   if ! test -z "$update"
-   then
-   update_module=$update
-   else
-   update_module=$(git config submodule."$name".update)
-   if test -z "$update_module"
-   then
-   update_module="checkout"
-   fi
-   fi
+   update_module=$(git submodule--helper update-module-mode 
$just_cloned "$sm_path" $update)
 
displaypath=$(git submodule--helper relative-path 
"$prefix$sm_path" "$wt_prefix")
 
if test $just_cloned -eq 1
then
subsha1=
-   case "$update_module" in
-   merge | rebase | none)
-   update_module=checkout ;;
-   esac
else

[PATCH 3/6] builtin/submodule--helper: factor out submodule updating

2018-07-12 Thread Stefan Beller

Separate the command line parsing from the actual execution of the command
within the repository. For now there is not a lot of execution as
most of it is still in git-submodule.sh.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 59 +
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ebcfe85bfa9..96929ba1096 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1472,6 +1472,8 @@ struct submodule_update_clone {
/* failed clones to be retried again */
const struct cache_entry **failed_clones;
int failed_clones_nr, failed_clones_alloc;
+
+   int max_jobs;
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
@@ -1714,11 +1716,36 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static int update_submodules(struct submodule_update_clone *suc)
+{
+   struct string_list_item *item;
+
+   run_processes_parallel(suc->max_jobs,
+  update_clone_get_next_task,
+  update_clone_start_failure,
+  update_clone_task_finished,
+  suc);
+
+   /*
+* We saved the output and put it out all at once now.
+* That means:
+* - the listener does not have to interleave their (checkout)
+*   work with our fetching.  The writes involved in a
+*   checkout involve more straightforward sequential I/O.
+* - the listener can avoid doing any work if fetching failed.
+*/
+   if (suc->quickstop)
+   return 1;
+
+   for_each_string_list_item(item, &suc->projectlines)
+   fprintf(stdout, "%s", item->string);
+
+   return 0;
+}
+
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
const char *update = NULL;
-   int max_jobs = 1;
-   struct string_list_item *item;
struct pathspec pathspec;
struct submodule_update_clone suc = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -1740,7 +1767,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
OPT_STRING(0, "depth", &suc.depth, "",
   N_("Create a shallow clone truncated to the "
  "specified number of revisions")),
-   OPT_INTEGER('j', "jobs", &max_jobs,
+   OPT_INTEGER('j', "jobs", &suc.max_jobs,
N_("parallel jobs")),
OPT_BOOL(0, "recommend-shallow", &suc.recommend_shallow,
N_("whether the initial clone should follow the 
shallow recommendation")),
@@ -1756,8 +1783,8 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
};
suc.prefix = prefix;
 
-   config_from_gitmodules(gitmodules_update_clone_config, &max_jobs);
-   git_config(gitmodules_update_clone_config, &max_jobs);
+   config_from_gitmodules(gitmodules_update_clone_config, &suc.max_jobs);
+   git_config(gitmodules_update_clone_config, &suc.max_jobs);
 
argc = parse_options(argc, argv, prefix, module_update_clone_options,
 git_submodule_helper_usage, 0);
@@ -1772,27 +1799,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
if (pathspec.nr)
suc.warn_if_uninitialized = 1;
 
-   run_processes_parallel(max_jobs,
-  update_clone_get_next_task,
-  update_clone_start_failure,
-  update_clone_task_finished,
-  &suc);
-
-   /*
-* We saved the output and put it out all at once now.
-* That means:
-* - the listener does not have to interleave their (checkout)
-*   work with our fetching.  The writes involved in a
-*   checkout involve more straightforward sequential I/O.
-* - the listener can avoid doing any work if fetching failed.
-*/
-   if (suc.quickstop)
-   return 1;
-
-   for_each_string_list_item(item, &suc.projectlines)
-   fprintf(stdout, "%s", item->string);
-
-   return 0;
+   return update_submodules(&suc);
 }
 
 static int resolve_relative_path(int argc, const char **argv, const char 
*prefix)
-- 
2.18.0.203.gfac676dfb9-goog

1 2 >

1 - 100 of 144 matches

Mail list logo