date:20180814

[PATCH 0/1] Fix a recently-introduced compile warning

2018-08-14 Thread Johannes Schindelin via GitGitGadget

With the relatively frequent breakages of pu recently, I had trouble staying
on top of the compile errors/test failures, sorry.

This one exists since Sunday, and it is a compile error only with 
DEVELOPER=1, which is, however, the recommended way to build in Git for
Windows' SDK.

Note: it is based on nd/clone-case-smashing-warning.

Johannes Schindelin (1):
  mark_colliding_entries(): fix incorrect #if...#endif guard

 entry.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


base-commit: f80218bf4e65ccc06cc9173c0ac5a5520d380f36
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-18%2Fdscho%2Fclone-case-smashing-warning-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-18/dscho/clone-case-smashing-warning-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/18
-- 
gitgitgadget

[PATCH 1/1] mark_colliding_entries(): fix incorrect #if...#endif guard

2018-08-14 Thread Johannes Schindelin via GitGitGadget

From: Johannes Schindelin 

The way the guard was put, the code was declaring an unused variable on
Windows. No need to do that, so let's fix it.

Signed-off-by: Johannes Schindelin 
---
 entry.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/entry.c b/entry.c
index c70340df8..2bce13352 100644
--- a/entry.c
+++ b/entry.c
@@ -402,11 +402,9 @@ static int check_path(const char *path, int len, struct 
stat *st, int skiplen)
 static void mark_colliding_entries(const struct checkout *state,
   struct cache_entry *ce, struct stat *st)
 {
+#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
int i;
 
-   ce->ce_flags |= CE_MATCHED;
-
-#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
for (i = 0; i < state->istate->cache_nr; i++) {
struct cache_entry *dup = state->istate->cache[i];
 
@@ -422,6 +420,8 @@ static void mark_colliding_entries(const struct checkout 
*state,
}
}
 #endif
+
+   ce->ce_flags |= CE_MATCHED;
 }
 
 /*
-- 
gitgitgadget

[PATCH v3 5/7] submodule: use the 'submodule--helper config' command

2018-08-14 Thread Antonio Ospite

Use the 'submodule--helper config' command in git-submodules.sh to avoid
referring explicitly to .gitmodules by the hardcoded file path.

This makes it possible to access the submodules configuration in a more
controlled way.

Signed-off-by: Antonio Ospite 
---
 git-submodule.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 8b5ad59bde..ff258e2e8c 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -72,7 +72,7 @@ get_submodule_config () {
value=$(git config submodule."$name"."$option")
if test -z "$value"
then
-   value=$(git config -f .gitmodules submodule."$name"."$option")
+   value=$(git submodule--helper config 
submodule."$name"."$option")
fi
printf '%s' "${value:-$default}"
 }
@@ -283,11 +283,11 @@ or you are unsure what this means choose another name 
with the '--name' option."
git add --no-warn-embedded-repo $force "$sm_path" ||
die "$(eval_gettext "Failed to add submodule '\$sm_path'")"
 
-   git config -f .gitmodules submodule."$sm_name".path "$sm_path" &&
-   git config -f .gitmodules submodule."$sm_name".url "$repo" &&
+   git submodule--helper config submodule."$sm_name".path "$sm_path" &&
+   git submodule--helper config submodule."$sm_name".url "$repo" &&
if test -n "$branch"
then
-   git config -f .gitmodules submodule."$sm_name".branch "$branch"
+   git submodule--helper config submodule."$sm_name".branch 
"$branch"
fi &&
git add --force .gitmodules ||
die "$(eval_gettext "Failed to register submodule '\$sm_path'")"
-- 
2.18.0

[PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-14 Thread Antonio Ospite

When the .gitmodules file is not available in the working tree, try
using HEAD:.gitmodules from the current branch. This covers the case
when the file is part of the repository but for some reason it is not
checked out, for example because of a sparse checkout.

This makes it possible to use at least the 'git submodule' commands
which *read* the gitmodules configuration file without fully populating
the working tree.

Writing to .gitmodules will still require that the file is checked out,
so check for that before calling config_set_in_gitmodules_file_gently.

Add a similar check also in git-submodule.sh::cmd_add() to anticipate
the eventual failure of the "git submodule add" command when .gitmodules
is not safely writeable; this prevents the command from leaving the
repository in a spurious state (e.g. the submodule repository was cloned
but .gitmodules was not updated because
config_set_in_gitmodules_file_gently failed).

Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
from .gitmodules succeeds and that writing to it fails when the file is
not checked out.

Signed-off-by: Antonio Ospite 
---

Maybe the check in config_set_in_gitmodules_file_gently and
git-submodule.sh::cmd_add() can share some code:

  - add an is_gitmodules_safely_writeable() helper
  - expose a "submodule--helper config --is-safely-writeable" subcommand

But for now I preferred to keep the changes with v2 to a minimum to avoid
blocking the series.

If adding a new helper is preferred I can do a v4 or send a follow-up patch.

Thank you,
   Antonio


 builtin/submodule--helper.c| 17 -
 cache.h|  1 +
 git-submodule.sh   |  7 ++
 submodule-config.c | 16 -
 t/t7416-submodule-sparse-gitmodules.sh | 90 ++
 5 files changed, 128 insertions(+), 3 deletions(-)
 create mode 100755 t/t7416-submodule-sparse-gitmodules.sh

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 7481d03b63..c0370a756b 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2036,8 +2036,23 @@ static int module_config(int argc, const char **argv, 
const char *prefix)
return print_config_from_gitmodules(argv[1]);
 
/* Equivalent to ACTION_SET in builtin/config.c */
-   if (argc == 3)
+   if (argc == 3) {
+   struct object_id oid;
+
+   /*
+* If the .gitmodules file is not in the working tree but it
+* is in the current branch, stop, as writing new values (and
+* staging them) would blindly overwrite ALL the old content.
+*
+* This check still makes it possible to create a brand new
+* .gitmodules when it is safe to do so: when neither
+* GITMODULES_FILE nor GITMODULES_HEAD exist.
+*/
+   if (!file_exists(GITMODULES_FILE) && get_oid(GITMODULES_HEAD, 
&oid) >= 0)
+   die(_("please make sure that the .gitmodules file in 
the current branch is checked out"));
+
return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
+   }
 
die("submodule--helper config takes 1 or 2 arguments: name [value]");
 }
diff --git a/cache.h b/cache.h
index 8dc7134f00..900f9e09e5 100644
--- a/cache.h
+++ b/cache.h
@@ -486,6 +486,7 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
 #define GITMODULES_FILE ".gitmodules"
+#define GITMODULES_HEAD "HEAD:.gitmodules"
 #define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
 #define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
 #define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF"
diff --git a/git-submodule.sh b/git-submodule.sh
index ff258e2e8c..b1cb187227 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -159,6 +159,13 @@ cmd_add()
shift
done
 
+   # For more details about this check, see
+   # builtin/submodule--helper.c::module_config()
+   if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > 
/dev/null 2>&1
+   then
+die "$(eval_gettext "please make sure that the .gitmodules 
file in the current branch is checked out")"
+   fi
+
if test -n "$reference_path"
then
is_absolute_path "$reference_path" ||
diff --git a/submodule-config.c b/submodule-config.c
index b7ef055c63..088dabb56f 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "dir.h"
 #include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
@@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct repository 
*repo)
 static void config_from_gitmodules(config_fn_t fn, struct repository *repo, 
void *data)
 {
if (repo->worktree) {
-   char *file = repo_worktree_pa

[PATCH v3 3/7] t7411: be nicer to future tests and really clean things up

2018-08-14 Thread Antonio Ospite

Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with
invalid lines in .gitmodules but then only the second commit is removed.

This may affect future subsequent tests if they assume that the
.gitmodules file has no errors.

Remove both the commits as soon as they are not needed anymore.

The error introduced in test 5 is also required by test 6, so the two
commits from above are removed respectively in tests 6 and 8.

Signed-off-by: Antonio Ospite 
---
 t/t7411-submodule-config.sh | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 0bde5850ac..c6b6cf6fae 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets 
continue' '
 '
 
 test_expect_success 'error message contains blob reference' '
+   # Remove the error introduced in the previous test.
+   # It is not needed in the following tests.
+   test_when_finished "git -C super reset --hard HEAD^" &&
(cd super &&
sha1=$(git rev-parse HEAD) &&
test-tool submodule-config \
@@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' '
 '
 
 test_expect_success 'error in history in fetchrecursesubmodule lets continue' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
(cd super &&
git config -f .gitmodules \
submodule.submodule.fetchrecursesubmodules blabla &&
@@ -134,8 +138,7 @@ test_expect_success 'error in history in 
fetchrecursesubmodule lets continue' '
HEAD b \
HEAD submodule \
>actual &&
-   test_cmp expect_error actual  &&
-   git reset --hard HEAD^
+   test_cmp expect_error actual
)
 '
 
-- 
2.18.0

[PATCH v3 4/7] submodule--helper: add a new 'config' subcommand

2018-08-14 Thread Antonio Ospite

Add a new 'config' subcommand to 'submodule--helper', this extra level
of indirection makes it possible to add some flexibility to how the
submodules configuration is handled.

Signed-off-by: Antonio Ospite 
---
 builtin/submodule--helper.c | 14 ++
 new |  0
 t/t7411-submodule-config.sh | 26 ++
 3 files changed, 40 insertions(+)
 create mode 100644 new

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index a3c4564c6c..7481d03b63 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2029,6 +2029,19 @@ static int connect_gitdir_workingtree(int argc, const 
char **argv, const char *p
return 0;
 }
 
+static int module_config(int argc, const char **argv, const char *prefix)
+{
+   /* Equivalent to ACTION_GET in builtin/config.c */
+   if (argc == 2)
+   return print_config_from_gitmodules(argv[1]);
+
+   /* Equivalent to ACTION_SET in builtin/config.c */
+   if (argc == 3)
+   return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
+
+   die("submodule--helper config takes 1 or 2 arguments: name [value]");
+}
+
 #define SUPPORT_SUPER_PREFIX (1<<0)
 
 struct cmd_struct {
@@ -2057,6 +2070,7 @@ static struct cmd_struct commands[] = {
{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
{"is-active", is_active, 0},
{"check-name", check_name, 0},
+   {"config", module_config, 0},
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/new b/new
new file mode 100644
index 00..e69de29bb2
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index c6b6cf6fae..4afb6f152e 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -142,4 +142,30 @@ test_expect_success 'error in history in 
fetchrecursesubmodule lets continue' '
)
 '
 
+test_expect_success 'reading submodules config with "submodule--helper 
config"' '
+   (cd super &&
+   echo "../submodule" >expected &&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expected actual
+   )
+'
+
+test_expect_success 'writing submodules config with "submodule--helper 
config"' '
+   (cd super &&
+   echo "new_url" >expected &&
+   git submodule--helper config submodule.submodule.url "new_url" 
&&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expected actual
+   )
+'
+
+test_expect_success 'overwriting unstaged submodules config with 
"submodule--helper config"' '
+   (cd super &&
+   echo "newer_url" >expected &&
+   git submodule--helper config submodule.submodule.url 
"newer_url" &&
+   git submodule--helper config submodule.submodule.url >actual &&
+   test_cmp expected actual
+   )
+'
+
 test_done
-- 
2.18.0

[PATCH v3 0/7] Make submodules work if .gitmodules is not checked out

2018-08-14 Thread Antonio Ospite

Hi,

this series teaches git to try and read the .gitmodules file from the
current branch (HEAD:.gitmodules) when it's not readily available in the
working tree.

This can be used, along with sparse checkouts, to enable submodule usage
with programs like vcsh[1] which manage multiple repositories with their
working trees sharing the same path.

[1] https://github.com/RichiH/vcsh


This is v3 of the series from:
https://public-inbox.org/git/20180802134634.10300-1-...@ao2.it/

The cover letter of the first proposal contains more background:
https://public-inbox.org/git/20180514105823.8378-1-...@ao2.it/

Changes since v2:

  * Removed the extern keyword from the public declaration of
print_config_from_gitmodules() and
config_set_in_gitmodules_file_gently()

  * Used test_when_finished in t/t7411-submodule-config.sh and remove
the problematic commits as soon as they are not needed anymore.

  * Restructured the code in module_config to avoid an unreachable
section, the code now dies as a fallback if the arguments are not
supported, as suggested by Jeff.

  * Dropped patches and tests about "submodule--helper config --stage"
as they are not strictly needed for now and there is no immediate
benefit from them.

  * Added a check to git-submodule.sh::cdm_add to make it fail earlier
if the .gitmodules file is not "safely writeable". This also fixes
one of the new tests which was previously marked as
"test_expect_failure".

  * Fixed a broken &&-chain in a subshell in one of the new tests, the
issue was exposed by a recent change in master.

  * Dropped a note about "git rm" and "git mv", it was intended as
a personal reminder and not for the general public.

  * Squashed t7416-submodule-sparse-gitmodules.sh in the same commit of
the code it exercises.

  * Dropped the two unrelated patches from v2:

  - dir: move is_empty_file() from builtin/am.c to dir.c and make it
public
  - submodule: remove the .gitmodules file when it is empty

as they are orthogonal to this series. I will send them as
a standalone series.

  * Minor wording fixes here and there.


The series looks a lot cleaner and more to the point, thanks for the
review.

Ciao,
   Antonio

Antonio Ospite (7):
  submodule: add a print_config_from_gitmodules() helper
  submodule: factor out a config_set_in_gitmodules_file_gently function
  t7411: be nicer to future tests and really clean things up
  submodule--helper: add a new 'config' subcommand
  submodule: use the 'submodule--helper config' command
  t7506: clean up .gitmodules properly before setting up new scenario
  submodule: support reading .gitmodules even when it's not checked out

 builtin/submodule--helper.c| 29 +
 cache.h|  1 +
 git-submodule.sh   | 15 +++--
 new|  0
 submodule-config.c | 53 ++-
 submodule-config.h |  3 +
 submodule.c| 10 +--
 t/t7411-submodule-config.sh| 33 +-
 t/t7416-submodule-sparse-gitmodules.sh | 90 ++
 t/t7506-status-submodule.sh|  3 +-
 10 files changed, 221 insertions(+), 16 deletions(-)
 create mode 100644 new
 create mode 100755 t/t7416-submodule-sparse-gitmodules.sh

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

[PATCH v3 2/7] submodule: factor out a config_set_in_gitmodules_file_gently function

2018-08-14 Thread Antonio Ospite

Introduce a new config_set_in_gitmodules_file_gently() function to write
config values to the .gitmodules file.

This is in preparation for a future change which will use the function
to write to the .gitmodules file in a more controlled way instead of
using "git config -f .gitmodules".

The purpose of the change is mainly to centralize the code that writes
to the .gitmodules file to avoid some duplication.

The naming follows git_config_set_in_file_gently() but the git_ prefix
is removed to communicate that this is not a generic git-config API.

Signed-off-by: Antonio Ospite 
---
 submodule-config.c | 12 
 submodule-config.h |  1 +
 submodule.c| 10 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index eef96c4198..b7ef055c63 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -707,6 +707,18 @@ int print_config_from_gitmodules(const char *key)
return 0;
 }
 
+int config_set_in_gitmodules_file_gently(const char *key, const char *value)
+{
+   int ret;
+
+   ret = git_config_set_in_file_gently(GITMODULES_FILE, key, value);
+   if (ret < 0)
+   /* Maybe the user already did that, don't error out here */
+   warning(_("Could not update .gitmodules entry %s"), key);
+
+   return ret;
+}
+
 struct fetch_config {
int *max_children;
int *recurse_submodules;
diff --git a/submodule-config.h b/submodule-config.h
index ed40e9a478..9957bcbbfa 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -57,6 +57,7 @@ void submodule_free(struct repository *r);
 int check_submodule_name(const char *name);
 
 int print_config_from_gitmodules(const char *key);
+int config_set_in_gitmodules_file_gently(const char *key, const char *value);
 
 /*
  * Note: these helper functions exist solely to maintain backward
diff --git a/submodule.c b/submodule.c
index 6e14547e9e..fd95cb76b3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -89,6 +89,7 @@ int update_path_in_gitmodules(const char *oldpath, const char 
*newpath)
 {
struct strbuf entry = STRBUF_INIT;
const struct submodule *submodule;
+   int ret;
 
if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */
return -1;
@@ -104,14 +105,9 @@ int update_path_in_gitmodules(const char *oldpath, const 
char *newpath)
strbuf_addstr(&entry, "submodule.");
strbuf_addstr(&entry, submodule->name);
strbuf_addstr(&entry, ".path");
-   if (git_config_set_in_file_gently(GITMODULES_FILE, entry.buf, newpath) 
< 0) {
-   /* Maybe the user already did that, don't error out here */
-   warning(_("Could not update .gitmodules entry %s"), entry.buf);
-   strbuf_release(&entry);
-   return -1;
-   }
+   ret = config_set_in_gitmodules_file_gently(entry.buf, newpath);
strbuf_release(&entry);
-   return 0;
+   return ret;
 }
 
 /*
-- 
2.18.0

[PATCH v3 6/7] t7506: clean up .gitmodules properly before setting up new scenario

2018-08-14 Thread Antonio Ospite

In t/t7506-status-submodule.sh at some point a new scenario is set up to
test different things, in particular new submodules are added which are
meant to completely replace the previous ones.

However before calling the "git submodule add" commands for the new
layout, the .gitmodules file is removed only from the working tree still
leaving the previous content in current branch.

This can break if, in the future, "git submodule add" starts
differentiating between the following two cases:

  - .gitmodules is not in the working tree but it is in the current
branch (it may not be safe to add new submodules in this case);

  - .gitmodules is neither in the working tree nor anywhere in the
current branch (it is safe to add new submodules).

Since the test intends to get rid of .gitmodules anyways, let's
completely remove it from the current branch, to actually start afresh
in the new scenario.

This is more future-proof and does not break current tests.

Signed-off-by: Antonio Ospite 
---
 t/t7506-status-submodule.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index 943708fb04..08629a6e70 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -325,7 +325,8 @@ test_expect_success 'setup superproject with untracked file 
in nested submodule'
(
cd super &&
git clean -dfx &&
-   rm .gitmodules &&
+   git rm .gitmodules &&
+   git commit -m "remove .gitmodules" &&
git submodule add -f ./sub1 &&
git submodule add -f ./sub2 &&
git submodule add -f ./sub1 sub3 &&
-- 
2.18.0

[PATCH v3 1/7] submodule: add a print_config_from_gitmodules() helper

2018-08-14 Thread Antonio Ospite

Add a new print_config_from_gitmodules() helper function to print values
from .gitmodules just like "git config -f .gitmodules" would.

This will be used by a new submodule--helper subcommand to be able to
access the .gitmodules file in a more controlled way.

Signed-off-by: Antonio Ospite 
---
 submodule-config.c | 25 +
 submodule-config.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index fc2c41b947..eef96c4198 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -682,6 +682,31 @@ void submodule_free(struct repository *r)
submodule_cache_clear(r->submodule_cache);
 }
 
+static int config_print_callback(const char *key_, const char *value_, void 
*cb_data)
+{
+   char *key = cb_data;
+
+   if (!strcmp(key, key_))
+   printf("%s\n", value_);
+
+   return 0;
+}
+
+int print_config_from_gitmodules(const char *key)
+{
+   int ret;
+   char *store_key;
+
+   ret = git_config_parse_key(key, &store_key, NULL);
+   if (ret < 0)
+   return CONFIG_INVALID_KEY;
+
+   config_from_gitmodules(config_print_callback, the_repository, 
store_key);
+
+   free(store_key);
+   return 0;
+}
+
 struct fetch_config {
int *max_children;
int *recurse_submodules;
diff --git a/submodule-config.h b/submodule-config.h
index dc7278eea4..ed40e9a478 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -56,6 +56,8 @@ void submodule_free(struct repository *r);
  */
 int check_submodule_name(const char *name);
 
+int print_config_from_gitmodules(const char *key);
+
 /*
  * Note: these helper functions exist solely to maintain backward
  * compatibility with 'fetch' and 'update_clone' storing configuration in
-- 
2.18.0

[PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-14 Thread SZEDER Gábor

The test 'pack-objects to file can use bitmap' added in 645c432d61
(pack-objects: use reachability bitmap index when generating
non-stdout pack, 2016-09-10) is silently buggy and doesn't check what
it's supposed to.

In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function
does what its name implies by running:

  git show-index <"$1" | cut -d' ' -f2

The test in question invokes this function like this:

  list_packed_objects packa.objects &&
  list_packed_objects packb.objects &&
  test_cmp packa.objects packb.objects

Note how these two callsites don't specify the name of the pack index
file as the function's parameter, but redirect the function's standard
input from it.  This triggers an error message from the shell, as it
has no filename to redirect from in the function, but this error is
ignored, because it happens upstream of a pipe.  Consequently, both
invocations produce empty 'pack{a,b}.objects' files, and the
subsequent 'test_cmp' happily finds those two empty files identical.

Fix these two 'list_packed_objects' invocations by specifying the pack
index files as parameters.  Furthermore, eliminate the pipe in that
function by replacing it with an &&-chained pair of commands using an
intermediate file, so a failure of 'git show-index' or the shell
redirection will fail the test.

Signed-off-by: SZEDER Gábor 
---
 t/t5310-pack-bitmaps.sh | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 6ee4d3f2d9..557bd0d0c0 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -9,7 +9,8 @@ objpath () {
 
 # show objects present in pack ($1 should be associated *.idx)
 list_packed_objects () {
-   git show-index <"$1" | cut -d' ' -f2
+   git show-index <"$1" >object-list &&
+   cut -d' ' -f2 object-list
 }
 
 # has_any pattern-file content-file
@@ -204,8 +205,8 @@ test_expect_success 'pack-objects to file can use bitmap' '
# verify equivalent packs are generated with/without using bitmap index
packasha1=$(git pack-objects --no-use-bitmap-index --all packa 
packa.objects &&
-   list_packed_objects packb.objects &&
+   list_packed_objects packa-$packasha1.idx >packa.objects &&
+   list_packed_objects packb-$packbsha1.idx >packb.objects &&
test_cmp packa.objects packb.objects
 '
 
-- 
2.18.0.720.g1f300496fc

[PATCH] doc: git-describe

2018-08-14 Thread William Pursell

commit cc4bd5268b2dbe90279198acb400a528ee23f5d5
Author: William Pursell 
Date:   Tue Aug 14 06:41:00 2018 -0600

doc: Reference  for --dirty/--broken

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index e027fb8c4b..f7d67306c0 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -39,11 +39,12 @@ OPTIONS
 --broken[=]::
  Describe the state of the working tree.  When the working
  tree matches HEAD, the output is the same as "git describe
- HEAD".  If the working tree has local modification "-dirty"
- is appended to it.  If a repository is corrupt and Git
- cannot determine if there is local modification, Git will
- error out, unless `--broken' is given, which appends
- the suffix "-broken" instead.
+ HEAD".  If the working tree has local modification, "-dirty"
+ (or , if supplied) is appended to it.  If a repository
+ is corrupt and Git cannot determine if there is local
+ modification, Git will error out unless `--broken' is given
+ in which case it will append the suffix "-broken" (or ,
+ if supplied) instead.

 --all::
  Instead of using only the annotated tags, use any ref

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Jeff Hostetler





On 8/13/2018 3:02 PM, Johannes Sixt wrote:

The Windows CRT implements O_APPEND "manually": on write() calls, the
file pointer is set to EOF before the data is written. Clearly, this is
not atomic. And in fact, this is the root cause of failures observed in
t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where
different processes write to the same trace file simultanously; it also
occurred in t5400-send-pack.sh, but there it was worked around in
71406ed4d6 ("t5400: avoid concurrent writes into a trace file",
2017-05-18).

Fortunately, Windows does support atomic O_APPEND semantics using the
file access mode FILE_APPEND_DATA. Provide an implementation that does.

This implementation is minimal in such a way that it only implements
the open modes that are actually used in the Git code base. Emulation
for other modes can be added as necessary later. To become aware of
the necessity early, the unusal error ENOSYS is reported if an
unsupported mode is encountered.

Diagnosed-by: Johannes Schindelin 
Helped-by: Jeff Hostetler 
Signed-off-by: Johannes Sixt 
---
  compat/mingw.c | 41 +++--
  1 file changed, 39 insertions(+), 2 deletions(-)



[...]

This looks good.  Thanks for following up on this.

Jeff

RE: Contributor Summit planning

2018-08-14 Thread Randall S. Becker

On August 14, 2018 2:53 AM, Elijah Newren wrote:
> On Mon, Aug 13, 2018 at 10:27 AM Jeff King  wrote:
> >
> > For the past several years, we've held a Git Contributor Summit as
> > part of the Git Merge conference. I'd like to get opinions from the
> > community to help plan future installments. Any feedback or opinion is
> > welcome, but some obvious things to think about:
> >
> >   - where, when, and how often?
> >
> > Plans are shaping up to have Git Merge 2019 in Brussels right after
> > FOSDEM in February (like it was two years ago), with a contributor
> > summit attached.
> >
> > Are there people who would be more likely to attend a contributor
> > summit if it were held elsewhere (e.g., in North America, probably
> > in the Bay Area)? Are people interested in attending a separate
> > contributor summit not attached to the larger Git Merge (and if so,
> > is there any other event it might be worth connecting it with,
> > time-wise)?  Are people interested in going to two summits in a year
> > (e.g., Brussels in February, and then maybe some in North America
> > later in the year), or is that diminishing returns?
> 
> Convincing my employer to send me to an event in North America is a lot
> easier than one in Europe; they mostly allow me to work on git stuff as a side
> project just to make me happy rather than as a business priority, so
> competing business interests, shifting managers, etc. make things hard for
> me to predict (so you may want to weight my preferences less than normal).
> 
> My last manger did say they'd send me to the next contributor summit (I
> think even if it ended up being in Europe rather than North America), but of
> course, he was pulled to a different team a few months ago, so I'm not sure
> if that still stands.
> 
> 
> On a personal note, I'm also somewhat travel averse.  It'd be nice to go to a
> Git conference again (last and only I went to was I think Git Together 2011),
> but I know when it comes close to time to actually travel, I'll start
> questioning my sanity when I said that -- particularly if it's far away or at 
> all
> complicated.  (So maybe you really ought to discount my preferences...)

Unrelated directly to above, I noticed that I actually showed up on the list 
for 2018 based on git log anyway. Does this mean I'd be welcome? Personally, 
it's actually easier to get approval to travel to Brussels now than SFO even 
though the latter is closer. With that in mind, I can do either (or both - 
depending on scheduling).

Cheers,
Randall

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Ævar Arnfjörð Bjarmason



On Mon, Aug 13 2018, Jeff King wrote:

> On Mon, Aug 13, 2018 at 11:22:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > O_APPEND is POSIX and means race-free append. If you mark some call
>> > sites with O_APPEND, then that must be the ones that need race-free
>> > append. Hence, you would have to go the other route: Mark those call
>> > sites that do _not_ need race-free append with some custom
>> > function/macro. (Or mark both with different helpers and avoid writing
>> > down O_APPEND.)
>>
>> O_APPEND in POSIX is race-free only up to PIPE_MAX bytes written at a
>> time, which is e.g. 2^12 by default on linux, after that all bets are
>> off and the kernel is free to interleave different write calls.

[I should have said PIPE_BUF, not PIPE_MAX]

> This is a claim I've run across often, but I've never seen a good
> citation for it.

To clarify I'm claiming that this is not a guarantee POSIX or linux
support. Not that in practice it doesn't work on some systems.

The relevant POSIX docs are here:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

Write requests of {PIPE_BUF} bytes or less shall not be interleaved
with data from other processes doing writes on the same pipe. Writes
of greater than {PIPE_BUF} bytes may have data interleaved, on
arbitrary boundaries, with writes by other processes, whether or not
the O_NONBLOCK flag of the file status flags is set.

And the Linux docs at http://man7.org/linux/man-pages/man7/pipe.7.html
say something similar:

Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may
interleave the data with data written by other processes.  POSIX.1
requires PIPE_BUF to be at least 512 bytes.  (On Linux, PIPE_BUF is
4096 bytes.)

> Certainly atomic writes to _pipes_ are determined by PIPE_BUF (which
> IIRC is not even a constant on Linux, but can be changed at run-time).
> But is it relevant for regular-file writes?

I believe it's hardcoded at PIPE_BUF which is defined as PAGE_SIZE on
linux. I think you may be thinking of /proc/sys/fs/pipe-max-pages which
is the number of pages that a pipe will take before filling up, but I
may be wrong.

> Another gem I found while digging on this O_APPEND/FILE_APPEND_DATA
> stuff the other day: somebody claimed that the max atomic-append size on
> Linux is 4096 and 1024 on Windows. But their experimental script was
> done in bash! So I suspect they were really just measuring the size of
> stdio buffers.

Yes, and some tests for this are wrong because they use e.g. "print" in
some higher-level language which'll always split stuff into writes of
1024 or something.

> Here's my attempt at a test setup. This C program forces two processes
> to write simultaneously to the same file with O_APPEND:
>
> -- >8 --
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> static void doit(int size, const char *fn, char c)
> {
>   int fd;
>   char *buf;
>
>   fd = open(fn, O_WRONLY|O_APPEND|O_CREAT, 0666);
>   if (fd < 0) {
>   perror("open");
>   return;
>   }
>
>   buf = malloc(size);
>   memset(buf, c, size);
>
>   while (1)
>   write(fd, buf, size);
> }
>
> int main(int argc, const char **argv)
> {
>   int size = atoi(argv[1]);
>
>   if (fork())
>   doit(size, argv[2], '1');
>   else
>   doit(size, argv[2], '2');
>   return 0;
> }
> -- 8< --
>
> and then this program checks that we saw atomic units of the correct
> size:
>
> -- >8 --
> #include 
> #include 
> #include 
>
> int main(int argc, const char **argv)
> {
>   int size = atoi(argv[1]);
>   char *buf;
>
>   buf = malloc(size);
>   while (1) {
>   int i;
>   /* assume atomic reads, i.e., no signals */
>   int r = read(0, buf, size);
>   if (!r)
>   break;
>   for (i = 1; i < size; i++) {
>   if (buf[i] != buf[0]) {
>   fprintf(stderr, "overlap\n");
>   return 1;
>   }
>   }
>   }
>   return 0;
> }
> -- 8< --
>
> And then you can do something like:
>
>   for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do
> >out ;# clean up from last run
> echo "Trying $size..."
> timeout 5 ./write $size out
> if ! ./check $sizeecho "$size failed"
>   break
> fi
>   done
>
> On my Linux system, each of those seems to write several gigabytes
> without overlapping. I did manage to hit some failing cases, but they
> were never sheared writes, but rather cases where there was an
> incomplete write at the end-of-file.

Yeah I can't make that fail experimentally either, except in the case
you mentioned. I also checked on a NetBSD & OpenBSD and OpenBSD box I
have access to, same thing.

> So obviously this is all a bit of a tangent. I'd be fine declaring that
> trace

Re: Contributor Summit planning

2018-08-14 Thread Ævar Arnfjörð Bjarmason



On Tue, Aug 14 2018, Randall S. Becker wrote:

> On August 14, 2018 2:53 AM, Elijah Newren wrote:
>> On Mon, Aug 13, 2018 at 10:27 AM Jeff King  wrote:
>> >
>> > For the past several years, we've held a Git Contributor Summit as
>> > part of the Git Merge conference. I'd like to get opinions from the
>> > community to help plan future installments. Any feedback or opinion is
>> > welcome, but some obvious things to think about:
>> >
>> >   - where, when, and how often?
>> >
>> > Plans are shaping up to have Git Merge 2019 in Brussels right after
>> > FOSDEM in February (like it was two years ago), with a contributor
>> > summit attached.
>> >
>> > Are there people who would be more likely to attend a contributor
>> > summit if it were held elsewhere (e.g., in North America, probably
>> > in the Bay Area)? Are people interested in attending a separate
>> > contributor summit not attached to the larger Git Merge (and if so,
>> > is there any other event it might be worth connecting it with,
>> > time-wise)?  Are people interested in going to two summits in a year
>> > (e.g., Brussels in February, and then maybe some in North America
>> > later in the year), or is that diminishing returns?
>>
>> Convincing my employer to send me to an event in North America is a lot
>> easier than one in Europe; they mostly allow me to work on git stuff as a 
>> side
>> project just to make me happy rather than as a business priority, so
>> competing business interests, shifting managers, etc. make things hard for
>> me to predict (so you may want to weight my preferences less than normal).
>>
>> My last manger did say they'd send me to the next contributor summit (I
>> think even if it ended up being in Europe rather than North America), but of
>> course, he was pulled to a different team a few months ago, so I'm not sure
>> if that still stands.
>>
>>
>> On a personal note, I'm also somewhat travel averse.  It'd be nice to go to a
>> Git conference again (last and only I went to was I think Git Together 2011),
>> but I know when it comes close to time to actually travel, I'll start
>> questioning my sanity when I said that -- particularly if it's far away or 
>> at all
>> complicated.  (So maybe you really ought to discount my preferences...)
>
> Unrelated directly to above, I noticed that I actually showed up on
> the list for 2018 based on git log anyway. Does this mean I'd be
> welcome? Personally, it's actually easier to get approval to travel to
> Brussels now than SFO even though the latter is closer. With that in
> mind, I can do either (or both - depending on scheduling).

I very much regret sending that 'git log' command without some further
explanation.

It was only meant as a *very* rough shortlist of people in the context
of a discussion of why some active contributors don't come to the
contributor summit. I.e. whether that's because of the location, timing
or whatever.

Any output from such a command definitely doesn't mean "you shouldn't
come to the contributor summit because this one-liner doesn't list
you".

I only meant to suggest that it would be interesting as a heuristic to
solicit feedback from people who *are* very active contributors to the
main git project who don't show up, to see why that is. Only because it
might be indicative of why some people who'd otherwise don't show up
don't show up.

Re: Contributor Summit planning

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 09:25:41AM -0400, Randall S. Becker wrote:

> Unrelated directly to above, I noticed that I actually showed up on
> the list for 2018 based on git log anyway. Does this mean I'd be
> welcome? Personally, it's actually easier to get approval to travel to
> Brussels now than SFO even though the latter is closer. With that in
> mind, I can do either (or both - depending on scheduling).

You'd definitely be welcome.

The point of having it "only developers" is mostly to keep the numbers
at a point where we can all sit around and have round-table discussions.
I don't think there are so many people at the fringe of "well, I only
have a few commits, is that enough?" for us to need to make any kind of
serious cut-off there.

What I think we want to avoid is random folks in the "I use Git, and it
would be neat to see people talk about it" camp. It would be nice to
accommodate that (and it might even manage to suck somebody into working
on the project). But that opens up a much larger pool of people, and if
(say) a hundred of them want to come, that wrecks the intimate
round-table approach.

That's all just my opinion, of course. I'm open to suggestions.

-Peff

Re: Contributor Summit planning

2018-08-14 Thread Duy Nguyen

On Mon, Aug 13, 2018 at 11:06 PM Jeff King  wrote:
>
> On Mon, Aug 13, 2018 at 01:41:51PM -0700, Stefan Beller wrote:
>
> > > Oh, using "git shortlog" might be also simpler ;-)
> >
> > I guess you'd need to memorize a different set of flags for that
> > as without -s it would be harder to parse than the oneliner above.
>
> I frequently using "git shortlog -ns" to see who is active (especially
> coupled with "--since=".
>
> I also use "--no-merges", because it makes me look a lot better when
> compared relatively to Junio. :)

--no-merges makes me number one. Not sure if I should laugh or cry :D

Going off topic a bit, can we count the number of topics of each
contributor? I could do it by decorating git log with remote refs from
Junio's repo and counting based on the two-letter prefix in the
topic/ref name but that's too hacky. fyi Jeff you're up to  second
position now with 34 topics (I'm unfortunately still the first with
38).
-- 
Duy

Re: Contributor Summit planning

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 04:06:23PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I very much regret sending that 'git log' command without some further
> explanation.
> 
> It was only meant as a *very* rough shortlist of people in the context
> of a discussion of why some active contributors don't come to the
> contributor summit. I.e. whether that's because of the location, timing
> or whatever.
> 
> Any output from such a command definitely doesn't mean "you shouldn't
> come to the contributor summit because this one-liner doesn't list
> you".

Amen.

> I only meant to suggest that it would be interesting as a heuristic to
> solicit feedback from people who *are* very active contributors to the
> main git project who don't show up, to see why that is. Only because it
> might be indicative of why some people who'd otherwise don't show up
> don't show up.

I've bugged people privately in the past to see if they want to come,
and I think the limiting factor is usually just time/effort to travel.

-Peff

Re: Contributor Summit planning

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 06:31:50AM +0200, Christian Couder wrote:

> > We have been kicking around the thought of reviving the GitTogethers
> > like back in the olden days (I only know them from hearsay), in
> > Mountain View or Sunnyvale at the Google Campus, but we have not yet
> > spent enough thought to make it so.
> 
> I think it would be great to have GitTogethers again around the time
> of the GSoC Mentor Summit like we did a long time ago!

Yeah, that's an interesting concept. In addition to amortizing travel
for one or maybe two Git devs who are mentors, it also allowed us to
pull in other open source folks who were tangential to Git (e.g., I
remember Joey Hess of git-annex fame came one year).

On the other hand, we can only send two mentors to the summit, so there
is no draw at all for the rest of the folks. :)

Timing-wise, it may be getting a little close to plan this year, as it's
exactly 2 months away (and I'd think many people, especially coming from
Europe, would already have made travel plans). We could start next year,
but that's 14 months away.

> > Personally I think the way is fine; we could collect topics in advance on
> > the list to have a head start, but the whiteboard is totally fine, IMHO.
> 
> Yeah for the GitTogethers we used to collect topics in advance, but we
> still had a whiteboard and voted on them at the beginning of the
> actual GitTogether.

Heh. Every year I ask for topics on the list, and the response leads me
to believe that people are happy white-boarding in the morning. ;)

-Peff

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Junio C Hamano

Jeff Hostetler  writes:

> On 8/13/2018 3:02 PM, Johannes Sixt wrote:
>>
>> Fortunately, Windows does support atomic O_APPEND semantics using the
>> file access mode FILE_APPEND_DATA. Provide an implementation that does.
>> ...
>>
>> Diagnosed-by: Johannes Schindelin 
>> Helped-by: Jeff Hostetler 
>> Signed-off-by: Johannes Sixt 
>> ---
>
> This looks good.  Thanks for following up on this.

Thanks, all.  Will queue.

Re: Contributor Summit planning

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 04:30:09PM +0200, Duy Nguyen wrote:

> > I frequently using "git shortlog -ns" to see who is active (especially
> > coupled with "--since=".
> >
> > I also use "--no-merges", because it makes me look a lot better when
> > compared relatively to Junio. :)
> 
> --no-merges makes me number one. Not sure if I should laugh or cry :D

Since when? Junio still has everyone beat for all time, though of course
he cheats with easy ones like "update version field to v2.17.1". :)

I also sometimes look at "shortlog -ns --no-merges v2.17.0..v2.18.0" and
so on (i.e., each major release). I had a good run from about v2.10 to
v2.15, but I've been slipping since then.

> Going off topic a bit, can we count the number of topics of each
> contributor? I could do it by decorating git log with remote refs from
> Junio's repo and counting based on the two-letter prefix in the
> topic/ref name but that's too hacky. fyi Jeff you're up to  second
> position now with 34 topics (I'm unfortunately still the first with
> 38).

One problem there is that the prefixes are ambiguous (e.g., Jacob Keller
shares with me, and I think at least one other over the years). You
could look at the author of the tip commit, but that's not always right
(and in fact, counting just merged topics misses bug-fixes that get
applied directly on top of other people's topics). And of course there's
the notion that "topic" might be a documentation typo fix, or it might
be the entire range-diff program.

I think "surviving lines" is another interesting metric, though it also
has flaws (if I s/sha1/oid/ on your line, it becomes my line; even
though my change is useful and should be counted, it's probably not as
important as whatever the code was doing in the first place).

-Peff

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 03:47:54PM +0200, Ævar Arnfjörð Bjarmason wrote:

> The relevant POSIX docs are here:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
> 
> Write requests of {PIPE_BUF} bytes or less shall not be interleaved
> with data from other processes doing writes on the same pipe. Writes
> of greater than {PIPE_BUF} bytes may have data interleaved, on
> arbitrary boundaries, with writes by other processes, whether or not
> the O_NONBLOCK flag of the file status flags is set.

Right, this is the part I've seen, but it's pretty clearly only about
pipes, not regular files.

> > Certainly atomic writes to _pipes_ are determined by PIPE_BUF (which
> > IIRC is not even a constant on Linux, but can be changed at run-time).
> > But is it relevant for regular-file writes?
> 
> I believe it's hardcoded at PIPE_BUF which is defined as PAGE_SIZE on
> linux. I think you may be thinking of /proc/sys/fs/pipe-max-pages which
> is the number of pages that a pipe will take before filling up, but I
> may be wrong.

Yeah, you're probably right.

> > So obviously this is all a bit of a tangent. I'd be fine declaring that
> > trace output is generally small enough not to worry about this in the
> > first place. But those results show that it shouldn't matter even if
> > we're writing 1MB trace lines on Linux. I wouldn't be at all surprised
> > to see different results on other operating systems, though.
> 
> I don't know how this works internally in the kernel, but I'd be very
> careful to make that assertion. Most likely this in practice depends on
> what FS you're using, its mount options, whether fsync() is involved,
> I/O pressure etc.

Definitely it depends on the filesystem (and I'm pretty sure that at
least old versions of NFS could not possibly do O_APPEND correctly,
because the protocol did not support an atomic seek+write).

I agree that the experiment I did doesn't really tell us anything for
sure. It _seems_ to work, but the machine was not under any kind of
memory or I/O pressure.

I'd feel pretty confident that writes under a page are always going to
be fine, but anything else is "seems to work". To me that's enough for
tracing, as the absolute worst case is jumbled output, not an on-disk
corruption.

> FWIW this is something I've ran into in the past on Linux as a practical
> matter, but that was many kernel versions ago, so maybe the semantics
> changed.
> 
> We had an ad-hoc file format with each chunk a " marker>\n" format (and the  was
> guaranteed not to contain "\n"). These would be written to the same file
> by N workers. We would get corrupt data because of this in cases where
> we were writing more than PIPE_BUF, e.g. start markers for unrelated
> payloads interleaved with content, lines that were incorrectly formed
> etc.

Interesting. I wonder if it is because of PIPE_BUF, or it is simply the
page size, which also happens to be the value of PIPE_BUF.

> But yeah, whether this is a practical concern for git trace output is
> another matter. I just wanted to chime in to note that atomic appends to
> files are only portable on POSIX up to PIPE_BUF.

I still think POSIX doesn't say anything either way here. The PIPE_BUF
bits are _just_ about pipes. At any rate, I think we have a decent
handle on what systems actually _do_, which is more important than POSIX
anyway.

-Peff

Re: [PATCH v6 11/21] range-diff: add tests

2018-08-14 Thread Johannes Schindelin

Hi Thomas,

On Mon, 13 Aug 2018, Thomas Gummerer wrote:

> On 08/13, Thomas Rast via GitGitGadget wrote:
> > From: Thomas Rast 
> > 
> > These are essentially lifted from https://github.com/trast/tbdiff, with
> > light touch-ups to account for the command now being named `git
> > range-diff`.
> > 
> > Apart from renaming `tbdiff` to `range-diff`, only one test case needed
> > to be adjusted: 11 - 'changed message'.
> > 
> > The underlying reason it had to be adjusted is that diff generation is
> > sometimes ambiguous. In this case, a comment line and an empty line are
> > added, but it is ambiguous whether they were added after the existing
> > empty line, or whether an empty line and the comment line are added
> > *before* the existing empty line. And apparently xdiff picks a different
> > option here than Python's difflib.
> >
> 
> Just noticed while reading the whole series again (hopefully for the
> last time :)), do we need Thomas Rast's Sign-off here, as he is
> credited as the author here? 

Hmm. I hoped that my commit message was enough to indicate that while he
is the author, I assembled this. Maybe I should move him to the footer, as
an Original-Authored-By:?

Junio?

Ciao,
Dscho
> 
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  t/.gitattributes   |   1 +
> >  t/t3206-range-diff.sh  | 145 ++
> >  t/t3206/history.export | 604 +
> >  3 files changed, 750 insertions(+)
> >  create mode 100755 t/t3206-range-diff.sh
> >  create mode 100644 t/t3206/history.export
> > 
> > diff --git a/t/.gitattributes b/t/.gitattributes
> > index 3bd959ae5..b17bf71b8 100644
> > --- a/t/.gitattributes
> > +++ b/t/.gitattributes
> > @@ -1,6 +1,7 @@
> >  t[0-9][0-9][0-9][0-9]/* -whitespace
> >  /diff-lib/* eol=lf
> >  /t0110/url-* binary
> > +/t3206/* eol=lf
> >  /t3900/*.txt eol=lf
> >  /t3901/*.txt eol=lf
> >  /t4034/*/* eol=lf
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > new file mode 100755
> > index 0..2237c7f4a
> > --- /dev/null
> > +++ b/t/t3206-range-diff.sh
> > @@ -0,0 +1,145 @@
> > +#!/bin/sh
> > +
> > +test_description='range-diff tests'
> > +
> > +. ./test-lib.sh
> > +
> > +# Note that because of the range-diff's heuristics, test_commit does more
> > +# harm than good.  We need some real history.
> > +
> > +test_expect_success 'setup' '
> > +   git fast-import < "$TEST_DIRECTORY"/t3206/history.export
> > +'
> > +
> > +test_expect_success 'simple A..B A..C (unmodified)' '
> > +   git range-diff --no-color master..topic master..unmodified \
> > +   >actual &&
> > +   cat >expected <<-EOF &&
> > +   1:  4de457d = 1:  35b9b25 s/5/A/
> > +   2:  fccce22 = 2:  de345ab s/4/A/
> > +   3:  147e64e = 3:  9af6654 s/11/B/
> > +   4:  a63e992 = 4:  2901f77 s/12/B/
> > +   EOF
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'simple B...C (unmodified)' '
> > +   git range-diff --no-color topic...unmodified >actual &&
> > +   # same "expected" as above
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'simple A B C (unmodified)' '
> > +   git range-diff --no-color master topic unmodified >actual &&
> > +   # same "expected" as above
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'trivial reordering' '
> > +   git range-diff --no-color master topic reordered >actual &&
> > +   cat >expected <<-EOF &&
> > +   1:  4de457d = 1:  aca177a s/5/A/
> > +   3:  147e64e = 2:  14ad629 s/11/B/
> > +   4:  a63e992 = 3:  ee58208 s/12/B/
> > +   2:  fccce22 = 4:  307b27a s/4/A/
> > +   EOF
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'removed a commit' '
> > +   git range-diff --no-color master topic removed >actual &&
> > +   cat >expected <<-EOF &&
> > +   1:  4de457d = 1:  7657159 s/5/A/
> > +   2:  fccce22 < -:  --- s/4/A/
> > +   3:  147e64e = 2:  43d84d3 s/11/B/
> > +   4:  a63e992 = 3:  a740396 s/12/B/
> > +   EOF
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'added a commit' '
> > +   git range-diff --no-color master topic added >actual &&
> > +   cat >expected <<-EOF &&
> > +   1:  4de457d = 1:  2716022 s/5/A/
> > +   2:  fccce22 = 2:  b62accd s/4/A/
> > +   -:  --- > 3:  df46cfa s/6/A/
> > +   3:  147e64e = 4:  3e64548 s/11/B/
> > +   4:  a63e992 = 5:  12b4063 s/12/B/
> > +   EOF
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'new base, A B C' '
> > +   git range-diff --no-color master topic rebased >actual &&
> > +   cat >expected <<-EOF &&
> > +   1:  4de457d = 1:  cc9c443 s/5/A/
> > +   2:  fccce22 = 2:  c5d9641 s/4/A/
> > +   3:  147e64e = 3:  28cc2b6 s/11/B/
> > +   4:  a63e992 = 4:  5628ab7 s/12/B/
> > +   EOF
> > +   test_cmp expected actual
> > +'
> > +
> > +test_expect_success 'new base, B...C' '
> > +   # this syntax includes the commits from master!
> > +   git range-diff --no-color topic...rebased >actual &&
> > +   cat >expected <<-EOF &&
> > +   -:  --- > 1:  a31b12e unre

Re: [PATCH v6 11/21] range-diff: add tests

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 04:53:51PM +0200, Johannes Schindelin wrote:

> > > These are essentially lifted from https://github.com/trast/tbdiff, with
> > > light touch-ups to account for the command now being named `git
> > > range-diff`.
> [...]
> > Just noticed while reading the whole series again (hopefully for the
> > last time :)), do we need Thomas Rast's Sign-off here, as he is
> > credited as the author here? 
> 
> Hmm. I hoped that my commit message was enough to indicate that while he
> is the author, I assembled this. Maybe I should move him to the footer, as
> an Original-Authored-By:?

I think the "Author" field is actually distinct from the copyright
provenance. In this case it ought to be perfectly fine to add your
signed-off-by under the DCO's point b:

  The contribution is based upon previous work that, to the best
  of my knowledge, is covered under an appropriate open source
  license and I have the right under that license to submit that
  work with modifications [...]

This is based on the tests in tbdiff, which is explicitly GPL'd by
Thomas. So your signoff certifies that, which is fine.

As for the author field, IMHO it serves two purposes:

  - to give credit where it is due

  - so that people digging in history know who to contact for
questions/problems

In this case it probably makes sense for it to be you, as you'd take
responsibility for the code in _this_ project. And as you note, you can
give credit in the commit message (the only unfortunate thing is that
most automated statistics would not credit Thomas, but in theory they
could by mentioning him in the trailer).

-Peff

Re: [PATCHv2 0/8] Resending sb/range-diff-colors

2018-08-14 Thread Johannes Schindelin

Hi Stefan,

On Mon, 13 Aug 2018, Stefan Beller wrote:

> This applies on top of js/range-diff (a7be92acd9600), this version 
> * resolves a semantic conflict in the test
>   (Did range-diff change its output slightly?)

If by semantic conflict you mean...

> 23:  6a1c7698c68 ! 23:  5701745379b t3206: add color test for range-diff 
> --dual-color
> @@ -23,7 +23,7 @@
>  +: s/4/A/
>  +: 
>  +:+Also a silly comment 
> here!
> -+:+
> ++:+

... this, then I do not understand. This looks like a straight-up change
in your commit. v1 of this patch did not append  to the end,
while v2 apparently does.

And it looks a bit funny, indeed.

In any case, from what I see you indeed addressed all my comments (the
need_reset was done differently than I suggested, but still okay).

Ciao,
Dscho

>  +: diff --git a/file b/file
>  +: --- a/file
>  +: +++ b/file
> 24:  7e12ece1d34 = 24:  4e56f63a5f5 diff.c: simplify caller of emit_line_0
> 25:  74dabd6d36f ! 25:  cf126556940 diff.c: reorder arguments for 
> emit_line_ws_markup
> @@ -3,7 +3,8 @@
>  diff.c: reorder arguments for emit_line_ws_markup
>  
>  The order shall be all colors first, then the content, flags at the 
> end.
> -The colors are in order.
> +The colors are in the order of occurrence, i.e. first the color for 
> the
> +sign and then the color for the rest of the line.
>  
>  Signed-off-by: Stefan Beller 
>  Signed-off-by: Junio C Hamano 
> 26:  e304e15aa6b ! 26:  69008364cb8 diff.c: add set_sign to emit_line_0
> @@ -2,14 +2,17 @@
>  
>  diff.c: add set_sign to emit_line_0
>  
> -For now just change the signature, we'll reason about the actual
> -change in a follow up patch.
> +Split the meaning of the `set` parameter that is passed to
> +emit_line_0()` to separate between the color of the "sign" (i.e.
> +the diff marker '+', '-' or ' ' that is passed in as the `first`
> +parameter) and the color of the rest of the line.
>  
> -Pass 'set_sign' (which is output before the sign) and 'set' which
> -controls the color after the first character. Hence, promote any
> -'set's to 'set_sign' as we want to have color before the sign
> -for now.
> +This changes the meaning of the `set` parameter to no longer refer
> +to the color of the diff marker, but instead to refer to the color
> +of the rest of the line. A value of `NULL` indicates that the rest
> +of the line wants to be colored the same as the diff marker.
>  
> +Helped-by: Johannes Schindelin 
>  Signed-off-by: Stefan Beller 
>  Signed-off-by: Junio C Hamano 
>  
> @@ -30,12 +33,15 @@
>   if (reverse && want_color(o->use_color))
>   fputs(GIT_COLOR_REVERSE, file);
>  -fputs(set, file);
> -+if (set_sign && set_sign[0])
> ++if (set_sign)
>  +fputs(set_sign, file);
>   if (first && !nofirst)
>   fputc(first, file);
> -+if (set)
> ++if (set && set != set_sign) {
> ++if (set_sign)
> ++fputs(reset, file);
>  +fputs(set, file);
> ++}
>   fwrite(line, len, 1, file);
>   fputs(reset, file);
>   }
> 27:  8d935d5212c <  -:  --- diff: use emit_line_0 once per line
> 28:  2344aac787a <  -:  --- diff.c: compute reverse locally in 
> emit_line_0
>  -:  --- > 27:  5d2629281a1 diff: use emit_line_0 once per line
>  -:  --- > 28:  5240e94a69a diff.c: omit check for line prefix in 
> emit_line_0
> 29:  4dc97b54a35 ! 29:  058e03a1601 diff.c: rewrite emit_line_0 more 
> understandably
> @@ -2,21 +2,15 @@
>  
>  diff.c: rewrite emit_line_0 more understandably
>  
> -emit_line_0 has no nested conditions, but puts out all its arguments
> -(if set) in order. There is still a slight confusion with set
> -and set_sign, but let's defer that to a later patch.
> +Rewrite emit_line_0 to have fewer (nested) conditions.
>  
> -'first' used be output always no matter if it was 0, but that got 
> lost
> -at "diff: add an internal option to dual-color diffs of diffs",
> -2018-07-21), as there we broadened the meaning of 'first' to also
> -signal an early return.
> -
> -The change in 'emit_line' makes sure that 'first' is never content, 
> but
> -always under our control, a sign or special character in the 
> beginning
> -of the line (or 0, in which case

Re: [PATCH v6 11/21] range-diff: add tests

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 11:03:10AM -0400, Jeff King wrote:

> > Hmm. I hoped that my commit message was enough to indicate that while he
> > is the author, I assembled this. Maybe I should move him to the footer, as
> > an Original-Authored-By:?
> 
> I think the "Author" field is actually distinct from the copyright
> provenance. In this case it ought to be perfectly fine to add your
> signed-off-by under the DCO's point b:
> 
>   The contribution is based upon previous work that, to the best
>   of my knowledge, is covered under an appropriate open source
>   license and I have the right under that license to submit that
>   work with modifications [...]
> 
> This is based on the tests in tbdiff, which is explicitly GPL'd by
> Thomas. So your signoff certifies that, which is fine.
> 
> As for the author field, IMHO it serves two purposes:
> 
>   - to give credit where it is due
> 
>   - so that people digging in history know who to contact for
> questions/problems
> 
> In this case it probably makes sense for it to be you, as you'd take
> responsibility for the code in _this_ project. And as you note, you can
> give credit in the commit message (the only unfortunate thing is that
> most automated statistics would not credit Thomas, but in theory they
> could by mentioning him in the trailer).

One thing I should have made clear: this is all my opinion, and anything
Thomas expresses trumps that. But since he hasn't been active lately,
this is all what I would do in the absence of input from him. Obviously
a sign-off from him is better than none. :)

-Peff

Re: [PATCH 1/1] mark_colliding_entries(): fix incorrect #if...#endif guard

2018-08-14 Thread Junio C Hamano

"Johannes Schindelin via GitGitGadget" 
writes:

> From: Johannes Schindelin 
>
> The way the guard was put, the code was declaring an unused variable on
> Windows. No need to do that, so let's fix it.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  entry.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/entry.c b/entry.c
> index c70340df8..2bce13352 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -402,11 +402,9 @@ static int check_path(const char *path, int len, struct 
> stat *st, int skiplen)
>  static void mark_colliding_entries(const struct checkout *state,
>  struct cache_entry *ce, struct stat *st)
>  {
> +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>   int i;
>  
> - ce->ce_flags |= CE_MATCHED;
> -
> -#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */
>   for (i = 0; i < state->istate->cache_nr; i++) {
>   struct cache_entry *dup = state->istate->cache[i];
>  
> @@ -422,6 +420,8 @@ static void mark_colliding_entries(const struct checkout 
> *state,
>   }
>   }
>  #endif
> +
> + ce->ce_flags |= CE_MATCHED;
>  }

Even though it looks a bit odd to smudge 'ce' itself after the loop,
I think this would not break anything, simply because the loop does
treat the 'ce' as special and stops the iteration once it sees it,
even before its MATCHED bit is set.

Duy, if you are going to send a new version updated for other
reasons, please squash this in.  In the meantime, I'll queue it on
top of what we have right now.

Thanks, both.

Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jeff Hostetler





On 8/13/2018 2:14 PM, Matthew DeVore wrote:

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
  Documentation/rev-list-options.txt |  3 ++
  list-objects-filter-options.c  |  4 +++
  list-objects-filter-options.h  |  1 +
  list-objects-filter.c  | 50 ++
  t/t5317-pack-objects-filter-objects.sh | 27 ++
  t/t5616-partial-clone.sh   | 27 ++
  t/t6112-rev-list-filters-objects.sh| 13 +++
  7 files changed, 125 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..9e351ec2a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,9 @@ specification contained in .
A debug option to help with future "partial clone" development.
This option specifies how missing objects are handled.
  +
+The form '--filter=tree:' omits all blobs and trees deeper than
+ from the root tree. Currently, only =0 is supported.
++
  The form '--missing=error' requests that rev-list stop with an error if
  a missing object is encountered.  This is the default action.
  +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
return 0;
}
  
+	} else if (!strcmp(arg, "tree:0")) {

+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
return d;
  }
  
+/*

+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   die("unknown filter_situation");
+   return LOFR_ZERO;
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, &obj->oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}


There are a couple of options here:
[] If really want to omit all trees and blobs (and we DO NOT want
   the oidset of everything omitted), then we might be able to
   shortcut the traversal and speed things up.

   {} add a LOFR_SKIP_TREE bit to list_objects_filter_result
   {} test this bit process_tree() and avoid the init_tree_desc() and
  the while loop and some adjacent setup/tear-down code.
   {} make this filter something like:

case LOFS_BEGIN_TREE:
if (filter_data->omits) {
oidset_insert(filter_data->omits, &obj->oid);
return LOFR_MARK_SEEN; /* ... (hard omit) */
} else

Re: [PATCH v6 11/21] range-diff: add tests

2018-08-14 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hmm. I hoped that my commit message was enough to indicate that while he
> is the author, I assembled this. Maybe I should move him to the footer, as
> an Original-Authored-By:?
>
> Junio?

I think the log message gives a clear enough statement to credit the
original author.  Sign-off is not about credit, but is about making
sure we know the provenance of the contribution we would use in our
codebase, so it would be nice to have, but lifting code from another
project (i.e. TRast's tbdiff) that is appropriately licensed (GPLv2)
verbatim is something you can do with your own sign-off, without the
original author's sign-off, so I think what we have is good.

Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan

> I don't understand about the ">= 0". What should I replace it with?
> Maybe you mean the return is never positive so I can change:
> 
> parse_tree_gently(tree, 1) >= 0
> 
> to:
> !parse_tree_gently(tree, 1)
> 
> ?

Sorry for the lack of clarity - that is what I meant.

> > The missing mechanism (for error, allow-any, print) should work without
> > needing to consult whether an object is a promisor object or not - it
> > should just print whatever is missing, so the "if
> > (!is_promisor_object..." line looks out of place.
> Done. I considered that a missing object which is not a promisor is a
> serious error, so I had it die here.

It is a serious error, but as far as I can tell, that is what the
--missing flags are supposed to help diagnose (so we can't die since we
need the diagnoses to be printed). See, for example, 'rev-list W/
--missing=print' in t6112 - the "r1" repository does not have partial
clone enabled (which I verified by inserting a test_pause then cat-ting
r1/.git/config), but nothing dies.

> But now that I've added the
> do_not_die_on_missing_tree flag, it's more natural to keep the
> previous promisor check as-is.

OK, I'll take a look once you send out v4.

> Also, is_promisor_object is an
> expensive check, and it would be better to skip it during the common
> execution path (which should be when exclude_promisor_objects, an
> internal-use-only flag, is *not* set, which means we never call
> is_promisor_object.

That's true.

> > In my original review [1], I suggested that we always show a tree if we
> > have its hash - if we don't have the object, we just recurse into it.
> > This would be the same as your patch, except that the 'die("bad tree
> > object...' is totally removed instead of merely moved. I still think
> > this solution has some merit - all the tests still pass (except that we
> > need to check for "unable to read" instead of "bad tree object" in error
> > messages), but I just realized that it might still be backwards
> > incompatible in that a basic "rev-list --objects" would now succeed
> > instead of fail if a tree was missing (I haven't tested this though).
> The presence of the die if !is_promisor_object is what justified the
> changing of the parse_tree_gently to always be gently, since it is
> what showed the OID. Can we really remove both? Maybe in a different
> patch set, since I'm no longer touching that line?

That's true - the idea of removing both needs more thought, and if we
were to do so, we definitely can do it in a different patch set.

> > We might need a flag called "do_not_die_on_missing_tree" (much like your
> > original idea of "show_missing_trees") so that callers that are prepared
> > to deal with missing trees can set this. Sorry for the churn. You can
> > document it as such:
> Added it, but not with a command-line flag, only in rev-info.h. We can
> always  add a flag later if people have been relying on the existing
> behavior of git rev-list to balk at missing trees. (That seems
> unlikely though, considering there is no filter to enable that before
> this patchset).

By flag, I indeed meant in rev-info.h - sorry for the confusion. That
sounds good.

Re: [PATCH] doc: git-describe

2018-08-14 Thread Junio C Hamano

William Pursell  writes:

> commit cc4bd5268b2dbe90279198acb400a528ee23f5d5
> Author: William Pursell 
> Date:   Tue Aug 14 06:41:00 2018 -0600
>
> doc: Reference  for --dirty/--broken
>

Thanks for trying to help.  Overall the resulting text does look
much better than the original, but

 - "git show" output is for your own consumption, it is not
   something you would feed other people (see patches from other
   people on the list,
   https://public-inbox.org/git/20180814110525.17801-7-...@ao2.it/raw
   for example)

 - Please sign-off your patch (see Documentation/SubmittingPatches)

 - The patch is seriously whitespace damaged and does not apply.
   Please first try sending it to yourself and make sure it applies
   to the commit your change was based on, before sending it to the
   list.

Thanks.

> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
> index e027fb8c4b..f7d67306c0 100644
> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -39,11 +39,12 @@ OPTIONS
>  --broken[=]::
>   Describe the state of the working tree.  When the working
>   tree matches HEAD, the output is the same as "git describe
> - HEAD".  If the working tree has local modification "-dirty"
> - is appended to it.  If a repository is corrupt and Git
> - cannot determine if there is local modification, Git will
> - error out, unless `--broken' is given, which appends
> - the suffix "-broken" instead.
> + HEAD".  If the working tree has local modification, "-dirty"
> + (or , if supplied) is appended to it.  If a repository
> + is corrupt and Git cannot determine if there is local
> + modification, Git will error out unless `--broken' is given
> + in which case it will append the suffix "-broken" (or ,
> + if supplied) instead.
>
>  --all::
>   Instead of using only the annotated tags, use any ref

Re: [PATCHv2 0/8] Resending sb/range-diff-colors

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 8:05 AM Johannes Schindelin
 wrote:
>
> Hi Stefan,
>
> On Mon, 13 Aug 2018, Stefan Beller wrote:
>
> > This applies on top of js/range-diff (a7be92acd9600), this version
> > * resolves a semantic conflict in the test
> >   (Did range-diff change its output slightly?)
>
> If by semantic conflict you mean...
>
> > 23:  6a1c7698c68 ! 23:  5701745379b t3206: add color test for range-diff 
> > --dual-color
> > @@ -23,7 +23,7 @@
> >  +: s/4/A/
> >  +: 
> >  +:+Also a silly comment 
> > here!
> > -+:+
> > ++:+
>
> ... this, then I do not understand. This looks like a straight-up change
> in your commit. v1 of this patch did not append  to the end,
> while v2 apparently does.

Yes, I did mean this. Both before and now the test passes on top of
 js/range-diff (which was resend), so I concluded that the logic in
range-diff itself
has changed. However this is changed in a later patch to not appear any more.

>
> In any case, from what I see you indeed addressed all my comments (the
> need_reset was done differently than I suggested, but still okay).

Thanks.

So unless I hear from other people I consider this series done.
(just like range-diff is not going to be resend but built on top now)

Thanks,
Stefan

Re: Contributor Summit planning

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 7:47 AM Jeff King  wrote:

>
> One problem there is that the prefixes are ambiguous (e.g., Jacob Keller
> shares with me, and I think at least one other over the years). You
> could look at the author of the tip commit, but that's not always right
> (and in fact, counting just merged topics misses bug-fixes that get
> applied directly on top of other people's topics). And of course there's
> the notion that "topic" might be a documentation typo fix, or it might
> be the entire range-diff program.

One could take all topics and see if you have at least one commit in there.
But that would mostly measure how much of an allrounder you are in the
code base (e.g. bug or style fixes such as Ramsay's "please squash this"
would be in many topics if not squashed).

There are other players who are very deep into one area of the code,
and probably have fewer series.

> I think "surviving lines" is another interesting metric, though it also
> has flaws (if I s/sha1/oid/ on your line, it becomes my line; even
> though my change is useful and should be counted, it's probably not as
> important as whatever the code was doing in the first place).

I wonder if we could measure the entropy added to the code base instead.
A patch that does s/sha1/oid/ (or introduction of a repository argument)
might compress very well, whereas new code might not compress very
well. ;-)

Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up

2018-08-14 Thread Brandon Williams

On 08/14, Antonio Ospite wrote:
> Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with
> invalid lines in .gitmodules but then only the second commit is removed.
> 
> This may affect future subsequent tests if they assume that the
> .gitmodules file has no errors.
> 
> Remove both the commits as soon as they are not needed anymore.
> 
> The error introduced in test 5 is also required by test 6, so the two
> commits from above are removed respectively in tests 6 and 8.

Thanks for cleaning this up.  We seem to have a habit for leaving
testing state around for longer than is necessary which makes it a bit
more difficult to read and understand when looking at it later.  What
would really be nice is if each test was self-contained...course that
would take a herculean effort to realize in our testsuite so I'm not
suggesting you do that :)

> 
> Signed-off-by: Antonio Ospite 
> ---
>  t/t7411-submodule-config.sh | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> index 0bde5850ac..c6b6cf6fae 100755
> --- a/t/t7411-submodule-config.sh
> +++ b/t/t7411-submodule-config.sh
> @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets 
> continue' '
>  '
>  
>  test_expect_success 'error message contains blob reference' '
> + # Remove the error introduced in the previous test.
> + # It is not needed in the following tests.
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   sha1=$(git rev-parse HEAD) &&
>   test-tool submodule-config \
> @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' '
>  '
>  
>  test_expect_success 'error in history in fetchrecursesubmodule lets 
> continue' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   git config -f .gitmodules \
>   submodule.submodule.fetchrecursesubmodules blabla &&
> @@ -134,8 +138,7 @@ test_expect_success 'error in history in 
> fetchrecursesubmodule lets continue' '
>   HEAD b \
>   HEAD submodule \
>   >actual &&
> - test_cmp expect_error actual  &&
> - git reset --hard HEAD^
> + test_cmp expect_error actual
>   )
>  '
>  
> -- 
> 2.18.0
> 

-- 
Brandon Williams

Re: [PATCH v3 4/7] submodule--helper: add a new 'config' subcommand

2018-08-14 Thread Brandon Williams

On 08/14, Antonio Ospite wrote:
> Add a new 'config' subcommand to 'submodule--helper', this extra level
> of indirection makes it possible to add some flexibility to how the
> submodules configuration is handled.
> 
> Signed-off-by: Antonio Ospite 
> ---
>  builtin/submodule--helper.c | 14 ++

>  new |  0

Looks like you may have accidentally left in an empty file "new" it should
probably be removed from this commit before it gets merged.

Aside from that this patch looks good.  I've recently run into issues
where we don't have a good enough abstraction layer around how we
interact with submodules so I'm glad we're moving towards better
abstractions :)

>  t/t7411-submodule-config.sh | 26 ++
>  3 files changed, 40 insertions(+)
>  create mode 100644 new
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index a3c4564c6c..7481d03b63 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -2029,6 +2029,19 @@ static int connect_gitdir_workingtree(int argc, const 
> char **argv, const char *p
>   return 0;
>  }
>  
> +static int module_config(int argc, const char **argv, const char *prefix)
> +{
> + /* Equivalent to ACTION_GET in builtin/config.c */
> + if (argc == 2)
> + return print_config_from_gitmodules(argv[1]);
> +
> + /* Equivalent to ACTION_SET in builtin/config.c */
> + if (argc == 3)
> + return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
> +
> + die("submodule--helper config takes 1 or 2 arguments: name [value]");
> +}
> +
>  #define SUPPORT_SUPER_PREFIX (1<<0)
>  
>  struct cmd_struct {
> @@ -2057,6 +2070,7 @@ static struct cmd_struct commands[] = {
>   {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
>   {"is-active", is_active, 0},
>   {"check-name", check_name, 0},
> + {"config", module_config, 0},
>  };
>  
>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
> diff --git a/new b/new
> new file mode 100644
> index 00..e69de29bb2
> diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> index c6b6cf6fae..4afb6f152e 100755
> --- a/t/t7411-submodule-config.sh
> +++ b/t/t7411-submodule-config.sh
> @@ -142,4 +142,30 @@ test_expect_success 'error in history in 
> fetchrecursesubmodule lets continue' '
>   )
>  '
>  
> +test_expect_success 'reading submodules config with "submodule--helper 
> config"' '
> + (cd super &&
> + echo "../submodule" >expected &&
> + git submodule--helper config submodule.submodule.url >actual &&
> + test_cmp expected actual
> + )
> +'
> +
> +test_expect_success 'writing submodules config with "submodule--helper 
> config"' '
> + (cd super &&
> + echo "new_url" >expected &&
> + git submodule--helper config submodule.submodule.url "new_url" 
> &&
> + git submodule--helper config submodule.submodule.url >actual &&
> + test_cmp expected actual
> + )
> +'
> +
> +test_expect_success 'overwriting unstaged submodules config with 
> "submodule--helper config"' '
> + (cd super &&
> + echo "newer_url" >expected &&
> + git submodule--helper config submodule.submodule.url 
> "newer_url" &&
> + git submodule--helper config submodule.submodule.url >actual &&
> + test_cmp expected actual
> + )
> +'
> +
>  test_done
> -- 
> 2.18.0
> 

-- 
Brandon Williams

Re: [PATCH v3 5/7] submodule: use the 'submodule--helper config' command

2018-08-14 Thread Brandon Williams

On 08/14, Antonio Ospite wrote:
> Use the 'submodule--helper config' command in git-submodules.sh to avoid
> referring explicitly to .gitmodules by the hardcoded file path.
> 
> This makes it possible to access the submodules configuration in a more
> controlled way.
> 
> Signed-off-by: Antonio Ospite 

Looks great.  I also like you're approach of introducing the new API and
testing it in one commit, and then using it in the next.  Makes the
patch set very easy to follow.

> ---
>  git-submodule.sh | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b5ad59bde..ff258e2e8c 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -72,7 +72,7 @@ get_submodule_config () {
>   value=$(git config submodule."$name"."$option")
>   if test -z "$value"
>   then
> - value=$(git config -f .gitmodules submodule."$name"."$option")
> + value=$(git submodule--helper config 
> submodule."$name"."$option")
>   fi
>   printf '%s' "${value:-$default}"
>  }
> @@ -283,11 +283,11 @@ or you are unsure what this means choose another name 
> with the '--name' option."
>   git add --no-warn-embedded-repo $force "$sm_path" ||
>   die "$(eval_gettext "Failed to add submodule '\$sm_path'")"
>  
> - git config -f .gitmodules submodule."$sm_name".path "$sm_path" &&
> - git config -f .gitmodules submodule."$sm_name".url "$repo" &&
> + git submodule--helper config submodule."$sm_name".path "$sm_path" &&
> + git submodule--helper config submodule."$sm_name".url "$repo" &&
>   if test -n "$branch"
>   then
> - git config -f .gitmodules submodule."$sm_name".branch "$branch"
> + git submodule--helper config submodule."$sm_name".branch 
> "$branch"
>   fi &&
>   git add --force .gitmodules ||
>   die "$(eval_gettext "Failed to register submodule '\$sm_path'")"
> -- 
> 2.18.0
> 

-- 
Brandon Williams

Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-14 Thread Brandon Williams

On 08/14, Antonio Ospite wrote:
> When the .gitmodules file is not available in the working tree, try
> using HEAD:.gitmodules from the current branch. This covers the case
> when the file is part of the repository but for some reason it is not
> checked out, for example because of a sparse checkout.
> 
> This makes it possible to use at least the 'git submodule' commands
> which *read* the gitmodules configuration file without fully populating
> the working tree.
> 
> Writing to .gitmodules will still require that the file is checked out,
> so check for that before calling config_set_in_gitmodules_file_gently.
> 
> Add a similar check also in git-submodule.sh::cmd_add() to anticipate
> the eventual failure of the "git submodule add" command when .gitmodules
> is not safely writeable; this prevents the command from leaving the
> repository in a spurious state (e.g. the submodule repository was cloned
> but .gitmodules was not updated because
> config_set_in_gitmodules_file_gently failed).
> 
> Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading
> from .gitmodules succeeds and that writing to it fails when the file is
> not checked out.
> 
> Signed-off-by: Antonio Ospite 
> ---
> 
> Maybe the check in config_set_in_gitmodules_file_gently and
> git-submodule.sh::cmd_add() can share some code:
> 
>   - add an is_gitmodules_safely_writeable() helper
>   - expose a "submodule--helper config --is-safely-writeable" subcommand
> 
> But for now I preferred to keep the changes with v2 to a minimum to avoid
> blocking the series.
> 
> If adding a new helper is preferred I can do a v4 or send a follow-up patch.

I see how it would be nice to have the addition of a helper like this.
I think maybe at some point we'd want it but its definitely not needed
now and can easily be added at a later point (maybe we can avoid needing
it until we can convert all of the git-submodule.sh code to C!).

Great work, thanks for working on this.

> 
> Thank you,
>Antonio
> 
> 
>  builtin/submodule--helper.c| 17 -
>  cache.h|  1 +
>  git-submodule.sh   |  7 ++
>  submodule-config.c | 16 -
>  t/t7416-submodule-sparse-gitmodules.sh | 90 ++
>  5 files changed, 128 insertions(+), 3 deletions(-)
>  create mode 100755 t/t7416-submodule-sparse-gitmodules.sh
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 7481d03b63..c0370a756b 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -2036,8 +2036,23 @@ static int module_config(int argc, const char **argv, 
> const char *prefix)
>   return print_config_from_gitmodules(argv[1]);
>  
>   /* Equivalent to ACTION_SET in builtin/config.c */
> - if (argc == 3)
> + if (argc == 3) {
> + struct object_id oid;
> +
> + /*
> +  * If the .gitmodules file is not in the working tree but it
> +  * is in the current branch, stop, as writing new values (and
> +  * staging them) would blindly overwrite ALL the old content.
> +  *
> +  * This check still makes it possible to create a brand new
> +  * .gitmodules when it is safe to do so: when neither
> +  * GITMODULES_FILE nor GITMODULES_HEAD exist.
> +  */
> + if (!file_exists(GITMODULES_FILE) && get_oid(GITMODULES_HEAD, 
> &oid) >= 0)
> + die(_("please make sure that the .gitmodules file in 
> the current branch is checked out"));
> +
>   return config_set_in_gitmodules_file_gently(argv[1], argv[2]);
> + }
>  
>   die("submodule--helper config takes 1 or 2 arguments: name [value]");
>  }
> diff --git a/cache.h b/cache.h
> index 8dc7134f00..900f9e09e5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -486,6 +486,7 @@ static inline enum object_type object_type(unsigned int 
> mode)
>  #define INFOATTRIBUTES_FILE "info/attributes"
>  #define ATTRIBUTE_MACRO_PREFIX "[attr]"
>  #define GITMODULES_FILE ".gitmodules"
> +#define GITMODULES_HEAD "HEAD:.gitmodules"
>  #define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
>  #define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
>  #define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF"
> diff --git a/git-submodule.sh b/git-submodule.sh
> index ff258e2e8c..b1cb187227 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -159,6 +159,13 @@ cmd_add()
>   shift
>   done
>  
> + # For more details about this check, see
> + # builtin/submodule--helper.c::module_config()
> + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > 
> /dev/null 2>&1
> + then
> +  die "$(eval_gettext "please make sure that the .gitmodules 
> file in the current branch is checked out")"
> + fi
> +
>   if test -n "$reference_path"
>   then
>   is_absolute_path "$reference_path" ||
>

[PATCH v4 1/6] list-objects: store common func args in struct

2018-08-14 Thread Matthew DeVore

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 158 +++--
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+   struct rev_info *revs;
+   show_object_fn show_object;
+   show_commit_fn show_commit;
+   void *show_data;
+   filter_object_fn filter_fn;
+   void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 struct blob *blob,
-show_object_fn show,
 struct strbuf *path,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = &blob->object;
size_t pathlen;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-   if (!revs->blob_objects)
+   if (!ctx->revs->blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 * may cause the actual filter to report an incomplete list
 * of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
+   if (ctx->revs->exclude_promisor_objects &&
!has_object_file(&obj->oid) &&
is_promisor_object(&obj->oid))
return;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BLOB, obj,
- path->buf, &path->buf[pathlen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BLOB, obj,
+  path->buf, &path->buf[pathlen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, path->buf, cb_data);
+   ctx->show_object(obj, path->buf, ctx->show_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
const unsigned char *sha1,
-   show_object_fn show,
struct strbuf *path,
-   const char *name,
-   void *cb_data)
+   const char *name)
 {
/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
-show_object_fn show,
 struct strbuf *base,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = &tree->object;
+   struct rev_info *revs = ctx->revs;
struct tree_desc desc;
struct name_entry entry;
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BEGIN_TREE, obj,
- base->buf, &base->buf[baselen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+  base->buf, &base->buf[baselen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, base->buf, cb_data);
+   ctx->show_object(obj, base->buf, ctx->show_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
}
 
if (S_ISDIR(entry.mode))
-   process_tree(revs,
+   process_tree(ctx,
 lookup_tree(the_repository

[PATCH v4 2/6] list-objects: refactor to process_tree_contents

2018-08-14 Thread Matthew DeVore

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
while (...) {
/* this is very indented */
}
}

we will have:

if (foo)
process_tree_contents(...);

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 68 ++
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+struct tree *tree,
+struct strbuf *base,
+const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+ struct tree *tree,
+ struct strbuf *base)
+{
+   struct tree_desc desc;
+   struct name_entry entry;
+   enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+   all_entries_interesting : entry_not_interesting;
+
+   init_tree_desc(&desc, tree->buffer, tree->size);
+
+   while (tree_entry(&desc, &entry)) {
+   if (match != all_entries_interesting) {
+   match = tree_entry_interesting(&entry, base, 0,
+  
&ctx->revs->diffopt.pathspec);
+   if (match == all_entries_not_interesting)
+   break;
+   if (match == entry_not_interesting)
+   continue;
+   }
+
+   if (S_ISDIR(entry.mode))
+   process_tree(ctx,
+lookup_tree(the_repository, entry.oid),
+base, entry.path);
+   else if (S_ISGITLINK(entry.mode))
+   process_gitlink(ctx, entry.oid->hash,
+   base, entry.path);
+   else
+   process_blob(ctx,
+lookup_blob(the_repository, entry.oid),
+base, entry.path);
+   }
+}
+
 static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
struct object *obj = &tree->object;
struct rev_info *revs = ctx->revs;
-   struct tree_desc desc;
-   struct name_entry entry;
-   enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-   all_entries_interesting: entry_not_interesting;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   init_tree_desc(&desc, tree->buffer, tree->size);
-
-   while (tree_entry(&desc, &entry)) {
-   if (match != all_entries_interesting) {
-   match = tree_entry_interesting(&entry, base, 0,
-  &revs->diffopt.pathspec);
-   if (match == all_entries_not_interesting)
-   break;
-   if (match == entry_not_interesting)
-   continue;
-   }
-
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
-   else if (S_ISGITLINK(entry.mode))
-   process_gitlink(ctx, entry.oid->hash, base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
-   }
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.18.0.865.gffc8e1a3cd6-goog

[PATCH v4 5/6] revision: mark non-user-given objects instead

2018-08-14 Thread Matthew DeVore

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 31 ++-
 revision.c |  1 -
 revision.h | 10 +++---
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index e88474a2d..3aee1b8a4 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BLOB, obj,
   path->buf, &path->buf[pathlen],
   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct 
traversal_context *ctx,
continue;
}
 
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
+   if (S_ISDIR(entry.mode)) {
+   struct tree *t = lookup_tree(the_repository, entry.oid);
+   t->object.flags |= NOT_USER_GIVEN;
+   process_tree(ctx, t, base, entry.path);
+   }
else if (S_ISGITLINK(entry.mode))
process_gitlink(ctx, entry.oid->hash,
base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
+   else {
+   struct blob *b = lookup_blob(the_repository, entry.oid);
+   b->object.flags |= NOT_USER_GIVEN;
+   process_blob(ctx, b, base, entry.path);
+   }
}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
   base->buf, &base->buf[baselen],
   ctx->filter_data);
@@ -191,7 +193,7 @@ static void process_tree(struct traversal_context *ctx,
if (parsed)
process_tree_contents(ctx, tree, base);
 
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
   base->buf, &base->buf[baselen],
   ctx->filter_data);
@@ -307,8 +309,11 @@ static void do_traverse(struct traversal_context *ctx)
 * an uninteresting boundary commit may not have its tree
 * parsed yet, but we are not going to show them anyway
 */
-   if (get_commit_tree(commit))
-   add_pending_tree(ctx->revs, get_commit_tree(commit));
+   if (get_commit_tree(commit)) {
+   struct tree *tree = get_commit_tree(commit);
+   tree->object.flags |= NOT_USER_GIVEN;
+   add_pending_tree(ctx->revs, tree);
+   }
ctx->show_commit(commit, ctx->show_data);
 
if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index 062749437..6d355b43c 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info 
*revs,
strbuf_release(&buf);
return; /* do not add the commit itself */
}
-   obj->flags |= USER_GIVEN;
add_object_array_with_path(obj, name, &revs->pending, mode, path);
 }
 
diff --git a/revision.h b/revision.h
index c94243543..55d47004d 100644
--- a/revision.h
+++ b/revision.h
@@ -8,7 +8,11 @@
 #include "diff.h"
 #include "commit-slab-decl.h"
 
-/* Remember to update object flag allocation in object.h */
+/* Remember to update object flag allocation in object.h
+ * NEEDSWORK: NOT_USER_GIVEN doesn't apply to com

[PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Matthew DeVore

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. A missing tree
will cause an error if --missing indicates an error should be caused,
and the hash is printed even if the tree is missing.

In list-objects.c we no longer print a message to stderr if a tree
object is missing (quiet_on_missing is always true). I couldn't find
any place where this would matter, or where the caller of
traverse_commit_list would need to be fixed to show the error. However,
in the future it would be trivial to make the caller show the message if
we needed to.

This is not tested very thoroughly, since we cannot create promisor
objects in tests without using an actual partial clone. t0410 has a
promise_and_delete utility function, but the is_promisor_object function
does not return 1 for objects deleted in this way. More tests will will
come in a patch that implements a filter that can be used with git
clone.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 11 +++--
 list-objects.c | 17 +--
 revision.h |  1 +
 t/t0410-partial-clone.sh   | 66 ++
 t/t5317-pack-objects-filter-objects.sh | 13 +
 5 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(&obj->oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(&obj->oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(&obj->oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(&obj->oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(&obj->oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+   if (!has_object_file(&obj->oid)) {
finish_object__ma(obj);
return 1;
}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(&revs, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..e88474a2d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+   int parsed;
 
if (!revs->tree_objects)
return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, 1) < 0) {
+
+   parsed = parse_tree_gently(tree, 1) >= 0;
+   if (!parsed) {
if (revs->ignore_missing_links)
return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
is_promisor_object(&obj->oid))
return;
 
-   die("bad tree object %s", oid_to_hex(&obj->oid));
+   if (!revs->do_not_die_on_missing_tree)
+   die("bad tree object %s", oid_to_hex(&obj->oid));
}
 
strbuf_addstr(base, name);
@@ -178,7 +182,14 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   /*
+* NEEDSWORK: we should not have to process this tree's contents if the
+* filter wants to exclude all its contents AND the filter doesn't need
+* to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which
+* allows skipping all children.
+*/
+

Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Derrick Stolee


On 8/13/2018 5:54 PM, Jeff King wrote:

So I try not to think too hard on metrics, and just use them to get a
rough view on who is active.


I've been very interested in measuring community involvement, with the 
knowledge that any metric is flawed and we should not ever say "this 
metric is how we measure the quality of a contributor". It can be 
helpful, though, to track some metrics and their change over time.


Here are a few measurements we can make:

1. Number of (non-merge) commit author tag-lines.

    using git repo:

  > git shortlog --no-merges --since 2017 -sne junio/next | head -n 20
   284  Nguyễn Thái Ngọc Duy 
   257  Jeff King 
   206  Stefan Beller 
   192  brian m. carlson 
   159  Brandon Williams 
   149  Junio C Hamano 
   137  Elijah Newren 
   116  René Scharfe 
   112  Johannes Schindelin 
   105  Ævar Arnfjörð Bjarmason 
    96  Jonathan Tan 
    93  SZEDER Gábor 
    78  Derrick Stolee 
    76  Martin Ågren 
    66  Michael Haggerty 
    61  Eric Sunshine 
    46  Christian Couder 
    36  Phillip Wood 
    35  Jonathan Nieder 
    33  Thomas Gummerer 

2. Number of other commit tag-lines (Reviewed-By, Helped-By, 
Reported-By, etc.).


    Using git repo:

    $ git log --since=2018-01-01 junio/next|grep by:|grep -v 
Signed-off-by:|sort|uniq -c|sort -nr|head -n 20


 66 Reviewed-by: Stefan Beller 
 22 Reviewed-by: Jeff King 
 19 Reviewed-by: Jonathan Tan 
 12 Helped-by: Eric Sunshine 
 11 Helped-by: Junio C Hamano 
  9 Helped-by: Jeff King 
  8 Reviewed-by: Elijah Newren 
  7 Reported-by: Ramsay Jones 
  7 Acked-by: Johannes Schindelin 
  7 Acked-by: Brandon Williams 
  6 Reviewed-by: Eric Sunshine 
  6 Helped-by: Johannes Schindelin 
  5 Mentored-by: Christian Couder 
  5 Acked-by: Johannes Schindelin 
  4 Reviewed-by: Jonathan Nieder 
  4 Reviewed-by: Johannes Schindelin 
  4 Helped-by: Stefan Beller 
  4 Helped-by: René Scharfe 
  3 Reviewed-by: Martin Ågren 
  3 Reviewed-by: Lars Schneider 

    (There does not appear to be enough density here to make a useful 
metric.)


3. Number of email messages sent.

    Using mailing list repo:

$ git shortlog --since 2017 -sne | head -n 20
  3749  Junio C Hamano 
  2213  Stefan Beller 
  2112  Jeff King 
  1106  Nguyễn Thái Ngọc Duy 
  1028  Johannes Schindelin 
   965  Ævar Arnfjörð Bjarmason 
   956  Brandon Williams 
   947  Eric Sunshine 
   890  Elijah Newren 
   753  brian m. carlson 
   677  Duy Nguyen 
   646  Jonathan Nieder 
   629  Derrick Stolee 
   545  Christian Couder 
   515  Jonathan Tan 
   425  Johannes Schindelin 
   425  Martin Ågren 
   420  Jeff Hostetler 
   420  SZEDER Gábor 
   363  Phillip Wood 

3. Number of threads started by user.

    (For this and the measurements below, I imported emails into a SQL 
table with columns [commit, author, date, message-id, in-reply-to, 
subject] and ran queries)


SELECT TOP 20
       COUNT(*) as NumSent
  ,[Author]
  FROM [git].[dbo].[mailing-list]
  WHERE [In-Reply-To] = ''
        AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 
00:00')

GROUP BY [Author]
ORDER BY NumSent DESC

| NumSent | Author |
|-||
| 76  | Junio C Hamano |
| 64  | Stefan Beller  |
| 54  | Philip Oakley  |
| 50  | Nguyá»…n ThÃ¡i Ngá»c Duy   |
| 49  | Robert P. J. Day   |
| 47  | Christian Couder   |
| 36  | Ramsay Jones   |
| 34  | Elijah Newren  |
| 34  | SZEDER GÃ¡bor  |
| 33  | Johannes Schindelin    |
| 31  | Jeff King  |
| 30  | Ã†var ArnfjÃ¶rÃ° Bjarmason |
| 24  | Jonathan Tan   |
| 22  | Alban Gruin    |
| 22  | brian m. carlson   |
| 18  | Randall S. Becker  |
| 15  | Paul-Sebastian Ungureanu   |
| 15  | Jeff Hostetler     |
| 15  | Brandon Williams   |
| 15  | Luke Diamand   |

4. Number of threads where the user participated

(This is measured by completing the transitive closure of In-Reply-To 
edges into a new 'BaseMessage' column.)


SELECT TOP 20
       COUNT(BaseMessage) as NumResponded
  ,Author
  FROM [git].[dbo].[mailing-list]
  WHERE [In-Reply-To] <> ''
        AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 
00:00')

GROUP BY Author
ORDER BY NumResponded DESC

| NumResponded | Author |
|--||
| 2084 | Junio C Hamano |
| 1596 | Stefan Beller  |
| 1211 | Jeff King  |
| 1120 | Johannes Schindelin    |
| 1021 | Nguyá»…n ThÃ¡i Ngá»c Duy   |
| 799  | Eric Sunshine  |
| 797  | Ã†var ArnfjÃ¶rÃ° Bjarmason |
| 693

Bug with Git submodules and submodule.recurse setting

2018-08-14 Thread Jochen Kühner

If I set 



 

git config --global submodule.recurse true





and run git via:



 

git pull --progress -v --no-rebase "origin"





The command will fail with following output (Errorlevel is 1)

 



Fetching submodule submodules/tstemplates



From http://10.0.102.194:7990/scm/mcc/tstemplates



= [up to date]  feature/robin -> origin/feature/robin



= [up to date]  master    -> origin/master



Already up to date.



usage: git submodule [--quiet] add [-b ] [-f|--force] [--name ] 
[--reference ] [--]  []



   or: git submodule [--quiet] status [--cached] [--recursive] [--] [...]



   or: git submodule [--quiet] init [--] [...]



   or: git submodule [--quiet] deinit [-f|--force] (--all| [--] ...)



   or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] 
[-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] 
[--reference ] [--recursive] [--] [...]



   or: git submodule [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]



   or: git submodule [--quiet] foreach [--recursive] 



   or: git submodule [--quiet] sync [--recursive] [--] [...]



   or: git submodule [--quiet] absorbgitdirs [--] [...]



 



seams that the “verbose” parameter “-v” is also sent to “git submodules” wich 
does not support it.



 



If I remove “-v” it will work.



 



Problem is, I use TortoiseGit, wich will automatically create this command!

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Brandon Williams

On 08/09, Jeff King wrote:
> On Wed, Aug 08, 2018 at 03:33:23PM -0700, Brandon Williams wrote:
> 
> > Commit 0383bbb901 (submodule-config: verify submodule names as paths,
> > 2018-04-30) introduced some checks to ensure that submodule names don't
> > include directory traversal components (e.g. "../").
> > 
> > This addresses the vulnerability identified in 0383bbb901 but the root
> > cause is that we use submodule names to construct paths to the
> > submodule's git directory.  What we really should do is munge the
> > submodule name before using it to construct a path.
> > 
> > Teach "submodule_name_to_gitdir()" to munge a submodule's name (by url
> > encoding it) before using it to build a path to the submodule's gitdir.
> 
> I like this approach very much, and I think using url encoding is much
> better than an opaque hash (purely because it makes debugging and
> inspection saner).
> 
> Two thoughts, though:
> 
> > +   modules_len = buf->len;
> > strbuf_addstr(buf, submodule_name);
> > +
> > +   /*
> > +* If the submodule gitdir already exists using the old-fashioned
> > +* location (which uses the submodule name as-is, without munging it)
> > +* then return that.
> > +*/
> > +   if (!access(buf->buf, F_OK))
> > +   return;
> 
> I think this backwards-compatibility is necessary to avoid pain. But
> until it goes away, I don't think this is helping the vulnerability from
> 0383bbb901. Because there the issue was that the submodule name pointed
> back into the working tree, so this access() would find the untrusted
> working tree code and say "ah, an old-fashioned name!".
> 
> In theory a fresh clone could set a config option for "I only speak
> use new-style modules". And there could even be a conversion program
> that moves the modules as appropriate, fixes up the .git files in the
> working tree, and then sets that config.
> 
> In fact, I think that config option _could_ be done by bumping
> core.repositoryformatversion and then setting extensions.submodulenames
> to "url" or something. Then you could never run into the confusing case
> where you have a clone done by a new version of git (using new-style
> names), but using an old-style version gets confused because it can't
> find the module directories (instead, it would barf and say "I don't
> know about that extension").
> 
> I don't know if any of that is worth it, though. We already fixed the
> problem from 0383bbb901. There may be a _different_ "break out of the
> modules directory" vulnerability, but since we disallow ".." it's hard
> to see what it would be (the best I could come up with is maybe pointing
> one module into the interior of another module, but I think you'd have
> to trouble overwriting anything useful).
> 
> And while an old-style version of Git being confused might be annoying,
> I suspect that bumping the repository version would be even _more_
> annoying, because it would hit every command, not just ones that try to
> touch those submodules.

Oh I know that this doesn't help with that vulnerability.  As you've
said we fix it and now disallow ".." at the submodule-config level so
really this path is simply about using what we get out of
submodule-config in a more sane manor.

> 
> > diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> > index 2c2c97e144..963693332c 100755
> > --- a/t/t7400-submodule-basic.sh
> > +++ b/t/t7400-submodule-basic.sh
> > @@ -933,7 +933,7 @@ test_expect_success 'recursive relative submodules stay 
> > relative' '
> > cd clone2 &&
> > git submodule update --init --recursive &&
> > echo "gitdir: ../.git/modules/sub3" >./sub3/.git_expect &&
> > -   echo "gitdir: ../../../.git/modules/sub3/modules/dirdir/subsub" 
> > >./sub3/dirdir/subsub/.git_expect
> > +   echo "gitdir: 
> > ../../../.git/modules/sub3/modules/dirdir%2fsubsub" 
> > >./sub3/dirdir/subsub/.git_expect
> 
> One interesting thing about url-encoding is that it's not one-to-one.
> This case could also be %2F, which is a different file (on a
> case-sensitive filesystem). I think "%20" and "+" are similarly
> interchangeable.
> 
> If we were decoding the filenames, that's fine. The round-trip is
> lossless.
> 
> But that's not quite how the new code behaves. We encode the input and
> then check to see if it matches an encoding we previously performed. So
> if our urlencode routines ever change, this will subtly break.
> 
> I don't know how much it's worth caring about. We're not that likely to
> change the routines ourself (though certainly a third-party
> implementation would need to know our exact url-encoding decisions).

This is exactly the reason why I wanted to get some opinions on what the
best thing to do here would be.  I _think_ the best thing would probably
be to write a specific routine to do the conversion, and it wouldn't
even have to be all that complex.  Basically I'm just interested in
converting '/' characters so that things no l

Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly

2018-08-14 Thread Jonathan Tan

> Previously, we assumed only blob objects could be missing. This patch
> makes rev-list handle missing trees like missing blobs. A missing tree
> will cause an error if --missing indicates an error should be caused,
> and the hash is printed even if the tree is missing.

The last sentence is difficult to understand - probably better to say
that all --missing= arguments and --exclude-promisor-objects work for
missing trees like they currently do for blobs (and do not fixate on
just --missing=error). And also demonstrate this in tests, like in
t6612.

> In list-objects.c we no longer print a message to stderr if a tree
> object is missing (quiet_on_missing is always true). I couldn't find
> any place where this would matter, or where the caller of
> traverse_commit_list would need to be fixed to show the error. However,
> in the future it would be trivial to make the caller show the message if
> we needed to.
> 
> This is not tested very thoroughly, since we cannot create promisor
> objects in tests without using an actual partial clone. t0410 has a
> promise_and_delete utility function, but the is_promisor_object function
> does not return 1 for objects deleted in this way. More tests will will
> come in a patch that implements a filter that can be used with git
> clone.

These two paragraphs are no longer applicable, I think.

> @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
> *prefix)
>   init_revisions(&revs, prefix);
>   revs.abbrev = DEFAULT_ABBREV;
>   revs.commit_format = CMIT_FMT_UNSPECIFIED;
> + revs.do_not_die_on_missing_tree = 1;

Is this correct? I would have expected this to be set only if --missing
was set.

> - process_tree_contents(ctx, tree, base);
> + /*
> +  * NEEDSWORK: we should not have to process this tree's contents if the
> +  * filter wants to exclude all its contents AND the filter doesn't need
> +  * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which
> +  * allows skipping all children.
> +  */
> + if (parsed)
> + process_tree_contents(ctx, tree, base);

I agree with Jeff Hostetler in [1] that a LOFR_SKIP_TREE bit is
desirable, but I don't think that this patch is the right place to
introduce this NEEDSWORK. For me, this patch is about skipping iterating
over the contents of a tree because the tree does not exist; this
NEEDSWORK is about skipping iterating over the contents of a tree
because we don't want its contents, and it is quite confusing to
conflate the two.

[1] 
https://public-inbox.org/git/d751d56b-84bb-a03d-5f2a-7dbaf8d94...@jeffhostetler.com/

> @@ -124,6 +124,7 @@ struct rev_info {
>   first_parent_only:1,
>   line_level_traverse:1,
>   tree_blobs_in_commit_order:1,
> + do_not_die_on_missing_tree:1,

I know that the other flags don't have documentation, but I think it's
worth documenting this one because it is rather complicated. I have
provided a sample one in my earlier review - feel free to use that or
come up with your own.

> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 4984ca583..74e3c5767 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at missing 
> and promised commit' '
>   ! grep $FOO out
>  '
>  
> +test_expect_success 'show missing tree objects with --missing=print' '
> + rm -rf repo &&
> + test_create_repo repo &&
> + test_commit -C repo foo &&
> + test_commit -C repo bar &&
> + test_commit -C repo baz &&
> +
> + TREE=$(git -C repo rev-parse bar^{tree}) &&
> +
> + promise_and_delete $TREE &&
> +
> + git -C repo config core.repositoryformatversion 1 &&
> + git -C repo config extensions.partialclone "arbitrary string" &&
> + git -C repo rev-list --quiet --missing=print --objects HEAD 
> >missing_objs 2>rev_list_err &&
> + echo "?$TREE" >expected &&
> + test_cmp expected missing_objs &&
> +
> + # do not complain when a missing tree cannot be parsed
> + ! grep -q "Could not read " rev_list_err
> +'

I think that the --exclude-promisor-tests can go in t0410 as you have
done, but the --missing tests (except for --missing=allow-promisor)
should go in t6112. (And like the existing --missing tests, they should
be done without setting extensions.partialclone.)

As for --missing=allow-promisor, I don't see them being tested anywhere
:-( so feel free to make a suggestion. I would put them in t6112 for
easy comparison with the other --missing tests.

[PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also consider only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
 Documentation/rev-list-options.txt |  3 ++
 list-objects-filter-options.c  |  4 +++
 list-objects-filter-options.h  |  1 +
 list-objects-filter.c  | 50 ++
 t/t5317-pack-objects-filter-objects.sh | 27 ++
 t/t5616-partial-clone.sh   | 38 
 t/t6112-rev-list-filters-objects.sh| 13 +++
 7 files changed, 136 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..9e351ec2a 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -743,6 +743,9 @@ specification contained in .
A debug option to help with future "partial clone" development.
This option specifies how missing objects are handled.
 +
+The form '--filter=tree:' omits all blobs and trees deeper than
+ from the root tree. Currently, only =0 is supported.
++
 The form '--missing=error' requests that rev-list stop with an error if
 a missing object is encountered.  This is the default action.
 +
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..a28382940 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
+   } else if (!strcmp(arg, "tree:0")) {
+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", &v0)) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..8e3caf5bf 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   die("unknown filter_situation");
+   return LOFR_ZERO;
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, &obj->oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}
+
+static void* filter_trees_none__init(
+   struct oidset *omitted,
+   struct list_objects_filter_options *filter_options,
+   filter_object_fn *filter_fn,
+   filter_free_fn *filter_free_fn)
+{
+   struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+   d->omits = omitted;
+
+   *filter_fn = filter_trees_none;
+   *filter_free_fn = free;
+   return d;
+}
+
 /*
  * A filter for list-objects to omit large blobs.
  * And to OPTIONALLY collect a list of the omitted OIDs.
@@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = {
NULL,
filter_blobs_none__init,
filter_blobs_limit__init,
+   filter_trees_none__init,
filter_sparse_oid__init,
filter_sparse_path__i

[PATCH v4 3/6] list-objects: always parse trees gently

2018-08-14 Thread Matthew DeVore

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   if (parse_tree_gently(tree, 1) < 0) {
if (revs->ignore_missing_links)
return;
 
-- 
2.18.0.865.gffc8e1a3cd6-goog

[PATCH v4 0/6] filter: support for excluding all trees and blobs

2018-08-14 Thread Matthew DeVore

I've applied or responded to all changes suggested by Jonathan and Jeff.

Matthew DeVore (6):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt |   3 +
 builtin/rev-list.c |  11 +-
 list-objects-filter-options.c  |   4 +
 list-objects-filter-options.h  |   1 +
 list-objects-filter.c  |  50 ++
 list-objects.c | 238 +
 revision.c |   1 -
 revision.h |  11 +-
 t/t0410-partial-clone.sh   |  66 +++
 t/t5317-pack-objects-filter-objects.sh |  40 +
 t/t5616-partial-clone.sh   |  38 
 t/t6112-rev-list-filters-objects.sh|  13 ++
 12 files changed, 358 insertions(+), 118 deletions(-)

-- 
2.18.0.865.gffc8e1a3cd6-goog

Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore

On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler  wrote:
>
>
>
> On 8/13/2018 2:14 PM, Matthew DeVore wrote:
> > Teach list-objects the "tree:0" filter which allows for filtering
> > out all tree and blob objects (unless other objects are explicitly
> > specified by the user). The purpose of this patch is to allow smaller
> > partial clones.
> >
> > The name of this filter - tree:0 - does not explicitly specify that
> > it also filters out all blobs, but this should not cause much confusion
> > because blobs are not at all useful without the trees that refer to
> > them.
> >
> > I also consider only:commits as a name, but this is inaccurate because
> > it suggests that annotated tags are omitted, but actually they are
> > included.
> >
> > The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> > would filter out all but the root tree and blobs. In order to avoid
> > confusion between 0 and capital O, the documentation was worded in a
> > somewhat round-about way that also hints at this future improvement to
> > the feature.
> >
> > Signed-off-by: Matthew DeVore 
> > ---
> >   Documentation/rev-list-options.txt |  3 ++
> >   list-objects-filter-options.c  |  4 +++
> >   list-objects-filter-options.h  |  1 +
> >   list-objects-filter.c  | 50 ++
> >   t/t5317-pack-objects-filter-objects.sh | 27 ++
> >   t/t5616-partial-clone.sh   | 27 ++
> >   t/t6112-rev-list-filters-objects.sh| 13 +++
> >   7 files changed, 125 insertions(+)
> >
> > diff --git a/Documentation/rev-list-options.txt 
> > b/Documentation/rev-list-options.txt
> > index 7b273635d..9e351ec2a 100644
> > --- a/Documentation/rev-list-options.txt
> > +++ b/Documentation/rev-list-options.txt
> > @@ -743,6 +743,9 @@ specification contained in .
> >   A debug option to help with future "partial clone" development.
> >   This option specifies how missing objects are handled.
> >   +
> > +The form '--filter=tree:' omits all blobs and trees deeper than
> > + from the root tree. Currently, only =0 is supported.
> > ++
> >   The form '--missing=error' requests that rev-list stop with an error if
> >   a missing object is encountered.  This is the default action.
> >   +
> > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> > index c0e2bd6a0..a28382940 100644
> > --- a/list-objects-filter-options.c
> > +++ b/list-objects-filter-options.c
> > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter(
> >   return 0;
> >   }
> >
> > + } else if (!strcmp(arg, "tree:0")) {
> > + filter_options->choice = LOFC_TREE_NONE;
> > + return 0;
> > +
> >   } else if (skip_prefix(arg, "sparse:oid=", &v0)) {
> >   struct object_context oc;
> >   struct object_id sparse_oid;
> > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> > index a61f8..af64e5c66 100644
> > --- a/list-objects-filter-options.h
> > +++ b/list-objects-filter-options.h
> > @@ -10,6 +10,7 @@ enum list_objects_filter_choice {
> >   LOFC_DISABLED = 0,
> >   LOFC_BLOB_NONE,
> >   LOFC_BLOB_LIMIT,
> > + LOFC_TREE_NONE,
> >   LOFC_SPARSE_OID,
> >   LOFC_SPARSE_PATH,
> >   LOFC__COUNT /* must be last */
> > diff --git a/list-objects-filter.c b/list-objects-filter.c
> > index a0ba78b20..8e3caf5bf 100644
> > --- a/list-objects-filter.c
> > +++ b/list-objects-filter.c
> > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init(
> >   return d;
> >   }
> >
> > +/*
> > + * A filter for list-objects to omit ALL trees and blobs from the 
> > traversal.
> > + * Can OPTIONALLY collect a list of the omitted OIDs.
> > + */
> > +struct filter_trees_none_data {
> > + struct oidset *omits;
> > +};
> > +
> > +static enum list_objects_filter_result filter_trees_none(
> > + enum list_objects_filter_situation filter_situation,
> > + struct object *obj,
> > + const char *pathname,
> > + const char *filename,
> > + void *filter_data_)
> > +{
> > + struct filter_trees_none_data *filter_data = filter_data_;
> > +
> > + switch (filter_situation) {
> > + default:
> > + die("unknown filter_situation");
> > + return LOFR_ZERO;
> > +
> > + case LOFS_BEGIN_TREE:
> > + case LOFS_BLOB:
> > + if (filter_data->omits)
> > + oidset_insert(filter_data->omits, &obj->oid);
> > + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
> > +
> > + case LOFS_END_TREE:
> > + assert(obj->type == OBJ_TREE);
> > + return LOFR_ZERO;
> > +
> > + }
> > +}
>
> There are a couple of options here:
> [] If really want to omit all trees and blobs (and we DO NOT want
> the oidset of everything omitted), then we might be able to
> shortcut the traversal and speed things up.
>
> {} add a LOFR_SKIP_TREE bi

[PATCH 0/4] finishing touches on jk/for-each-object-iteration

2018-08-14 Thread Jeff King

On Mon, Aug 13, 2018 at 11:45:06AM -0700, Jonathan Tan wrote:

> >   [1/7]: for_each_*_object: store flag definitions in a single location
> >   [2/7]: for_each_*_object: take flag arguments as enum
> >   [3/7]: for_each_*_object: give more comprehensive docstrings
> >   [4/7]: for_each_packed_object: support iterating in pack-order
> >   [5/7]: t1006: test cat-file --batch-all-objects with duplicates
> >   [6/7]: cat-file: rename batch_{loose,packed}_object callbacks
> >   [7/7]: cat-file: support "unordered" output for --batch-all-objects
> 
> Thanks for laying all the patches out so cleanly! All of them are:
> Reviewed-by: Jonathan Tan 
> 
> Normally I would re-explain the patches to demonstrate that I understand
> them, but in this case, I think they are simple enough - patches 1, 2,
> 3, and 6 are refactorings that I agree with, patch 5 just makes a test
> more comprehensive, and patches 4 and 7 do what their commit messages
> say.
> 
> Stefan brought up the concern that cache.h is increasing in size, but I
> agree with the patch as written that it's probably best that we
> centralize all the flags somewhere, and we can deal with the location in
> a future patch.

Thanks for the review. Here are a few patches on top to deal with the
cache.h thing, as well as some optimizations that came out of discussing
oidset in another thread (I left out for now the "big" optimization of
moving oidset to a different data structure; that's complicated enough
to be dealt with on its own, I think).

The first patch here could arguably be squashed into the final patch of
the original series, but I'm OK with it either way.

  [1/4]: cat-file: use oidset check-and-insert
  [2/4]: cat-file: split batch "buf" into two variables
  [3/4]: cat-file: use a single strbuf for all output
  [4/4]: for_each_*_object: move declarations to object-store.h

 builtin/cat-file.c | 43 +++-
 builtin/prune-packed.c |  1 +
 cache.h| 75 ---
 object-store.h | 90 ++
 packfile.h | 20 --
 5 files changed, 116 insertions(+), 113 deletions(-)

-Peff

[PATCH 1/4] cat-file: use oidset check-and-insert

2018-08-14 Thread Jeff King

We don't need to check if the oidset has our object before
we insert it; that's done as part of the insertion. We can
just rely on the return value from oidset_insert(), which
saves one hash lookup per object.

This measurable speedup is tiny and within the run-to-run
noise, but the result is simpler to read, too.

Signed-off-by: Jeff King 
---
 builtin/cat-file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 45992c9be9..04b5cda191 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -443,9 +443,8 @@ static int batch_unordered_object(const struct object_id 
*oid, void *vdata)
 {
struct object_cb_data *data = vdata;
 
-   if (oidset_contains(data->seen, oid))
+   if (oidset_insert(data->seen, oid))
return 0;
-   oidset_insert(data->seen, oid);
 
return batch_object_cb(oid, data);
 }
-- 
2.18.0.1066.g0d97f3a098

[PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Ævar Arnfjörð Bjarmason

Add a --send-delay option with a corresponding sendemail.smtpSendDelay
configuration variable. When set to e.g. 2, this causes send-email to
sleep 2 seconds before sending the next E-Mail. We'll only sleep
between sends, not before the first send, or after the last.

This option has two uses. Firstly, to be able to Ctrl+C a long send
with "all" if you have a change of heart. Secondly, as a hack in some
mail setups to, with a sufficiently high delay, force the receiving
client to sort the E-Mails correctly.

Some popular E-Mail clients completely ignore the "Date" header, which
format-patch is careful to set such that the patches will be displayed
in order, and instead sort by the time the E-mail was received.

Google's GMail is a good example of such a client. It ostensibly sorts
by some approximation of received time (although not by any "Received"
header). It's more usual than not to see patches showing out of order
in GMail. To take a few examples of orders seen on patches on the Git
mailing list:

1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
3 -> 2 -> 1 (fast-import by Jameson Miller)
2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)

The reason to add the new "X-Mailer-Send-Delay" header is to make it
easy to tell what the imposed delay was, if any. This allows for
gathering some data on how the transfer of E-Mails with & without this
option behaves. This may not be workable without really long delays,
see [1] and [2].

The reason for why the getopt format is "send-delay=s" instead of
"send-delay=d" is because we're doing manual validation of the value
we get passed, which getopt would corrupt in cases of e.g. float
values before we could show a sensible error message.

1. 
https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/
2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I submitted this back in March hoping it would solve mail ordering
problems, but the other motive I had for this is that I'm paranoid
that I'm sending out bad E-Mails, and tend to "y" to each one because
"a" is too fast.

So I'm re-sending this with an updated commit message & rationale, and
not sending 2/2 to toggle this on by default. I'd still like to have
this feature.

 Documentation/config.txt |  6 
 Documentation/git-send-email.txt |  4 +++
 git-send-email.perl  | 18 ---
 t/t9001-send-email.sh| 55 
 4 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 63365dcf3d..5eb81b64a7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3216,6 +3216,12 @@ sendemail.smtpReloginDelay::
Seconds wait before reconnecting to smtp server.
See also the `--relogin-delay` option of linkgit:git-send-email[1].
 
+sendemail.smtpSendDelay::
+   Seconds wait in between message sending before sending another
+   message. Set it to 0 to impose no extra delay, defaults to 0.
++
+See also the `--send-delay` option of linkgit:git-send-email[1].
+
 showbranch.default::
The default set of branches for linkgit:git-show-branch[1].
See linkgit:git-show-branch[1].
diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 465a4ecbed..98fdd9b803 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -270,6 +270,10 @@ must be used for each option.
with --batch-size option.  Defaults to the `sendemail.smtpReloginDelay`
configuration variable.
 
+--send-delay=::
+   Wait $ between sending emails. Defaults to the
+   `sendemail.smtpSendDelay` configuration variable.
+
 Automating
 ~~
 
diff --git a/git-send-email.perl b/git-send-email.perl
index 2be5dac337..65b53ee9f1 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -89,6 +89,7 @@ sub usage {
 --batch-size  * send max  message per connection.
 --relogin-delay   * delay  seconds between two 
successive login.
  This option can only be used with 
--batch-size
+--send-delay  * ensure that  seconds pass between 
two successive sends.
 
   Automating:
 --identity* Use the sendemail. options.
@@ -225,7 +226,7 @@ sub do_edit {
 my ($to_cmd, $cc_cmd);
 my ($smtp_server, $smtp_server_port, @smtp_server_options);
 my ($smtp_authuser, $smtp_encryption, $smtp_ssl_cert_path);
-my ($batch_size, $relogin_delay);
+my ($batch_size, $relogin_delay, $send_delay);
 my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth);
 my ($validate, $confirm);
 my (@suppress_cc);
@@ -259,6 +260,7 @@ sub do_edit {
 "smtpauth" => \$smtp_auth,
 "smtpbatchsize" => \$batch_size,
 "smtprelogindelay" => \$relogin_delay,
+"smtpsenddela

[PATCH 2/4] cat-file: split batch "buf" into two variables

2018-08-14 Thread Jeff King

We use the "buf" strbuf for two things: to read incoming
lines, and as a scratch space for test-expanding the
user-provided format. Let's split this into two variables
with descriptive names, which makes their purpose and
lifetime more clear.

It will also help in a future patch when we start using the
"output" buffer for more expansions.

Signed-off-by: Jeff King 
---
René, in the patch you sent earlier, I noticed that for the
non-batch-all-objects case we use the same strbuf for input and output.
That'd probably be OK most of the time (the first thing we do is resolve
the input to an oid), but I suspect it could be pretty bad with %(rest).
We'd write over or even realloc the string it points into as part of the
output.

This patch just clarifies the names; your reuse idea is in the next one.

 builtin/cat-file.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 04b5cda191..3ed1d0be80 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -466,7 +466,8 @@ static int batch_unordered_packed(const struct object_id 
*oid,
 
 static int batch_objects(struct batch_options *opt)
 {
-   struct strbuf buf = STRBUF_INIT;
+   struct strbuf input = STRBUF_INIT;
+   struct strbuf output = STRBUF_INIT;
struct expand_data data;
int save_warning;
int retval = 0;
@@ -481,8 +482,9 @@ static int batch_objects(struct batch_options *opt)
 */
memset(&data, 0, sizeof(data));
data.mark_query = 1;
-   strbuf_expand(&buf, opt->format, expand_format, &data);
+   strbuf_expand(&output, opt->format, expand_format, &data);
data.mark_query = 0;
+   strbuf_release(&output);
if (opt->cmdmode)
data.split_on_whitespace = 1;
 
@@ -542,14 +544,14 @@ static int batch_objects(struct batch_options *opt)
save_warning = warn_on_object_refname_ambiguity;
warn_on_object_refname_ambiguity = 0;
 
-   while (strbuf_getline(&buf, stdin) != EOF) {
+   while (strbuf_getline(&input, stdin) != EOF) {
if (data.split_on_whitespace) {
/*
 * Split at first whitespace, tying off the beginning
 * of the string and saving the remainder (or NULL) in
 * data.rest.
 */
-   char *p = strpbrk(buf.buf, " \t");
+   char *p = strpbrk(input.buf, " \t");
if (p) {
while (*p && strchr(" \t", *p))
*p++ = '\0';
@@ -557,10 +559,10 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   batch_one_object(buf.buf, opt, &data);
+   batch_one_object(input.buf, opt, &data);
}
 
-   strbuf_release(&buf);
+   strbuf_release(&input);
warn_on_object_refname_ambiguity = save_warning;
return retval;
 }
-- 
2.18.0.1066.g0d97f3a098

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan

> @@ -743,6 +743,9 @@ specification contained in .
>   A debug option to help with future "partial clone" development.
>   This option specifies how missing objects are handled.
>  +
> +The form '--filter=tree:' omits all blobs and trees deeper than
> + from the root tree. Currently, only =0 is supported.
> ++
>  The form '--missing=error' requests that rev-list stop with an error if
>  a missing object is encountered.  This is the default action.
>  +

The "--filter" documentation should go with the other "--filter"
information, not right after --missing.

> +test_expect_success 'setup for tests of tree:0' '
> + mkdir r1/subtree &&
> + echo "This is a file in a subtree" > r1/subtree/file &&
> + git -C r1 add subtree/file &&
> + git -C r1 commit -m subtree
> +'

Style: no space after >

> +test_expect_success 'grab tree directly when using tree:0' '
> + # We should get the tree specified directly but not its blobs or 
> subtrees.
> + git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack 
> <<-EOF &&
> + HEAD:
> + EOF
> + git -C r1 index-pack ../commitsonly.pack &&
> + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> + grep -E "tree|blob" objs >trees_and_blobs &&
> + test_line_count = 1 trees_and_blobs
> +'

Can we also verify that the SHA-1 in trees_and_blobs is what we
expected?

> +test_expect_success 'use fsck before and after manually fetching a missing 
> subtree' '
> + # push new commit so server has a subtree
> + mkdir src/dir &&
> + echo "in dir" > src/dir/file.txt &&

No space after >

> + git -C src add dir/file.txt &&
> + git -C src commit -m "file in dir" &&
> + git -C src push -u srv master &&
> + SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> +
> + rm -rf dst &&
> + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> + git -C dst fsck &&
> + git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> + git -C dst fsck
> +'

If you don't need to redirect to err, don't do so.

Before the cat-file, also verify that the tree is missing, most likely
through a "git rev-list" with "--missing=print".

And I would grep on the tree_contents to ensure that the filename
("file.txt") is there, so that we know that we got the correct tree.

> +test_expect_success 'can use tree:0 to filter partial clone' '
> + rm -rf dst &&
> + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst &&
> + git -C dst rev-list master --missing=allow-any --objects 
> >fetched_objects &&
> + cat fetched_objects \
> + | awk -f print_1.awk \
> + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> + sort fetched_types -u >unique_types.observed &&
> + echo commit > unique_types.expected &&
> + test_cmp unique_types.observed unique_types.expected
> +'
> +
> +test_expect_success 'auto-fetching of trees with --missing=error' '
> + git -C dst rev-list master --missing=error --objects >fetched_objects &&
> + cat fetched_objects \
> + | awk -f print_1.awk \
> + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> + sort fetched_types -u >unique_types.observed &&
> + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> + test_cmp unique_types.observed unique_types.expected
> +'

These two tests seem redundant with the 'use fsck before and after
manually fetching a missing subtree' test (after the latter is
appropriately renamed). I think we only need to test this sequence once,
which can be placed in one or spread over multiple tests:

 1. partial clone with --filter=tree:0
 2. fsck works
 3. verify that trees are indeed missing
 4. autofetch a tree
 5. fsck still works

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler





On 8/13/2018 6:41 PM, Junio C Hamano wrote:

Jeff King  writes:


I can buy the argument that it's nice to have some form of profiling
that works everywhere, even if it's lowest-common-denominator. I just
wonder if we could be investing effort into tooling around existing
solutions that will end up more powerful and flexible in the long run.


Another thing I noticed is that the codepaths we would find
interesting to annotate with trace_performance_* stuff often
overlaps with the "slog" thing.  If the latter aims to eventually
replace GIT_TRACE (and if not, I suspect there is not much point
adding it in the first place), perhaps we can extend it to also
cover the need of these trace_performance_* calls, so that we do not
have to carry three different tracing mechanisms.



I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.
And I would like to have a set of nested messages like Duy has proposed
be a part of that.

In an independent effort I've found the nested messages being very
helpful in certain contexts.  They are not a replacement for the
various platform tools, like PerfView and friends as discussed earlier
on this thread, but then again I can ask a customer to turn a knob and
run it again and send me the output and hopefully get a rough idea of
the problem -- without having them install a bunch of perf tools.

Jeff

[PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread Jeff King

When we're in batch mode, we end up in batch_object_write()
for each object, which allocates its own strbuf for each
call. Instead, we can provide a single "scratch" buffer that
gets reused for each output. When running:

  git cat-file --batch-all-objects --batch-check='%(objectname)'

on git.git, my best-of-five time drops from:

  real  0m0.171s
  user  0m0.159s
  sys   0m0.012s

to:

  real  0m0.133s
  user  0m0.121s
  sys   0m0.012s

Note that we could do this just by putting the "scratch"
pointer into "struct expand_data", but I chose instead to
add an extra parameter to the callstack. That's more
verbose, but it makes it a bit more obvious what is going
on, which in turn makes it easy to see where we need to be
releasing the string in the caller (right after the loop
which uses it in each case).

Based-on-a-patch-by: René Scharfe 
Signed-off-by: Jeff King 
---
It also made it easy to see that without the prior patch,
we'd have been using "buf" for two parameters. :)

 builtin/cat-file.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 3ed1d0be80..08dced2614 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options 
*opt, struct expand_data *d
}
 }
 
-static void batch_object_write(const char *obj_name, struct batch_options *opt,
+static void batch_object_write(const char *obj_name,
+  struct strbuf *scratch,
+  struct batch_options *opt,
   struct expand_data *data)
 {
-   struct strbuf buf = STRBUF_INIT;
-
if (!data->skip_object_info &&
oid_object_info_extended(the_repository, &data->oid, &data->info,
 OBJECT_INFO_LOOKUP_REPLACE) < 0) {
@@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, 
struct batch_options *opt,
return;
}
 
-   strbuf_expand(&buf, opt->format, expand_format, data);
-   strbuf_addch(&buf, '\n');
-   batch_write(opt, buf.buf, buf.len);
-   strbuf_release(&buf);
+   strbuf_reset(scratch);
+   strbuf_expand(scratch, opt->format, expand_format, data);
+   strbuf_addch(scratch, '\n');
+   batch_write(opt, scratch->buf, scratch->len);
 
if (opt->print_contents) {
print_object_or_die(opt, data);
@@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, struct 
batch_options *opt,
}
 }
 
-static void batch_one_object(const char *obj_name, struct batch_options *opt,
+static void batch_one_object(const char *obj_name,
+struct strbuf *scratch,
+struct batch_options *opt,
 struct expand_data *data)
 {
struct object_context ctx;
@@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, struct 
batch_options *opt,
return;
}
 
-   batch_object_write(obj_name, opt, data);
+   batch_object_write(obj_name, scratch, opt, data);
 }
 
 struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
struct oidset *seen;
+   struct strbuf *scratch;
 };
 
 static int batch_object_cb(const struct object_id *oid, void *vdata)
 {
struct object_cb_data *data = vdata;
oidcpy(&data->expand->oid, oid);
-   batch_object_write(NULL, data->opt, data->expand);
+   batch_object_write(NULL, data->scratch, data->opt, data->expand);
return 0;
 }
 
@@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt)
 
cb.opt = opt;
cb.expand = &data;
+   cb.scratch = &output;
 
if (opt->unordered) {
struct oidset seen = OIDSET_INIT;
@@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt)
oid_array_clear(&sa);
}
 
+   strbuf_release(&output);
return 0;
}
 
@@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   batch_one_object(input.buf, opt, &data);
+   batch_one_object(input.buf, &output, opt, &data);
}
 
strbuf_release(&input);
+   strbuf_release(&output);
warn_on_object_refname_ambiguity = save_warning;
return retval;
 }
-- 
2.18.0.1066.g0d97f3a098

[PATCH 4/4] for_each_*_object: move declarations to object-store.h

2018-08-14 Thread Jeff King

The for_each_loose_object() and for_each_packed_object()
functions are meant to be part of a unified interface: they
use the same set of for_each_object_flags, and it's not
inconceivable that we might one day add a single
for_each_object() wrapper around them.

Let's put them together in a single file, so we can avoid
awkwardness like saying "the flags for this function are
over in cache.h". Moving the loose functions to packfile.h
is silly. Moving the packed functions to cache.h works, but
makes the "cache.h is a kitchen sink" problem worse. The
best place is the recently-created object-store.h, since
these are quite obviously related to object storage.

The for_each_*_in_objdir() functions do not use the same
flags, but they are logically part of the same interface as
for_each_loose_object(), and share callback signatures. So
we'll move those, as well, as they also make sense in
object-store.h.

Signed-off-by: Jeff King 
---
This patch also happens to be a nice showcase for --color-moved.

 builtin/prune-packed.c |  1 +
 cache.h| 75 ---
 object-store.h | 90 ++
 packfile.h | 20 --
 4 files changed, 91 insertions(+), 95 deletions(-)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index 4ff525e50f..a9e7b552b9 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -3,6 +3,7 @@
 #include "progress.h"
 #include "parse-options.h"
 #include "packfile.h"
+#include "object-store.h"
 
 static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
diff --git a/cache.h b/cache.h
index 6d14702df2..aee36afa54 100644
--- a/cache.h
+++ b/cache.h
@@ -1575,81 +1575,6 @@ extern int odb_mkstemp(struct strbuf *temp_filename, 
const char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * Iterate over the files in the loose-object parts of the object
- * directory "path", triggering the following callbacks:
- *
- *  - loose_object is called for each loose object we find.
- *
- *  - loose_cruft is called for any files that do not appear to be
- *loose objects. Note that we only look in the loose object
- *directories "objects/[0-9a-f]{2}/", so we will not report
- *"objects/foobar" as cruft.
- *
- *  - loose_subdir is called for each top-level hashed subdirectory
- *of the object directory (e.g., "$OBJDIR/f0"). It is called
- *after the objects in the directory are processed.
- *
- * Any callback that is NULL will be ignored. Callbacks returning non-zero
- * will end the iteration.
- *
- * In the "buf" variant, "path" is a strbuf which will also be used as a
- * scratch buffer, but restored to its original contents before
- * the function returns.
- */
-typedef int each_loose_object_fn(const struct object_id *oid,
-const char *path,
-void *data);
-typedef int each_loose_cruft_fn(const char *basename,
-   const char *path,
-   void *data);
-typedef int each_loose_subdir_fn(unsigned int nr,
-const char *path,
-void *data);
-int for_each_file_in_obj_subdir(unsigned int subdir_nr,
-   struct strbuf *path,
-   each_loose_object_fn obj_cb,
-   each_loose_cruft_fn cruft_cb,
-   each_loose_subdir_fn subdir_cb,
-   void *data);
-int for_each_loose_file_in_objdir(const char *path,
- each_loose_object_fn obj_cb,
- each_loose_cruft_fn cruft_cb,
- each_loose_subdir_fn subdir_cb,
- void *data);
-int for_each_loose_file_in_objdir_buf(struct strbuf *path,
- each_loose_object_fn obj_cb,
- each_loose_cruft_fn cruft_cb,
- each_loose_subdir_fn subdir_cb,
- void *data);
-
-/*
- * Flags for for_each_*_object(), including for_each_loose below and
- * for_each_packed in packfile.h.
- */
-enum for_each_object_flags {
-   /* Iterate only over local objects, not alternates. */
-   FOR_EACH_OBJECT_LOCAL_ONLY = (1<<0),
-
-   /* Only iterate over packs obtained from the promisor remote. */
-   FOR_EACH_OBJECT_PROMISOR_ONLY = (1<<1),
-
-   /*
-* Visit objects within a pack in packfile order rather than .idx order
-*/
-   FOR_EACH_OBJECT_PACK_ORDER = (1<<2),
-};
-
-/*
- * Iterate over all accessible loose objects without respect to
- * reachability. By default, this includes both local and alternate objects.
- * The order in which objects are visited is unspecified.
- *
- * Any flags specific to packs are

[PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet

2018-08-14 Thread Stefan Beller

In a56771a668d (builtin/pull: respect verbosity settings in submodules,
2018-01-25), we made sure to pass on both quiet and verbose flag from
builtin/pull.c to the submodule shell script. However git-submodule doesn't
understand a verbose flag, which results in a bug when invoking

  git pull --recurse-submodules -v [...]

There are a few different approaches to fix this bug:

1) rewrite 'argv_push_verbosity' or its caller in builtin/pull.c to
   cap opt_verbosity at 0. Then 'argv_push_verbosity' would only add
   '-q' if any.

2) Have a flag in 'argv_push_verbosity' that specifies if we allow adding
  -q or -v (or both).

3) Add -v to git-submodule.sh and make it a no-op

(1) seems like a maintenance burden: What if we add code after
the submodule operations or move submodule operations higher up,
then we have altered the opt_verbosity setting further down the line
in builtin/pull.c.

(2) seems like it could work reasonably well without more regressions

(3) seems easiest to implement as well as actually is a feature with the
last-one-wins rule of passing flags to Git commands.

Signed-off-by: Stefan Beller 
---

On Tue, Aug 14, 2018 at 10:54 AM Jochen Kühner  wrote:
>
> If I set 
> git config --global submodule.recurse true
> and run git via:
> git pull --progress -v --no-rebase "origin"
> The command will fail with following output (Errorlevel is 1)
> Fetching submodule submodules/tstemplates
> From http://10.0.102.194:7990/scm/mcc/tstemplates
> = [up to date]  feature/robin -> origin/feature/robin
> = [up to date]  master-> origin/master
> Already up to date.
> usage: git submodule [--quiet] add [-b ] [-f|--force] [--name ] 
> [--reference ] [--]  []
>or: git submodule [--quiet] status [--cached] [--recursive] [--] 
> [...]
>or: git submodule [--quiet] init [--] [...]
>or: git submodule [--quiet] deinit [-f|--force] (--all| [--] ...)
>or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] 
> [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] 
> [--reference ] [--recursive] [--] [...]
>or: git submodule [--quiet] summary [--cached|--files] [--summary-limit 
> ] [commit] [--] [...]
>or: git submodule [--quiet] foreach [--recursive] 
>or: git submodule [--quiet] sync [--recursive] [--] [...]
>or: git submodule [--quiet] absorbgitdirs [--] [...]
>
> seams that the “verbose” parameter “-v” is also sent to “git submodules” wich 
> does not support it.
>
> If I remove “-v” it will work.
>
> Problem is, I use TortoiseGit, wich will automatically create this command!

 git-submodule.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git-submodule.sh b/git-submodule.sh
index 8b5ad59bdee..f7fd80345cd 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -438,6 +438,9 @@ cmd_update()
-q|--quiet)
GIT_QUIET=1
;;
+   -v)
+   GIT_QUIET=0
+   ;;
--progress)
progress=1
;;
-- 
2.18.0.265.g16de1b435c9.dirty

Re: [PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet

2018-08-14 Thread Jonathan Nieder

Stefan Beller wrote:

>  git-submodule.sh | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b5ad59bdee..f7fd80345cd 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -438,6 +438,9 @@ cmd_update()
>   -q|--quiet)
>   GIT_QUIET=1
>   ;;
> + -v)
> + GIT_QUIET=0
> + ;;

Reviewed-by: Jonathan Nieder 

Thanks.

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Johannes Sixt


Am 14.08.2018 um 00:37 schrieb Jeff King:

And then you can do something like:

   for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do
 >out ;# clean up from last run
 echo "Trying $size..."
 timeout 5 ./write $size out
 if ! ./check $size 

I used your programs with necessary adjustments (as fork() is not 
available), and did similar tests with concurrent processes. With packet 
sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did 
not observe any overlapping or short writes.


I'm now very confident that we are on the safe side for our purposes.

-- Hannes

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Duy Nguyen

On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:
> I'm looking at adding code to my SLOG (better name suggestions welcome)
> patch series to eventually replace the existing git_trace facility.

Complement maybe. Replace, please no. I'd rather not stare at json messages.
-- 
Duy

Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 11:15 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
> configuration variable. When set to e.g. 2, this causes send-email to
> sleep 2 seconds before sending the next E-Mail. We'll only sleep
> between sends, not before the first send, or after the last.
>
> This option has two uses. Firstly, to be able to Ctrl+C a long send
> with "all" if you have a change of heart. Secondly, as a hack in some
> mail setups to, with a sufficiently high delay, force the receiving
> client to sort the E-Mails correctly.
>
> Some popular E-Mail clients completely ignore the "Date" header, which
> format-patch is careful to set such that the patches will be displayed
> in order, and instead sort by the time the E-mail was received.
>
> Google's GMail is a good example of such a client. It ostensibly sorts
> by some approximation of received time (although not by any "Received"
> header). It's more usual than not to see patches showing out of order
> in GMail. To take a few examples of orders seen on patches on the Git
> mailing list:
>
> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
> 3 -> 2 -> 1 (fast-import by Jameson Miller)
> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
>
> The reason to add the new "X-Mailer-Send-Delay" header is to make it
> easy to tell what the imposed delay was, if any. This allows for
> gathering some data on how the transfer of E-Mails with & without this
> option behaves. This may not be workable without really long delays,
> see [1] and [2].
>
> The reason for why the getopt format is "send-delay=s" instead of
> "send-delay=d" is because we're doing manual validation of the value
> we get passed, which getopt would corrupt in cases of e.g. float
> values before we could show a sensible error message.
>
> 1. 
> https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/
> 2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>
> I submitted this back in March hoping it would solve mail ordering
> problems, but the other motive I had for this is that I'm paranoid
> that I'm sending out bad E-Mails, and tend to "y" to each one because
> "a" is too fast.'

Heh. GMail seems to have added an Undo button in their UI, which
would be the same feature as this one. (Hit Ctrl+C in time to "undo"
the sending command)

I have been bitten quite a few times by using "a" as I had old
series still laying around, such that it would send a new series and parts
of the old series (or when you changed subjects and resend another
iteration of a series, you may end up with two "patch 1"s).
So I learned to be careful before pressing "a" on sending.

Maybe the underlying issue is that you really only want to send a series
and not "all" as send-email asks for.
So maybe that dialog could learn a [s]eries switch, which would
check either filenames to count up, or if the base that is recorded
(base-commit for first and prerequisite-patch-id for followups)
is consistent.

Send this email? ([y]es|[n]o|[e]dit|[q]uit|[a]ll|[s]eries):

Another note:
I personally never use no/quit, but Ctrl+C for both cases.

This is also different than the feature of 5453b83bdf9 (send-email
--batch-size to work around some SMTP server limit, 2017-05-21)
which introduced sendemail.smtpReloginDelay, which would offer the
same functionality when the batch-size is set to 1. (Although this would
keep you connected to the server as well as add the X-Mailer-Send-Delay
header, which is nothing from the official email RFC, but your own invention?)

Having sorted mails in GMail would be nice!

Thanks,
Stefan

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:
>
> On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:
> > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > patch series to eventually replace the existing git_trace facility.
>
> Complement maybe. Replace, please no. I'd rather not stare at json messages.

>From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/

Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Eric Wong

Ævar Arnfjörð Bjarmason  wrote:
> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
> configuration variable. When set to e.g. 2, this causes send-email to
> sleep 2 seconds before sending the next E-Mail. We'll only sleep
> between sends, not before the first send, or after the last.
> 
> This option has two uses. Firstly, to be able to Ctrl+C a long send
> with "all" if you have a change of heart. Secondly, as a hack in some
> mail setups to, with a sufficiently high delay, force the receiving
> client to sort the E-Mails correctly.
> 
> Some popular E-Mail clients completely ignore the "Date" header, which
> format-patch is careful to set such that the patches will be displayed
> in order, and instead sort by the time the E-mail was received.
> 
> Google's GMail is a good example of such a client. It ostensibly sorts
> by some approximation of received time (although not by any "Received"
> header). It's more usual than not to see patches showing out of order
> in GMail. To take a few examples of orders seen on patches on the Git
> mailing list:
> 
> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
> 3 -> 2 -> 1 (fast-import by Jameson Miller)
> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
> 
> The reason to add the new "X-Mailer-Send-Delay" header is to make it
> easy to tell what the imposed delay was, if any. This allows for
> gathering some data on how the transfer of E-Mails with & without this
> option behaves. This may not be workable without really long delays,
> see [1] and [2].

Aside from the new header, I think this is better implemented
using the existing $relogin_delay and $batch_size=1.

Disconnecting during the delay might be more sympathetic to
existing mail servers (which aren't C10K-optimized).  If the
client sleeps, the server may disconnect the client anyways
to save resources.

> @@ -1741,6 +1747,10 @@ sub process_file {
>   $message, $xfer_encoding, $target_xfer_encoding);
>   push @xh, "Content-Transfer-Encoding: $xfer_encoding";
>   unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
> + if ($send_delay && $i > 0) {
> + push @xh, "X-Mailer-Send-Delay: $send_delay";
> + sleep $send_delay;
> + }

We can add this header for relogin_delay + batch_size

But maybe --send-delay can be a shortcut for
--relogin-delay and --batch-size=1

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Duy Nguyen

On Tue, Aug 14, 2018 at 8:44 PM Stefan Beller  wrote:
>
> On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:
> >
> > On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  
> > wrote:
> > > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > > patch series to eventually replace the existing git_trace facility.
> >
> > Complement maybe. Replace, please no. I'd rather not stare at json messages.
>
> From the sidelines: We'd only need one logging infrastructure in place, as the
> formatting would be done as a later step? For local operations we'd certainly
> find better formatting than json, and we figured that we might end up desiring
> ProtocolBuffers[1] instead of JSon, so if it would be easy to change
> the output of
> the structured logging easily that would be great.

These trace messages are made for human consumption. Granted
occasionally we need some processing but I find one liners mostly
suffice. Now we turn these into something made for machines, turning
people to second citizens. I've read these messages reformatted for
human, it's usually too verbose even if it's reformatted.

> But AFAICT these series are all about putting the sampling points into the
> code base, so formatting would be orthogonal to it?

It's not just sampling points. There's things like index id being
shown in the message for example. I prefer to keep free style format
to help me read. There's also things like indentation I do here to
help me read. Granted you could do all that with scripts and stuff,
but will we pass around in mail  dumps of json messages to be decoded
locally?

> Stefan
>
> [1] https://developers.google.com/protocol-buffers/

-- 
Duy

Re: [PATCH 4/4] range-diff: indent special lines as context

2018-08-14 Thread Johannes Schindelin

Hi Stefan,

On Mon, 13 Aug 2018, Stefan Beller wrote:

> > > The later lines that indicate a change to the Makefile will be treated as
> > > context both in the outer and inner diff, such that those lines stay
> > > regular color.
> >
> > While I am a fan of having those lines colored correctly, I have to admit
> > that I am not exactly enthusiastic about that extra indentation...
> >
> > Otherwise, this looks good to me.
> 
> Can you explain what makes you less enthused about the indentation?
> 
> Advantage:
> * allows easy coloring (easy implementation)
> Disadvantage:
> * formats change,

This is it. It breaks my visual flow.

> but the range diff is still in its early design phase, so we're not
> breaking things, yet?

Indeed. We're not breaking things. If you feel strongly about it, we can
have that indentation, I *can* get used to it.

>   (Do we ever plan on sending range-diff patches that can be applied to
>   rewrite history? I am very uncertain on such a feature request.  It
>   sounds cool, though)

I remember that I heard you discussing this internally. I am not too big a
fan of this idea, I have to admit. The range diff seems more designed to
explain how a patch series evolved, rather than providing machine-readable
data that allows to recreate said evolution. For example, the committer
information as well as the date are missing, which would preclude a
faithful reconstruction.

And that is not all: if you wanted to "apply" a range diff, you would need
to know more about the base(s) of the two commit ranges. You would need to
know that they are at least very similar to the base onto which you want
to apply this.

And quite seriously, this would be the wrong way to go in my mind. We have
a very efficient data format to transport all of that information: the Git
bundle.

Let's not overload the range diff format with multiple, partially
contradicting purposes. Think "separation of concerns". It's the same
issue, really, as trying to send highly structured data such as bug
reports or code contributions via a medium meant to send unstructured
plain or formatted text back and forth between human beings.

Ciao,
Dscho

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 11:04:06AM -0700, Brandon Williams wrote:

> > I think this backwards-compatibility is necessary to avoid pain. But
> > until it goes away, I don't think this is helping the vulnerability from
> > 0383bbb901. Because there the issue was that the submodule name pointed
> > back into the working tree, so this access() would find the untrusted
> > working tree code and say "ah, an old-fashioned name!".
> [...]
> 
> Oh I know that this doesn't help with that vulnerability.  As you've
> said we fix it and now disallow ".." at the submodule-config level so
> really this path is simply about using what we get out of
> submodule-config in a more sane manor.

OK, I'm alright with that as long as we are all on the same page. I
think I mistook "this addresses the vulnerability" from your commit
message the wrong way. I took it as "this patch", but reading it again,
you simply mean "the '..' handling we already did".

I do think eventually dropping this back-compatibility could save us
from another directory-escape problem, but it's hard to justify the
real-world pain for a hypothetical benefit. Maybe in a few years we
could get rid of it in a major version bump.

> > One interesting thing about url-encoding is that it's not one-to-one.
> > This case could also be %2F, which is a different file (on a
> > case-sensitive filesystem). I think "%20" and "+" are similarly
> > interchangeable.
> > 
> > If we were decoding the filenames, that's fine. The round-trip is
> > lossless.
> > 
> > But that's not quite how the new code behaves. We encode the input and
> > then check to see if it matches an encoding we previously performed. So
> > if our urlencode routines ever change, this will subtly break.
> > 
> > I don't know how much it's worth caring about. We're not that likely to
> > change the routines ourself (though certainly a third-party
> > implementation would need to know our exact url-encoding decisions).
> 
> This is exactly the reason why I wanted to get some opinions on what the
> best thing to do here would be.  I _think_ the best thing would probably
> be to write a specific routine to do the conversion, and it wouldn't
> even have to be all that complex.  Basically I'm just interested in
> converting '/' characters so that things no longer behave like
> nested directories.

I think we benefit from catching names that would trigger filesystem
case-folding, too. If I have submodules with names "foo" and "FOO", we
would not want to confuse them (or at least we should confuse them
equally on all platforms). I doubt you can do anything malicious, but it
might simply be annoying.

That implies to me using a custom function (even if its encoded form
ends up being understandable as url-encoding).

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Jonathan Nieder

Hi,

Brandon Williams wrote:
> On 08/09, Jeff King wrote:

>> One interesting thing about url-encoding is that it's not one-to-one.
>> This case could also be %2F, which is a different file (on a
>> case-sensitive filesystem). I think "%20" and "+" are similarly
>> interchangeable.
>>
>> If we were decoding the filenames, that's fine. The round-trip is
>> lossless.
>>
>> But that's not quite how the new code behaves. We encode the input and
>> then check to see if it matches an encoding we previously performed. So
>> if our urlencode routines ever change, this will subtly break.
>>
>> I don't know how much it's worth caring about. We're not that likely to
>> change the routines ourself (though certainly a third-party
>> implementation would need to know our exact url-encoding decisions).
>
> This is exactly the reason why I wanted to get some opinions on what the
> best thing to do here would be.  I _think_ the best thing would probably
> be to write a specific routine to do the conversion, and it wouldn't
> even have to be all that complex.  Basically I'm just interested in
> converting '/' characters so that things no longer behave like
> nested directories.

First of all, I think the behavior with this patch is already much
better than the previous status quo.  I'm using the patch now and am
very happy with it.

Second, what if we store the pathname in config?  We already store the
URL there:

[submodule "plugins/hooks"]
url = https://gerrit.googlesource.com/plugins/hooks

So we could (as a followup patch) do something like

[submodule "plugins/hooks"]
url = https://gerrit.googlesource.com/plugins/hooks
gitdirname = plugins%2fhooks

and use that for lookups instead of regenerating the directory name.
What do you think?

Thanks,
Jonathan

[PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Ævar Arnfjörð Bjarmason

The tests added in 63e95beb08 ("submodule: port resolve_relative_url
from shell to C", 2016-04-15) didn't do a good job of testing various
up-path invocations where the up-path would bring us beyond even the
URL in question without emitting an error.

These results look nonsensical, but it's worth exhaustively testing
them before fixing any of this code, so we can see which of these
cases were changed.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

The reason I started to poke at this was because I wanted to add
support for relative URLs like //group/project.

This solves the problem of there being e.g. a myhost.com/group/project
which includes a ../submodule submodule pointing to
myhost.com/group/submodule , and me wanting to fork it to
myhost.com/myuser/project.

But of course with ../submodule the relative URL will point to
myhost.com/myuser/submodule, which won't exist.

So I was hoping to find some point in the relative URL code where we
were actually aware of what the host boundary was, so I could just cut
off everything after that if //group/project was provided, strip off
one slash to make it /group/project, and call it a day.

But as the tests below show the code isn't aware of that at all, and
will happily trample over the host(name) to produce nonsensical
results.

So I think these tests are worthwihle in themselves, but would like
some advice on how to proceed with that from someone more familiar
with submodules.

 t/t0060-path-utils.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
index 21a8b53132..cd74c0a471 100755
--- a/t/t0060-path-utils.sh
+++ b/t/t0060-path-utils.sh
@@ -330,6 +330,9 @@ test_submodule_relative_url "(null)" "../foo" 
"../submodule" "../submodule"
 test_submodule_relative_url "(null)" "./foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "./foo" "../submodule" "submodule"
 test_submodule_relative_url "(null)" "//somewhere else/repo" "../subrepo" 
"//somewhere else/subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" "../../subrepo" 
"//subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" 
"../../../subrepo" "/subrepo"
+test_submodule_relative_url "(null)" "//somewhere else/repo" 
"../../../../subrepo" "subrepo"
 test_submodule_relative_url "(null)" "$(pwd)/subsuper_update_r" 
"../subsubsuper_update_r" "$(pwd)/subsubsuper_update_r"
 test_submodule_relative_url "(null)" "$(pwd)/super_update_r2" 
"../subsuper_update_r" "$(pwd)/subsuper_update_r"
 test_submodule_relative_url "(null)" "$(pwd)/." "../." "$(pwd)/."
@@ -344,10 +347,20 @@ test_submodule_relative_url "(null)" "file:///tmp/repo" 
"../subrepo" "file:///tm
 test_submodule_relative_url "(null)" "foo/bar" "../submodule" "foo/submodule"
 test_submodule_relative_url "(null)" "foo" "../submodule" "submodule"
 test_submodule_relative_url "(null)" "helper:://hostname/repo" "../subrepo" 
"helper:://hostname/subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../subrepo" 
"helper:://subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../subrepo" "helper::/subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../subrepo" "helper::subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../../subrepo" "helper:subrepo"
+test_submodule_relative_url "(null)" "helper:://hostname/repo" 
"../../../../../../subrepo" ".:subrepo"
 test_submodule_relative_url "(null)" "ssh://hostname/repo" "../subrepo" 
"ssh://hostname/subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../subrepo" 
"ssh://subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../../subrepo" 
"ssh:/subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" 
"../../../../subrepo" "ssh:subrepo"
+test_submodule_relative_url "(null)" "ssh://hostname/repo" 
"../../../../../subrepo" ".:subrepo"
 test_submodule_relative_url "(null)" "ssh://hostname:22/repo" "../subrepo" 
"ssh://hostname:22/subrepo"
 test_submodule_relative_url "(null)" "user@host:path/to/repo" "../subrepo" 
"user@host:path/to/subrepo"
 test_submodule_relative_url "(null)" "user@host:repo" "../subrepo" 
"user@host:subrepo"
+test_submodule_relative_url "(null)" "user@host:repo" "../../subrepo" 
".:subrepo"
 
 test_expect_success 'match .gitmodules' '
test-tool path-utils is_dotgitmodules \
-- 
2.18.0.865.gffc8e1a3cd6

Re: [PATCH 4/4] range-diff: indent special lines as context

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 11:54 AM Johannes Schindelin
 wrote:
>
> Hi Stefan,
>
> On Mon, 13 Aug 2018, Stefan Beller wrote:
>
> > > > The later lines that indicate a change to the Makefile will be treated 
> > > > as
> > > > context both in the outer and inner diff, such that those lines stay
> > > > regular color.
> > >
> > > While I am a fan of having those lines colored correctly, I have to admit
> > > that I am not exactly enthusiastic about that extra indentation...
> > >
> > > Otherwise, this looks good to me.
> >
> > Can you explain what makes you less enthused about the indentation?
> >
> > Advantage:
> > * allows easy coloring (easy implementation)
> > Disadvantage:
> > * formats change,
>
> This is it. It breaks my visual flow.
>
> > but the range diff is still in its early design phase, so we're not
> > breaking things, yet?
>
> Indeed. We're not breaking things. If you feel strongly about it, we can
> have that indentation, I *can* get used to it.

I only feel strongly about it now as that is the *easiest* way to make
the colors
look like I want them to look. And I really value colors in the range-diff.
Earlier you said that color-less range-diff is nearly useless for you and I
thought it was hyperbole, but by now I realize how much truth you spoke.
So getting the colors fixed to not markup files (+++/ --- lines of the inner
diff) is a high priority for me. So high that I would compromise on the
indentation/flow of these corner case areas.

> >   (Do we ever plan on sending range-diff patches that can be applied to
> >   rewrite history? I am very uncertain on such a feature request.  It
> >   sounds cool, though)
>
> I remember that I heard you discussing this internally. I am not too big a
> fan of this idea, I have to admit. The range diff seems more designed to
> explain how a patch series evolved, rather than providing machine-readable
> data that allows to recreate said evolution. For example, the committer
> information as well as the date are missing, which would preclude a
> faithful reconstruction.
>

Ah! good point. Though we could just work around that and use the email
date for the new author dates. ;-)

> And that is not all: if you wanted to "apply" a range diff, you would need
> to know more about the base(s) of the two commit ranges. You would need to
> know that they are at least very similar to the base onto which you want
> to apply this.

You would say so in the cover letter "This is a resend of sb/range-diff-colors"
and by the knowledge of that tip only and the range-diff you would
know how the new series would look like, even if it was rebased.

> And quite seriously, this would be the wrong way to go in my mind. We have
> a very efficient data format to transport all of that information: the Git
> bundle.

The bundle format is very efficient for machine transport, but I thought the
whole point of the mailing list was easy human readable parts, i.e. you can
point out things in a diff, which you could also do in a range-diff to some
extend. We would loose some of the "fresh eyes" as you'd only see the
changed part of the series. :-/ So yeah even for the workflow this seems
a net-negative. I just thought it would be cool.

> Let's not overload the range diff format with multiple, partially
> contradicting purposes. Think "separation of concerns". It's the same
> issue, really, as trying to send highly structured data such as bug
> reports or code contributions via a medium meant to send unstructured
> plain or formatted text back and forth between human beings.

ok.

Thanks,
Stefan

Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason
 wrote:
>
> The tests added in 63e95beb08 ("submodule: port resolve_relative_url
> from shell to C", 2016-04-15) didn't do a good job of testing various
> up-path invocations where the up-path would bring us beyond even the
> URL in question without emitting an error.
>
> These results look nonsensical, but it's worth exhaustively testing
> them before fixing any of this code, so we can see which of these
> cases were changed.

Yeah. Please look at the comment in builtin/submodule--helper.c
in that commit, where I described my expectations.

I should have put them into tests instead with the expectations
spelled out there.

Thanks for this patch!
Stefan

>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

> So I think these tests are worthwihle in themselves,

The reason I put it in the comment instead of tests was the
ease of spelling out both the status quo and expectations.

> but would like
> some advice on how to proceed with that from someone more familiar
> with submodules.

So ideally we'd also error out as soon as the host name is touched?

Re: [PATCH] mingw: enable atomic O_APPEND

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 08:29:04PM +0200, Johannes Sixt wrote:

> Am 14.08.2018 um 00:37 schrieb Jeff King:
> > And then you can do something like:
> > 
> >for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do
> >  >out ;# clean up from last run
> >  echo "Trying $size..."
> >  timeout 5 ./write $size out
> >  if ! ./check $size  >echo "$size failed"
> >break
> >  fi
> >done
> > 
> > On my Linux system, each of those seems to write several gigabytes
> > without overlapping. I did manage to hit some failing cases, but they
> > were never sheared writes, but rather cases where there was an
> > incomplete write at the end-of-file.
> 
> I used your programs with necessary adjustments (as fork() is not
> available), and did similar tests with concurrent processes. With packet
> sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did not
> observe any overlapping or short writes.
> 
> I'm now very confident that we are on the safe side for our purposes.

Great, thanks for testing!

Re-reading what I wrote about end-of-file above and thinking about the
conversation with Ævar elsewhere in the thread, I suspect it _is_ easy
to get overlapping writes if the processes are receiving signals (since
clearly the TERM signal caused a partial write).

My experiment doesn't simulate that at all. I suppose the parent process
could send SIGUSR1 to the child in each loop, and the child would catch
it but keep going.

Hmm, that was easy enough to do (programs below for reference), but
surprisingly it didn't fail for me (except for the normal end-of-file
truncation). It's like the OS is willing to truncate the write of a
dying program but not one for a signal that is getting handled. Which is
great for us, since it's exactly what we want, but makes me even more
suspicious that a non-Linux kernel might behave completely differently.

I still think we're fine in practice, as I'd expect any kernel to be
atomic under the page size. So this was mostly just for my own
edification.

-Peff

-- >8 --
/* check.c, with separate short-read reporting */
#include 
#include 
#include 

int main(int argc, const char **argv)
{
int size = atoi(argv[1]);
int block = 0;
char *buf;

buf = malloc(size);
while (1) {
int i;
/* assume atomic reads */
int r = read(0, buf, size);
if (!r)
break;
if (r < size) {
fprintf(stderr, "short read\n");
return 1;
}
for (i = 1; i < size; i++) {
if (buf[i] != buf[0]) {
fprintf(stderr, "overlap in block %d\n", block);
return 1;
}
}
block++;
}
}
-- >8 --

-- >8 --
/* write.c with signals; you can also confirm via strace
   that each write is atomic */
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void handle_signal(int sig)
{
/* do nothing */
}

static void doit(int size, const char *fn, char c, pid_t pid)
{
int fd;
char *buf;

fd = open(fn, O_WRONLY|O_APPEND|O_CREAT, 0666);
if (fd < 0) {
perror("open");
return;
}

buf = malloc(size);
memset(buf, c, size);

while (1) {
if (pid)
kill(pid, SIGUSR1);
write(fd, buf, size);
}
}

int main(int argc, const char **argv)
{
int size = atoi(argv[1]);
pid_t pid;

signal(SIGUSR1, handle_signal);

pid = fork();
if (pid)
doit(size, argv[2], '1', pid);
else
doit(size, argv[2], '2', pid);
return 0;
}
-- >8 --

Re: [PATCH v4 0/5] Speed up unpack_trees()

2018-08-14 Thread Ben Peart





On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:

v4 has a bunch of changes

- 1/5 is a new one to show indented tracing. This way it's less
   misleading to read nested time measurements
- 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
   check instead in his final note, but I think this is safer (yeah I'm
   scared too)
- the old 4/4 is dropped because
   - it assumes n-way logic
   - the visible time saving is not worth the tradeoff
   - Elijah gave me an idea to avoid add_index_entry() that I think
 does not have n-way logic assumptions and gives better saving.
 But it requires some more changes so I'm going to do it later
- 5/5 is also new and should help reduce cache_tree_update() cost.
   I wrote somewhere I was not going to work on this part, but it turns
   out just a couple lines, might as well do it now.

Interdiff



I've now had a chance to run the git tests, as well as our own unit and 
functional tests with this patch series and all passed.


I reviewed the tests in t0090-cache-tree.h and verified that there are 
tests that validate the cache tree is correct after doing a checkout and 
merge (both of which exercise the new cache tree optimization in patch 5).


I've also run our perf test suite and the results are outstanding:

Checkout saves 51% on average
Merge saves 44%
Pull saves 30%
Rebase saves 26%

For perspective, that means these commands are going from ~20 seconds to 
~10 seconds.


I don't feel that any of my comments to the individual patches deserve a 
re-roll.  Given the ongoing discussion about the additional tracing - 
I'm happy to leave out the first 2 patches so that the rest can go in 
sooner rather than later.


Looks good!


diff --git a/cache-tree.c b/cache-tree.c
index 0dbe10fc85..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it,
  
  int cache_tree_update(struct index_state *istate, int flags)

  {
-   uint64_t start = getnanotime();
struct cache_tree *it = istate->cache_tree;
struct cache_entry **cache = istate->cache;
int entries = istate->cache_nr;
@@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int 
flags)
  
  	if (i)

return i;
+   trace_performance_enter();
i = update_one(it, cache, entries, "", 0, &skip, flags);
+   trace_performance_leave("cache_tree_update");
if (i < 0)
return i;
istate->cache_changed |= CACHE_TREE_CHANGED;
-   trace_performance_since(start, "repair cache-tree");
return 0;
  }
  
diff --git a/cache.h b/cache.h

index e6f7ee4b64..8b447652a7 100644
--- a/cache.h
+++ b/cache.h
@@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const 
char *name, int name
  #define ADD_CACHE_JUST_APPEND 8   /* Append only; 
tree.c::read_tree() */
  #define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */
  #define ADD_CACHE_KEEP_CACHE_TREE 32  /* Do not invalidate cache-tree */
-#define ADD_CACHE_SKIP_VERIFY_PATH 64  /* Do not verify path */
  extern int add_index_entry(struct index_state *, struct cache_entry *ce, int 
option);
  extern void rename_index_entry_at(struct index_state *, int pos, const char 
*new_name);
  
diff --git a/diff-lib.c b/diff-lib.c

index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
  int run_diff_index(struct rev_info *revs, int cached)
  {
struct object_array_entry *ent;
-   uint64_t start = getnanotime();
  
+	trace_performance_enter();

ent = revs->pending.objects;
if (diff_cache(revs, &ent->item->oid, ent->name, cached))
exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
diffcore_fix_diff_index(&revs->diffopt);
diffcore_std(&revs->diffopt);
diff_flush(&revs->diffopt);
-   trace_performance_since(start, "diff-index");
+   trace_performance_leave("diff-index");
return 0;
  }
  
diff --git a/dir.c b/dir.c

index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
   const char *path, int len, const struct pathspec *pathspec)
  {
struct untracked_cache_dir *untracked;
-   uint64_t start = getnanotime();
  
  	if (has_symlink_leading_path(path, len))

return dir->nr;
  
+	trace_performance_enter();

untracked = validate_untracked_cache(dir, len, pathspec);
if (!untracked)
/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
dir->nr = i;
}
  
-	trace_performance_since(start, "read directory %.*s", len, path);

+   trace_performance_leave("read directory %.*s", len, path);
if (dir->untracked) {
s

Re: [PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread René Scharfe

Am 14.08.2018 um 20:20 schrieb Jeff King:
> When we're in batch mode, we end up in batch_object_write()
> for each object, which allocates its own strbuf for each
> call. Instead, we can provide a single "scratch" buffer that
> gets reused for each output. When running:
> 
>   git cat-file --batch-all-objects --batch-check='%(objectname)'
> 
> on git.git, my best-of-five time drops from:
> 
>   real0m0.171s
>   user0m0.159s
>   sys 0m0.012s
> 
> to:
> 
>   real0m0.133s
>   user0m0.121s
>   sys 0m0.012s
> 
> Note that we could do this just by putting the "scratch"
> pointer into "struct expand_data", but I chose instead to
> add an extra parameter to the callstack. That's more
> verbose, but it makes it a bit more obvious what is going
> on, which in turn makes it easy to see where we need to be
> releasing the string in the caller (right after the loop
> which uses it in each case).
> 
> Based-on-a-patch-by: René Scharfe 
> Signed-off-by: Jeff King 
> ---
> It also made it easy to see that without the prior patch,
> we'd have been using "buf" for two parameters. :)

Good catch.

> 
>  builtin/cat-file.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 3ed1d0be80..08dced2614 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options 
> *opt, struct expand_data *d
>   }
>  }
>  
> -static void batch_object_write(const char *obj_name, struct batch_options 
> *opt,
> +static void batch_object_write(const char *obj_name,
> +struct strbuf *scratch,
> +struct batch_options *opt,
>  struct expand_data *data)
>  {
> - struct strbuf buf = STRBUF_INIT;

We could also avoid passing that buffer around by making it static.  I
shy away from adding static variables because the resulting code won't
be thread-safe, but that fear might be irrational, especially with
cat-file.

> -
>   if (!data->skip_object_info &&
>   oid_object_info_extended(the_repository, &data->oid, &data->info,
>OBJECT_INFO_LOOKUP_REPLACE) < 0) {
> @@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, 
> struct batch_options *opt,
>   return;
>   }
>  
> - strbuf_expand(&buf, opt->format, expand_format, data);
> - strbuf_addch(&buf, '\n');
> - batch_write(opt, buf.buf, buf.len);
> - strbuf_release(&buf);
> + strbuf_reset(scratch);
> + strbuf_expand(scratch, opt->format, expand_format, data);
> + strbuf_addch(scratch, '\n');
> + batch_write(opt, scratch->buf, scratch->len);
>  
>   if (opt->print_contents) {
>   print_object_or_die(opt, data);
> @@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, 
> struct batch_options *opt,
>   }
>  }
>  
> -static void batch_one_object(const char *obj_name, struct batch_options *opt,
> +static void batch_one_object(const char *obj_name,
> +  struct strbuf *scratch,
> +  struct batch_options *opt,
>struct expand_data *data)
>  {
>   struct object_context ctx;
> @@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, 
> struct batch_options *opt,
>   return;
>   }
>  
> - batch_object_write(obj_name, opt, data);
> + batch_object_write(obj_name, scratch, opt, data);
>  }
>  
>  struct object_cb_data {
>   struct batch_options *opt;
>   struct expand_data *expand;
>   struct oidset *seen;
> + struct strbuf *scratch;
>  };
>  
>  static int batch_object_cb(const struct object_id *oid, void *vdata)
>  {
>   struct object_cb_data *data = vdata;
>   oidcpy(&data->expand->oid, oid);
> - batch_object_write(NULL, data->opt, data->expand);
> + batch_object_write(NULL, data->scratch, data->opt, data->expand);
>   return 0;
>  }
>  
> @@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt)
>  
>   cb.opt = opt;
>   cb.expand = &data;
> + cb.scratch = &output;
>  
>   if (opt->unordered) {
>   struct oidset seen = OIDSET_INIT;
> @@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt)
>   oid_array_clear(&sa);
>   }
>  
> + strbuf_release(&output);
>   return 0;
>   }
>  
> @@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt)
>   data.rest = p;
>   }
>  
> - batch_one_object(input.buf, opt, &data);
> + batch_one_object(input.buf, &output, opt, &data);
>   }
>  
>   strbuf_release(&input);
> + strbuf_release(&output);
>   warn_on_object_refname_ambiguity = save_warning;
>

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:

> On 8/13/2018 5:54 PM, Jeff King wrote:
> > So I try not to think too hard on metrics, and just use them to get a
> > rough view on who is active.
> 
> I've been very interested in measuring community involvement, with the
> knowledge that any metric is flawed and we should not ever say "this metric
> is how we measure the quality of a contributor". It can be helpful, though,
> to track some metrics and their change over time.
> 
> Here are a few measurements we can make:

Thanks, it was nice to see a more comprehensive list in one spot.

It would be neat to have a tool that presents all of these
automatically, but I think the email ones are pretty tricky (most people
don't have the whole list archive sitting around).

> 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> etc.).
> 
>     Using git repo:
> 
>     $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> Signed-off-by:|sort|uniq -c|sort -nr|head -n 20

At one point I sent a patch series that would let shortlog group by
trailers. Nobody seemed all that interested and I didn't end up using it
for its original purpose, so I didn't polish it further.  But I'd be
happy to re-submit it if you think it would be useful.

The shell hackery here isn't too bad, but doing it internally is a
little faster, a little more robust (less parsing), and lets you show
more details about the commits themselves (e.g., who reviews whom).

> 3. Number of threads started by user.

You have "started" and "participated in". I guess one more would be
"closed", as in "solved a bug", but that is quite hard to tell without
looking at the content. Taking just the last person in a thread as the
closer means that an OP saying "thanks!" wrecks it. And somebody who
rants long enough that everybody else loses interest gets marked as a
closer. ;)

> If you have other ideas for fun measurements, then please let me know.

I think I mentioned "surviving lines" elsewhere, which I do like this
(and almost certainly stole from Junio a long time ago):

  # Obviously you can tweak this as you like, but the mass-imported bits
  # in compat and xdiff tend to skew the counts. It's possibly worth
  # counting language lines separately.
  git ls-files '*.c' '*.h' :^compat :^contrib :^xdiff |
  while read fn; do
# eye candy
echo >&2 "Blaming $fn..."

# You can use more/fewer -C to dig more or less for code moves.
# Possibly "-w" would help, though I doubt it shifts things more
# than a few percent anyway.
git blame -C --line-porcelain $fn
  done |
  perl -lne '/^author (.*)/ and print $1' |
  sort | uniq -c | sort -rn | head

The output right now is:

  35156 Junio C Hamano
  22207 Jeff King
  17466 Nguyễn Thái Ngọc Duy
  12005 Johannes Schindelin
  10259 Michael Haggerty
   9389 Linus Torvalds
   8318 Brandon Williams
   7776 Stefan Beller
   5947 Christian Couder
   4935 René Scharfe

which seems reasonable.

-Peff

Re: [PATCH 3/4] cat-file: use a single strbuf for all output

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 09:30:57PM +0200, René Scharfe wrote:

> > -static void batch_object_write(const char *obj_name, struct batch_options 
> > *opt,
> > +static void batch_object_write(const char *obj_name,
> > +  struct strbuf *scratch,
> > +  struct batch_options *opt,
> >struct expand_data *data)
> >  {
> > -   struct strbuf buf = STRBUF_INIT;
> 
> We could also avoid passing that buffer around by making it static.  I
> shy away from adding static variables because the resulting code won't
> be thread-safe, but that fear might be irrational, especially with
> cat-file.

True, I didn't even think of that after your original got me in the
mindset of passing the buffer down. It's not too bad to do it this way,
and I agree with you that we are better avoiding static variables if we
can. Five years ago I might have said the opposite, but we've cleaned up
a lot of confusing hidden-static bits in that time. Let's not go in the
opposite direction. :)

-Peff

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:

> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

With the advent of public inbox, this is easy to obtain?

>
> > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By,
> > etc.).
> >
> > Using git repo:
> >
> > $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
>
> At one point I sent a patch series that would let shortlog group by
> trailers. Nobody seemed all that interested and I didn't end up using it
> for its original purpose, so I didn't polish it further.  But I'd be
> happy to re-submit it if you think it would be useful.

I would think it is useful. Didn't Linus also ask for a related thing?
https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/

Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Junio C Hamano

Eric Wong  writes:

>> Some popular E-Mail clients completely ignore the "Date" header, which
>> format-patch is careful to set such that the patches will be displayed
>> in order, and instead sort by the time the E-mail was received.

It is send-email that carefully shows monotonically increasing
timestamps so that the sender's datestamp can be used for sorting by
the recipient, not format-patch, which records author-date,
primarily meant for local replaying, in the generated messages but
discarded by send-email.

> Disconnecting during the delay might be more sympathetic to
> existing mail servers (which aren't C10K-optimized).  If the
> client sleeps, the server may disconnect the client anyways
> to save resources.
>
> But maybe --send-delay can be a shortcut for
> --relogin-delay and --batch-size=1

Both good points to point at a simpler and better solution, I guess.

Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Junio C Hamano

Stefan Beller  writes:

> Thanks for this patch!
> Stefan

Thanks, I'd take it as your Acked-by: (please holler if it isn't
before the patch hits 'next').

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 08:51:41PM +0200, Duy Nguyen wrote:

> > But AFAICT these series are all about putting the sampling points into the
> > code base, so formatting would be orthogonal to it?
> 
> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read. Granted you could do all that with scripts and stuff,
> but will we pass around in mail  dumps of json messages to be decoded
> locally?

I think you could have both forms using the same entry points sprinkled
through the code.

At GitHub we have a similar telemetry-ish thing, where we collect some
data points and then the resulting JSON is stored for every operation
(for a few weeks for read ops, and indefinitely attached to every ref
write).

And I've found that the storage and the trace-style "just show a
human-readable message to stderr" interface complement each other in
both directions:

 - you can output a human readable message that is sent immediately to
   the trace mechanism but _also_ becomes part of the telemetry. E.g.,
   imagine that one item in the json blob is "this is the last message
   from GIT_TRACE_FOO". Now you can push tracing messages into whatever
   plan you're using to store SLOG. We do this less with TRACE, and much
   more with error() and die() messages.

 - when a structured telemetry item is updated, we can still output a
   human-readable trace message with just that item. E.g., with:

 trace_performance(n, "foo");

   we could either store a json key (perf.foo=n) or output a nicely
   formatted string like we do now, depending on what the user has
   configured (or even both, of course).

It helps if the sampling points give enough information to cover both
cases (as in the trace_performance example), but you can generally
shoe-horn unstructured data into the structured log, and pretty-print
structured data.

-Peff

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Matthew DeVore

On Tue, Aug 14, 2018 at 11:18 AM Jonathan Tan  wrote:
>
> > @@ -743,6 +743,9 @@ specification contained in .
> >   A debug option to help with future "partial clone" development.
> >   This option specifies how missing objects are handled.
> >  +
> > +The form '--filter=tree:' omits all blobs and trees deeper than
> > + from the root tree. Currently, only =0 is supported.
> > ++
> >  The form '--missing=error' requests that rev-list stop with an error if
> >  a missing object is encountered.  This is the default action.
> >  +
>
> The "--filter" documentation should go with the other "--filter"
> information, not right after --missing.
Fixed. My problem was that I didn't know what the + meant - I guess it
means that the paragraph before and after are in the same section?

>
> > +test_expect_success 'setup for tests of tree:0' '
> > + mkdir r1/subtree &&
> > + echo "This is a file in a subtree" > r1/subtree/file &&
> > + git -C r1 add subtree/file &&
> > + git -C r1 commit -m subtree
> > +'
>
> Style: no space after >
Fixed.

>
> > +test_expect_success 'grab tree directly when using tree:0' '
> > + # We should get the tree specified directly but not its blobs or 
> > subtrees.
> > + git -C r1 pack-objects --rev --stdout --filter=tree:0 
> > >commitsonly.pack <<-EOF &&
> > + HEAD:
> > + EOF
> > + git -C r1 index-pack ../commitsonly.pack &&
> > + git -C r1 verify-pack -v ../commitsonly.pack >objs &&
> > + grep -E "tree|blob" objs >trees_and_blobs &&
> > + test_line_count = 1 trees_and_blobs
> > +'
>
> Can we also verify that the SHA-1 in trees_and_blobs is what we
> expected?
Done - Now I'm comparing to the output of `git rev-parse HEAD:` and I
don't need the separate line count check either.
>
> > +test_expect_success 'use fsck before and after manually fetching a missing 
> > subtree' '
> > + # push new commit so server has a subtree
> > + mkdir src/dir &&
> > + echo "in dir" > src/dir/file.txt &&
>
> No space after >
Fixed.

>
> > + git -C src add dir/file.txt &&
> > + git -C src commit -m "file in dir" &&
> > + git -C src push -u srv master &&
> > + SUBTREE=$(git -C src rev-parse HEAD:dir) &&
> > +
> > + rm -rf dst &&
> > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst 
> > &&
> > + git -C dst fsck &&
> > + git -C dst cat-file -p $SUBTREE >tree_contents 2>err &&
> > + git -C dst fsck
> > +'
>
> If you don't need to redirect to err, don't do so.
>
> Before the cat-file, also verify that the tree is missing, most likely
> through a "git rev-list" with "--missing=print".
That won't work though - the subtree's hash is not known because its
parent tree is not there. I've merged the three tests in this file,
and as a result am now using the check which makes sure the object
types are only "commit"

>
> And I would grep on the tree_contents to ensure that the filename
> ("file.txt") is there, so that we know that we got the correct tree.
Done.

>
> > +test_expect_success 'can use tree:0 to filter partial clone' '
> > + rm -rf dst &&
> > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst 
> > &&
> > + git -C dst rev-list master --missing=allow-any --objects 
> > >fetched_objects &&
> > + cat fetched_objects \
> > + | awk -f print_1.awk \
> > + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > + sort fetched_types -u >unique_types.observed &&
> > + echo commit > unique_types.expected &&
> > + test_cmp unique_types.observed unique_types.expected
> > +'
> > +
> > +test_expect_success 'auto-fetching of trees with --missing=error' '
> > + git -C dst rev-list master --missing=error --objects >fetched_objects 
> > &&
> > + cat fetched_objects \
> > + | awk -f print_1.awk \
> > + | xargs -n1 git -C dst cat-file -t >fetched_types &&
> > + sort fetched_types -u >unique_types.observed &&
> > + printf "blob\ncommit\ntree\n" >unique_types.expected &&
> > + test_cmp unique_types.observed unique_types.expected
> > +'
>
> These two tests seem redundant with the 'use fsck before and after
> manually fetching a missing subtree' test (after the latter is
> appropriately renamed). I think we only need to test this sequence once,
> which can be placed in one or spread over multiple tests:
>
>  1. partial clone with --filter=tree:0
>  2. fsck works
>  3. verify that trees are indeed missing
>  4. autofetch a tree
>  5. fsck still works
Done - that's much nicer. Thanks!

Here is an interdiff from v4 of the patch:

diff --git a/Documentation/rev-list-options.txt
b/Documentation/rev-list-options.txt
index 9e351ec2a..0b5f77ad3 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,9 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all bl

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote:

> The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
> would filter out all but the root tree and blobs. In order to avoid
> confusion between 0 and capital O, the documentation was worded in a
> somewhat round-about way that also hints at this future improvement to
> the feature.

I'm OK with this as a name, since we're explicitly not supporting deeper
depths. But I'd note that "depth" is actually a tricky characteristic,
as it's not a property of the object itself, but rather who refers to
it. So:

  - it's expensive to compute, because you have to actually walk all of
the possible commits and trees that could refer to it. This
prohibits a lot of other optimizations like reachability bitmaps
(though with some complexity you could cache the depths, too).

  - you have to define it as something like "the minimum depth at which
this object is found", since there may be multiple depths

I think you can read that second definition between the lines of:

> +The form '--filter=tree:' omits all blobs and trees deeper than
> + from the root tree. Currently, only =0 is supported.

But I wonder if we should be more precise. It doesn't matter now, but it
may help set expectations if the feature does come later.

-Peff

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 12:36 PM Jeff King  wrote:
> 
> > Thanks, it was nice to see a more comprehensive list in one spot.
> >
> > It would be neat to have a tool that presents all of these
> > automatically, but I think the email ones are pretty tricky (most people
> > don't have the whole list archive sitting around).
> 
> With the advent of public inbox, this is easy to obtain?

For our project, yes. But I was thinking of a tool that could be used
for other projects, too.

> > At one point I sent a patch series that would let shortlog group by
> > trailers. Nobody seemed all that interested and I didn't end up using it
> > for its original purpose, so I didn't polish it further.  But I'd be
> > happy to re-submit it if you think it would be useful.
> 
> I would think it is useful. Didn't Linus also ask for a related thing?
> https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/

He wanted grouping by committer, which we ended up adding as a separate
feature. I think there's some discussion of the trailer thing in that
thread.

-Peff

Syncing HEAD

2018-08-14 Thread Christian Couder

Hi,

When cloning with --mirror, the clone gets its HEAD initialized with
the value HEAD has in its origin remote. After that if HEAD changes in
origin there is no simple way to sync HEAD at the same time as the
refs are synced.

It looks like the simplest way to sync HEAD is:

1) git remote show origin
2) parse "HEAD branch: XXX" from the output of the above command
3) git symbolic-ref HEAD refs/heads/XXX

It looks like it would be quite easy to add an option to `fetch` to
sync HEAD at the same time as regular refs are synced because every
fetch from an origin that uses a recent Git contains something like:

19:55:39.304976 pkt-line.c:80   packet:  git< 
HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
deepen-since deepen-not deepen-relative no-progress include-tag
multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
agent=git/2.18.0

which in this example shows that HEAD is a symref to refs/heads/test-1
in origin.

Is there a reason why no such option already exists? Would it makes
sense to add one? Is there any reason why it's not a good idea? Or am
I missing something?

I am asking because GitLab uses HEAD in the bare repos it manages to
store the default branch against which the Merge Requests (same thing
as Pull Requests on GitHub) are created.

So when people want to keep 2 GitLab hosted repos in sync, GitLab
needs to sync HEADs too, not just the refs.

I think this could be useful to other setups than GitLab though.

Thanks,
Christian.

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Jeff Hostetler





On 8/14/2018 2:44 PM, Stefan Beller wrote:

On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen  wrote:


On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler  wrote:

I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.


Complement maybe. Replace, please no. I'd rather not stare at json messages.


 From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/



Last time I checked, protocol-buffers has a C++ binding but not
a C binding.

I've not had a chance to use pbuffers, so I have to ask what advantages
would they have over JSON or some other similar self-describing format?
And/or would it be possible for you to tail the json log file and
convert it to whatever format you preferred?

It seems like the important thing is to capture structured data
(whatever the format) to disk first.

Jeff

Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up

2018-08-14 Thread Junio C Hamano

Antonio Ospite  writes:

>  test_expect_success 'error message contains blob reference' '
> + # Remove the error introduced in the previous test.
> + # It is not needed in the following tests.
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   sha1=$(git rev-parse HEAD) &&
>   test-tool submodule-config \

Antonio Ospite  writes:

> Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with
> invalid lines in .gitmodules but then only the second commit is removed.
>
> This may affect future subsequent tests if they assume that the
> .gitmodules file has no errors.
>
> Remove both the commits as soon as they are not needed anymore.
>
> The error introduced in test 5 is also required by test 6, so the two
> commits from above are removed respectively in tests 6 and 8.
>
> Signed-off-by: Antonio Ospite 
> ---
>  t/t7411-submodule-config.sh | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> index 0bde5850ac..c6b6cf6fae 100755
> --- a/t/t7411-submodule-config.sh
> +++ b/t/t7411-submodule-config.sh
> @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets 
> continue' '
>  '
>  
>  test_expect_success 'error message contains blob reference' '
> + # Remove the error introduced in the previous test.
> + # It is not needed in the following tests.
> + test_when_finished "git -C super reset --hard HEAD^" &&

Hmm, that is ugly.  Depending on where in the subshell the previous
test failed, you'd still be taking us to an unexpected place.
Imagine if "git commit -m 'add error'" failed, for example, in the
test before this one.

I am wondering if the proper fix is to merge the previous one and
this one into a single test.  The combined test would

- remember where the HEAD in super is and arrange to come back
  to it when test is done
- break .gitmodules and commit it
- run test-tool and check its output
- also check its error output

in a single test_expect_success.

> @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' '
>  '
>  
>  test_expect_success 'error in history in fetchrecursesubmodule lets 
> continue' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
>   (cd super &&
>   git config -f .gitmodules \
>   submodule.submodule.fetchrecursesubmodules blabla &&
> @@ -134,8 +138,7 @@ test_expect_success 'error in history in 
> fetchrecursesubmodule lets continue' '
>   HEAD b \
>   HEAD submodule \
>   >actual &&
> - test_cmp expect_error actual  &&
> - git reset --hard HEAD^
> + test_cmp expect_error actual
>   )
>  '

If we want to be more robust, you'd probably need to find a better
anchoring point than HEAD, which can be pointing different commit
depending on where in the subshell the process is hit with ^C,
i.e.

ORIG=$(git -C super rev-parse HEAD) &&
test_when_finished "git -C super reset --hard $ORIG" &&
(
cd super &&
...

The patch is still an improvement compared to the current code,
where a broken test-tool that does not produce expected output in
the file 'actual' is guaranteed to leave us at a commit that we do
not expect to be at, but not entirely satisfactory.

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Jonathan Tan

> - grep -E "tree|blob" objs >trees_and_blobs &&
> - test_line_count = 1 trees_and_blobs
> + grep -E "tree|blob" objs \
> + | awk -f print_1.awk >trees_and_blobs &&
> + git -C r1 rev-parse HEAD: >expected &&
> + test_cmp trees_and_blobs expected

Indent "| awk" (and similar lines in this patch) - although I guess it
is likely that you actually have it indented, and your e-mail client
modified the whitespace so that it looks like there is no indent.

Other than that, this interdiff looks good to me. Thanks.

Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out

2018-08-14 Thread Junio C Hamano

Antonio Ospite  writes:

>   /* Equivalent to ACTION_SET in builtin/config.c */
> - if (argc == 3)
> + if (argc == 3) {
> + struct object_id oid;
> +
> + /*
> +  * If the .gitmodules file is not in the working tree but it
> +  * is in the current branch, stop, as writing new values (and
> +  * staging them) would blindly overwrite ALL the old content.

Hmph, "the file is missing" certainly is a condition we would want
to notice, but wouldn't we in general want to prevent us from
overwriting any local modification, where "missing" is merely a very
special case of local modification?  I am wondering if we would want
to stop if .gitmodules file exists both in the working tree and in
the index, and the contents of them differ, or something like that.

> diff --git a/git-submodule.sh b/git-submodule.sh
> index ff258e2e8c..b1cb187227 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -159,6 +159,13 @@ cmd_add()
>   shift
>   done
>  
> + # For more details about this check, see
> + # builtin/submodule--helper.c::module_config()
> + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > 
> /dev/null 2>&1

No SP between redirection '>' and its target '/dev/null'.

More importantly, I think it is better to add a submodule--helper
subcommand that exposes the check in question, as the code is
already written ;-) That approach will guarantee that the logic and
the message stay the same between here and in the C code.  Then you
do not even need these two line comment.

> + then
> +  die "$(eval_gettext "please make sure that the .gitmodules 
> file in the current branch is checked out")"
> + fi
> +

> diff --git a/submodule-config.c b/submodule-config.c
> index b7ef055c63..088dabb56f 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "dir.h"
>  #include "repository.h"
>  #include "config.h"
>  #include "submodule-config.h"
> @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct repository 
> *repo)
>  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, 
> void *data)
>  {
>   if (repo->worktree) {
> - char *file = repo_worktree_path(repo, GITMODULES_FILE);
> - git_config_from_file(fn, file, data);
> + struct git_config_source config_source = { 0 };
> + const struct config_options opts = { 0 };
> + struct object_id oid;
> + char *file;
> +
> + file = repo_worktree_path(repo, GITMODULES_FILE);
> + if (file_exists(file))
> + config_source.file = file;
> + else if (get_oid(GITMODULES_HEAD, &oid) >= 0)
> + config_source.blob = GITMODULES_HEAD;

What is the reason why we fall back directly to HEAD when working
tree file does not exist?  I thought that our usual fallback was to
the version in the index for other things like .gitignore/attribute
and this codepath look like an oddball.  Are you trying to handle
the case where we are in a bare repository without any file checked
out (and there is not even the index)?

> diff --git a/t/t7416-submodule-sparse-gitmodules.sh 
> b/t/t7416-submodule-sparse-gitmodules.sh
> new file mode 100755
> index 00..5341e9b012
> --- /dev/null
> +++ b/t/t7416-submodule-sparse-gitmodules.sh
> @@ -0,0 +1,90 @@
> +#!/bin/sh
> +#
> +# Copyright (C) 2018  Antonio Ospite 
> +#
> +
> +test_description='Test reading/writing .gitmodules when not in the working 
> tree
> +
> +This test verifies that, when .gitmodules is in the current branch but is not
> +in the working tree reading from it still works but writing to it does not.
> +
> +The test setup uses a sparse checkout, however the same scenario can be set 
> up
> +also by committing .gitmodules and then just removing it from the filesystem.
> +'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'sparse checkout setup which hides .gitmodules' '
> + echo file > file &&

No SP between redirection '>' and its target 'file'.

> + git add file &&
> + test_tick &&
> + git commit -m upstream &&
> + git clone . super &&
> + git clone super submodule &&
> + git clone super new_submodule &&
> + (cd super &&
> + git submodule add ../submodule &&
> + test_tick &&
> + git commit -m submodule &&
> + cat >.git/info/sparse-checkout <<\EOF &&
> +/*
> +!/.gitmodules
> +EOF

You can use <<-\EOF and indent the body of the here-doc, which makes
the result easier to read, i.e.

cat >target <<-\EOF &&
line 1
line 2
EOF

> + git config core.sparsecheckout true &&
> + git read-tree -m -u HEAD &&

That's old fashioned---I am curious if this has to be one-way merge
or can just be a usual "git checkout" (I am merely curious;

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-14 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:
>
>> On 8/13/2018 5:54 PM, Jeff King wrote:
>> > So I try not to think too hard on metrics, and just use them to get a
>> > rough view on who is active.
>> 
>> I've been very interested in measuring community involvement, with the
>> knowledge that any metric is flawed and we should not ever say "this metric
>> is how we measure the quality of a contributor". It can be helpful, though,
>> to track some metrics and their change over time.
>> 
>> Here are a few measurements we can make:
>
> Thanks, it was nice to see a more comprehensive list in one spot.
>
> It would be neat to have a tool that presents all of these
> automatically, but I think the email ones are pretty tricky (most people
> don't have the whole list archive sitting around).

I do not think it covered e-mail at all, but there was git stats
project several years ago (perhaps part of GSoC IIRC).

> I think I mentioned "surviving lines" elsewhere, which I do like this
> (and almost certainly stole from Junio a long time ago):

Yeah, I recall that one as part of counting how many of 1244 lines
Linus originally wrote still were in our codebase at around v1.6.0
timeframe (the answer was ~220 IIRC) ;-)

Re: [PATCH v4 2/5] unpack-trees: add performance tracing

2018-08-14 Thread Junio C Hamano

Duy Nguyen  writes:

> These trace messages are made for human consumption. Granted
> occasionally we need some processing but I find one liners mostly
> suffice. Now we turn these into something made for machines, turning
> people to second citizens. I've read these messages reformatted for
> human, it's usually too verbose even if it's reformatted.

I actually actively hate the aspect of the slog thing that exposes
the fact that it wants to take and show JSON too much in its API,
but if you look at these "jw_object_*()" thing as _only_ filling
parameters to be emitted, there is no reason to think we cannot
enhance/extend slog_emit_*() thing to take a format string (perhaps
inside the jw structure) so that the formatter does not have to
generate JSON at all.  Envisioning that kind of future, json_writer
is a misnomer that too narrowly defines what it is---it is merely a
generic data container that the codepath being traced can use to
communicate what needs to be logged to the outside world.
slog_emit* can (and when enhanced, should) be capable of paying
attention to an external input (e.g. environment variable) to switch
the output format, and JSON could be just one of the choices.

> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read.

Yup, I do not think that contradicts with the approach to have a
single unified "data collection" API; you should also be able to
specify how that collection of data is to be presented in the trace
messages meant for humans, which would be discarded when emitting
json but would be used when showing human-readble trace, no?

Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0

2018-08-14 Thread Junio C Hamano

Jonathan Tan  writes:

>> - grep -E "tree|blob" objs >trees_and_blobs &&
>> - test_line_count = 1 trees_and_blobs
>> + grep -E "tree|blob" objs \
>> + | awk -f print_1.awk >trees_and_blobs &&
>> + git -C r1 rev-parse HEAD: >expected &&
>> + test_cmp trees_and_blobs expected
>
> Indent "| awk" (and similar lines in this patch) - although I guess it
> is likely that you actually have it indented, and your e-mail client
> modified the whitespace so that it looks like there is no indent.

No, wrap lines like this

command1 arg1 arg2 |
command2 arg1 arg2 &&

That way, you do not need backslash to continue line.

Also think twice when you are seeing yourself piping output from
"grep" to more powerful tools like "perl", "awk" or "sed".  Often
you can lose the upstream "grep".

Re: Syncing HEAD

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 1:09 PM Christian Couder
 wrote:
>
> Hi,
>
> When cloning with --mirror, the clone gets its HEAD initialized with
> the value HEAD has in its origin remote. After that if HEAD changes in
> origin there is no simple way to sync HEAD at the same time as the
> refs are synced.
>
> It looks like the simplest way to sync HEAD is:
>
> 1) git remote show origin
> 2) parse "HEAD branch: XXX" from the output of the above command
> 3) git symbolic-ref HEAD refs/heads/XXX
>
> It looks like it would be quite easy to add an option to `fetch` to
> sync HEAD at the same time as regular refs are synced because every
> fetch from an origin that uses a recent Git contains something like:
>
> 19:55:39.304976 pkt-line.c:80   packet:  git< 
> HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
> deepen-since deepen-not deepen-relative no-progress include-tag
> multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
> agent=git/2.18.0
>
> which in this example shows that HEAD is a symref to refs/heads/test-1
> in origin.
>
> Is there a reason why no such option already exists? Would it makes
> sense to add one? Is there any reason why it's not a good idea? Or am
> I missing something?

I think it is a great idea to add that. IIRC there was some talk when
designing protocol v2 on how fetching of symrefs could be added later
on in the protocol, which is why I cc'd Brandon who did the work there.

> I am asking because GitLab uses HEAD in the bare repos it manages to
> store the default branch against which the Merge Requests (same thing
> as Pull Requests on GitHub) are created.
>
> So when people want to keep 2 GitLab hosted repos in sync, GitLab
> needs to sync HEADs too, not just the refs.
>
> I think this could be useful to other setups than GitLab though.

As said, I can see how that is useful; I recently came across some
HEAD bug related to submodules, and there we'd also have the application.

git clone --recurse-submodules file://...

might clone the submodules that are in detached HEAD, which is totally
not a long term viable good HEAD, so a subsequent fetch might want
to change the detached HEAD in submodules or re-affix it to branches.

Unrelated/extended: I think it would be cool to mirror a repository even
more, e.g. it would be cool to be able to fetch (if configured as allowed)
the remote reflog, (not to be confused with you local reflog of the remote).
I think that work would be enabled once reftables are available, which you
have an eye on?

Re: Contributor Summit planning

2018-08-14 Thread Junio C Hamano

Jeff King  writes:

> One problem there is that the prefixes are ambiguous (e.g., Jacob Keller
> shares with me, and I think at least one other over the years). You
> could look at the author of the tip commit, but that's not always right
> (and in fact, counting just merged topics misses bug-fixes that get
> applied directly on top of other people's topics).

Yes, a fix by somebody else to a bug that was recently introduced is
safest to apply to the original topic and merge down; that way makes
it more difficult to merge the original topic to older maintenance
track(s) without the fix by mistake.  So a "topic" with commits from
multiple people is not all that unusual.

Thanks.

Re: [PATCH 0/7] Resend of sb/submodule-update-in-c

2018-08-14 Thread Junio C Hamano

Stefan Beller  writes:

> Thanks Brandon for pointing out to use repo_git_path instead of
> manually constructing the path.
>
> That is the only change in this resend.

Rcpt.  Hopefully this is now ready for 'next'?

Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails

2018-08-14 Thread Ævar Arnfjörð Bjarmason



On Tue, Aug 14 2018, Eric Wong wrote:

> Ævar Arnfjörð Bjarmason  wrote:
>> Add a --send-delay option with a corresponding sendemail.smtpSendDelay
>> configuration variable. When set to e.g. 2, this causes send-email to
>> sleep 2 seconds before sending the next E-Mail. We'll only sleep
>> between sends, not before the first send, or after the last.
>>
>> This option has two uses. Firstly, to be able to Ctrl+C a long send
>> with "all" if you have a change of heart. Secondly, as a hack in some
>> mail setups to, with a sufficiently high delay, force the receiving
>> client to sort the E-Mails correctly.
>>
>> Some popular E-Mail clients completely ignore the "Date" header, which
>> format-patch is careful to set such that the patches will be displayed
>> in order, and instead sort by the time the E-mail was received.
>>
>> Google's GMail is a good example of such a client. It ostensibly sorts
>> by some approximation of received time (although not by any "Received"
>> header). It's more usual than not to see patches showing out of order
>> in GMail. To take a few examples of orders seen on patches on the Git
>> mailing list:
>>
>> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy)
>> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee)
>> 3 -> 2 -> 1 (fast-import by Jameson Miller)
>> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King)
>>
>> The reason to add the new "X-Mailer-Send-Delay" header is to make it
>> easy to tell what the imposed delay was, if any. This allows for
>> gathering some data on how the transfer of E-Mails with & without this
>> option behaves. This may not be workable without really long delays,
>> see [1] and [2].
>
> Aside from the new header, I think this is better implemented
> using the existing $relogin_delay and $batch_size=1.
>
> Disconnecting during the delay might be more sympathetic to
> existing mail servers (which aren't C10K-optimized).

Yeah that's a good point, maybe we're being wasteful on remote resources
here.

> If the client sleeps, the server may disconnect the client anyways to
> save resources.

Seems like something we'd need to deal with anyway, do we?

>> @@ -1741,6 +1747,10 @@ sub process_file {
>>  $message, $xfer_encoding, $target_xfer_encoding);
>>  push @xh, "Content-Transfer-Encoding: $xfer_encoding";
>>  unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
>> +if ($send_delay && $i > 0) {
>> +push @xh, "X-Mailer-Send-Delay: $send_delay";
>> +sleep $send_delay;
>> +}
>
> We can add this header for relogin_delay + batch_size
>
> But maybe --send-delay can be a shortcut for
> --relogin-delay and --batch-size=1

I need to enter a password when sending a batch with my SMTP server now,
once. With relogin I'd need to enter this N times unless I use whatever
auth save facility there is in git-send-email (which I don't use now).

I don't think it makes sense to conflate these two modes.

Re: [PATCH] submodule: add more exhaustive up-path testing

2018-08-14 Thread Ævar Arnfjörð Bjarmason



On Tue, Aug 14 2018, Stefan Beller wrote:

> On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason
>  wrote:
>>
>> The tests added in 63e95beb08 ("submodule: port resolve_relative_url
>> from shell to C", 2016-04-15) didn't do a good job of testing various
>> up-path invocations where the up-path would bring us beyond even the
>> URL in question without emitting an error.
>>
>> These results look nonsensical, but it's worth exhaustively testing
>> them before fixing any of this code, so we can see which of these
>> cases were changed.
>
> Yeah. Please look at the comment in builtin/submodule--helper.c
> in that commit, where I described my expectations.
>
> I should have put them into tests instead with the expectations
> spelled out there.

I'll check that out thanks. I saw that comment, but have been skimming
most of this code...

> Thanks for this patch!
> Stefan
>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>
>
>> So I think these tests are worthwihle in themselves,
>
> The reason I put it in the comment instead of tests was the
> ease of spelling out both the status quo and expectations.
>
>> but would like
>> some advice on how to proceed with that from someone more familiar
>> with submodules.
>
> So ideally we'd also error out as soon as the host name is touched?

Do we have some utility function that'll take whatever we have in
remote..url and spew out the username / host / path? We must,
since the clone protocol needs it, but I haven't found it.

Re: Syncing HEAD

2018-08-14 Thread Jeff King

On Tue, Aug 14, 2018 at 10:09:37PM +0200, Christian Couder wrote:

> When cloning with --mirror, the clone gets its HEAD initialized with
> the value HEAD has in its origin remote. After that if HEAD changes in
> origin there is no simple way to sync HEAD at the same time as the
> refs are synced.
> 
> It looks like the simplest way to sync HEAD is:
> 
> 1) git remote show origin
> 2) parse "HEAD branch: XXX" from the output of the above command
> 3) git symbolic-ref HEAD refs/heads/XXX

How about:

  git remote set-head origin -a

?

> It looks like it would be quite easy to add an option to `fetch` to
> sync HEAD at the same time as regular refs are synced because every
> fetch from an origin that uses a recent Git contains something like:

I think the "remote set-head" option is not very discoverable, since
people are used to working with "fetch", making it the natural place to
look. Just like we ported "remote update" over to "fetch --all", I think
it would be sensible to have "fetch --update-head" or similar.

One tricky thing is that the name "refs/remotes//HEAD" is only
special by convention, and that convention is known on the writing side
only by git-clone and git-remote. So obviously:

  git fetch --update-head https://example.com/

is nonsense. We don't even have a ref. What should:

  git config remote.origin.fetch refs/heads/*:refs/remotes/foo/*
  git fetch --update-head origin

do? Should it update based no the remote name, or based on the refspec?
What happens if there are several refspecs? Etc.

99% of the time those questions won't come up. But we should design so
that we do the obvious thing in those 99%, and something sane in the
other 1%.

-Peff

Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-14 Thread Stefan Beller

On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder  wrote:
>
> Hi,
>
> Brandon Williams wrote:
> > On 08/09, Jeff King wrote:
>
> >> One interesting thing about url-encoding is that it's not one-to-one.
> >> This case could also be %2F, which is a different file (on a
> >> case-sensitive filesystem). I think "%20" and "+" are similarly
> >> interchangeable.
> >>
> >> If we were decoding the filenames, that's fine. The round-trip is
> >> lossless.
> >>
> >> But that's not quite how the new code behaves. We encode the input and
> >> then check to see if it matches an encoding we previously performed. So
> >> if our urlencode routines ever change, this will subtly break.
> >>
> >> I don't know how much it's worth caring about. We're not that likely to
> >> change the routines ourself (though certainly a third-party
> >> implementation would need to know our exact url-encoding decisions).
> >
> > This is exactly the reason why I wanted to get some opinions on what the
> > best thing to do here would be.  I _think_ the best thing would probably
> > be to write a specific routine to do the conversion, and it wouldn't
> > even have to be all that complex.  Basically I'm just interested in
> > converting '/' characters so that things no longer behave like
> > nested directories.
>
> First of all, I think the behavior with this patch is already much
> better than the previous status quo.  I'm using the patch now and am
> very happy with it.
>
> Second, what if we store the pathname in config?  We already store the
> URL there:
>
> [submodule "plugins/hooks"]
> url = https://gerrit.googlesource.com/plugins/hooks
>
> So we could (as a followup patch) do something like
>
> [submodule "plugins/hooks"]
> url = https://gerrit.googlesource.com/plugins/hooks
> gitdirname = plugins%2fhooks
>
> and use that for lookups instead of regenerating the directory name.
> What do you think?

As I just looked at worktree code, this sounds intriguing for the wrong
reason (again), as a user may want to point the gitdirname to a repository
that they have already on disk outside the actual superproject. They
would be reinventing worktrees in the submodule space. ;-)

This would open up the security hole that we just had, again.
So we'd have to make sure that the gitdirname (instead of the
now meaningless subsection name) is proof to ../ attacks.

I feel uneasy about this as then the user might come in
and move submodules and repoint the gitdirname...
to a not url encoded path. Exposing this knob just
asks for trouble, no?

On the other hand, the only requirement for the "name" is
now uniqueness, and that is implied with subsections,
so I guess it looks elegant.

What would happen if gitdirname is changed as part of
history? (The same problem we have now with changing
the subsection name)

The more I think about it the less appealing this is, but it looks
elegant.

Stefan

Re: Syncing HEAD

2018-08-14 Thread Brandon Williams

On 08/14, Stefan Beller wrote:
> On Tue, Aug 14, 2018 at 1:09 PM Christian Couder
>  wrote:
> >
> > Hi,
> >
> > When cloning with --mirror, the clone gets its HEAD initialized with
> > the value HEAD has in its origin remote. After that if HEAD changes in
> > origin there is no simple way to sync HEAD at the same time as the
> > refs are synced.
> >
> > It looks like the simplest way to sync HEAD is:
> >
> > 1) git remote show origin
> > 2) parse "HEAD branch: XXX" from the output of the above command
> > 3) git symbolic-ref HEAD refs/heads/XXX
> >
> > It looks like it would be quite easy to add an option to `fetch` to
> > sync HEAD at the same time as regular refs are synced because every
> > fetch from an origin that uses a recent Git contains something like:
> >
> > 19:55:39.304976 pkt-line.c:80   packet:  git< 
> > HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow
> > deepen-since deepen-not deepen-relative no-progress include-tag
> > multi_ack_detailed no-done symref=HEAD:refs/heads/test-1
> > agent=git/2.18.0
> >
> > which in this example shows that HEAD is a symref to refs/heads/test-1
> > in origin.
> >
> > Is there a reason why no such option already exists? Would it makes
> > sense to add one? Is there any reason why it's not a good idea? Or am
> > I missing something?
> 
> I think it is a great idea to add that. IIRC there was some talk when
> designing protocol v2 on how fetching of symrefs could be added later
> on in the protocol, which is why I cc'd Brandon who did the work there.

Actually the functionality for fetching symrefs already exists (when
using protocol v2 of course).  Despite this functionality existing its
not being used right now.

When performing a v2 fetch the first thing that a client does is request
the list of refs (by doing an ls-refs request).  The output from ls-refs
(if asked) will included information about each ref including if they
are a symref and what ref they resolve to.  So really we just need to
plumb that information through fetch to actually update HEAD, or even
update other symrefs which exist on the server.

> 
> > I am asking because GitLab uses HEAD in the bare repos it manages to
> > store the default branch against which the Merge Requests (same thing
> > as Pull Requests on GitHub) are created.
> >
> > So when people want to keep 2 GitLab hosted repos in sync, GitLab
> > needs to sync HEADs too, not just the refs.
> >
> > I think this could be useful to other setups than GitLab though.
> 
> As said, I can see how that is useful; I recently came across some
> HEAD bug related to submodules, and there we'd also have the application.
> 
> git clone --recurse-submodules file://...
> 
> might clone the submodules that are in detached HEAD, which is totally
> not a long term viable good HEAD, so a subsequent fetch might want
> to change the detached HEAD in submodules or re-affix it to branches.
> 
> Unrelated/extended: I think it would be cool to mirror a repository even
> more, e.g. it would be cool to be able to fetch (if configured as allowed)
> the remote reflog, (not to be confused with you local reflog of the remote).
> I think that work would be enabled once reftables are available, which you
> have an eye on?

-- 
Brandon Williams

1 2 >

1 - 100 of 152 matches

Mail list logo