[PATCH 0/1] Fix a recently-introduced compile warning
With the relatively frequent breakages of pu recently, I had trouble staying on top of the compile errors/test failures, sorry. This one exists since Sunday, and it is a compile error only with DEVELOPER=1, which is, however, the recommended way to build in Git for Windows' SDK. Note: it is based on nd/clone-case-smashing-warning. Johannes Schindelin (1): mark_colliding_entries(): fix incorrect #if...#endif guard entry.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) base-commit: f80218bf4e65ccc06cc9173c0ac5a5520d380f36 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-18%2Fdscho%2Fclone-case-smashing-warning-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-18/dscho/clone-case-smashing-warning-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/18 -- gitgitgadget
[PATCH 1/1] mark_colliding_entries(): fix incorrect #if...#endif guard
From: Johannes Schindelin The way the guard was put, the code was declaring an unused variable on Windows. No need to do that, so let's fix it. Signed-off-by: Johannes Schindelin --- entry.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/entry.c b/entry.c index c70340df8..2bce13352 100644 --- a/entry.c +++ b/entry.c @@ -402,11 +402,9 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ int i; - ce->ce_flags |= CE_MATCHED; - -#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; @@ -422,6 +420,8 @@ static void mark_colliding_entries(const struct checkout *state, } } #endif + + ce->ce_flags |= CE_MATCHED; } /* -- gitgitgadget
[PATCH v3 5/7] submodule: use the 'submodule--helper config' command
Use the 'submodule--helper config' command in git-submodules.sh to avoid referring explicitly to .gitmodules by the hardcoded file path. This makes it possible to access the submodules configuration in a more controlled way. Signed-off-by: Antonio Ospite --- git-submodule.sh | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/git-submodule.sh b/git-submodule.sh index 8b5ad59bde..ff258e2e8c 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -72,7 +72,7 @@ get_submodule_config () { value=$(git config submodule."$name"."$option") if test -z "$value" then - value=$(git config -f .gitmodules submodule."$name"."$option") + value=$(git submodule--helper config submodule."$name"."$option") fi printf '%s' "${value:-$default}" } @@ -283,11 +283,11 @@ or you are unsure what this means choose another name with the '--name' option." git add --no-warn-embedded-repo $force "$sm_path" || die "$(eval_gettext "Failed to add submodule '\$sm_path'")" - git config -f .gitmodules submodule."$sm_name".path "$sm_path" && - git config -f .gitmodules submodule."$sm_name".url "$repo" && + git submodule--helper config submodule."$sm_name".path "$sm_path" && + git submodule--helper config submodule."$sm_name".url "$repo" && if test -n "$branch" then - git config -f .gitmodules submodule."$sm_name".branch "$branch" + git submodule--helper config submodule."$sm_name".branch "$branch" fi && git add --force .gitmodules || die "$(eval_gettext "Failed to register submodule '\$sm_path'")" -- 2.18.0
[PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out
When the .gitmodules file is not available in the working tree, try using HEAD:.gitmodules from the current branch. This covers the case when the file is part of the repository but for some reason it is not checked out, for example because of a sparse checkout. This makes it possible to use at least the 'git submodule' commands which *read* the gitmodules configuration file without fully populating the working tree. Writing to .gitmodules will still require that the file is checked out, so check for that before calling config_set_in_gitmodules_file_gently. Add a similar check also in git-submodule.sh::cmd_add() to anticipate the eventual failure of the "git submodule add" command when .gitmodules is not safely writeable; this prevents the command from leaving the repository in a spurious state (e.g. the submodule repository was cloned but .gitmodules was not updated because config_set_in_gitmodules_file_gently failed). Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading from .gitmodules succeeds and that writing to it fails when the file is not checked out. Signed-off-by: Antonio Ospite --- Maybe the check in config_set_in_gitmodules_file_gently and git-submodule.sh::cmd_add() can share some code: - add an is_gitmodules_safely_writeable() helper - expose a "submodule--helper config --is-safely-writeable" subcommand But for now I preferred to keep the changes with v2 to a minimum to avoid blocking the series. If adding a new helper is preferred I can do a v4 or send a follow-up patch. Thank you, Antonio builtin/submodule--helper.c| 17 - cache.h| 1 + git-submodule.sh | 7 ++ submodule-config.c | 16 - t/t7416-submodule-sparse-gitmodules.sh | 90 ++ 5 files changed, 128 insertions(+), 3 deletions(-) create mode 100755 t/t7416-submodule-sparse-gitmodules.sh diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 7481d03b63..c0370a756b 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2036,8 +2036,23 @@ static int module_config(int argc, const char **argv, const char *prefix) return print_config_from_gitmodules(argv[1]); /* Equivalent to ACTION_SET in builtin/config.c */ - if (argc == 3) + if (argc == 3) { + struct object_id oid; + + /* +* If the .gitmodules file is not in the working tree but it +* is in the current branch, stop, as writing new values (and +* staging them) would blindly overwrite ALL the old content. +* +* This check still makes it possible to create a brand new +* .gitmodules when it is safe to do so: when neither +* GITMODULES_FILE nor GITMODULES_HEAD exist. +*/ + if (!file_exists(GITMODULES_FILE) && get_oid(GITMODULES_HEAD, &oid) >= 0) + die(_("please make sure that the .gitmodules file in the current branch is checked out")); + return config_set_in_gitmodules_file_gently(argv[1], argv[2]); + } die("submodule--helper config takes 1 or 2 arguments: name [value]"); } diff --git a/cache.h b/cache.h index 8dc7134f00..900f9e09e5 100644 --- a/cache.h +++ b/cache.h @@ -486,6 +486,7 @@ static inline enum object_type object_type(unsigned int mode) #define INFOATTRIBUTES_FILE "info/attributes" #define ATTRIBUTE_MACRO_PREFIX "[attr]" #define GITMODULES_FILE ".gitmodules" +#define GITMODULES_HEAD "HEAD:.gitmodules" #define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF" #define GIT_NOTES_DEFAULT_REF "refs/notes/commits" #define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF" diff --git a/git-submodule.sh b/git-submodule.sh index ff258e2e8c..b1cb187227 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -159,6 +159,13 @@ cmd_add() shift done + # For more details about this check, see + # builtin/submodule--helper.c::module_config() + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > /dev/null 2>&1 + then +die "$(eval_gettext "please make sure that the .gitmodules file in the current branch is checked out")" + fi + if test -n "$reference_path" then is_absolute_path "$reference_path" || diff --git a/submodule-config.c b/submodule-config.c index b7ef055c63..088dabb56f 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "dir.h" #include "repository.h" #include "config.h" #include "submodule-config.h" @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct repository *repo) static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void *data) { if (repo->worktree) { - char *file = repo_worktree_pa
[PATCH v3 3/7] t7411: be nicer to future tests and really clean things up
Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with invalid lines in .gitmodules but then only the second commit is removed. This may affect future subsequent tests if they assume that the .gitmodules file has no errors. Remove both the commits as soon as they are not needed anymore. The error introduced in test 5 is also required by test 6, so the two commits from above are removed respectively in tests 6 and 8. Signed-off-by: Antonio Ospite --- t/t7411-submodule-config.sh | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh index 0bde5850ac..c6b6cf6fae 100755 --- a/t/t7411-submodule-config.sh +++ b/t/t7411-submodule-config.sh @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets continue' ' ' test_expect_success 'error message contains blob reference' ' + # Remove the error introduced in the previous test. + # It is not needed in the following tests. + test_when_finished "git -C super reset --hard HEAD^" && (cd super && sha1=$(git rev-parse HEAD) && test-tool submodule-config \ @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' ' ' test_expect_success 'error in history in fetchrecursesubmodule lets continue' ' + test_when_finished "git -C super reset --hard HEAD^" && (cd super && git config -f .gitmodules \ submodule.submodule.fetchrecursesubmodules blabla && @@ -134,8 +138,7 @@ test_expect_success 'error in history in fetchrecursesubmodule lets continue' ' HEAD b \ HEAD submodule \ >actual && - test_cmp expect_error actual && - git reset --hard HEAD^ + test_cmp expect_error actual ) ' -- 2.18.0
[PATCH v3 4/7] submodule--helper: add a new 'config' subcommand
Add a new 'config' subcommand to 'submodule--helper', this extra level of indirection makes it possible to add some flexibility to how the submodules configuration is handled. Signed-off-by: Antonio Ospite --- builtin/submodule--helper.c | 14 ++ new | 0 t/t7411-submodule-config.sh | 26 ++ 3 files changed, 40 insertions(+) create mode 100644 new diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index a3c4564c6c..7481d03b63 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2029,6 +2029,19 @@ static int connect_gitdir_workingtree(int argc, const char **argv, const char *p return 0; } +static int module_config(int argc, const char **argv, const char *prefix) +{ + /* Equivalent to ACTION_GET in builtin/config.c */ + if (argc == 2) + return print_config_from_gitmodules(argv[1]); + + /* Equivalent to ACTION_SET in builtin/config.c */ + if (argc == 3) + return config_set_in_gitmodules_file_gently(argv[1], argv[2]); + + die("submodule--helper config takes 1 or 2 arguments: name [value]"); +} + #define SUPPORT_SUPER_PREFIX (1<<0) struct cmd_struct { @@ -2057,6 +2070,7 @@ static struct cmd_struct commands[] = { {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX}, {"is-active", is_active, 0}, {"check-name", check_name, 0}, + {"config", module_config, 0}, }; int cmd_submodule__helper(int argc, const char **argv, const char *prefix) diff --git a/new b/new new file mode 100644 index 00..e69de29bb2 diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh index c6b6cf6fae..4afb6f152e 100755 --- a/t/t7411-submodule-config.sh +++ b/t/t7411-submodule-config.sh @@ -142,4 +142,30 @@ test_expect_success 'error in history in fetchrecursesubmodule lets continue' ' ) ' +test_expect_success 'reading submodules config with "submodule--helper config"' ' + (cd super && + echo "../submodule" >expected && + git submodule--helper config submodule.submodule.url >actual && + test_cmp expected actual + ) +' + +test_expect_success 'writing submodules config with "submodule--helper config"' ' + (cd super && + echo "new_url" >expected && + git submodule--helper config submodule.submodule.url "new_url" && + git submodule--helper config submodule.submodule.url >actual && + test_cmp expected actual + ) +' + +test_expect_success 'overwriting unstaged submodules config with "submodule--helper config"' ' + (cd super && + echo "newer_url" >expected && + git submodule--helper config submodule.submodule.url "newer_url" && + git submodule--helper config submodule.submodule.url >actual && + test_cmp expected actual + ) +' + test_done -- 2.18.0
[PATCH v3 0/7] Make submodules work if .gitmodules is not checked out
Hi, this series teaches git to try and read the .gitmodules file from the current branch (HEAD:.gitmodules) when it's not readily available in the working tree. This can be used, along with sparse checkouts, to enable submodule usage with programs like vcsh[1] which manage multiple repositories with their working trees sharing the same path. [1] https://github.com/RichiH/vcsh This is v3 of the series from: https://public-inbox.org/git/20180802134634.10300-1-...@ao2.it/ The cover letter of the first proposal contains more background: https://public-inbox.org/git/20180514105823.8378-1-...@ao2.it/ Changes since v2: * Removed the extern keyword from the public declaration of print_config_from_gitmodules() and config_set_in_gitmodules_file_gently() * Used test_when_finished in t/t7411-submodule-config.sh and remove the problematic commits as soon as they are not needed anymore. * Restructured the code in module_config to avoid an unreachable section, the code now dies as a fallback if the arguments are not supported, as suggested by Jeff. * Dropped patches and tests about "submodule--helper config --stage" as they are not strictly needed for now and there is no immediate benefit from them. * Added a check to git-submodule.sh::cdm_add to make it fail earlier if the .gitmodules file is not "safely writeable". This also fixes one of the new tests which was previously marked as "test_expect_failure". * Fixed a broken &&-chain in a subshell in one of the new tests, the issue was exposed by a recent change in master. * Dropped a note about "git rm" and "git mv", it was intended as a personal reminder and not for the general public. * Squashed t7416-submodule-sparse-gitmodules.sh in the same commit of the code it exercises. * Dropped the two unrelated patches from v2: - dir: move is_empty_file() from builtin/am.c to dir.c and make it public - submodule: remove the .gitmodules file when it is empty as they are orthogonal to this series. I will send them as a standalone series. * Minor wording fixes here and there. The series looks a lot cleaner and more to the point, thanks for the review. Ciao, Antonio Antonio Ospite (7): submodule: add a print_config_from_gitmodules() helper submodule: factor out a config_set_in_gitmodules_file_gently function t7411: be nicer to future tests and really clean things up submodule--helper: add a new 'config' subcommand submodule: use the 'submodule--helper config' command t7506: clean up .gitmodules properly before setting up new scenario submodule: support reading .gitmodules even when it's not checked out builtin/submodule--helper.c| 29 + cache.h| 1 + git-submodule.sh | 15 +++-- new| 0 submodule-config.c | 53 ++- submodule-config.h | 3 + submodule.c| 10 +-- t/t7411-submodule-config.sh| 33 +- t/t7416-submodule-sparse-gitmodules.sh | 90 ++ t/t7506-status-submodule.sh| 3 +- 10 files changed, 221 insertions(+), 16 deletions(-) create mode 100644 new create mode 100755 t/t7416-submodule-sparse-gitmodules.sh -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
[PATCH v3 2/7] submodule: factor out a config_set_in_gitmodules_file_gently function
Introduce a new config_set_in_gitmodules_file_gently() function to write config values to the .gitmodules file. This is in preparation for a future change which will use the function to write to the .gitmodules file in a more controlled way instead of using "git config -f .gitmodules". The purpose of the change is mainly to centralize the code that writes to the .gitmodules file to avoid some duplication. The naming follows git_config_set_in_file_gently() but the git_ prefix is removed to communicate that this is not a generic git-config API. Signed-off-by: Antonio Ospite --- submodule-config.c | 12 submodule-config.h | 1 + submodule.c| 10 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/submodule-config.c b/submodule-config.c index eef96c4198..b7ef055c63 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -707,6 +707,18 @@ int print_config_from_gitmodules(const char *key) return 0; } +int config_set_in_gitmodules_file_gently(const char *key, const char *value) +{ + int ret; + + ret = git_config_set_in_file_gently(GITMODULES_FILE, key, value); + if (ret < 0) + /* Maybe the user already did that, don't error out here */ + warning(_("Could not update .gitmodules entry %s"), key); + + return ret; +} + struct fetch_config { int *max_children; int *recurse_submodules; diff --git a/submodule-config.h b/submodule-config.h index ed40e9a478..9957bcbbfa 100644 --- a/submodule-config.h +++ b/submodule-config.h @@ -57,6 +57,7 @@ void submodule_free(struct repository *r); int check_submodule_name(const char *name); int print_config_from_gitmodules(const char *key); +int config_set_in_gitmodules_file_gently(const char *key, const char *value); /* * Note: these helper functions exist solely to maintain backward diff --git a/submodule.c b/submodule.c index 6e14547e9e..fd95cb76b3 100644 --- a/submodule.c +++ b/submodule.c @@ -89,6 +89,7 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath) { struct strbuf entry = STRBUF_INIT; const struct submodule *submodule; + int ret; if (!file_exists(GITMODULES_FILE)) /* Do nothing without .gitmodules */ return -1; @@ -104,14 +105,9 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath) strbuf_addstr(&entry, "submodule."); strbuf_addstr(&entry, submodule->name); strbuf_addstr(&entry, ".path"); - if (git_config_set_in_file_gently(GITMODULES_FILE, entry.buf, newpath) < 0) { - /* Maybe the user already did that, don't error out here */ - warning(_("Could not update .gitmodules entry %s"), entry.buf); - strbuf_release(&entry); - return -1; - } + ret = config_set_in_gitmodules_file_gently(entry.buf, newpath); strbuf_release(&entry); - return 0; + return ret; } /* -- 2.18.0
[PATCH v3 6/7] t7506: clean up .gitmodules properly before setting up new scenario
In t/t7506-status-submodule.sh at some point a new scenario is set up to test different things, in particular new submodules are added which are meant to completely replace the previous ones. However before calling the "git submodule add" commands for the new layout, the .gitmodules file is removed only from the working tree still leaving the previous content in current branch. This can break if, in the future, "git submodule add" starts differentiating between the following two cases: - .gitmodules is not in the working tree but it is in the current branch (it may not be safe to add new submodules in this case); - .gitmodules is neither in the working tree nor anywhere in the current branch (it is safe to add new submodules). Since the test intends to get rid of .gitmodules anyways, let's completely remove it from the current branch, to actually start afresh in the new scenario. This is more future-proof and does not break current tests. Signed-off-by: Antonio Ospite --- t/t7506-status-submodule.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index 943708fb04..08629a6e70 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -325,7 +325,8 @@ test_expect_success 'setup superproject with untracked file in nested submodule' ( cd super && git clean -dfx && - rm .gitmodules && + git rm .gitmodules && + git commit -m "remove .gitmodules" && git submodule add -f ./sub1 && git submodule add -f ./sub2 && git submodule add -f ./sub1 sub3 && -- 2.18.0
[PATCH v3 1/7] submodule: add a print_config_from_gitmodules() helper
Add a new print_config_from_gitmodules() helper function to print values from .gitmodules just like "git config -f .gitmodules" would. This will be used by a new submodule--helper subcommand to be able to access the .gitmodules file in a more controlled way. Signed-off-by: Antonio Ospite --- submodule-config.c | 25 + submodule-config.h | 2 ++ 2 files changed, 27 insertions(+) diff --git a/submodule-config.c b/submodule-config.c index fc2c41b947..eef96c4198 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -682,6 +682,31 @@ void submodule_free(struct repository *r) submodule_cache_clear(r->submodule_cache); } +static int config_print_callback(const char *key_, const char *value_, void *cb_data) +{ + char *key = cb_data; + + if (!strcmp(key, key_)) + printf("%s\n", value_); + + return 0; +} + +int print_config_from_gitmodules(const char *key) +{ + int ret; + char *store_key; + + ret = git_config_parse_key(key, &store_key, NULL); + if (ret < 0) + return CONFIG_INVALID_KEY; + + config_from_gitmodules(config_print_callback, the_repository, store_key); + + free(store_key); + return 0; +} + struct fetch_config { int *max_children; int *recurse_submodules; diff --git a/submodule-config.h b/submodule-config.h index dc7278eea4..ed40e9a478 100644 --- a/submodule-config.h +++ b/submodule-config.h @@ -56,6 +56,8 @@ void submodule_free(struct repository *r); */ int check_submodule_name(const char *name); +int print_config_from_gitmodules(const char *key); + /* * Note: these helper functions exist solely to maintain backward * compatibility with 'fetch' and 'update_clone' storing configuration in -- 2.18.0
[PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
The test 'pack-objects to file can use bitmap' added in 645c432d61 (pack-objects: use reachability bitmap index when generating non-stdout pack, 2016-09-10) is silently buggy and doesn't check what it's supposed to. In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function does what its name implies by running: git show-index <"$1" | cut -d' ' -f2 The test in question invokes this function like this: list_packed_objects packa.objects && list_packed_objects packb.objects && test_cmp packa.objects packb.objects Note how these two callsites don't specify the name of the pack index file as the function's parameter, but redirect the function's standard input from it. This triggers an error message from the shell, as it has no filename to redirect from in the function, but this error is ignored, because it happens upstream of a pipe. Consequently, both invocations produce empty 'pack{a,b}.objects' files, and the subsequent 'test_cmp' happily finds those two empty files identical. Fix these two 'list_packed_objects' invocations by specifying the pack index files as parameters. Furthermore, eliminate the pipe in that function by replacing it with an &&-chained pair of commands using an intermediate file, so a failure of 'git show-index' or the shell redirection will fail the test. Signed-off-by: SZEDER Gábor --- t/t5310-pack-bitmaps.sh | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 6ee4d3f2d9..557bd0d0c0 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -9,7 +9,8 @@ objpath () { # show objects present in pack ($1 should be associated *.idx) list_packed_objects () { - git show-index <"$1" | cut -d' ' -f2 + git show-index <"$1" >object-list && + cut -d' ' -f2 object-list } # has_any pattern-file content-file @@ -204,8 +205,8 @@ test_expect_success 'pack-objects to file can use bitmap' ' # verify equivalent packs are generated with/without using bitmap index packasha1=$(git pack-objects --no-use-bitmap-index --all packa packa.objects && - list_packed_objects packb.objects && + list_packed_objects packa-$packasha1.idx >packa.objects && + list_packed_objects packb-$packbsha1.idx >packb.objects && test_cmp packa.objects packb.objects ' -- 2.18.0.720.g1f300496fc
[PATCH] doc: git-describe
commit cc4bd5268b2dbe90279198acb400a528ee23f5d5 Author: William Pursell Date: Tue Aug 14 06:41:00 2018 -0600 doc: Reference for --dirty/--broken diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index e027fb8c4b..f7d67306c0 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -39,11 +39,12 @@ OPTIONS --broken[=]:: Describe the state of the working tree. When the working tree matches HEAD, the output is the same as "git describe - HEAD". If the working tree has local modification "-dirty" - is appended to it. If a repository is corrupt and Git - cannot determine if there is local modification, Git will - error out, unless `--broken' is given, which appends - the suffix "-broken" instead. + HEAD". If the working tree has local modification, "-dirty" + (or , if supplied) is appended to it. If a repository + is corrupt and Git cannot determine if there is local + modification, Git will error out unless `--broken' is given + in which case it will append the suffix "-broken" (or , + if supplied) instead. --all:: Instead of using only the annotated tags, use any ref
Re: [PATCH] mingw: enable atomic O_APPEND
On 8/13/2018 3:02 PM, Johannes Sixt wrote: The Windows CRT implements O_APPEND "manually": on write() calls, the file pointer is set to EOF before the data is written. Clearly, this is not atomic. And in fact, this is the root cause of failures observed in t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where different processes write to the same trace file simultanously; it also occurred in t5400-send-pack.sh, but there it was worked around in 71406ed4d6 ("t5400: avoid concurrent writes into a trace file", 2017-05-18). Fortunately, Windows does support atomic O_APPEND semantics using the file access mode FILE_APPEND_DATA. Provide an implementation that does. This implementation is minimal in such a way that it only implements the open modes that are actually used in the Git code base. Emulation for other modes can be added as necessary later. To become aware of the necessity early, the unusal error ENOSYS is reported if an unsupported mode is encountered. Diagnosed-by: Johannes Schindelin Helped-by: Jeff Hostetler Signed-off-by: Johannes Sixt --- compat/mingw.c | 41 +++-- 1 file changed, 39 insertions(+), 2 deletions(-) [...] This looks good. Thanks for following up on this. Jeff
RE: Contributor Summit planning
On August 14, 2018 2:53 AM, Elijah Newren wrote: > On Mon, Aug 13, 2018 at 10:27 AM Jeff King wrote: > > > > For the past several years, we've held a Git Contributor Summit as > > part of the Git Merge conference. I'd like to get opinions from the > > community to help plan future installments. Any feedback or opinion is > > welcome, but some obvious things to think about: > > > > - where, when, and how often? > > > > Plans are shaping up to have Git Merge 2019 in Brussels right after > > FOSDEM in February (like it was two years ago), with a contributor > > summit attached. > > > > Are there people who would be more likely to attend a contributor > > summit if it were held elsewhere (e.g., in North America, probably > > in the Bay Area)? Are people interested in attending a separate > > contributor summit not attached to the larger Git Merge (and if so, > > is there any other event it might be worth connecting it with, > > time-wise)? Are people interested in going to two summits in a year > > (e.g., Brussels in February, and then maybe some in North America > > later in the year), or is that diminishing returns? > > Convincing my employer to send me to an event in North America is a lot > easier than one in Europe; they mostly allow me to work on git stuff as a side > project just to make me happy rather than as a business priority, so > competing business interests, shifting managers, etc. make things hard for > me to predict (so you may want to weight my preferences less than normal). > > My last manger did say they'd send me to the next contributor summit (I > think even if it ended up being in Europe rather than North America), but of > course, he was pulled to a different team a few months ago, so I'm not sure > if that still stands. > > > On a personal note, I'm also somewhat travel averse. It'd be nice to go to a > Git conference again (last and only I went to was I think Git Together 2011), > but I know when it comes close to time to actually travel, I'll start > questioning my sanity when I said that -- particularly if it's far away or at > all > complicated. (So maybe you really ought to discount my preferences...) Unrelated directly to above, I noticed that I actually showed up on the list for 2018 based on git log anyway. Does this mean I'd be welcome? Personally, it's actually easier to get approval to travel to Brussels now than SFO even though the latter is closer. With that in mind, I can do either (or both - depending on scheduling). Cheers, Randall
Re: [PATCH] mingw: enable atomic O_APPEND
On Mon, Aug 13 2018, Jeff King wrote: > On Mon, Aug 13, 2018 at 11:22:10PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > O_APPEND is POSIX and means race-free append. If you mark some call >> > sites with O_APPEND, then that must be the ones that need race-free >> > append. Hence, you would have to go the other route: Mark those call >> > sites that do _not_ need race-free append with some custom >> > function/macro. (Or mark both with different helpers and avoid writing >> > down O_APPEND.) >> >> O_APPEND in POSIX is race-free only up to PIPE_MAX bytes written at a >> time, which is e.g. 2^12 by default on linux, after that all bets are >> off and the kernel is free to interleave different write calls. [I should have said PIPE_BUF, not PIPE_MAX] > This is a claim I've run across often, but I've never seen a good > citation for it. To clarify I'm claiming that this is not a guarantee POSIX or linux support. Not that in practice it doesn't work on some systems. The relevant POSIX docs are here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set. And the Linux docs at http://man7.org/linux/man-pages/man7/pipe.7.html say something similar: Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.) > Certainly atomic writes to _pipes_ are determined by PIPE_BUF (which > IIRC is not even a constant on Linux, but can be changed at run-time). > But is it relevant for regular-file writes? I believe it's hardcoded at PIPE_BUF which is defined as PAGE_SIZE on linux. I think you may be thinking of /proc/sys/fs/pipe-max-pages which is the number of pages that a pipe will take before filling up, but I may be wrong. > Another gem I found while digging on this O_APPEND/FILE_APPEND_DATA > stuff the other day: somebody claimed that the max atomic-append size on > Linux is 4096 and 1024 on Windows. But their experimental script was > done in bash! So I suspect they were really just measuring the size of > stdio buffers. Yes, and some tests for this are wrong because they use e.g. "print" in some higher-level language which'll always split stuff into writes of 1024 or something. > Here's my attempt at a test setup. This C program forces two processes > to write simultaneously to the same file with O_APPEND: > > -- >8 -- > #include > #include > #include > #include > #include > #include > > static void doit(int size, const char *fn, char c) > { > int fd; > char *buf; > > fd = open(fn, O_WRONLY|O_APPEND|O_CREAT, 0666); > if (fd < 0) { > perror("open"); > return; > } > > buf = malloc(size); > memset(buf, c, size); > > while (1) > write(fd, buf, size); > } > > int main(int argc, const char **argv) > { > int size = atoi(argv[1]); > > if (fork()) > doit(size, argv[2], '1'); > else > doit(size, argv[2], '2'); > return 0; > } > -- 8< -- > > and then this program checks that we saw atomic units of the correct > size: > > -- >8 -- > #include > #include > #include > > int main(int argc, const char **argv) > { > int size = atoi(argv[1]); > char *buf; > > buf = malloc(size); > while (1) { > int i; > /* assume atomic reads, i.e., no signals */ > int r = read(0, buf, size); > if (!r) > break; > for (i = 1; i < size; i++) { > if (buf[i] != buf[0]) { > fprintf(stderr, "overlap\n"); > return 1; > } > } > } > return 0; > } > -- 8< -- > > And then you can do something like: > > for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do > >out ;# clean up from last run > echo "Trying $size..." > timeout 5 ./write $size out > if ! ./check $sizeecho "$size failed" > break > fi > done > > On my Linux system, each of those seems to write several gigabytes > without overlapping. I did manage to hit some failing cases, but they > were never sheared writes, but rather cases where there was an > incomplete write at the end-of-file. Yeah I can't make that fail experimentally either, except in the case you mentioned. I also checked on a NetBSD & OpenBSD and OpenBSD box I have access to, same thing. > So obviously this is all a bit of a tangent. I'd be fine declaring that > trace
Re: Contributor Summit planning
On Tue, Aug 14 2018, Randall S. Becker wrote: > On August 14, 2018 2:53 AM, Elijah Newren wrote: >> On Mon, Aug 13, 2018 at 10:27 AM Jeff King wrote: >> > >> > For the past several years, we've held a Git Contributor Summit as >> > part of the Git Merge conference. I'd like to get opinions from the >> > community to help plan future installments. Any feedback or opinion is >> > welcome, but some obvious things to think about: >> > >> > - where, when, and how often? >> > >> > Plans are shaping up to have Git Merge 2019 in Brussels right after >> > FOSDEM in February (like it was two years ago), with a contributor >> > summit attached. >> > >> > Are there people who would be more likely to attend a contributor >> > summit if it were held elsewhere (e.g., in North America, probably >> > in the Bay Area)? Are people interested in attending a separate >> > contributor summit not attached to the larger Git Merge (and if so, >> > is there any other event it might be worth connecting it with, >> > time-wise)? Are people interested in going to two summits in a year >> > (e.g., Brussels in February, and then maybe some in North America >> > later in the year), or is that diminishing returns? >> >> Convincing my employer to send me to an event in North America is a lot >> easier than one in Europe; they mostly allow me to work on git stuff as a >> side >> project just to make me happy rather than as a business priority, so >> competing business interests, shifting managers, etc. make things hard for >> me to predict (so you may want to weight my preferences less than normal). >> >> My last manger did say they'd send me to the next contributor summit (I >> think even if it ended up being in Europe rather than North America), but of >> course, he was pulled to a different team a few months ago, so I'm not sure >> if that still stands. >> >> >> On a personal note, I'm also somewhat travel averse. It'd be nice to go to a >> Git conference again (last and only I went to was I think Git Together 2011), >> but I know when it comes close to time to actually travel, I'll start >> questioning my sanity when I said that -- particularly if it's far away or >> at all >> complicated. (So maybe you really ought to discount my preferences...) > > Unrelated directly to above, I noticed that I actually showed up on > the list for 2018 based on git log anyway. Does this mean I'd be > welcome? Personally, it's actually easier to get approval to travel to > Brussels now than SFO even though the latter is closer. With that in > mind, I can do either (or both - depending on scheduling). I very much regret sending that 'git log' command without some further explanation. It was only meant as a *very* rough shortlist of people in the context of a discussion of why some active contributors don't come to the contributor summit. I.e. whether that's because of the location, timing or whatever. Any output from such a command definitely doesn't mean "you shouldn't come to the contributor summit because this one-liner doesn't list you". I only meant to suggest that it would be interesting as a heuristic to solicit feedback from people who *are* very active contributors to the main git project who don't show up, to see why that is. Only because it might be indicative of why some people who'd otherwise don't show up don't show up.
Re: Contributor Summit planning
On Tue, Aug 14, 2018 at 09:25:41AM -0400, Randall S. Becker wrote: > Unrelated directly to above, I noticed that I actually showed up on > the list for 2018 based on git log anyway. Does this mean I'd be > welcome? Personally, it's actually easier to get approval to travel to > Brussels now than SFO even though the latter is closer. With that in > mind, I can do either (or both - depending on scheduling). You'd definitely be welcome. The point of having it "only developers" is mostly to keep the numbers at a point where we can all sit around and have round-table discussions. I don't think there are so many people at the fringe of "well, I only have a few commits, is that enough?" for us to need to make any kind of serious cut-off there. What I think we want to avoid is random folks in the "I use Git, and it would be neat to see people talk about it" camp. It would be nice to accommodate that (and it might even manage to suck somebody into working on the project). But that opens up a much larger pool of people, and if (say) a hundred of them want to come, that wrecks the intimate round-table approach. That's all just my opinion, of course. I'm open to suggestions. -Peff
Re: Contributor Summit planning
On Mon, Aug 13, 2018 at 11:06 PM Jeff King wrote: > > On Mon, Aug 13, 2018 at 01:41:51PM -0700, Stefan Beller wrote: > > > > Oh, using "git shortlog" might be also simpler ;-) > > > > I guess you'd need to memorize a different set of flags for that > > as without -s it would be harder to parse than the oneliner above. > > I frequently using "git shortlog -ns" to see who is active (especially > coupled with "--since=". > > I also use "--no-merges", because it makes me look a lot better when > compared relatively to Junio. :) --no-merges makes me number one. Not sure if I should laugh or cry :D Going off topic a bit, can we count the number of topics of each contributor? I could do it by decorating git log with remote refs from Junio's repo and counting based on the two-letter prefix in the topic/ref name but that's too hacky. fyi Jeff you're up to second position now with 34 topics (I'm unfortunately still the first with 38). -- Duy
Re: Contributor Summit planning
On Tue, Aug 14, 2018 at 04:06:23PM +0200, Ævar Arnfjörð Bjarmason wrote: > I very much regret sending that 'git log' command without some further > explanation. > > It was only meant as a *very* rough shortlist of people in the context > of a discussion of why some active contributors don't come to the > contributor summit. I.e. whether that's because of the location, timing > or whatever. > > Any output from such a command definitely doesn't mean "you shouldn't > come to the contributor summit because this one-liner doesn't list > you". Amen. > I only meant to suggest that it would be interesting as a heuristic to > solicit feedback from people who *are* very active contributors to the > main git project who don't show up, to see why that is. Only because it > might be indicative of why some people who'd otherwise don't show up > don't show up. I've bugged people privately in the past to see if they want to come, and I think the limiting factor is usually just time/effort to travel. -Peff
Re: Contributor Summit planning
On Tue, Aug 14, 2018 at 06:31:50AM +0200, Christian Couder wrote: > > We have been kicking around the thought of reviving the GitTogethers > > like back in the olden days (I only know them from hearsay), in > > Mountain View or Sunnyvale at the Google Campus, but we have not yet > > spent enough thought to make it so. > > I think it would be great to have GitTogethers again around the time > of the GSoC Mentor Summit like we did a long time ago! Yeah, that's an interesting concept. In addition to amortizing travel for one or maybe two Git devs who are mentors, it also allowed us to pull in other open source folks who were tangential to Git (e.g., I remember Joey Hess of git-annex fame came one year). On the other hand, we can only send two mentors to the summit, so there is no draw at all for the rest of the folks. :) Timing-wise, it may be getting a little close to plan this year, as it's exactly 2 months away (and I'd think many people, especially coming from Europe, would already have made travel plans). We could start next year, but that's 14 months away. > > Personally I think the way is fine; we could collect topics in advance on > > the list to have a head start, but the whiteboard is totally fine, IMHO. > > Yeah for the GitTogethers we used to collect topics in advance, but we > still had a whiteboard and voted on them at the beginning of the > actual GitTogether. Heh. Every year I ask for topics on the list, and the response leads me to believe that people are happy white-boarding in the morning. ;) -Peff
Re: [PATCH] mingw: enable atomic O_APPEND
Jeff Hostetler writes: > On 8/13/2018 3:02 PM, Johannes Sixt wrote: >> >> Fortunately, Windows does support atomic O_APPEND semantics using the >> file access mode FILE_APPEND_DATA. Provide an implementation that does. >> ... >> >> Diagnosed-by: Johannes Schindelin >> Helped-by: Jeff Hostetler >> Signed-off-by: Johannes Sixt >> --- > > This looks good. Thanks for following up on this. Thanks, all. Will queue.
Re: Contributor Summit planning
On Tue, Aug 14, 2018 at 04:30:09PM +0200, Duy Nguyen wrote: > > I frequently using "git shortlog -ns" to see who is active (especially > > coupled with "--since=". > > > > I also use "--no-merges", because it makes me look a lot better when > > compared relatively to Junio. :) > > --no-merges makes me number one. Not sure if I should laugh or cry :D Since when? Junio still has everyone beat for all time, though of course he cheats with easy ones like "update version field to v2.17.1". :) I also sometimes look at "shortlog -ns --no-merges v2.17.0..v2.18.0" and so on (i.e., each major release). I had a good run from about v2.10 to v2.15, but I've been slipping since then. > Going off topic a bit, can we count the number of topics of each > contributor? I could do it by decorating git log with remote refs from > Junio's repo and counting based on the two-letter prefix in the > topic/ref name but that's too hacky. fyi Jeff you're up to second > position now with 34 topics (I'm unfortunately still the first with > 38). One problem there is that the prefixes are ambiguous (e.g., Jacob Keller shares with me, and I think at least one other over the years). You could look at the author of the tip commit, but that's not always right (and in fact, counting just merged topics misses bug-fixes that get applied directly on top of other people's topics). And of course there's the notion that "topic" might be a documentation typo fix, or it might be the entire range-diff program. I think "surviving lines" is another interesting metric, though it also has flaws (if I s/sha1/oid/ on your line, it becomes my line; even though my change is useful and should be counted, it's probably not as important as whatever the code was doing in the first place). -Peff
Re: [PATCH] mingw: enable atomic O_APPEND
On Tue, Aug 14, 2018 at 03:47:54PM +0200, Ævar Arnfjörð Bjarmason wrote: > The relevant POSIX docs are here: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html > > Write requests of {PIPE_BUF} bytes or less shall not be interleaved > with data from other processes doing writes on the same pipe. Writes > of greater than {PIPE_BUF} bytes may have data interleaved, on > arbitrary boundaries, with writes by other processes, whether or not > the O_NONBLOCK flag of the file status flags is set. Right, this is the part I've seen, but it's pretty clearly only about pipes, not regular files. > > Certainly atomic writes to _pipes_ are determined by PIPE_BUF (which > > IIRC is not even a constant on Linux, but can be changed at run-time). > > But is it relevant for regular-file writes? > > I believe it's hardcoded at PIPE_BUF which is defined as PAGE_SIZE on > linux. I think you may be thinking of /proc/sys/fs/pipe-max-pages which > is the number of pages that a pipe will take before filling up, but I > may be wrong. Yeah, you're probably right. > > So obviously this is all a bit of a tangent. I'd be fine declaring that > > trace output is generally small enough not to worry about this in the > > first place. But those results show that it shouldn't matter even if > > we're writing 1MB trace lines on Linux. I wouldn't be at all surprised > > to see different results on other operating systems, though. > > I don't know how this works internally in the kernel, but I'd be very > careful to make that assertion. Most likely this in practice depends on > what FS you're using, its mount options, whether fsync() is involved, > I/O pressure etc. Definitely it depends on the filesystem (and I'm pretty sure that at least old versions of NFS could not possibly do O_APPEND correctly, because the protocol did not support an atomic seek+write). I agree that the experiment I did doesn't really tell us anything for sure. It _seems_ to work, but the machine was not under any kind of memory or I/O pressure. I'd feel pretty confident that writes under a page are always going to be fine, but anything else is "seems to work". To me that's enough for tracing, as the absolute worst case is jumbled output, not an on-disk corruption. > FWIW this is something I've ran into in the past on Linux as a practical > matter, but that was many kernel versions ago, so maybe the semantics > changed. > > We had an ad-hoc file format with each chunk a " marker>\n" format (and the was > guaranteed not to contain "\n"). These would be written to the same file > by N workers. We would get corrupt data because of this in cases where > we were writing more than PIPE_BUF, e.g. start markers for unrelated > payloads interleaved with content, lines that were incorrectly formed > etc. Interesting. I wonder if it is because of PIPE_BUF, or it is simply the page size, which also happens to be the value of PIPE_BUF. > But yeah, whether this is a practical concern for git trace output is > another matter. I just wanted to chime in to note that atomic appends to > files are only portable on POSIX up to PIPE_BUF. I still think POSIX doesn't say anything either way here. The PIPE_BUF bits are _just_ about pipes. At any rate, I think we have a decent handle on what systems actually _do_, which is more important than POSIX anyway. -Peff
Re: [PATCH v6 11/21] range-diff: add tests
Hi Thomas, On Mon, 13 Aug 2018, Thomas Gummerer wrote: > On 08/13, Thomas Rast via GitGitGadget wrote: > > From: Thomas Rast > > > > These are essentially lifted from https://github.com/trast/tbdiff, with > > light touch-ups to account for the command now being named `git > > range-diff`. > > > > Apart from renaming `tbdiff` to `range-diff`, only one test case needed > > to be adjusted: 11 - 'changed message'. > > > > The underlying reason it had to be adjusted is that diff generation is > > sometimes ambiguous. In this case, a comment line and an empty line are > > added, but it is ambiguous whether they were added after the existing > > empty line, or whether an empty line and the comment line are added > > *before* the existing empty line. And apparently xdiff picks a different > > option here than Python's difflib. > > > > Just noticed while reading the whole series again (hopefully for the > last time :)), do we need Thomas Rast's Sign-off here, as he is > credited as the author here? Hmm. I hoped that my commit message was enough to indicate that while he is the author, I assembled this. Maybe I should move him to the footer, as an Original-Authored-By:? Junio? Ciao, Dscho > > > Signed-off-by: Johannes Schindelin > > --- > > t/.gitattributes | 1 + > > t/t3206-range-diff.sh | 145 ++ > > t/t3206/history.export | 604 + > > 3 files changed, 750 insertions(+) > > create mode 100755 t/t3206-range-diff.sh > > create mode 100644 t/t3206/history.export > > > > diff --git a/t/.gitattributes b/t/.gitattributes > > index 3bd959ae5..b17bf71b8 100644 > > --- a/t/.gitattributes > > +++ b/t/.gitattributes > > @@ -1,6 +1,7 @@ > > t[0-9][0-9][0-9][0-9]/* -whitespace > > /diff-lib/* eol=lf > > /t0110/url-* binary > > +/t3206/* eol=lf > > /t3900/*.txt eol=lf > > /t3901/*.txt eol=lf > > /t4034/*/* eol=lf > > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > > new file mode 100755 > > index 0..2237c7f4a > > --- /dev/null > > +++ b/t/t3206-range-diff.sh > > @@ -0,0 +1,145 @@ > > +#!/bin/sh > > + > > +test_description='range-diff tests' > > + > > +. ./test-lib.sh > > + > > +# Note that because of the range-diff's heuristics, test_commit does more > > +# harm than good. We need some real history. > > + > > +test_expect_success 'setup' ' > > + git fast-import < "$TEST_DIRECTORY"/t3206/history.export > > +' > > + > > +test_expect_success 'simple A..B A..C (unmodified)' ' > > + git range-diff --no-color master..topic master..unmodified \ > > + >actual && > > + cat >expected <<-EOF && > > + 1: 4de457d = 1: 35b9b25 s/5/A/ > > + 2: fccce22 = 2: de345ab s/4/A/ > > + 3: 147e64e = 3: 9af6654 s/11/B/ > > + 4: a63e992 = 4: 2901f77 s/12/B/ > > + EOF > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'simple B...C (unmodified)' ' > > + git range-diff --no-color topic...unmodified >actual && > > + # same "expected" as above > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'simple A B C (unmodified)' ' > > + git range-diff --no-color master topic unmodified >actual && > > + # same "expected" as above > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'trivial reordering' ' > > + git range-diff --no-color master topic reordered >actual && > > + cat >expected <<-EOF && > > + 1: 4de457d = 1: aca177a s/5/A/ > > + 3: 147e64e = 2: 14ad629 s/11/B/ > > + 4: a63e992 = 3: ee58208 s/12/B/ > > + 2: fccce22 = 4: 307b27a s/4/A/ > > + EOF > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'removed a commit' ' > > + git range-diff --no-color master topic removed >actual && > > + cat >expected <<-EOF && > > + 1: 4de457d = 1: 7657159 s/5/A/ > > + 2: fccce22 < -: --- s/4/A/ > > + 3: 147e64e = 2: 43d84d3 s/11/B/ > > + 4: a63e992 = 3: a740396 s/12/B/ > > + EOF > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'added a commit' ' > > + git range-diff --no-color master topic added >actual && > > + cat >expected <<-EOF && > > + 1: 4de457d = 1: 2716022 s/5/A/ > > + 2: fccce22 = 2: b62accd s/4/A/ > > + -: --- > 3: df46cfa s/6/A/ > > + 3: 147e64e = 4: 3e64548 s/11/B/ > > + 4: a63e992 = 5: 12b4063 s/12/B/ > > + EOF > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'new base, A B C' ' > > + git range-diff --no-color master topic rebased >actual && > > + cat >expected <<-EOF && > > + 1: 4de457d = 1: cc9c443 s/5/A/ > > + 2: fccce22 = 2: c5d9641 s/4/A/ > > + 3: 147e64e = 3: 28cc2b6 s/11/B/ > > + 4: a63e992 = 4: 5628ab7 s/12/B/ > > + EOF > > + test_cmp expected actual > > +' > > + > > +test_expect_success 'new base, B...C' ' > > + # this syntax includes the commits from master! > > + git range-diff --no-color topic...rebased >actual && > > + cat >expected <<-EOF && > > + -: --- > 1: a31b12e unre
Re: [PATCH v6 11/21] range-diff: add tests
On Tue, Aug 14, 2018 at 04:53:51PM +0200, Johannes Schindelin wrote: > > > These are essentially lifted from https://github.com/trast/tbdiff, with > > > light touch-ups to account for the command now being named `git > > > range-diff`. > [...] > > Just noticed while reading the whole series again (hopefully for the > > last time :)), do we need Thomas Rast's Sign-off here, as he is > > credited as the author here? > > Hmm. I hoped that my commit message was enough to indicate that while he > is the author, I assembled this. Maybe I should move him to the footer, as > an Original-Authored-By:? I think the "Author" field is actually distinct from the copyright provenance. In this case it ought to be perfectly fine to add your signed-off-by under the DCO's point b: The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications [...] This is based on the tests in tbdiff, which is explicitly GPL'd by Thomas. So your signoff certifies that, which is fine. As for the author field, IMHO it serves two purposes: - to give credit where it is due - so that people digging in history know who to contact for questions/problems In this case it probably makes sense for it to be you, as you'd take responsibility for the code in _this_ project. And as you note, you can give credit in the commit message (the only unfortunate thing is that most automated statistics would not credit Thomas, but in theory they could by mentioning him in the trailer). -Peff
Re: [PATCHv2 0/8] Resending sb/range-diff-colors
Hi Stefan, On Mon, 13 Aug 2018, Stefan Beller wrote: > This applies on top of js/range-diff (a7be92acd9600), this version > * resolves a semantic conflict in the test > (Did range-diff change its output slightly?) If by semantic conflict you mean... > 23: 6a1c7698c68 ! 23: 5701745379b t3206: add color test for range-diff > --dual-color > @@ -23,7 +23,7 @@ > +: s/4/A/ > +: > +:+Also a silly comment > here! > -+:+ > ++:+ ... this, then I do not understand. This looks like a straight-up change in your commit. v1 of this patch did not append to the end, while v2 apparently does. And it looks a bit funny, indeed. In any case, from what I see you indeed addressed all my comments (the need_reset was done differently than I suggested, but still okay). Ciao, Dscho > +: diff --git a/file b/file > +: --- a/file > +: +++ b/file > 24: 7e12ece1d34 = 24: 4e56f63a5f5 diff.c: simplify caller of emit_line_0 > 25: 74dabd6d36f ! 25: cf126556940 diff.c: reorder arguments for > emit_line_ws_markup > @@ -3,7 +3,8 @@ > diff.c: reorder arguments for emit_line_ws_markup > > The order shall be all colors first, then the content, flags at the > end. > -The colors are in order. > +The colors are in the order of occurrence, i.e. first the color for > the > +sign and then the color for the rest of the line. > > Signed-off-by: Stefan Beller > Signed-off-by: Junio C Hamano > 26: e304e15aa6b ! 26: 69008364cb8 diff.c: add set_sign to emit_line_0 > @@ -2,14 +2,17 @@ > > diff.c: add set_sign to emit_line_0 > > -For now just change the signature, we'll reason about the actual > -change in a follow up patch. > +Split the meaning of the `set` parameter that is passed to > +emit_line_0()` to separate between the color of the "sign" (i.e. > +the diff marker '+', '-' or ' ' that is passed in as the `first` > +parameter) and the color of the rest of the line. > > -Pass 'set_sign' (which is output before the sign) and 'set' which > -controls the color after the first character. Hence, promote any > -'set's to 'set_sign' as we want to have color before the sign > -for now. > +This changes the meaning of the `set` parameter to no longer refer > +to the color of the diff marker, but instead to refer to the color > +of the rest of the line. A value of `NULL` indicates that the rest > +of the line wants to be colored the same as the diff marker. > > +Helped-by: Johannes Schindelin > Signed-off-by: Stefan Beller > Signed-off-by: Junio C Hamano > > @@ -30,12 +33,15 @@ > if (reverse && want_color(o->use_color)) > fputs(GIT_COLOR_REVERSE, file); > -fputs(set, file); > -+if (set_sign && set_sign[0]) > ++if (set_sign) > +fputs(set_sign, file); > if (first && !nofirst) > fputc(first, file); > -+if (set) > ++if (set && set != set_sign) { > ++if (set_sign) > ++fputs(reset, file); > +fputs(set, file); > ++} > fwrite(line, len, 1, file); > fputs(reset, file); > } > 27: 8d935d5212c < -: --- diff: use emit_line_0 once per line > 28: 2344aac787a < -: --- diff.c: compute reverse locally in > emit_line_0 > -: --- > 27: 5d2629281a1 diff: use emit_line_0 once per line > -: --- > 28: 5240e94a69a diff.c: omit check for line prefix in > emit_line_0 > 29: 4dc97b54a35 ! 29: 058e03a1601 diff.c: rewrite emit_line_0 more > understandably > @@ -2,21 +2,15 @@ > > diff.c: rewrite emit_line_0 more understandably > > -emit_line_0 has no nested conditions, but puts out all its arguments > -(if set) in order. There is still a slight confusion with set > -and set_sign, but let's defer that to a later patch. > +Rewrite emit_line_0 to have fewer (nested) conditions. > > -'first' used be output always no matter if it was 0, but that got > lost > -at "diff: add an internal option to dual-color diffs of diffs", > -2018-07-21), as there we broadened the meaning of 'first' to also > -signal an early return. > - > -The change in 'emit_line' makes sure that 'first' is never content, > but > -always under our control, a sign or special character in the > beginning > -of the line (or 0, in which case
Re: [PATCH v6 11/21] range-diff: add tests
On Tue, Aug 14, 2018 at 11:03:10AM -0400, Jeff King wrote: > > Hmm. I hoped that my commit message was enough to indicate that while he > > is the author, I assembled this. Maybe I should move him to the footer, as > > an Original-Authored-By:? > > I think the "Author" field is actually distinct from the copyright > provenance. In this case it ought to be perfectly fine to add your > signed-off-by under the DCO's point b: > > The contribution is based upon previous work that, to the best > of my knowledge, is covered under an appropriate open source > license and I have the right under that license to submit that > work with modifications [...] > > This is based on the tests in tbdiff, which is explicitly GPL'd by > Thomas. So your signoff certifies that, which is fine. > > As for the author field, IMHO it serves two purposes: > > - to give credit where it is due > > - so that people digging in history know who to contact for > questions/problems > > In this case it probably makes sense for it to be you, as you'd take > responsibility for the code in _this_ project. And as you note, you can > give credit in the commit message (the only unfortunate thing is that > most automated statistics would not credit Thomas, but in theory they > could by mentioning him in the trailer). One thing I should have made clear: this is all my opinion, and anything Thomas expresses trumps that. But since he hasn't been active lately, this is all what I would do in the absence of input from him. Obviously a sign-off from him is better than none. :) -Peff
Re: [PATCH 1/1] mark_colliding_entries(): fix incorrect #if...#endif guard
"Johannes Schindelin via GitGitGadget" writes: > From: Johannes Schindelin > > The way the guard was put, the code was declaring an unused variable on > Windows. No need to do that, so let's fix it. > > Signed-off-by: Johannes Schindelin > --- > entry.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/entry.c b/entry.c > index c70340df8..2bce13352 100644 > --- a/entry.c > +++ b/entry.c > @@ -402,11 +402,9 @@ static int check_path(const char *path, int len, struct > stat *st, int skiplen) > static void mark_colliding_entries(const struct checkout *state, > struct cache_entry *ce, struct stat *st) > { > +#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > int i; > > - ce->ce_flags |= CE_MATCHED; > - > -#if !defined(GIT_WINDOWS_NATIVE) /* inode is always zero on Windows */ > for (i = 0; i < state->istate->cache_nr; i++) { > struct cache_entry *dup = state->istate->cache[i]; > > @@ -422,6 +420,8 @@ static void mark_colliding_entries(const struct checkout > *state, > } > } > #endif > + > + ce->ce_flags |= CE_MATCHED; > } Even though it looks a bit odd to smudge 'ce' itself after the loop, I think this would not break anything, simply because the loop does treat the 'ce' as special and stops the iteration once it sees it, even before its MATCHED bit is set. Duy, if you are going to send a new version updated for other reasons, please squash this in. In the meantime, I'll queue it on top of what we have right now. Thanks, both.
Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
On 8/13/2018 2:14 PM, Matthew DeVore wrote: Teach list-objects the "tree:0" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:0 - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also consider only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. The name "tree:0" allows later filtering based on depth, i.e. "tree:1" would filter out all but the root tree and blobs. In order to avoid confusion between 0 and capital O, the documentation was worded in a somewhat round-about way that also hints at this future improvement to the feature. Signed-off-by: Matthew DeVore --- Documentation/rev-list-options.txt | 3 ++ list-objects-filter-options.c | 4 +++ list-objects-filter-options.h | 1 + list-objects-filter.c | 50 ++ t/t5317-pack-objects-filter-objects.sh | 27 ++ t/t5616-partial-clone.sh | 27 ++ t/t6112-rev-list-filters-objects.sh| 13 +++ 7 files changed, 125 insertions(+) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7b273635d..9e351ec2a 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -743,6 +743,9 @@ specification contained in . A debug option to help with future "partial clone" development. This option specifies how missing objects are handled. + +The form '--filter=tree:' omits all blobs and trees deeper than + from the root tree. Currently, only =0 is supported. ++ The form '--missing=error' requests that rev-list stop with an error if a missing object is encountered. This is the default action. + diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index c0e2bd6a0..a28382940 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter( return 0; } + } else if (!strcmp(arg, "tree:0")) { + filter_options->choice = LOFC_TREE_NONE; + return 0; + } else if (skip_prefix(arg, "sparse:oid=", &v0)) { struct object_context oc; struct object_id sparse_oid; diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index a61f8..af64e5c66 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -10,6 +10,7 @@ enum list_objects_filter_choice { LOFC_DISABLED = 0, LOFC_BLOB_NONE, LOFC_BLOB_LIMIT, + LOFC_TREE_NONE, LOFC_SPARSE_OID, LOFC_SPARSE_PATH, LOFC__COUNT /* must be last */ diff --git a/list-objects-filter.c b/list-objects-filter.c index a0ba78b20..8e3caf5bf 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -80,6 +80,55 @@ static void *filter_blobs_none__init( return d; } +/* + * A filter for list-objects to omit ALL trees and blobs from the traversal. + * Can OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_trees_none_data { + struct oidset *omits; +}; + +static enum list_objects_filter_result filter_trees_none( + enum list_objects_filter_situation filter_situation, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_trees_none_data *filter_data = filter_data_; + + switch (filter_situation) { + default: + die("unknown filter_situation"); + return LOFR_ZERO; + + case LOFS_BEGIN_TREE: + case LOFS_BLOB: + if (filter_data->omits) + oidset_insert(filter_data->omits, &obj->oid); + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */ + + case LOFS_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + } +} There are a couple of options here: [] If really want to omit all trees and blobs (and we DO NOT want the oidset of everything omitted), then we might be able to shortcut the traversal and speed things up. {} add a LOFR_SKIP_TREE bit to list_objects_filter_result {} test this bit process_tree() and avoid the init_tree_desc() and the while loop and some adjacent setup/tear-down code. {} make this filter something like: case LOFS_BEGIN_TREE: if (filter_data->omits) { oidset_insert(filter_data->omits, &obj->oid); return LOFR_MARK_SEEN; /* ... (hard omit) */ } else
Re: [PATCH v6 11/21] range-diff: add tests
Johannes Schindelin writes: > Hmm. I hoped that my commit message was enough to indicate that while he > is the author, I assembled this. Maybe I should move him to the footer, as > an Original-Authored-By:? > > Junio? I think the log message gives a clear enough statement to credit the original author. Sign-off is not about credit, but is about making sure we know the provenance of the contribution we would use in our codebase, so it would be nice to have, but lifting code from another project (i.e. TRast's tbdiff) that is appropriately licensed (GPLv2) verbatim is something you can do with your own sign-off, without the original author's sign-off, so I think what we have is good.
Re: [PATCH v2 3/5] rev-list: handle missing tree objects properly
> I don't understand about the ">= 0". What should I replace it with? > Maybe you mean the return is never positive so I can change: > > parse_tree_gently(tree, 1) >= 0 > > to: > !parse_tree_gently(tree, 1) > > ? Sorry for the lack of clarity - that is what I meant. > > The missing mechanism (for error, allow-any, print) should work without > > needing to consult whether an object is a promisor object or not - it > > should just print whatever is missing, so the "if > > (!is_promisor_object..." line looks out of place. > Done. I considered that a missing object which is not a promisor is a > serious error, so I had it die here. It is a serious error, but as far as I can tell, that is what the --missing flags are supposed to help diagnose (so we can't die since we need the diagnoses to be printed). See, for example, 'rev-list W/ --missing=print' in t6112 - the "r1" repository does not have partial clone enabled (which I verified by inserting a test_pause then cat-ting r1/.git/config), but nothing dies. > But now that I've added the > do_not_die_on_missing_tree flag, it's more natural to keep the > previous promisor check as-is. OK, I'll take a look once you send out v4. > Also, is_promisor_object is an > expensive check, and it would be better to skip it during the common > execution path (which should be when exclude_promisor_objects, an > internal-use-only flag, is *not* set, which means we never call > is_promisor_object. That's true. > > In my original review [1], I suggested that we always show a tree if we > > have its hash - if we don't have the object, we just recurse into it. > > This would be the same as your patch, except that the 'die("bad tree > > object...' is totally removed instead of merely moved. I still think > > this solution has some merit - all the tests still pass (except that we > > need to check for "unable to read" instead of "bad tree object" in error > > messages), but I just realized that it might still be backwards > > incompatible in that a basic "rev-list --objects" would now succeed > > instead of fail if a tree was missing (I haven't tested this though). > The presence of the die if !is_promisor_object is what justified the > changing of the parse_tree_gently to always be gently, since it is > what showed the OID. Can we really remove both? Maybe in a different > patch set, since I'm no longer touching that line? That's true - the idea of removing both needs more thought, and if we were to do so, we definitely can do it in a different patch set. > > We might need a flag called "do_not_die_on_missing_tree" (much like your > > original idea of "show_missing_trees") so that callers that are prepared > > to deal with missing trees can set this. Sorry for the churn. You can > > document it as such: > Added it, but not with a command-line flag, only in rev-info.h. We can > always add a flag later if people have been relying on the existing > behavior of git rev-list to balk at missing trees. (That seems > unlikely though, considering there is no filter to enable that before > this patchset). By flag, I indeed meant in rev-info.h - sorry for the confusion. That sounds good.
Re: [PATCH] doc: git-describe
William Pursell writes: > commit cc4bd5268b2dbe90279198acb400a528ee23f5d5 > Author: William Pursell > Date: Tue Aug 14 06:41:00 2018 -0600 > > doc: Reference for --dirty/--broken > Thanks for trying to help. Overall the resulting text does look much better than the original, but - "git show" output is for your own consumption, it is not something you would feed other people (see patches from other people on the list, https://public-inbox.org/git/20180814110525.17801-7-...@ao2.it/raw for example) - Please sign-off your patch (see Documentation/SubmittingPatches) - The patch is seriously whitespace damaged and does not apply. Please first try sending it to yourself and make sure it applies to the commit your change was based on, before sending it to the list. Thanks. > diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt > index e027fb8c4b..f7d67306c0 100644 > --- a/Documentation/git-describe.txt > +++ b/Documentation/git-describe.txt > @@ -39,11 +39,12 @@ OPTIONS > --broken[=]:: > Describe the state of the working tree. When the working > tree matches HEAD, the output is the same as "git describe > - HEAD". If the working tree has local modification "-dirty" > - is appended to it. If a repository is corrupt and Git > - cannot determine if there is local modification, Git will > - error out, unless `--broken' is given, which appends > - the suffix "-broken" instead. > + HEAD". If the working tree has local modification, "-dirty" > + (or , if supplied) is appended to it. If a repository > + is corrupt and Git cannot determine if there is local > + modification, Git will error out unless `--broken' is given > + in which case it will append the suffix "-broken" (or , > + if supplied) instead. > > --all:: > Instead of using only the annotated tags, use any ref
Re: [PATCHv2 0/8] Resending sb/range-diff-colors
On Tue, Aug 14, 2018 at 8:05 AM Johannes Schindelin wrote: > > Hi Stefan, > > On Mon, 13 Aug 2018, Stefan Beller wrote: > > > This applies on top of js/range-diff (a7be92acd9600), this version > > * resolves a semantic conflict in the test > > (Did range-diff change its output slightly?) > > If by semantic conflict you mean... > > > 23: 6a1c7698c68 ! 23: 5701745379b t3206: add color test for range-diff > > --dual-color > > @@ -23,7 +23,7 @@ > > +: s/4/A/ > > +: > > +:+Also a silly comment > > here! > > -+:+ > > ++:+ > > ... this, then I do not understand. This looks like a straight-up change > in your commit. v1 of this patch did not append to the end, > while v2 apparently does. Yes, I did mean this. Both before and now the test passes on top of js/range-diff (which was resend), so I concluded that the logic in range-diff itself has changed. However this is changed in a later patch to not appear any more. > > In any case, from what I see you indeed addressed all my comments (the > need_reset was done differently than I suggested, but still okay). Thanks. So unless I hear from other people I consider this series done. (just like range-diff is not going to be resend but built on top now) Thanks, Stefan
Re: Contributor Summit planning
On Tue, Aug 14, 2018 at 7:47 AM Jeff King wrote: > > One problem there is that the prefixes are ambiguous (e.g., Jacob Keller > shares with me, and I think at least one other over the years). You > could look at the author of the tip commit, but that's not always right > (and in fact, counting just merged topics misses bug-fixes that get > applied directly on top of other people's topics). And of course there's > the notion that "topic" might be a documentation typo fix, or it might > be the entire range-diff program. One could take all topics and see if you have at least one commit in there. But that would mostly measure how much of an allrounder you are in the code base (e.g. bug or style fixes such as Ramsay's "please squash this" would be in many topics if not squashed). There are other players who are very deep into one area of the code, and probably have fewer series. > I think "surviving lines" is another interesting metric, though it also > has flaws (if I s/sha1/oid/ on your line, it becomes my line; even > though my change is useful and should be counted, it's probably not as > important as whatever the code was doing in the first place). I wonder if we could measure the entropy added to the code base instead. A patch that does s/sha1/oid/ (or introduction of a repository argument) might compress very well, whereas new code might not compress very well. ;-)
Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up
On 08/14, Antonio Ospite wrote: > Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with > invalid lines in .gitmodules but then only the second commit is removed. > > This may affect future subsequent tests if they assume that the > .gitmodules file has no errors. > > Remove both the commits as soon as they are not needed anymore. > > The error introduced in test 5 is also required by test 6, so the two > commits from above are removed respectively in tests 6 and 8. Thanks for cleaning this up. We seem to have a habit for leaving testing state around for longer than is necessary which makes it a bit more difficult to read and understand when looking at it later. What would really be nice is if each test was self-contained...course that would take a herculean effort to realize in our testsuite so I'm not suggesting you do that :) > > Signed-off-by: Antonio Ospite > --- > t/t7411-submodule-config.sh | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh > index 0bde5850ac..c6b6cf6fae 100755 > --- a/t/t7411-submodule-config.sh > +++ b/t/t7411-submodule-config.sh > @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets > continue' ' > ' > > test_expect_success 'error message contains blob reference' ' > + # Remove the error introduced in the previous test. > + # It is not needed in the following tests. > + test_when_finished "git -C super reset --hard HEAD^" && > (cd super && > sha1=$(git rev-parse HEAD) && > test-tool submodule-config \ > @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' ' > ' > > test_expect_success 'error in history in fetchrecursesubmodule lets > continue' ' > + test_when_finished "git -C super reset --hard HEAD^" && > (cd super && > git config -f .gitmodules \ > submodule.submodule.fetchrecursesubmodules blabla && > @@ -134,8 +138,7 @@ test_expect_success 'error in history in > fetchrecursesubmodule lets continue' ' > HEAD b \ > HEAD submodule \ > >actual && > - test_cmp expect_error actual && > - git reset --hard HEAD^ > + test_cmp expect_error actual > ) > ' > > -- > 2.18.0 > -- Brandon Williams
Re: [PATCH v3 4/7] submodule--helper: add a new 'config' subcommand
On 08/14, Antonio Ospite wrote: > Add a new 'config' subcommand to 'submodule--helper', this extra level > of indirection makes it possible to add some flexibility to how the > submodules configuration is handled. > > Signed-off-by: Antonio Ospite > --- > builtin/submodule--helper.c | 14 ++ > new | 0 Looks like you may have accidentally left in an empty file "new" it should probably be removed from this commit before it gets merged. Aside from that this patch looks good. I've recently run into issues where we don't have a good enough abstraction layer around how we interact with submodules so I'm glad we're moving towards better abstractions :) > t/t7411-submodule-config.sh | 26 ++ > 3 files changed, 40 insertions(+) > create mode 100644 new > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index a3c4564c6c..7481d03b63 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -2029,6 +2029,19 @@ static int connect_gitdir_workingtree(int argc, const > char **argv, const char *p > return 0; > } > > +static int module_config(int argc, const char **argv, const char *prefix) > +{ > + /* Equivalent to ACTION_GET in builtin/config.c */ > + if (argc == 2) > + return print_config_from_gitmodules(argv[1]); > + > + /* Equivalent to ACTION_SET in builtin/config.c */ > + if (argc == 3) > + return config_set_in_gitmodules_file_gently(argv[1], argv[2]); > + > + die("submodule--helper config takes 1 or 2 arguments: name [value]"); > +} > + > #define SUPPORT_SUPER_PREFIX (1<<0) > > struct cmd_struct { > @@ -2057,6 +2070,7 @@ static struct cmd_struct commands[] = { > {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX}, > {"is-active", is_active, 0}, > {"check-name", check_name, 0}, > + {"config", module_config, 0}, > }; > > int cmd_submodule__helper(int argc, const char **argv, const char *prefix) > diff --git a/new b/new > new file mode 100644 > index 00..e69de29bb2 > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh > index c6b6cf6fae..4afb6f152e 100755 > --- a/t/t7411-submodule-config.sh > +++ b/t/t7411-submodule-config.sh > @@ -142,4 +142,30 @@ test_expect_success 'error in history in > fetchrecursesubmodule lets continue' ' > ) > ' > > +test_expect_success 'reading submodules config with "submodule--helper > config"' ' > + (cd super && > + echo "../submodule" >expected && > + git submodule--helper config submodule.submodule.url >actual && > + test_cmp expected actual > + ) > +' > + > +test_expect_success 'writing submodules config with "submodule--helper > config"' ' > + (cd super && > + echo "new_url" >expected && > + git submodule--helper config submodule.submodule.url "new_url" > && > + git submodule--helper config submodule.submodule.url >actual && > + test_cmp expected actual > + ) > +' > + > +test_expect_success 'overwriting unstaged submodules config with > "submodule--helper config"' ' > + (cd super && > + echo "newer_url" >expected && > + git submodule--helper config submodule.submodule.url > "newer_url" && > + git submodule--helper config submodule.submodule.url >actual && > + test_cmp expected actual > + ) > +' > + > test_done > -- > 2.18.0 > -- Brandon Williams
Re: [PATCH v3 5/7] submodule: use the 'submodule--helper config' command
On 08/14, Antonio Ospite wrote: > Use the 'submodule--helper config' command in git-submodules.sh to avoid > referring explicitly to .gitmodules by the hardcoded file path. > > This makes it possible to access the submodules configuration in a more > controlled way. > > Signed-off-by: Antonio Ospite Looks great. I also like you're approach of introducing the new API and testing it in one commit, and then using it in the next. Makes the patch set very easy to follow. > --- > git-submodule.sh | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/git-submodule.sh b/git-submodule.sh > index 8b5ad59bde..ff258e2e8c 100755 > --- a/git-submodule.sh > +++ b/git-submodule.sh > @@ -72,7 +72,7 @@ get_submodule_config () { > value=$(git config submodule."$name"."$option") > if test -z "$value" > then > - value=$(git config -f .gitmodules submodule."$name"."$option") > + value=$(git submodule--helper config > submodule."$name"."$option") > fi > printf '%s' "${value:-$default}" > } > @@ -283,11 +283,11 @@ or you are unsure what this means choose another name > with the '--name' option." > git add --no-warn-embedded-repo $force "$sm_path" || > die "$(eval_gettext "Failed to add submodule '\$sm_path'")" > > - git config -f .gitmodules submodule."$sm_name".path "$sm_path" && > - git config -f .gitmodules submodule."$sm_name".url "$repo" && > + git submodule--helper config submodule."$sm_name".path "$sm_path" && > + git submodule--helper config submodule."$sm_name".url "$repo" && > if test -n "$branch" > then > - git config -f .gitmodules submodule."$sm_name".branch "$branch" > + git submodule--helper config submodule."$sm_name".branch > "$branch" > fi && > git add --force .gitmodules || > die "$(eval_gettext "Failed to register submodule '\$sm_path'")" > -- > 2.18.0 > -- Brandon Williams
Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out
On 08/14, Antonio Ospite wrote: > When the .gitmodules file is not available in the working tree, try > using HEAD:.gitmodules from the current branch. This covers the case > when the file is part of the repository but for some reason it is not > checked out, for example because of a sparse checkout. > > This makes it possible to use at least the 'git submodule' commands > which *read* the gitmodules configuration file without fully populating > the working tree. > > Writing to .gitmodules will still require that the file is checked out, > so check for that before calling config_set_in_gitmodules_file_gently. > > Add a similar check also in git-submodule.sh::cmd_add() to anticipate > the eventual failure of the "git submodule add" command when .gitmodules > is not safely writeable; this prevents the command from leaving the > repository in a spurious state (e.g. the submodule repository was cloned > but .gitmodules was not updated because > config_set_in_gitmodules_file_gently failed). > > Finally, add t7416-submodule-sparse-gitmodules.sh to verify that reading > from .gitmodules succeeds and that writing to it fails when the file is > not checked out. > > Signed-off-by: Antonio Ospite > --- > > Maybe the check in config_set_in_gitmodules_file_gently and > git-submodule.sh::cmd_add() can share some code: > > - add an is_gitmodules_safely_writeable() helper > - expose a "submodule--helper config --is-safely-writeable" subcommand > > But for now I preferred to keep the changes with v2 to a minimum to avoid > blocking the series. > > If adding a new helper is preferred I can do a v4 or send a follow-up patch. I see how it would be nice to have the addition of a helper like this. I think maybe at some point we'd want it but its definitely not needed now and can easily be added at a later point (maybe we can avoid needing it until we can convert all of the git-submodule.sh code to C!). Great work, thanks for working on this. > > Thank you, >Antonio > > > builtin/submodule--helper.c| 17 - > cache.h| 1 + > git-submodule.sh | 7 ++ > submodule-config.c | 16 - > t/t7416-submodule-sparse-gitmodules.sh | 90 ++ > 5 files changed, 128 insertions(+), 3 deletions(-) > create mode 100755 t/t7416-submodule-sparse-gitmodules.sh > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index 7481d03b63..c0370a756b 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -2036,8 +2036,23 @@ static int module_config(int argc, const char **argv, > const char *prefix) > return print_config_from_gitmodules(argv[1]); > > /* Equivalent to ACTION_SET in builtin/config.c */ > - if (argc == 3) > + if (argc == 3) { > + struct object_id oid; > + > + /* > + * If the .gitmodules file is not in the working tree but it > + * is in the current branch, stop, as writing new values (and > + * staging them) would blindly overwrite ALL the old content. > + * > + * This check still makes it possible to create a brand new > + * .gitmodules when it is safe to do so: when neither > + * GITMODULES_FILE nor GITMODULES_HEAD exist. > + */ > + if (!file_exists(GITMODULES_FILE) && get_oid(GITMODULES_HEAD, > &oid) >= 0) > + die(_("please make sure that the .gitmodules file in > the current branch is checked out")); > + > return config_set_in_gitmodules_file_gently(argv[1], argv[2]); > + } > > die("submodule--helper config takes 1 or 2 arguments: name [value]"); > } > diff --git a/cache.h b/cache.h > index 8dc7134f00..900f9e09e5 100644 > --- a/cache.h > +++ b/cache.h > @@ -486,6 +486,7 @@ static inline enum object_type object_type(unsigned int > mode) > #define INFOATTRIBUTES_FILE "info/attributes" > #define ATTRIBUTE_MACRO_PREFIX "[attr]" > #define GITMODULES_FILE ".gitmodules" > +#define GITMODULES_HEAD "HEAD:.gitmodules" > #define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF" > #define GIT_NOTES_DEFAULT_REF "refs/notes/commits" > #define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF" > diff --git a/git-submodule.sh b/git-submodule.sh > index ff258e2e8c..b1cb187227 100755 > --- a/git-submodule.sh > +++ b/git-submodule.sh > @@ -159,6 +159,13 @@ cmd_add() > shift > done > > + # For more details about this check, see > + # builtin/submodule--helper.c::module_config() > + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > > /dev/null 2>&1 > + then > + die "$(eval_gettext "please make sure that the .gitmodules > file in the current branch is checked out")" > + fi > + > if test -n "$reference_path" > then > is_absolute_path "$reference_path" || >
[PATCH v4 1/6] list-objects: store common func args in struct
This will make utility functions easier to create, as done by the next patch. Signed-off-by: Matthew DeVore --- list-objects.c | 158 +++-- 1 file changed, 74 insertions(+), 84 deletions(-) diff --git a/list-objects.c b/list-objects.c index c99c47ac1..584518a3f 100644 --- a/list-objects.c +++ b/list-objects.c @@ -12,20 +12,25 @@ #include "packfile.h" #include "object-store.h" -static void process_blob(struct rev_info *revs, +struct traversal_context { + struct rev_info *revs; + show_object_fn show_object; + show_commit_fn show_commit; + void *show_data; + filter_object_fn filter_fn; + void *filter_data; +}; + +static void process_blob(struct traversal_context *ctx, struct blob *blob, -show_object_fn show, struct strbuf *path, -const char *name, -void *cb_data, -filter_object_fn filter_fn, -void *filter_data) +const char *name) { struct object *obj = &blob->object; size_t pathlen; enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW; - if (!revs->blob_objects) + if (!ctx->revs->blob_objects) return; if (!obj) die("bad blob object"); @@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs, * may cause the actual filter to report an incomplete list * of missing objects. */ - if (revs->exclude_promisor_objects && + if (ctx->revs->exclude_promisor_objects && !has_object_file(&obj->oid) && is_promisor_object(&obj->oid)) return; pathlen = path->len; strbuf_addstr(path, name); - if (!(obj->flags & USER_GIVEN) && filter_fn) - r = filter_fn(LOFS_BLOB, obj, - path->buf, &path->buf[pathlen], - filter_data); + if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) + r = ctx->filter_fn(LOFS_BLOB, obj, + path->buf, &path->buf[pathlen], + ctx->filter_data); if (r & LOFR_MARK_SEEN) obj->flags |= SEEN; if (r & LOFR_DO_SHOW) - show(obj, path->buf, cb_data); + ctx->show_object(obj, path->buf, ctx->show_data); strbuf_setlen(path, pathlen); } @@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs, * the link, and how to do it. Whether it necessarily makes * any sense what-so-ever to ever do that is another issue. */ -static void process_gitlink(struct rev_info *revs, +static void process_gitlink(struct traversal_context *ctx, const unsigned char *sha1, - show_object_fn show, struct strbuf *path, - const char *name, - void *cb_data) + const char *name) { /* Nothing to do */ } -static void process_tree(struct rev_info *revs, +static void process_tree(struct traversal_context *ctx, struct tree *tree, -show_object_fn show, struct strbuf *base, -const char *name, -void *cb_data, -filter_object_fn filter_fn, -void *filter_data) +const char *name) { struct object *obj = &tree->object; + struct rev_info *revs = ctx->revs; struct tree_desc desc; struct name_entry entry; enum interesting match = revs->diffopt.pathspec.nr == 0 ? @@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs, } strbuf_addstr(base, name); - if (!(obj->flags & USER_GIVEN) && filter_fn) - r = filter_fn(LOFS_BEGIN_TREE, obj, - base->buf, &base->buf[baselen], - filter_data); + if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) + r = ctx->filter_fn(LOFS_BEGIN_TREE, obj, + base->buf, &base->buf[baselen], + ctx->filter_data); if (r & LOFR_MARK_SEEN) obj->flags |= SEEN; if (r & LOFR_DO_SHOW) - show(obj, base->buf, cb_data); + ctx->show_object(obj, base->buf, ctx->show_data); if (base->len) strbuf_addch(base, '/'); @@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs, } if (S_ISDIR(entry.mode)) - process_tree(revs, + process_tree(ctx, lookup_tree(the_repository
[PATCH v4 2/6] list-objects: refactor to process_tree_contents
This will be used in a follow-up patch to reduce indentation needed when invoking the logic conditionally. i.e. rather than: if (foo) { while (...) { /* this is very indented */ } } we will have: if (foo) process_tree_contents(...); Signed-off-by: Matthew DeVore --- list-objects.c | 68 ++ 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/list-objects.c b/list-objects.c index 584518a3f..ccc529e5e 100644 --- a/list-objects.c +++ b/list-objects.c @@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx, /* Nothing to do */ } +static void process_tree(struct traversal_context *ctx, +struct tree *tree, +struct strbuf *base, +const char *name); + +static void process_tree_contents(struct traversal_context *ctx, + struct tree *tree, + struct strbuf *base) +{ + struct tree_desc desc; + struct name_entry entry; + enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ? + all_entries_interesting : entry_not_interesting; + + init_tree_desc(&desc, tree->buffer, tree->size); + + while (tree_entry(&desc, &entry)) { + if (match != all_entries_interesting) { + match = tree_entry_interesting(&entry, base, 0, + &ctx->revs->diffopt.pathspec); + if (match == all_entries_not_interesting) + break; + if (match == entry_not_interesting) + continue; + } + + if (S_ISDIR(entry.mode)) + process_tree(ctx, +lookup_tree(the_repository, entry.oid), +base, entry.path); + else if (S_ISGITLINK(entry.mode)) + process_gitlink(ctx, entry.oid->hash, + base, entry.path); + else + process_blob(ctx, +lookup_blob(the_repository, entry.oid), +base, entry.path); + } +} + static void process_tree(struct traversal_context *ctx, struct tree *tree, struct strbuf *base, @@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx, { struct object *obj = &tree->object; struct rev_info *revs = ctx->revs; - struct tree_desc desc; - struct name_entry entry; - enum interesting match = revs->diffopt.pathspec.nr == 0 ? - all_entries_interesting: entry_not_interesting; int baselen = base->len; enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW; int gently = revs->ignore_missing_links || @@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx, if (base->len) strbuf_addch(base, '/'); - init_tree_desc(&desc, tree->buffer, tree->size); - - while (tree_entry(&desc, &entry)) { - if (match != all_entries_interesting) { - match = tree_entry_interesting(&entry, base, 0, - &revs->diffopt.pathspec); - if (match == all_entries_not_interesting) - break; - if (match == entry_not_interesting) - continue; - } - - if (S_ISDIR(entry.mode)) - process_tree(ctx, -lookup_tree(the_repository, entry.oid), -base, entry.path); - else if (S_ISGITLINK(entry.mode)) - process_gitlink(ctx, entry.oid->hash, base, entry.path); - else - process_blob(ctx, -lookup_blob(the_repository, entry.oid), -base, entry.path); - } + process_tree_contents(ctx, tree, base); if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) { r = ctx->filter_fn(LOFS_END_TREE, obj, -- 2.18.0.865.gffc8e1a3cd6-goog
[PATCH v4 5/6] revision: mark non-user-given objects instead
Currently, list-objects.c incorrectly treats all root trees of commits as USER_GIVEN. Also, it would be easier to mark objects that are non-user-given instead of user-given, since the places in the code where we access an object through a reference are more obvious than the places where we access an object that was given by the user. Resolve these two problems by introducing a flag NOT_USER_GIVEN that marks blobs and trees that are non-user-given, replacing USER_GIVEN. (Only blobs and trees are marked because this mark is only used when filtering objects, and filtering of other types of objects is not supported yet.) Signed-off-by: Matthew DeVore --- list-objects.c | 31 ++- revision.c | 1 - revision.h | 10 +++--- 3 files changed, 25 insertions(+), 17 deletions(-) diff --git a/list-objects.c b/list-objects.c index e88474a2d..3aee1b8a4 100644 --- a/list-objects.c +++ b/list-objects.c @@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx, pathlen = path->len; strbuf_addstr(path, name); - if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) + if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) r = ctx->filter_fn(LOFS_BLOB, obj, path->buf, &path->buf[pathlen], ctx->filter_data); @@ -120,17 +120,19 @@ static void process_tree_contents(struct traversal_context *ctx, continue; } - if (S_ISDIR(entry.mode)) - process_tree(ctx, -lookup_tree(the_repository, entry.oid), -base, entry.path); + if (S_ISDIR(entry.mode)) { + struct tree *t = lookup_tree(the_repository, entry.oid); + t->object.flags |= NOT_USER_GIVEN; + process_tree(ctx, t, base, entry.path); + } else if (S_ISGITLINK(entry.mode)) process_gitlink(ctx, entry.oid->hash, base, entry.path); - else - process_blob(ctx, -lookup_blob(the_repository, entry.oid), -base, entry.path); + else { + struct blob *b = lookup_blob(the_repository, entry.oid); + b->object.flags |= NOT_USER_GIVEN; + process_blob(ctx, b, base, entry.path); + } } } @@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx, } strbuf_addstr(base, name); - if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) + if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) r = ctx->filter_fn(LOFS_BEGIN_TREE, obj, base->buf, &base->buf[baselen], ctx->filter_data); @@ -191,7 +193,7 @@ static void process_tree(struct traversal_context *ctx, if (parsed) process_tree_contents(ctx, tree, base); - if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) { + if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) { r = ctx->filter_fn(LOFS_END_TREE, obj, base->buf, &base->buf[baselen], ctx->filter_data); @@ -307,8 +309,11 @@ static void do_traverse(struct traversal_context *ctx) * an uninteresting boundary commit may not have its tree * parsed yet, but we are not going to show them anyway */ - if (get_commit_tree(commit)) - add_pending_tree(ctx->revs, get_commit_tree(commit)); + if (get_commit_tree(commit)) { + struct tree *tree = get_commit_tree(commit); + tree->object.flags |= NOT_USER_GIVEN; + add_pending_tree(ctx->revs, tree); + } ctx->show_commit(commit, ctx->show_data); if (ctx->revs->tree_blobs_in_commit_order) diff --git a/revision.c b/revision.c index 062749437..6d355b43c 100644 --- a/revision.c +++ b/revision.c @@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info *revs, strbuf_release(&buf); return; /* do not add the commit itself */ } - obj->flags |= USER_GIVEN; add_object_array_with_path(obj, name, &revs->pending, mode, path); } diff --git a/revision.h b/revision.h index c94243543..55d47004d 100644 --- a/revision.h +++ b/revision.h @@ -8,7 +8,11 @@ #include "diff.h" #include "commit-slab-decl.h" -/* Remember to update object flag allocation in object.h */ +/* Remember to update object flag allocation in object.h + * NEEDSWORK: NOT_USER_GIVEN doesn't apply to com
[PATCH v4 4/6] rev-list: handle missing tree objects properly
Previously, we assumed only blob objects could be missing. This patch makes rev-list handle missing trees like missing blobs. A missing tree will cause an error if --missing indicates an error should be caused, and the hash is printed even if the tree is missing. In list-objects.c we no longer print a message to stderr if a tree object is missing (quiet_on_missing is always true). I couldn't find any place where this would matter, or where the caller of traverse_commit_list would need to be fixed to show the error. However, in the future it would be trivial to make the caller show the message if we needed to. This is not tested very thoroughly, since we cannot create promisor objects in tests without using an actual partial clone. t0410 has a promise_and_delete utility function, but the is_promisor_object function does not return 1 for objects deleted in this way. More tests will will come in a patch that implements a filter that can be used with git clone. Signed-off-by: Matthew DeVore --- builtin/rev-list.c | 11 +++-- list-objects.c | 17 +-- revision.h | 1 + t/t0410-partial-clone.sh | 66 ++ t/t5317-pack-objects-filter-objects.sh | 13 + 5 files changed, 101 insertions(+), 7 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 5b07f3f4a..49d6deed7 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -6,6 +6,7 @@ #include "list-objects.h" #include "list-objects-filter.h" #include "list-objects-filter-options.h" +#include "object.h" #include "object-store.h" #include "pack.h" #include "pack-bitmap.h" @@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj) */ switch (arg_missing_action) { case MA_ERROR: - die("missing blob object '%s'", oid_to_hex(&obj->oid)); + die("missing %s object '%s'", + type_name(obj->type), oid_to_hex(&obj->oid)); return; case MA_ALLOW_ANY: @@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj) case MA_ALLOW_PROMISOR: if (is_promisor_object(&obj->oid)) return; - die("unexpected missing blob object '%s'", - oid_to_hex(&obj->oid)); + die("unexpected missing %s object '%s'", + type_name(obj->type), oid_to_hex(&obj->oid)); return; default: @@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj) static int finish_object(struct object *obj, const char *name, void *cb_data) { struct rev_list_info *info = cb_data; - if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) { + if (!has_object_file(&obj->oid)) { finish_object__ma(obj); return 1; } @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) init_revisions(&revs, prefix); revs.abbrev = DEFAULT_ABBREV; revs.commit_format = CMIT_FMT_UNSPECIFIED; + revs.do_not_die_on_missing_tree = 1; /* * Scan the argument list before invoking setup_revisions(), so that we diff --git a/list-objects.c b/list-objects.c index f9b51db7a..e88474a2d 100644 --- a/list-objects.c +++ b/list-objects.c @@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx, struct rev_info *revs = ctx->revs; int baselen = base->len; enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW; + int parsed; if (!revs->tree_objects) return; @@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx, die("bad tree object"); if (obj->flags & (UNINTERESTING | SEEN)) return; - if (parse_tree_gently(tree, 1) < 0) { + + parsed = parse_tree_gently(tree, 1) >= 0; + if (!parsed) { if (revs->ignore_missing_links) return; @@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx, is_promisor_object(&obj->oid)) return; - die("bad tree object %s", oid_to_hex(&obj->oid)); + if (!revs->do_not_die_on_missing_tree) + die("bad tree object %s", oid_to_hex(&obj->oid)); } strbuf_addstr(base, name); @@ -178,7 +182,14 @@ static void process_tree(struct traversal_context *ctx, if (base->len) strbuf_addch(base, '/'); - process_tree_contents(ctx, tree, base); + /* +* NEEDSWORK: we should not have to process this tree's contents if the +* filter wants to exclude all its contents AND the filter doesn't need +* to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which +* allows skipping all children. +*/ +
Measuring Community Involvement (was Re: Contributor Summit planning)
On 8/13/2018 5:54 PM, Jeff King wrote: So I try not to think too hard on metrics, and just use them to get a rough view on who is active. I've been very interested in measuring community involvement, with the knowledge that any metric is flawed and we should not ever say "this metric is how we measure the quality of a contributor". It can be helpful, though, to track some metrics and their change over time. Here are a few measurements we can make: 1. Number of (non-merge) commit author tag-lines. using git repo: > git shortlog --no-merges --since 2017 -sne junio/next | head -n 20 284 Nguyễn Thái Ngọc Duy 257 Jeff King 206 Stefan Beller 192 brian m. carlson 159 Brandon Williams 149 Junio C Hamano 137 Elijah Newren 116 René Scharfe 112 Johannes Schindelin 105 Ævar Arnfjörð Bjarmason 96 Jonathan Tan 93 SZEDER Gábor 78 Derrick Stolee 76 Martin Ågren 66 Michael Haggerty 61 Eric Sunshine 46 Christian Couder 36 Phillip Wood 35 Jonathan Nieder 33 Thomas Gummerer 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, etc.). Using git repo: $ git log --since=2018-01-01 junio/next|grep by:|grep -v Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 66 Reviewed-by: Stefan Beller 22 Reviewed-by: Jeff King 19 Reviewed-by: Jonathan Tan 12 Helped-by: Eric Sunshine 11 Helped-by: Junio C Hamano 9 Helped-by: Jeff King 8 Reviewed-by: Elijah Newren 7 Reported-by: Ramsay Jones 7 Acked-by: Johannes Schindelin 7 Acked-by: Brandon Williams 6 Reviewed-by: Eric Sunshine 6 Helped-by: Johannes Schindelin 5 Mentored-by: Christian Couder 5 Acked-by: Johannes Schindelin 4 Reviewed-by: Jonathan Nieder 4 Reviewed-by: Johannes Schindelin 4 Helped-by: Stefan Beller 4 Helped-by: René Scharfe 3 Reviewed-by: Martin Ågren 3 Reviewed-by: Lars Schneider (There does not appear to be enough density here to make a useful metric.) 3. Number of email messages sent. Using mailing list repo: $ git shortlog --since 2017 -sne | head -n 20 3749 Junio C Hamano 2213 Stefan Beller 2112 Jeff King 1106 Nguyễn Thái Ngọc Duy 1028 Johannes Schindelin 965 Ævar Arnfjörð Bjarmason 956 Brandon Williams 947 Eric Sunshine 890 Elijah Newren 753 brian m. carlson 677 Duy Nguyen 646 Jonathan Nieder 629 Derrick Stolee 545 Christian Couder 515 Jonathan Tan 425 Johannes Schindelin 425 Martin Ågren 420 Jeff Hostetler 420 SZEDER Gábor 363 Phillip Wood 3. Number of threads started by user. (For this and the measurements below, I imported emails into a SQL table with columns [commit, author, date, message-id, in-reply-to, subject] and ran queries) SELECT TOP 20 COUNT(*) as NumSent ,[Author] FROM [git].[dbo].[mailing-list] WHERE [In-Reply-To] = '' AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 00:00') GROUP BY [Author] ORDER BY NumSent DESC | NumSent | Author | |-|| | 76 | Junio C Hamano | | 64 | Stefan Beller | | 54 | Philip Oakley | | 50 | Nguyá»…n Thái Ngá»c Duy | | 49 | Robert P. J. Day | | 47 | Christian Couder | | 36 | Ramsay Jones | | 34 | Elijah Newren | | 34 | SZEDER Gábor | | 33 | Johannes Schindelin | | 31 | Jeff King | | 30 | Ævar Arnfjörð Bjarmason | | 24 | Jonathan Tan | | 22 | Alban Gruin | | 22 | brian m. carlson | | 18 | Randall S. Becker | | 15 | Paul-Sebastian Ungureanu | | 15 | Jeff Hostetler | | 15 | Brandon Williams | | 15 | Luke Diamand | 4. Number of threads where the user participated (This is measured by completing the transitive closure of In-Reply-To edges into a new 'BaseMessage' column.) SELECT TOP 20 COUNT(BaseMessage) as NumResponded ,Author FROM [git].[dbo].[mailing-list] WHERE [In-Reply-To] <> '' AND CONVERT(DATETIME,[Date]) > CONVERT(DATETIME, '01-01-2018 00:00') GROUP BY Author ORDER BY NumResponded DESC | NumResponded | Author | |--|| | 2084 | Junio C Hamano | | 1596 | Stefan Beller | | 1211 | Jeff King | | 1120 | Johannes Schindelin | | 1021 | Nguyá»…n Thái Ngá»c Duy | | 799 | Eric Sunshine | | 797 | Ævar Arnfjörð Bjarmason | | 693
Bug with Git submodules and submodule.recurse setting
If I set git config --global submodule.recurse true and run git via: git pull --progress -v --no-rebase "origin" The command will fail with following output (Errorlevel is 1) Fetching submodule submodules/tstemplates From http://10.0.102.194:7990/scm/mcc/tstemplates = [up to date] feature/robin -> origin/feature/robin = [up to date] master -> origin/master Already up to date. usage: git submodule [--quiet] add [-b ] [-f|--force] [--name ] [--reference ] [--] [] or: git submodule [--quiet] status [--cached] [--recursive] [--] [...] or: git submodule [--quiet] init [--] [...] or: git submodule [--quiet] deinit [-f|--force] (--all| [--] ...) or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference ] [--recursive] [--] [...] or: git submodule [--quiet] summary [--cached|--files] [--summary-limit ] [commit] [--] [...] or: git submodule [--quiet] foreach [--recursive] or: git submodule [--quiet] sync [--recursive] [--] [...] or: git submodule [--quiet] absorbgitdirs [--] [...] seams that the “verbose” parameter “-v” is also sent to “git submodules” wich does not support it. If I remove “-v” it will work. Problem is, I use TortoiseGit, wich will automatically create this command!
Re: [PATCH 2/2] submodule: munge paths to submodule git directories
On 08/09, Jeff King wrote: > On Wed, Aug 08, 2018 at 03:33:23PM -0700, Brandon Williams wrote: > > > Commit 0383bbb901 (submodule-config: verify submodule names as paths, > > 2018-04-30) introduced some checks to ensure that submodule names don't > > include directory traversal components (e.g. "../"). > > > > This addresses the vulnerability identified in 0383bbb901 but the root > > cause is that we use submodule names to construct paths to the > > submodule's git directory. What we really should do is munge the > > submodule name before using it to construct a path. > > > > Teach "submodule_name_to_gitdir()" to munge a submodule's name (by url > > encoding it) before using it to build a path to the submodule's gitdir. > > I like this approach very much, and I think using url encoding is much > better than an opaque hash (purely because it makes debugging and > inspection saner). > > Two thoughts, though: > > > + modules_len = buf->len; > > strbuf_addstr(buf, submodule_name); > > + > > + /* > > +* If the submodule gitdir already exists using the old-fashioned > > +* location (which uses the submodule name as-is, without munging it) > > +* then return that. > > +*/ > > + if (!access(buf->buf, F_OK)) > > + return; > > I think this backwards-compatibility is necessary to avoid pain. But > until it goes away, I don't think this is helping the vulnerability from > 0383bbb901. Because there the issue was that the submodule name pointed > back into the working tree, so this access() would find the untrusted > working tree code and say "ah, an old-fashioned name!". > > In theory a fresh clone could set a config option for "I only speak > use new-style modules". And there could even be a conversion program > that moves the modules as appropriate, fixes up the .git files in the > working tree, and then sets that config. > > In fact, I think that config option _could_ be done by bumping > core.repositoryformatversion and then setting extensions.submodulenames > to "url" or something. Then you could never run into the confusing case > where you have a clone done by a new version of git (using new-style > names), but using an old-style version gets confused because it can't > find the module directories (instead, it would barf and say "I don't > know about that extension"). > > I don't know if any of that is worth it, though. We already fixed the > problem from 0383bbb901. There may be a _different_ "break out of the > modules directory" vulnerability, but since we disallow ".." it's hard > to see what it would be (the best I could come up with is maybe pointing > one module into the interior of another module, but I think you'd have > to trouble overwriting anything useful). > > And while an old-style version of Git being confused might be annoying, > I suspect that bumping the repository version would be even _more_ > annoying, because it would hit every command, not just ones that try to > touch those submodules. Oh I know that this doesn't help with that vulnerability. As you've said we fix it and now disallow ".." at the submodule-config level so really this path is simply about using what we get out of submodule-config in a more sane manor. > > > diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh > > index 2c2c97e144..963693332c 100755 > > --- a/t/t7400-submodule-basic.sh > > +++ b/t/t7400-submodule-basic.sh > > @@ -933,7 +933,7 @@ test_expect_success 'recursive relative submodules stay > > relative' ' > > cd clone2 && > > git submodule update --init --recursive && > > echo "gitdir: ../.git/modules/sub3" >./sub3/.git_expect && > > - echo "gitdir: ../../../.git/modules/sub3/modules/dirdir/subsub" > > >./sub3/dirdir/subsub/.git_expect > > + echo "gitdir: > > ../../../.git/modules/sub3/modules/dirdir%2fsubsub" > > >./sub3/dirdir/subsub/.git_expect > > One interesting thing about url-encoding is that it's not one-to-one. > This case could also be %2F, which is a different file (on a > case-sensitive filesystem). I think "%20" and "+" are similarly > interchangeable. > > If we were decoding the filenames, that's fine. The round-trip is > lossless. > > But that's not quite how the new code behaves. We encode the input and > then check to see if it matches an encoding we previously performed. So > if our urlencode routines ever change, this will subtly break. > > I don't know how much it's worth caring about. We're not that likely to > change the routines ourself (though certainly a third-party > implementation would need to know our exact url-encoding decisions). This is exactly the reason why I wanted to get some opinions on what the best thing to do here would be. I _think_ the best thing would probably be to write a specific routine to do the conversion, and it wouldn't even have to be all that complex. Basically I'm just interested in converting '/' characters so that things no l
Re: [PATCH v4 4/6] rev-list: handle missing tree objects properly
> Previously, we assumed only blob objects could be missing. This patch > makes rev-list handle missing trees like missing blobs. A missing tree > will cause an error if --missing indicates an error should be caused, > and the hash is printed even if the tree is missing. The last sentence is difficult to understand - probably better to say that all --missing= arguments and --exclude-promisor-objects work for missing trees like they currently do for blobs (and do not fixate on just --missing=error). And also demonstrate this in tests, like in t6612. > In list-objects.c we no longer print a message to stderr if a tree > object is missing (quiet_on_missing is always true). I couldn't find > any place where this would matter, or where the caller of > traverse_commit_list would need to be fixed to show the error. However, > in the future it would be trivial to make the caller show the message if > we needed to. > > This is not tested very thoroughly, since we cannot create promisor > objects in tests without using an actual partial clone. t0410 has a > promise_and_delete utility function, but the is_promisor_object function > does not return 1 for objects deleted in this way. More tests will will > come in a patch that implements a filter that can be used with git > clone. These two paragraphs are no longer applicable, I think. > @@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char > *prefix) > init_revisions(&revs, prefix); > revs.abbrev = DEFAULT_ABBREV; > revs.commit_format = CMIT_FMT_UNSPECIFIED; > + revs.do_not_die_on_missing_tree = 1; Is this correct? I would have expected this to be set only if --missing was set. > - process_tree_contents(ctx, tree, base); > + /* > + * NEEDSWORK: we should not have to process this tree's contents if the > + * filter wants to exclude all its contents AND the filter doesn't need > + * to collect the omitted OIDs. We should add a LOFR_SKIP_TREE bit which > + * allows skipping all children. > + */ > + if (parsed) > + process_tree_contents(ctx, tree, base); I agree with Jeff Hostetler in [1] that a LOFR_SKIP_TREE bit is desirable, but I don't think that this patch is the right place to introduce this NEEDSWORK. For me, this patch is about skipping iterating over the contents of a tree because the tree does not exist; this NEEDSWORK is about skipping iterating over the contents of a tree because we don't want its contents, and it is quite confusing to conflate the two. [1] https://public-inbox.org/git/d751d56b-84bb-a03d-5f2a-7dbaf8d94...@jeffhostetler.com/ > @@ -124,6 +124,7 @@ struct rev_info { > first_parent_only:1, > line_level_traverse:1, > tree_blobs_in_commit_order:1, > + do_not_die_on_missing_tree:1, I know that the other flags don't have documentation, but I think it's worth documenting this one because it is rather complicated. I have provided a sample one in my earlier review - feel free to use that or come up with your own. > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 4984ca583..74e3c5767 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -186,6 +186,72 @@ test_expect_success 'rev-list stops traversal at missing > and promised commit' ' > ! grep $FOO out > ' > > +test_expect_success 'show missing tree objects with --missing=print' ' > + rm -rf repo && > + test_create_repo repo && > + test_commit -C repo foo && > + test_commit -C repo bar && > + test_commit -C repo baz && > + > + TREE=$(git -C repo rev-parse bar^{tree}) && > + > + promise_and_delete $TREE && > + > + git -C repo config core.repositoryformatversion 1 && > + git -C repo config extensions.partialclone "arbitrary string" && > + git -C repo rev-list --quiet --missing=print --objects HEAD > >missing_objs 2>rev_list_err && > + echo "?$TREE" >expected && > + test_cmp expected missing_objs && > + > + # do not complain when a missing tree cannot be parsed > + ! grep -q "Could not read " rev_list_err > +' I think that the --exclude-promisor-tests can go in t0410 as you have done, but the --missing tests (except for --missing=allow-promisor) should go in t6112. (And like the existing --missing tests, they should be done without setting extensions.partialclone.) As for --missing=allow-promisor, I don't see them being tested anywhere :-( so feel free to make a suggestion. I would put them in t6112 for easy comparison with the other --missing tests.
[PATCH v4 6/6] list-objects-filter: implement filter tree:0
Teach list-objects the "tree:0" filter which allows for filtering out all tree and blob objects (unless other objects are explicitly specified by the user). The purpose of this patch is to allow smaller partial clones. The name of this filter - tree:0 - does not explicitly specify that it also filters out all blobs, but this should not cause much confusion because blobs are not at all useful without the trees that refer to them. I also consider only:commits as a name, but this is inaccurate because it suggests that annotated tags are omitted, but actually they are included. The name "tree:0" allows later filtering based on depth, i.e. "tree:1" would filter out all but the root tree and blobs. In order to avoid confusion between 0 and capital O, the documentation was worded in a somewhat round-about way that also hints at this future improvement to the feature. Signed-off-by: Matthew DeVore --- Documentation/rev-list-options.txt | 3 ++ list-objects-filter-options.c | 4 +++ list-objects-filter-options.h | 1 + list-objects-filter.c | 50 ++ t/t5317-pack-objects-filter-objects.sh | 27 ++ t/t5616-partial-clone.sh | 38 t/t6112-rev-list-filters-objects.sh| 13 +++ 7 files changed, 136 insertions(+) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7b273635d..9e351ec2a 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -743,6 +743,9 @@ specification contained in . A debug option to help with future "partial clone" development. This option specifies how missing objects are handled. + +The form '--filter=tree:' omits all blobs and trees deeper than + from the root tree. Currently, only =0 is supported. ++ The form '--missing=error' requests that rev-list stop with an error if a missing object is encountered. This is the default action. + diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index c0e2bd6a0..a28382940 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter( return 0; } + } else if (!strcmp(arg, "tree:0")) { + filter_options->choice = LOFC_TREE_NONE; + return 0; + } else if (skip_prefix(arg, "sparse:oid=", &v0)) { struct object_context oc; struct object_id sparse_oid; diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index a61f8..af64e5c66 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -10,6 +10,7 @@ enum list_objects_filter_choice { LOFC_DISABLED = 0, LOFC_BLOB_NONE, LOFC_BLOB_LIMIT, + LOFC_TREE_NONE, LOFC_SPARSE_OID, LOFC_SPARSE_PATH, LOFC__COUNT /* must be last */ diff --git a/list-objects-filter.c b/list-objects-filter.c index a0ba78b20..8e3caf5bf 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -80,6 +80,55 @@ static void *filter_blobs_none__init( return d; } +/* + * A filter for list-objects to omit ALL trees and blobs from the traversal. + * Can OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_trees_none_data { + struct oidset *omits; +}; + +static enum list_objects_filter_result filter_trees_none( + enum list_objects_filter_situation filter_situation, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_trees_none_data *filter_data = filter_data_; + + switch (filter_situation) { + default: + die("unknown filter_situation"); + return LOFR_ZERO; + + case LOFS_BEGIN_TREE: + case LOFS_BLOB: + if (filter_data->omits) + oidset_insert(filter_data->omits, &obj->oid); + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */ + + case LOFS_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + } +} + +static void* filter_trees_none__init( + struct oidset *omitted, + struct list_objects_filter_options *filter_options, + filter_object_fn *filter_fn, + filter_free_fn *filter_free_fn) +{ + struct filter_trees_none_data *d = xcalloc(1, sizeof(*d)); + d->omits = omitted; + + *filter_fn = filter_trees_none; + *filter_free_fn = free; + return d; +} + /* * A filter for list-objects to omit large blobs. * And to OPTIONALLY collect a list of the omitted OIDs. @@ -374,6 +423,7 @@ static filter_init_fn s_filters[] = { NULL, filter_blobs_none__init, filter_blobs_limit__init, + filter_trees_none__init, filter_sparse_oid__init, filter_sparse_path__i
[PATCH v4 3/6] list-objects: always parse trees gently
If parsing fails when revs->ignore_missing_links and revs->exclude_promisor_objects are both false, we print the OID anyway in the die("bad tree object...") call, so any message printed by parse_tree_gently() is superfluous. Signed-off-by: Matthew DeVore --- list-objects.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/list-objects.c b/list-objects.c index ccc529e5e..f9b51db7a 100644 --- a/list-objects.c +++ b/list-objects.c @@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx, struct rev_info *revs = ctx->revs; int baselen = base->len; enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW; - int gently = revs->ignore_missing_links || -revs->exclude_promisor_objects; if (!revs->tree_objects) return; @@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx, die("bad tree object"); if (obj->flags & (UNINTERESTING | SEEN)) return; - if (parse_tree_gently(tree, gently) < 0) { + if (parse_tree_gently(tree, 1) < 0) { if (revs->ignore_missing_links) return; -- 2.18.0.865.gffc8e1a3cd6-goog
[PATCH v4 0/6] filter: support for excluding all trees and blobs
I've applied or responded to all changes suggested by Jonathan and Jeff. Matthew DeVore (6): list-objects: store common func args in struct list-objects: refactor to process_tree_contents list-objects: always parse trees gently rev-list: handle missing tree objects properly revision: mark non-user-given objects instead list-objects-filter: implement filter tree:0 Documentation/rev-list-options.txt | 3 + builtin/rev-list.c | 11 +- list-objects-filter-options.c | 4 + list-objects-filter-options.h | 1 + list-objects-filter.c | 50 ++ list-objects.c | 238 + revision.c | 1 - revision.h | 11 +- t/t0410-partial-clone.sh | 66 +++ t/t5317-pack-objects-filter-objects.sh | 40 + t/t5616-partial-clone.sh | 38 t/t6112-rev-list-filters-objects.sh| 13 ++ 12 files changed, 358 insertions(+), 118 deletions(-) -- 2.18.0.865.gffc8e1a3cd6-goog
Re: [PATCH v3 5/5] list-objects-filter: implement filter tree:0
On Tue, Aug 14, 2018 at 8:13 AM Jeff Hostetler wrote: > > > > On 8/13/2018 2:14 PM, Matthew DeVore wrote: > > Teach list-objects the "tree:0" filter which allows for filtering > > out all tree and blob objects (unless other objects are explicitly > > specified by the user). The purpose of this patch is to allow smaller > > partial clones. > > > > The name of this filter - tree:0 - does not explicitly specify that > > it also filters out all blobs, but this should not cause much confusion > > because blobs are not at all useful without the trees that refer to > > them. > > > > I also consider only:commits as a name, but this is inaccurate because > > it suggests that annotated tags are omitted, but actually they are > > included. > > > > The name "tree:0" allows later filtering based on depth, i.e. "tree:1" > > would filter out all but the root tree and blobs. In order to avoid > > confusion between 0 and capital O, the documentation was worded in a > > somewhat round-about way that also hints at this future improvement to > > the feature. > > > > Signed-off-by: Matthew DeVore > > --- > > Documentation/rev-list-options.txt | 3 ++ > > list-objects-filter-options.c | 4 +++ > > list-objects-filter-options.h | 1 + > > list-objects-filter.c | 50 ++ > > t/t5317-pack-objects-filter-objects.sh | 27 ++ > > t/t5616-partial-clone.sh | 27 ++ > > t/t6112-rev-list-filters-objects.sh| 13 +++ > > 7 files changed, 125 insertions(+) > > > > diff --git a/Documentation/rev-list-options.txt > > b/Documentation/rev-list-options.txt > > index 7b273635d..9e351ec2a 100644 > > --- a/Documentation/rev-list-options.txt > > +++ b/Documentation/rev-list-options.txt > > @@ -743,6 +743,9 @@ specification contained in . > > A debug option to help with future "partial clone" development. > > This option specifies how missing objects are handled. > > + > > +The form '--filter=tree:' omits all blobs and trees deeper than > > + from the root tree. Currently, only =0 is supported. > > ++ > > The form '--missing=error' requests that rev-list stop with an error if > > a missing object is encountered. This is the default action. > > + > > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > > index c0e2bd6a0..a28382940 100644 > > --- a/list-objects-filter-options.c > > +++ b/list-objects-filter-options.c > > @@ -50,6 +50,10 @@ static int gently_parse_list_objects_filter( > > return 0; > > } > > > > + } else if (!strcmp(arg, "tree:0")) { > > + filter_options->choice = LOFC_TREE_NONE; > > + return 0; > > + > > } else if (skip_prefix(arg, "sparse:oid=", &v0)) { > > struct object_context oc; > > struct object_id sparse_oid; > > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h > > index a61f8..af64e5c66 100644 > > --- a/list-objects-filter-options.h > > +++ b/list-objects-filter-options.h > > @@ -10,6 +10,7 @@ enum list_objects_filter_choice { > > LOFC_DISABLED = 0, > > LOFC_BLOB_NONE, > > LOFC_BLOB_LIMIT, > > + LOFC_TREE_NONE, > > LOFC_SPARSE_OID, > > LOFC_SPARSE_PATH, > > LOFC__COUNT /* must be last */ > > diff --git a/list-objects-filter.c b/list-objects-filter.c > > index a0ba78b20..8e3caf5bf 100644 > > --- a/list-objects-filter.c > > +++ b/list-objects-filter.c > > @@ -80,6 +80,55 @@ static void *filter_blobs_none__init( > > return d; > > } > > > > +/* > > + * A filter for list-objects to omit ALL trees and blobs from the > > traversal. > > + * Can OPTIONALLY collect a list of the omitted OIDs. > > + */ > > +struct filter_trees_none_data { > > + struct oidset *omits; > > +}; > > + > > +static enum list_objects_filter_result filter_trees_none( > > + enum list_objects_filter_situation filter_situation, > > + struct object *obj, > > + const char *pathname, > > + const char *filename, > > + void *filter_data_) > > +{ > > + struct filter_trees_none_data *filter_data = filter_data_; > > + > > + switch (filter_situation) { > > + default: > > + die("unknown filter_situation"); > > + return LOFR_ZERO; > > + > > + case LOFS_BEGIN_TREE: > > + case LOFS_BLOB: > > + if (filter_data->omits) > > + oidset_insert(filter_data->omits, &obj->oid); > > + return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */ > > + > > + case LOFS_END_TREE: > > + assert(obj->type == OBJ_TREE); > > + return LOFR_ZERO; > > + > > + } > > +} > > There are a couple of options here: > [] If really want to omit all trees and blobs (and we DO NOT want > the oidset of everything omitted), then we might be able to > shortcut the traversal and speed things up. > > {} add a LOFR_SKIP_TREE bi
[PATCH 0/4] finishing touches on jk/for-each-object-iteration
On Mon, Aug 13, 2018 at 11:45:06AM -0700, Jonathan Tan wrote: > > [1/7]: for_each_*_object: store flag definitions in a single location > > [2/7]: for_each_*_object: take flag arguments as enum > > [3/7]: for_each_*_object: give more comprehensive docstrings > > [4/7]: for_each_packed_object: support iterating in pack-order > > [5/7]: t1006: test cat-file --batch-all-objects with duplicates > > [6/7]: cat-file: rename batch_{loose,packed}_object callbacks > > [7/7]: cat-file: support "unordered" output for --batch-all-objects > > Thanks for laying all the patches out so cleanly! All of them are: > Reviewed-by: Jonathan Tan > > Normally I would re-explain the patches to demonstrate that I understand > them, but in this case, I think they are simple enough - patches 1, 2, > 3, and 6 are refactorings that I agree with, patch 5 just makes a test > more comprehensive, and patches 4 and 7 do what their commit messages > say. > > Stefan brought up the concern that cache.h is increasing in size, but I > agree with the patch as written that it's probably best that we > centralize all the flags somewhere, and we can deal with the location in > a future patch. Thanks for the review. Here are a few patches on top to deal with the cache.h thing, as well as some optimizations that came out of discussing oidset in another thread (I left out for now the "big" optimization of moving oidset to a different data structure; that's complicated enough to be dealt with on its own, I think). The first patch here could arguably be squashed into the final patch of the original series, but I'm OK with it either way. [1/4]: cat-file: use oidset check-and-insert [2/4]: cat-file: split batch "buf" into two variables [3/4]: cat-file: use a single strbuf for all output [4/4]: for_each_*_object: move declarations to object-store.h builtin/cat-file.c | 43 +++- builtin/prune-packed.c | 1 + cache.h| 75 --- object-store.h | 90 ++ packfile.h | 20 -- 5 files changed, 116 insertions(+), 113 deletions(-) -Peff
[PATCH 1/4] cat-file: use oidset check-and-insert
We don't need to check if the oidset has our object before we insert it; that's done as part of the insertion. We can just rely on the return value from oidset_insert(), which saves one hash lookup per object. This measurable speedup is tiny and within the run-to-run noise, but the result is simpler to read, too. Signed-off-by: Jeff King --- builtin/cat-file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 45992c9be9..04b5cda191 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -443,9 +443,8 @@ static int batch_unordered_object(const struct object_id *oid, void *vdata) { struct object_cb_data *data = vdata; - if (oidset_contains(data->seen, oid)) + if (oidset_insert(data->seen, oid)) return 0; - oidset_insert(data->seen, oid); return batch_object_cb(oid, data); } -- 2.18.0.1066.g0d97f3a098
[PATCH v2] send-email: add an option to impose delay sent E-Mails
Add a --send-delay option with a corresponding sendemail.smtpSendDelay configuration variable. When set to e.g. 2, this causes send-email to sleep 2 seconds before sending the next E-Mail. We'll only sleep between sends, not before the first send, or after the last. This option has two uses. Firstly, to be able to Ctrl+C a long send with "all" if you have a change of heart. Secondly, as a hack in some mail setups to, with a sufficiently high delay, force the receiving client to sort the E-Mails correctly. Some popular E-Mail clients completely ignore the "Date" header, which format-patch is careful to set such that the patches will be displayed in order, and instead sort by the time the E-mail was received. Google's GMail is a good example of such a client. It ostensibly sorts by some approximation of received time (although not by any "Received" header). It's more usual than not to see patches showing out of order in GMail. To take a few examples of orders seen on patches on the Git mailing list: 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy) 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee) 3 -> 2 -> 1 (fast-import by Jameson Miller) 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King) The reason to add the new "X-Mailer-Send-Delay" header is to make it easy to tell what the imposed delay was, if any. This allows for gathering some data on how the transfer of E-Mails with & without this option behaves. This may not be workable without really long delays, see [1] and [2]. The reason for why the getopt format is "send-delay=s" instead of "send-delay=d" is because we're doing manual validation of the value we get passed, which getopt would corrupt in cases of e.g. float values before we could show a sensible error message. 1. https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/ 2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason --- I submitted this back in March hoping it would solve mail ordering problems, but the other motive I had for this is that I'm paranoid that I'm sending out bad E-Mails, and tend to "y" to each one because "a" is too fast. So I'm re-sending this with an updated commit message & rationale, and not sending 2/2 to toggle this on by default. I'd still like to have this feature. Documentation/config.txt | 6 Documentation/git-send-email.txt | 4 +++ git-send-email.perl | 18 --- t/t9001-send-email.sh| 55 4 files changed, 79 insertions(+), 4 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 63365dcf3d..5eb81b64a7 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3216,6 +3216,12 @@ sendemail.smtpReloginDelay:: Seconds wait before reconnecting to smtp server. See also the `--relogin-delay` option of linkgit:git-send-email[1]. +sendemail.smtpSendDelay:: + Seconds wait in between message sending before sending another + message. Set it to 0 to impose no extra delay, defaults to 0. ++ +See also the `--send-delay` option of linkgit:git-send-email[1]. + showbranch.default:: The default set of branches for linkgit:git-show-branch[1]. See linkgit:git-show-branch[1]. diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 465a4ecbed..98fdd9b803 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -270,6 +270,10 @@ must be used for each option. with --batch-size option. Defaults to the `sendemail.smtpReloginDelay` configuration variable. +--send-delay=:: + Wait $ between sending emails. Defaults to the + `sendemail.smtpSendDelay` configuration variable. + Automating ~~ diff --git a/git-send-email.perl b/git-send-email.perl index 2be5dac337..65b53ee9f1 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -89,6 +89,7 @@ sub usage { --batch-size * send max message per connection. --relogin-delay * delay seconds between two successive login. This option can only be used with --batch-size +--send-delay * ensure that seconds pass between two successive sends. Automating: --identity* Use the sendemail. options. @@ -225,7 +226,7 @@ sub do_edit { my ($to_cmd, $cc_cmd); my ($smtp_server, $smtp_server_port, @smtp_server_options); my ($smtp_authuser, $smtp_encryption, $smtp_ssl_cert_path); -my ($batch_size, $relogin_delay); +my ($batch_size, $relogin_delay, $send_delay); my ($identity, $aliasfiletype, @alias_files, $smtp_domain, $smtp_auth); my ($validate, $confirm); my (@suppress_cc); @@ -259,6 +260,7 @@ sub do_edit { "smtpauth" => \$smtp_auth, "smtpbatchsize" => \$batch_size, "smtprelogindelay" => \$relogin_delay, +"smtpsenddela
[PATCH 2/4] cat-file: split batch "buf" into two variables
We use the "buf" strbuf for two things: to read incoming lines, and as a scratch space for test-expanding the user-provided format. Let's split this into two variables with descriptive names, which makes their purpose and lifetime more clear. It will also help in a future patch when we start using the "output" buffer for more expansions. Signed-off-by: Jeff King --- René, in the patch you sent earlier, I noticed that for the non-batch-all-objects case we use the same strbuf for input and output. That'd probably be OK most of the time (the first thing we do is resolve the input to an oid), but I suspect it could be pretty bad with %(rest). We'd write over or even realloc the string it points into as part of the output. This patch just clarifies the names; your reuse idea is in the next one. builtin/cat-file.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 04b5cda191..3ed1d0be80 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -466,7 +466,8 @@ static int batch_unordered_packed(const struct object_id *oid, static int batch_objects(struct batch_options *opt) { - struct strbuf buf = STRBUF_INIT; + struct strbuf input = STRBUF_INIT; + struct strbuf output = STRBUF_INIT; struct expand_data data; int save_warning; int retval = 0; @@ -481,8 +482,9 @@ static int batch_objects(struct batch_options *opt) */ memset(&data, 0, sizeof(data)); data.mark_query = 1; - strbuf_expand(&buf, opt->format, expand_format, &data); + strbuf_expand(&output, opt->format, expand_format, &data); data.mark_query = 0; + strbuf_release(&output); if (opt->cmdmode) data.split_on_whitespace = 1; @@ -542,14 +544,14 @@ static int batch_objects(struct batch_options *opt) save_warning = warn_on_object_refname_ambiguity; warn_on_object_refname_ambiguity = 0; - while (strbuf_getline(&buf, stdin) != EOF) { + while (strbuf_getline(&input, stdin) != EOF) { if (data.split_on_whitespace) { /* * Split at first whitespace, tying off the beginning * of the string and saving the remainder (or NULL) in * data.rest. */ - char *p = strpbrk(buf.buf, " \t"); + char *p = strpbrk(input.buf, " \t"); if (p) { while (*p && strchr(" \t", *p)) *p++ = '\0'; @@ -557,10 +559,10 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - batch_one_object(buf.buf, opt, &data); + batch_one_object(input.buf, opt, &data); } - strbuf_release(&buf); + strbuf_release(&input); warn_on_object_refname_ambiguity = save_warning; return retval; } -- 2.18.0.1066.g0d97f3a098
Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
> @@ -743,6 +743,9 @@ specification contained in . > A debug option to help with future "partial clone" development. > This option specifies how missing objects are handled. > + > +The form '--filter=tree:' omits all blobs and trees deeper than > + from the root tree. Currently, only =0 is supported. > ++ > The form '--missing=error' requests that rev-list stop with an error if > a missing object is encountered. This is the default action. > + The "--filter" documentation should go with the other "--filter" information, not right after --missing. > +test_expect_success 'setup for tests of tree:0' ' > + mkdir r1/subtree && > + echo "This is a file in a subtree" > r1/subtree/file && > + git -C r1 add subtree/file && > + git -C r1 commit -m subtree > +' Style: no space after > > +test_expect_success 'grab tree directly when using tree:0' ' > + # We should get the tree specified directly but not its blobs or > subtrees. > + git -C r1 pack-objects --rev --stdout --filter=tree:0 >commitsonly.pack > <<-EOF && > + HEAD: > + EOF > + git -C r1 index-pack ../commitsonly.pack && > + git -C r1 verify-pack -v ../commitsonly.pack >objs && > + grep -E "tree|blob" objs >trees_and_blobs && > + test_line_count = 1 trees_and_blobs > +' Can we also verify that the SHA-1 in trees_and_blobs is what we expected? > +test_expect_success 'use fsck before and after manually fetching a missing > subtree' ' > + # push new commit so server has a subtree > + mkdir src/dir && > + echo "in dir" > src/dir/file.txt && No space after > > + git -C src add dir/file.txt && > + git -C src commit -m "file in dir" && > + git -C src push -u srv master && > + SUBTREE=$(git -C src rev-parse HEAD:dir) && > + > + rm -rf dst && > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst && > + git -C dst fsck && > + git -C dst cat-file -p $SUBTREE >tree_contents 2>err && > + git -C dst fsck > +' If you don't need to redirect to err, don't do so. Before the cat-file, also verify that the tree is missing, most likely through a "git rev-list" with "--missing=print". And I would grep on the tree_contents to ensure that the filename ("file.txt") is there, so that we know that we got the correct tree. > +test_expect_success 'can use tree:0 to filter partial clone' ' > + rm -rf dst && > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst && > + git -C dst rev-list master --missing=allow-any --objects > >fetched_objects && > + cat fetched_objects \ > + | awk -f print_1.awk \ > + | xargs -n1 git -C dst cat-file -t >fetched_types && > + sort fetched_types -u >unique_types.observed && > + echo commit > unique_types.expected && > + test_cmp unique_types.observed unique_types.expected > +' > + > +test_expect_success 'auto-fetching of trees with --missing=error' ' > + git -C dst rev-list master --missing=error --objects >fetched_objects && > + cat fetched_objects \ > + | awk -f print_1.awk \ > + | xargs -n1 git -C dst cat-file -t >fetched_types && > + sort fetched_types -u >unique_types.observed && > + printf "blob\ncommit\ntree\n" >unique_types.expected && > + test_cmp unique_types.observed unique_types.expected > +' These two tests seem redundant with the 'use fsck before and after manually fetching a missing subtree' test (after the latter is appropriately renamed). I think we only need to test this sequence once, which can be placed in one or spread over multiple tests: 1. partial clone with --filter=tree:0 2. fsck works 3. verify that trees are indeed missing 4. autofetch a tree 5. fsck still works
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On 8/13/2018 6:41 PM, Junio C Hamano wrote: Jeff King writes: I can buy the argument that it's nice to have some form of profiling that works everywhere, even if it's lowest-common-denominator. I just wonder if we could be investing effort into tooling around existing solutions that will end up more powerful and flexible in the long run. Another thing I noticed is that the codepaths we would find interesting to annotate with trace_performance_* stuff often overlaps with the "slog" thing. If the latter aims to eventually replace GIT_TRACE (and if not, I suspect there is not much point adding it in the first place), perhaps we can extend it to also cover the need of these trace_performance_* calls, so that we do not have to carry three different tracing mechanisms. I'm looking at adding code to my SLOG (better name suggestions welcome) patch series to eventually replace the existing git_trace facility. And I would like to have a set of nested messages like Duy has proposed be a part of that. In an independent effort I've found the nested messages being very helpful in certain contexts. They are not a replacement for the various platform tools, like PerfView and friends as discussed earlier on this thread, but then again I can ask a customer to turn a knob and run it again and send me the output and hopefully get a rough idea of the problem -- without having them install a bunch of perf tools. Jeff
[PATCH 3/4] cat-file: use a single strbuf for all output
When we're in batch mode, we end up in batch_object_write() for each object, which allocates its own strbuf for each call. Instead, we can provide a single "scratch" buffer that gets reused for each output. When running: git cat-file --batch-all-objects --batch-check='%(objectname)' on git.git, my best-of-five time drops from: real 0m0.171s user 0m0.159s sys 0m0.012s to: real 0m0.133s user 0m0.121s sys 0m0.012s Note that we could do this just by putting the "scratch" pointer into "struct expand_data", but I chose instead to add an extra parameter to the callstack. That's more verbose, but it makes it a bit more obvious what is going on, which in turn makes it easy to see where we need to be releasing the string in the caller (right after the loop which uses it in each case). Based-on-a-patch-by: René Scharfe Signed-off-by: Jeff King --- It also made it easy to see that without the prior patch, we'd have been using "buf" for two parameters. :) builtin/cat-file.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 3ed1d0be80..08dced2614 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } -static void batch_object_write(const char *obj_name, struct batch_options *opt, +static void batch_object_write(const char *obj_name, + struct strbuf *scratch, + struct batch_options *opt, struct expand_data *data) { - struct strbuf buf = STRBUF_INIT; - if (!data->skip_object_info && oid_object_info_extended(the_repository, &data->oid, &data->info, OBJECT_INFO_LOOKUP_REPLACE) < 0) { @@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt, return; } - strbuf_expand(&buf, opt->format, expand_format, data); - strbuf_addch(&buf, '\n'); - batch_write(opt, buf.buf, buf.len); - strbuf_release(&buf); + strbuf_reset(scratch); + strbuf_expand(scratch, opt->format, expand_format, data); + strbuf_addch(scratch, '\n'); + batch_write(opt, scratch->buf, scratch->len); if (opt->print_contents) { print_object_or_die(opt, data); @@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt, } } -static void batch_one_object(const char *obj_name, struct batch_options *opt, +static void batch_one_object(const char *obj_name, +struct strbuf *scratch, +struct batch_options *opt, struct expand_data *data) { struct object_context ctx; @@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt, return; } - batch_object_write(obj_name, opt, data); + batch_object_write(obj_name, scratch, opt, data); } struct object_cb_data { struct batch_options *opt; struct expand_data *expand; struct oidset *seen; + struct strbuf *scratch; }; static int batch_object_cb(const struct object_id *oid, void *vdata) { struct object_cb_data *data = vdata; oidcpy(&data->expand->oid, oid); - batch_object_write(NULL, data->opt, data->expand); + batch_object_write(NULL, data->scratch, data->opt, data->expand); return 0; } @@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt) cb.opt = opt; cb.expand = &data; + cb.scratch = &output; if (opt->unordered) { struct oidset seen = OIDSET_INIT; @@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt) oid_array_clear(&sa); } + strbuf_release(&output); return 0; } @@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - batch_one_object(input.buf, opt, &data); + batch_one_object(input.buf, &output, opt, &data); } strbuf_release(&input); + strbuf_release(&output); warn_on_object_refname_ambiguity = save_warning; return retval; } -- 2.18.0.1066.g0d97f3a098
[PATCH 4/4] for_each_*_object: move declarations to object-store.h
The for_each_loose_object() and for_each_packed_object() functions are meant to be part of a unified interface: they use the same set of for_each_object_flags, and it's not inconceivable that we might one day add a single for_each_object() wrapper around them. Let's put them together in a single file, so we can avoid awkwardness like saying "the flags for this function are over in cache.h". Moving the loose functions to packfile.h is silly. Moving the packed functions to cache.h works, but makes the "cache.h is a kitchen sink" problem worse. The best place is the recently-created object-store.h, since these are quite obviously related to object storage. The for_each_*_in_objdir() functions do not use the same flags, but they are logically part of the same interface as for_each_loose_object(), and share callback signatures. So we'll move those, as well, as they also make sense in object-store.h. Signed-off-by: Jeff King --- This patch also happens to be a nice showcase for --color-moved. builtin/prune-packed.c | 1 + cache.h| 75 --- object-store.h | 90 ++ packfile.h | 20 -- 4 files changed, 91 insertions(+), 95 deletions(-) diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c index 4ff525e50f..a9e7b552b9 100644 --- a/builtin/prune-packed.c +++ b/builtin/prune-packed.c @@ -3,6 +3,7 @@ #include "progress.h" #include "parse-options.h" #include "packfile.h" +#include "object-store.h" static const char * const prune_packed_usage[] = { N_("git prune-packed [-n | --dry-run] [-q | --quiet]"), diff --git a/cache.h b/cache.h index 6d14702df2..aee36afa54 100644 --- a/cache.h +++ b/cache.h @@ -1575,81 +1575,6 @@ extern int odb_mkstemp(struct strbuf *temp_filename, const char *pattern); */ extern int odb_pack_keep(const char *name); -/* - * Iterate over the files in the loose-object parts of the object - * directory "path", triggering the following callbacks: - * - * - loose_object is called for each loose object we find. - * - * - loose_cruft is called for any files that do not appear to be - *loose objects. Note that we only look in the loose object - *directories "objects/[0-9a-f]{2}/", so we will not report - *"objects/foobar" as cruft. - * - * - loose_subdir is called for each top-level hashed subdirectory - *of the object directory (e.g., "$OBJDIR/f0"). It is called - *after the objects in the directory are processed. - * - * Any callback that is NULL will be ignored. Callbacks returning non-zero - * will end the iteration. - * - * In the "buf" variant, "path" is a strbuf which will also be used as a - * scratch buffer, but restored to its original contents before - * the function returns. - */ -typedef int each_loose_object_fn(const struct object_id *oid, -const char *path, -void *data); -typedef int each_loose_cruft_fn(const char *basename, - const char *path, - void *data); -typedef int each_loose_subdir_fn(unsigned int nr, -const char *path, -void *data); -int for_each_file_in_obj_subdir(unsigned int subdir_nr, - struct strbuf *path, - each_loose_object_fn obj_cb, - each_loose_cruft_fn cruft_cb, - each_loose_subdir_fn subdir_cb, - void *data); -int for_each_loose_file_in_objdir(const char *path, - each_loose_object_fn obj_cb, - each_loose_cruft_fn cruft_cb, - each_loose_subdir_fn subdir_cb, - void *data); -int for_each_loose_file_in_objdir_buf(struct strbuf *path, - each_loose_object_fn obj_cb, - each_loose_cruft_fn cruft_cb, - each_loose_subdir_fn subdir_cb, - void *data); - -/* - * Flags for for_each_*_object(), including for_each_loose below and - * for_each_packed in packfile.h. - */ -enum for_each_object_flags { - /* Iterate only over local objects, not alternates. */ - FOR_EACH_OBJECT_LOCAL_ONLY = (1<<0), - - /* Only iterate over packs obtained from the promisor remote. */ - FOR_EACH_OBJECT_PROMISOR_ONLY = (1<<1), - - /* -* Visit objects within a pack in packfile order rather than .idx order -*/ - FOR_EACH_OBJECT_PACK_ORDER = (1<<2), -}; - -/* - * Iterate over all accessible loose objects without respect to - * reachability. By default, this includes both local and alternate objects. - * The order in which objects are visited is unspecified. - * - * Any flags specific to packs are
[PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet
In a56771a668d (builtin/pull: respect verbosity settings in submodules, 2018-01-25), we made sure to pass on both quiet and verbose flag from builtin/pull.c to the submodule shell script. However git-submodule doesn't understand a verbose flag, which results in a bug when invoking git pull --recurse-submodules -v [...] There are a few different approaches to fix this bug: 1) rewrite 'argv_push_verbosity' or its caller in builtin/pull.c to cap opt_verbosity at 0. Then 'argv_push_verbosity' would only add '-q' if any. 2) Have a flag in 'argv_push_verbosity' that specifies if we allow adding -q or -v (or both). 3) Add -v to git-submodule.sh and make it a no-op (1) seems like a maintenance burden: What if we add code after the submodule operations or move submodule operations higher up, then we have altered the opt_verbosity setting further down the line in builtin/pull.c. (2) seems like it could work reasonably well without more regressions (3) seems easiest to implement as well as actually is a feature with the last-one-wins rule of passing flags to Git commands. Signed-off-by: Stefan Beller --- On Tue, Aug 14, 2018 at 10:54 AM Jochen Kühner wrote: > > If I set > git config --global submodule.recurse true > and run git via: > git pull --progress -v --no-rebase "origin" > The command will fail with following output (Errorlevel is 1) > Fetching submodule submodules/tstemplates > From http://10.0.102.194:7990/scm/mcc/tstemplates > = [up to date] feature/robin -> origin/feature/robin > = [up to date] master-> origin/master > Already up to date. > usage: git submodule [--quiet] add [-b ] [-f|--force] [--name ] > [--reference ] [--] [] >or: git submodule [--quiet] status [--cached] [--recursive] [--] > [...] >or: git submodule [--quiet] init [--] [...] >or: git submodule [--quiet] deinit [-f|--force] (--all| [--] ...) >or: git submodule [--quiet] update [--init] [--remote] [-N|--no-fetch] > [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] > [--reference ] [--recursive] [--] [...] >or: git submodule [--quiet] summary [--cached|--files] [--summary-limit > ] [commit] [--] [...] >or: git submodule [--quiet] foreach [--recursive] >or: git submodule [--quiet] sync [--recursive] [--] [...] >or: git submodule [--quiet] absorbgitdirs [--] [...] > > seams that the “verbose” parameter “-v” is also sent to “git submodules” wich > does not support it. > > If I remove “-v” it will work. > > Problem is, I use TortoiseGit, wich will automatically create this command! git-submodule.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/git-submodule.sh b/git-submodule.sh index 8b5ad59bdee..f7fd80345cd 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -438,6 +438,9 @@ cmd_update() -q|--quiet) GIT_QUIET=1 ;; + -v) + GIT_QUIET=0 + ;; --progress) progress=1 ;; -- 2.18.0.265.g16de1b435c9.dirty
Re: [PATCH] git-submodule.sh: accept verbose flag in cmd_update to be non-quiet
Stefan Beller wrote: > git-submodule.sh | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/git-submodule.sh b/git-submodule.sh > index 8b5ad59bdee..f7fd80345cd 100755 > --- a/git-submodule.sh > +++ b/git-submodule.sh > @@ -438,6 +438,9 @@ cmd_update() > -q|--quiet) > GIT_QUIET=1 > ;; > + -v) > + GIT_QUIET=0 > + ;; Reviewed-by: Jonathan Nieder Thanks.
Re: [PATCH] mingw: enable atomic O_APPEND
Am 14.08.2018 um 00:37 schrieb Jeff King: And then you can do something like: for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do >out ;# clean up from last run echo "Trying $size..." timeout 5 ./write $size out if ! ./check $size I used your programs with necessary adjustments (as fork() is not available), and did similar tests with concurrent processes. With packet sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did not observe any overlapping or short writes. I'm now very confident that we are on the safe side for our purposes. -- Hannes
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler wrote: > I'm looking at adding code to my SLOG (better name suggestions welcome) > patch series to eventually replace the existing git_trace facility. Complement maybe. Replace, please no. I'd rather not stare at json messages. -- Duy
Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails
On Tue, Aug 14, 2018 at 11:15 AM Ævar Arnfjörð Bjarmason wrote: > > Add a --send-delay option with a corresponding sendemail.smtpSendDelay > configuration variable. When set to e.g. 2, this causes send-email to > sleep 2 seconds before sending the next E-Mail. We'll only sleep > between sends, not before the first send, or after the last. > > This option has two uses. Firstly, to be able to Ctrl+C a long send > with "all" if you have a change of heart. Secondly, as a hack in some > mail setups to, with a sufficiently high delay, force the receiving > client to sort the E-Mails correctly. > > Some popular E-Mail clients completely ignore the "Date" header, which > format-patch is careful to set such that the patches will be displayed > in order, and instead sort by the time the E-mail was received. > > Google's GMail is a good example of such a client. It ostensibly sorts > by some approximation of received time (although not by any "Received" > header). It's more usual than not to see patches showing out of order > in GMail. To take a few examples of orders seen on patches on the Git > mailing list: > > 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy) > 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee) > 3 -> 2 -> 1 (fast-import by Jameson Miller) > 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King) > > The reason to add the new "X-Mailer-Send-Delay" header is to make it > easy to tell what the imposed delay was, if any. This allows for > gathering some data on how the transfer of E-Mails with & without this > option behaves. This may not be workable without really long delays, > see [1] and [2]. > > The reason for why the getopt format is "send-delay=s" instead of > "send-delay=d" is because we're doing manual validation of the value > we get passed, which getopt would corrupt in cases of e.g. float > values before we could show a sensible error message. > > 1. > https://public-inbox.org/git/20180325210132.ge74...@genre.crustytoothpaste.net/ > 2. https://public-inbox.org/git/xmqqpo3rehe4@gitster-ct.c.googlers.com/ > > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > > I submitted this back in March hoping it would solve mail ordering > problems, but the other motive I had for this is that I'm paranoid > that I'm sending out bad E-Mails, and tend to "y" to each one because > "a" is too fast.' Heh. GMail seems to have added an Undo button in their UI, which would be the same feature as this one. (Hit Ctrl+C in time to "undo" the sending command) I have been bitten quite a few times by using "a" as I had old series still laying around, such that it would send a new series and parts of the old series (or when you changed subjects and resend another iteration of a series, you may end up with two "patch 1"s). So I learned to be careful before pressing "a" on sending. Maybe the underlying issue is that you really only want to send a series and not "all" as send-email asks for. So maybe that dialog could learn a [s]eries switch, which would check either filenames to count up, or if the base that is recorded (base-commit for first and prerequisite-patch-id for followups) is consistent. Send this email? ([y]es|[n]o|[e]dit|[q]uit|[a]ll|[s]eries): Another note: I personally never use no/quit, but Ctrl+C for both cases. This is also different than the feature of 5453b83bdf9 (send-email --batch-size to work around some SMTP server limit, 2017-05-21) which introduced sendemail.smtpReloginDelay, which would offer the same functionality when the batch-size is set to 1. (Although this would keep you connected to the server as well as add the X-Mailer-Send-Delay header, which is nothing from the official email RFC, but your own invention?) Having sorted mails in GMail would be nice! Thanks, Stefan
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen wrote: > > On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler wrote: > > I'm looking at adding code to my SLOG (better name suggestions welcome) > > patch series to eventually replace the existing git_trace facility. > > Complement maybe. Replace, please no. I'd rather not stare at json messages. >From the sidelines: We'd only need one logging infrastructure in place, as the formatting would be done as a later step? For local operations we'd certainly find better formatting than json, and we figured that we might end up desiring ProtocolBuffers[1] instead of JSon, so if it would be easy to change the output of the structured logging easily that would be great. But AFAICT these series are all about putting the sampling points into the code base, so formatting would be orthogonal to it? Stefan [1] https://developers.google.com/protocol-buffers/
Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails
Ævar Arnfjörð Bjarmason wrote: > Add a --send-delay option with a corresponding sendemail.smtpSendDelay > configuration variable. When set to e.g. 2, this causes send-email to > sleep 2 seconds before sending the next E-Mail. We'll only sleep > between sends, not before the first send, or after the last. > > This option has two uses. Firstly, to be able to Ctrl+C a long send > with "all" if you have a change of heart. Secondly, as a hack in some > mail setups to, with a sufficiently high delay, force the receiving > client to sort the E-Mails correctly. > > Some popular E-Mail clients completely ignore the "Date" header, which > format-patch is careful to set such that the patches will be displayed > in order, and instead sort by the time the E-mail was received. > > Google's GMail is a good example of such a client. It ostensibly sorts > by some approximation of received time (although not by any "Received" > header). It's more usual than not to see patches showing out of order > in GMail. To take a few examples of orders seen on patches on the Git > mailing list: > > 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy) > 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee) > 3 -> 2 -> 1 (fast-import by Jameson Miller) > 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King) > > The reason to add the new "X-Mailer-Send-Delay" header is to make it > easy to tell what the imposed delay was, if any. This allows for > gathering some data on how the transfer of E-Mails with & without this > option behaves. This may not be workable without really long delays, > see [1] and [2]. Aside from the new header, I think this is better implemented using the existing $relogin_delay and $batch_size=1. Disconnecting during the delay might be more sympathetic to existing mail servers (which aren't C10K-optimized). If the client sleeps, the server may disconnect the client anyways to save resources. > @@ -1741,6 +1747,10 @@ sub process_file { > $message, $xfer_encoding, $target_xfer_encoding); > push @xh, "Content-Transfer-Encoding: $xfer_encoding"; > unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version; > + if ($send_delay && $i > 0) { > + push @xh, "X-Mailer-Send-Delay: $send_delay"; > + sleep $send_delay; > + } We can add this header for relogin_delay + batch_size But maybe --send-delay can be a shortcut for --relogin-delay and --batch-size=1
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On Tue, Aug 14, 2018 at 8:44 PM Stefan Beller wrote: > > On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen wrote: > > > > On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler > > wrote: > > > I'm looking at adding code to my SLOG (better name suggestions welcome) > > > patch series to eventually replace the existing git_trace facility. > > > > Complement maybe. Replace, please no. I'd rather not stare at json messages. > > From the sidelines: We'd only need one logging infrastructure in place, as the > formatting would be done as a later step? For local operations we'd certainly > find better formatting than json, and we figured that we might end up desiring > ProtocolBuffers[1] instead of JSon, so if it would be easy to change > the output of > the structured logging easily that would be great. These trace messages are made for human consumption. Granted occasionally we need some processing but I find one liners mostly suffice. Now we turn these into something made for machines, turning people to second citizens. I've read these messages reformatted for human, it's usually too verbose even if it's reformatted. > But AFAICT these series are all about putting the sampling points into the > code base, so formatting would be orthogonal to it? It's not just sampling points. There's things like index id being shown in the message for example. I prefer to keep free style format to help me read. There's also things like indentation I do here to help me read. Granted you could do all that with scripts and stuff, but will we pass around in mail dumps of json messages to be decoded locally? > Stefan > > [1] https://developers.google.com/protocol-buffers/ -- Duy
Re: [PATCH 4/4] range-diff: indent special lines as context
Hi Stefan, On Mon, 13 Aug 2018, Stefan Beller wrote: > > > The later lines that indicate a change to the Makefile will be treated as > > > context both in the outer and inner diff, such that those lines stay > > > regular color. > > > > While I am a fan of having those lines colored correctly, I have to admit > > that I am not exactly enthusiastic about that extra indentation... > > > > Otherwise, this looks good to me. > > Can you explain what makes you less enthused about the indentation? > > Advantage: > * allows easy coloring (easy implementation) > Disadvantage: > * formats change, This is it. It breaks my visual flow. > but the range diff is still in its early design phase, so we're not > breaking things, yet? Indeed. We're not breaking things. If you feel strongly about it, we can have that indentation, I *can* get used to it. > (Do we ever plan on sending range-diff patches that can be applied to > rewrite history? I am very uncertain on such a feature request. It > sounds cool, though) I remember that I heard you discussing this internally. I am not too big a fan of this idea, I have to admit. The range diff seems more designed to explain how a patch series evolved, rather than providing machine-readable data that allows to recreate said evolution. For example, the committer information as well as the date are missing, which would preclude a faithful reconstruction. And that is not all: if you wanted to "apply" a range diff, you would need to know more about the base(s) of the two commit ranges. You would need to know that they are at least very similar to the base onto which you want to apply this. And quite seriously, this would be the wrong way to go in my mind. We have a very efficient data format to transport all of that information: the Git bundle. Let's not overload the range diff format with multiple, partially contradicting purposes. Think "separation of concerns". It's the same issue, really, as trying to send highly structured data such as bug reports or code contributions via a medium meant to send unstructured plain or formatted text back and forth between human beings. Ciao, Dscho
Re: [PATCH 2/2] submodule: munge paths to submodule git directories
On Tue, Aug 14, 2018 at 11:04:06AM -0700, Brandon Williams wrote: > > I think this backwards-compatibility is necessary to avoid pain. But > > until it goes away, I don't think this is helping the vulnerability from > > 0383bbb901. Because there the issue was that the submodule name pointed > > back into the working tree, so this access() would find the untrusted > > working tree code and say "ah, an old-fashioned name!". > [...] > > Oh I know that this doesn't help with that vulnerability. As you've > said we fix it and now disallow ".." at the submodule-config level so > really this path is simply about using what we get out of > submodule-config in a more sane manor. OK, I'm alright with that as long as we are all on the same page. I think I mistook "this addresses the vulnerability" from your commit message the wrong way. I took it as "this patch", but reading it again, you simply mean "the '..' handling we already did". I do think eventually dropping this back-compatibility could save us from another directory-escape problem, but it's hard to justify the real-world pain for a hypothetical benefit. Maybe in a few years we could get rid of it in a major version bump. > > One interesting thing about url-encoding is that it's not one-to-one. > > This case could also be %2F, which is a different file (on a > > case-sensitive filesystem). I think "%20" and "+" are similarly > > interchangeable. > > > > If we were decoding the filenames, that's fine. The round-trip is > > lossless. > > > > But that's not quite how the new code behaves. We encode the input and > > then check to see if it matches an encoding we previously performed. So > > if our urlencode routines ever change, this will subtly break. > > > > I don't know how much it's worth caring about. We're not that likely to > > change the routines ourself (though certainly a third-party > > implementation would need to know our exact url-encoding decisions). > > This is exactly the reason why I wanted to get some opinions on what the > best thing to do here would be. I _think_ the best thing would probably > be to write a specific routine to do the conversion, and it wouldn't > even have to be all that complex. Basically I'm just interested in > converting '/' characters so that things no longer behave like > nested directories. I think we benefit from catching names that would trigger filesystem case-folding, too. If I have submodules with names "foo" and "FOO", we would not want to confuse them (or at least we should confuse them equally on all platforms). I doubt you can do anything malicious, but it might simply be annoying. That implies to me using a custom function (even if its encoded form ends up being understandable as url-encoding). -Peff
Re: [PATCH 2/2] submodule: munge paths to submodule git directories
Hi, Brandon Williams wrote: > On 08/09, Jeff King wrote: >> One interesting thing about url-encoding is that it's not one-to-one. >> This case could also be %2F, which is a different file (on a >> case-sensitive filesystem). I think "%20" and "+" are similarly >> interchangeable. >> >> If we were decoding the filenames, that's fine. The round-trip is >> lossless. >> >> But that's not quite how the new code behaves. We encode the input and >> then check to see if it matches an encoding we previously performed. So >> if our urlencode routines ever change, this will subtly break. >> >> I don't know how much it's worth caring about. We're not that likely to >> change the routines ourself (though certainly a third-party >> implementation would need to know our exact url-encoding decisions). > > This is exactly the reason why I wanted to get some opinions on what the > best thing to do here would be. I _think_ the best thing would probably > be to write a specific routine to do the conversion, and it wouldn't > even have to be all that complex. Basically I'm just interested in > converting '/' characters so that things no longer behave like > nested directories. First of all, I think the behavior with this patch is already much better than the previous status quo. I'm using the patch now and am very happy with it. Second, what if we store the pathname in config? We already store the URL there: [submodule "plugins/hooks"] url = https://gerrit.googlesource.com/plugins/hooks So we could (as a followup patch) do something like [submodule "plugins/hooks"] url = https://gerrit.googlesource.com/plugins/hooks gitdirname = plugins%2fhooks and use that for lookups instead of regenerating the directory name. What do you think? Thanks, Jonathan
[PATCH] submodule: add more exhaustive up-path testing
The tests added in 63e95beb08 ("submodule: port resolve_relative_url from shell to C", 2016-04-15) didn't do a good job of testing various up-path invocations where the up-path would bring us beyond even the URL in question without emitting an error. These results look nonsensical, but it's worth exhaustively testing them before fixing any of this code, so we can see which of these cases were changed. Signed-off-by: Ævar Arnfjörð Bjarmason --- The reason I started to poke at this was because I wanted to add support for relative URLs like //group/project. This solves the problem of there being e.g. a myhost.com/group/project which includes a ../submodule submodule pointing to myhost.com/group/submodule , and me wanting to fork it to myhost.com/myuser/project. But of course with ../submodule the relative URL will point to myhost.com/myuser/submodule, which won't exist. So I was hoping to find some point in the relative URL code where we were actually aware of what the host boundary was, so I could just cut off everything after that if //group/project was provided, strip off one slash to make it /group/project, and call it a day. But as the tests below show the code isn't aware of that at all, and will happily trample over the host(name) to produce nonsensical results. So I think these tests are worthwihle in themselves, but would like some advice on how to proceed with that from someone more familiar with submodules. t/t0060-path-utils.sh | 13 + 1 file changed, 13 insertions(+) diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh index 21a8b53132..cd74c0a471 100755 --- a/t/t0060-path-utils.sh +++ b/t/t0060-path-utils.sh @@ -330,6 +330,9 @@ test_submodule_relative_url "(null)" "../foo" "../submodule" "../submodule" test_submodule_relative_url "(null)" "./foo/bar" "../submodule" "foo/submodule" test_submodule_relative_url "(null)" "./foo" "../submodule" "submodule" test_submodule_relative_url "(null)" "//somewhere else/repo" "../subrepo" "//somewhere else/subrepo" +test_submodule_relative_url "(null)" "//somewhere else/repo" "../../subrepo" "//subrepo" +test_submodule_relative_url "(null)" "//somewhere else/repo" "../../../subrepo" "/subrepo" +test_submodule_relative_url "(null)" "//somewhere else/repo" "../../../../subrepo" "subrepo" test_submodule_relative_url "(null)" "$(pwd)/subsuper_update_r" "../subsubsuper_update_r" "$(pwd)/subsubsuper_update_r" test_submodule_relative_url "(null)" "$(pwd)/super_update_r2" "../subsuper_update_r" "$(pwd)/subsuper_update_r" test_submodule_relative_url "(null)" "$(pwd)/." "../." "$(pwd)/." @@ -344,10 +347,20 @@ test_submodule_relative_url "(null)" "file:///tmp/repo" "../subrepo" "file:///tm test_submodule_relative_url "(null)" "foo/bar" "../submodule" "foo/submodule" test_submodule_relative_url "(null)" "foo" "../submodule" "submodule" test_submodule_relative_url "(null)" "helper:://hostname/repo" "../subrepo" "helper:://hostname/subrepo" +test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../subrepo" "helper:://subrepo" +test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../../subrepo" "helper::/subrepo" +test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../../../subrepo" "helper::subrepo" +test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../../../../subrepo" "helper:subrepo" +test_submodule_relative_url "(null)" "helper:://hostname/repo" "../../../../../../subrepo" ".:subrepo" test_submodule_relative_url "(null)" "ssh://hostname/repo" "../subrepo" "ssh://hostname/subrepo" +test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../subrepo" "ssh://subrepo" +test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../../subrepo" "ssh:/subrepo" +test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../../../subrepo" "ssh:subrepo" +test_submodule_relative_url "(null)" "ssh://hostname/repo" "../../../../../subrepo" ".:subrepo" test_submodule_relative_url "(null)" "ssh://hostname:22/repo" "../subrepo" "ssh://hostname:22/subrepo" test_submodule_relative_url "(null)" "user@host:path/to/repo" "../subrepo" "user@host:path/to/subrepo" test_submodule_relative_url "(null)" "user@host:repo" "../subrepo" "user@host:subrepo" +test_submodule_relative_url "(null)" "user@host:repo" "../../subrepo" ".:subrepo" test_expect_success 'match .gitmodules' ' test-tool path-utils is_dotgitmodules \ -- 2.18.0.865.gffc8e1a3cd6
Re: [PATCH 4/4] range-diff: indent special lines as context
On Tue, Aug 14, 2018 at 11:54 AM Johannes Schindelin wrote: > > Hi Stefan, > > On Mon, 13 Aug 2018, Stefan Beller wrote: > > > > > The later lines that indicate a change to the Makefile will be treated > > > > as > > > > context both in the outer and inner diff, such that those lines stay > > > > regular color. > > > > > > While I am a fan of having those lines colored correctly, I have to admit > > > that I am not exactly enthusiastic about that extra indentation... > > > > > > Otherwise, this looks good to me. > > > > Can you explain what makes you less enthused about the indentation? > > > > Advantage: > > * allows easy coloring (easy implementation) > > Disadvantage: > > * formats change, > > This is it. It breaks my visual flow. > > > but the range diff is still in its early design phase, so we're not > > breaking things, yet? > > Indeed. We're not breaking things. If you feel strongly about it, we can > have that indentation, I *can* get used to it. I only feel strongly about it now as that is the *easiest* way to make the colors look like I want them to look. And I really value colors in the range-diff. Earlier you said that color-less range-diff is nearly useless for you and I thought it was hyperbole, but by now I realize how much truth you spoke. So getting the colors fixed to not markup files (+++/ --- lines of the inner diff) is a high priority for me. So high that I would compromise on the indentation/flow of these corner case areas. > > (Do we ever plan on sending range-diff patches that can be applied to > > rewrite history? I am very uncertain on such a feature request. It > > sounds cool, though) > > I remember that I heard you discussing this internally. I am not too big a > fan of this idea, I have to admit. The range diff seems more designed to > explain how a patch series evolved, rather than providing machine-readable > data that allows to recreate said evolution. For example, the committer > information as well as the date are missing, which would preclude a > faithful reconstruction. > Ah! good point. Though we could just work around that and use the email date for the new author dates. ;-) > And that is not all: if you wanted to "apply" a range diff, you would need > to know more about the base(s) of the two commit ranges. You would need to > know that they are at least very similar to the base onto which you want > to apply this. You would say so in the cover letter "This is a resend of sb/range-diff-colors" and by the knowledge of that tip only and the range-diff you would know how the new series would look like, even if it was rebased. > And quite seriously, this would be the wrong way to go in my mind. We have > a very efficient data format to transport all of that information: the Git > bundle. The bundle format is very efficient for machine transport, but I thought the whole point of the mailing list was easy human readable parts, i.e. you can point out things in a diff, which you could also do in a range-diff to some extend. We would loose some of the "fresh eyes" as you'd only see the changed part of the series. :-/ So yeah even for the workflow this seems a net-negative. I just thought it would be cool. > Let's not overload the range diff format with multiple, partially > contradicting purposes. Think "separation of concerns". It's the same > issue, really, as trying to send highly structured data such as bug > reports or code contributions via a medium meant to send unstructured > plain or formatted text back and forth between human beings. ok. Thanks, Stefan
Re: [PATCH] submodule: add more exhaustive up-path testing
On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason wrote: > > The tests added in 63e95beb08 ("submodule: port resolve_relative_url > from shell to C", 2016-04-15) didn't do a good job of testing various > up-path invocations where the up-path would bring us beyond even the > URL in question without emitting an error. > > These results look nonsensical, but it's worth exhaustively testing > them before fixing any of this code, so we can see which of these > cases were changed. Yeah. Please look at the comment in builtin/submodule--helper.c in that commit, where I described my expectations. I should have put them into tests instead with the expectations spelled out there. Thanks for this patch! Stefan > > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > So I think these tests are worthwihle in themselves, The reason I put it in the comment instead of tests was the ease of spelling out both the status quo and expectations. > but would like > some advice on how to proceed with that from someone more familiar > with submodules. So ideally we'd also error out as soon as the host name is touched?
Re: [PATCH] mingw: enable atomic O_APPEND
On Tue, Aug 14, 2018 at 08:29:04PM +0200, Johannes Sixt wrote: > Am 14.08.2018 um 00:37 schrieb Jeff King: > > And then you can do something like: > > > >for size in 4097 8193 16385 32769 65537 131073 262145 524289 1048577; do > > >out ;# clean up from last run > > echo "Trying $size..." > > timeout 5 ./write $size out > > if ! ./check $size >echo "$size failed" > >break > > fi > >done > > > > On my Linux system, each of those seems to write several gigabytes > > without overlapping. I did manage to hit some failing cases, but they > > were never sheared writes, but rather cases where there was an > > incomplete write at the end-of-file. > > I used your programs with necessary adjustments (as fork() is not > available), and did similar tests with concurrent processes. With packet > sizes 1025, 4093, 7531 (just to include some odd number), and 8193 I did not > observe any overlapping or short writes. > > I'm now very confident that we are on the safe side for our purposes. Great, thanks for testing! Re-reading what I wrote about end-of-file above and thinking about the conversation with Ævar elsewhere in the thread, I suspect it _is_ easy to get overlapping writes if the processes are receiving signals (since clearly the TERM signal caused a partial write). My experiment doesn't simulate that at all. I suppose the parent process could send SIGUSR1 to the child in each loop, and the child would catch it but keep going. Hmm, that was easy enough to do (programs below for reference), but surprisingly it didn't fail for me (except for the normal end-of-file truncation). It's like the OS is willing to truncate the write of a dying program but not one for a signal that is getting handled. Which is great for us, since it's exactly what we want, but makes me even more suspicious that a non-Linux kernel might behave completely differently. I still think we're fine in practice, as I'd expect any kernel to be atomic under the page size. So this was mostly just for my own edification. -Peff -- >8 -- /* check.c, with separate short-read reporting */ #include #include #include int main(int argc, const char **argv) { int size = atoi(argv[1]); int block = 0; char *buf; buf = malloc(size); while (1) { int i; /* assume atomic reads */ int r = read(0, buf, size); if (!r) break; if (r < size) { fprintf(stderr, "short read\n"); return 1; } for (i = 1; i < size; i++) { if (buf[i] != buf[0]) { fprintf(stderr, "overlap in block %d\n", block); return 1; } } block++; } } -- >8 -- -- >8 -- /* write.c with signals; you can also confirm via strace that each write is atomic */ #include #include #include #include #include #include #include void handle_signal(int sig) { /* do nothing */ } static void doit(int size, const char *fn, char c, pid_t pid) { int fd; char *buf; fd = open(fn, O_WRONLY|O_APPEND|O_CREAT, 0666); if (fd < 0) { perror("open"); return; } buf = malloc(size); memset(buf, c, size); while (1) { if (pid) kill(pid, SIGUSR1); write(fd, buf, size); } } int main(int argc, const char **argv) { int size = atoi(argv[1]); pid_t pid; signal(SIGUSR1, handle_signal); pid = fork(); if (pid) doit(size, argv[2], '1', pid); else doit(size, argv[2], '2', pid); return 0; } -- >8 --
Re: [PATCH v4 0/5] Speed up unpack_trees()
On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote: v4 has a bunch of changes - 1/5 is a new one to show indented tracing. This way it's less misleading to read nested time measurements - 3/5 now has the switch/restore cache_bottom logic. Junio suggested a check instead in his final note, but I think this is safer (yeah I'm scared too) - the old 4/4 is dropped because - it assumes n-way logic - the visible time saving is not worth the tradeoff - Elijah gave me an idea to avoid add_index_entry() that I think does not have n-way logic assumptions and gives better saving. But it requires some more changes so I'm going to do it later - 5/5 is also new and should help reduce cache_tree_update() cost. I wrote somewhere I was not going to work on this part, but it turns out just a couple lines, might as well do it now. Interdiff I've now had a chance to run the git tests, as well as our own unit and functional tests with this patch series and all passed. I reviewed the tests in t0090-cache-tree.h and verified that there are tests that validate the cache tree is correct after doing a checkout and merge (both of which exercise the new cache tree optimization in patch 5). I've also run our perf test suite and the results are outstanding: Checkout saves 51% on average Merge saves 44% Pull saves 30% Rebase saves 26% For perspective, that means these commands are going from ~20 seconds to ~10 seconds. I don't feel that any of my comments to the individual patches deserve a re-roll. Given the ongoing discussion about the additional tracing - I'm happy to leave out the first 2 patches so that the rest can go in sooner rather than later. Looks good! diff --git a/cache-tree.c b/cache-tree.c index 0dbe10fc85..105f13806f 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it, int cache_tree_update(struct index_state *istate, int flags) { - uint64_t start = getnanotime(); struct cache_tree *it = istate->cache_tree; struct cache_entry **cache = istate->cache; int entries = istate->cache_nr; @@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int flags) if (i) return i; + trace_performance_enter(); i = update_one(it, cache, entries, "", 0, &skip, flags); + trace_performance_leave("cache_tree_update"); if (i < 0) return i; istate->cache_changed |= CACHE_TREE_CHANGED; - trace_performance_since(start, "repair cache-tree"); return 0; } diff --git a/cache.h b/cache.h index e6f7ee4b64..8b447652a7 100644 --- a/cache.h +++ b/cache.h @@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const char *name, int name #define ADD_CACHE_JUST_APPEND 8 /* Append only; tree.c::read_tree() */ #define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */ #define ADD_CACHE_KEEP_CACHE_TREE 32 /* Do not invalidate cache-tree */ -#define ADD_CACHE_SKIP_VERIFY_PATH 64 /* Do not verify path */ extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option); extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name); diff --git a/diff-lib.c b/diff-lib.c index a9f38eb5a3..1ffa22c882 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs, int run_diff_index(struct rev_info *revs, int cached) { struct object_array_entry *ent; - uint64_t start = getnanotime(); + trace_performance_enter(); ent = revs->pending.objects; if (diff_cache(revs, &ent->item->oid, ent->name, cached)) exit(128); @@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached) diffcore_fix_diff_index(&revs->diffopt); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); - trace_performance_since(start, "diff-index"); + trace_performance_leave("diff-index"); return 0; } diff --git a/dir.c b/dir.c index 21e6f2520a..c5e9fc8cea 100644 --- a/dir.c +++ b/dir.c @@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - uint64_t start = getnanotime(); if (has_symlink_leading_path(path, len)) return dir->nr; + trace_performance_enter(); untracked = validate_untracked_cache(dir, len, pathspec); if (!untracked) /* @@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, dir->nr = i; } - trace_performance_since(start, "read directory %.*s", len, path); + trace_performance_leave("read directory %.*s", len, path); if (dir->untracked) { s
Re: [PATCH 3/4] cat-file: use a single strbuf for all output
Am 14.08.2018 um 20:20 schrieb Jeff King: > When we're in batch mode, we end up in batch_object_write() > for each object, which allocates its own strbuf for each > call. Instead, we can provide a single "scratch" buffer that > gets reused for each output. When running: > > git cat-file --batch-all-objects --batch-check='%(objectname)' > > on git.git, my best-of-five time drops from: > > real0m0.171s > user0m0.159s > sys 0m0.012s > > to: > > real0m0.133s > user0m0.121s > sys 0m0.012s > > Note that we could do this just by putting the "scratch" > pointer into "struct expand_data", but I chose instead to > add an extra parameter to the callstack. That's more > verbose, but it makes it a bit more obvious what is going > on, which in turn makes it easy to see where we need to be > releasing the string in the caller (right after the loop > which uses it in each case). > > Based-on-a-patch-by: René Scharfe > Signed-off-by: Jeff King > --- > It also made it easy to see that without the prior patch, > we'd have been using "buf" for two parameters. :) Good catch. > > builtin/cat-file.c | 28 +--- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index 3ed1d0be80..08dced2614 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -338,11 +338,11 @@ static void print_object_or_die(struct batch_options > *opt, struct expand_data *d > } > } > > -static void batch_object_write(const char *obj_name, struct batch_options > *opt, > +static void batch_object_write(const char *obj_name, > +struct strbuf *scratch, > +struct batch_options *opt, > struct expand_data *data) > { > - struct strbuf buf = STRBUF_INIT; We could also avoid passing that buffer around by making it static. I shy away from adding static variables because the resulting code won't be thread-safe, but that fear might be irrational, especially with cat-file. > - > if (!data->skip_object_info && > oid_object_info_extended(the_repository, &data->oid, &data->info, >OBJECT_INFO_LOOKUP_REPLACE) < 0) { > @@ -352,10 +352,10 @@ static void batch_object_write(const char *obj_name, > struct batch_options *opt, > return; > } > > - strbuf_expand(&buf, opt->format, expand_format, data); > - strbuf_addch(&buf, '\n'); > - batch_write(opt, buf.buf, buf.len); > - strbuf_release(&buf); > + strbuf_reset(scratch); > + strbuf_expand(scratch, opt->format, expand_format, data); > + strbuf_addch(scratch, '\n'); > + batch_write(opt, scratch->buf, scratch->len); > > if (opt->print_contents) { > print_object_or_die(opt, data); > @@ -363,7 +363,9 @@ static void batch_object_write(const char *obj_name, > struct batch_options *opt, > } > } > > -static void batch_one_object(const char *obj_name, struct batch_options *opt, > +static void batch_one_object(const char *obj_name, > + struct strbuf *scratch, > + struct batch_options *opt, >struct expand_data *data) > { > struct object_context ctx; > @@ -405,20 +407,21 @@ static void batch_one_object(const char *obj_name, > struct batch_options *opt, > return; > } > > - batch_object_write(obj_name, opt, data); > + batch_object_write(obj_name, scratch, opt, data); > } > > struct object_cb_data { > struct batch_options *opt; > struct expand_data *expand; > struct oidset *seen; > + struct strbuf *scratch; > }; > > static int batch_object_cb(const struct object_id *oid, void *vdata) > { > struct object_cb_data *data = vdata; > oidcpy(&data->expand->oid, oid); > - batch_object_write(NULL, data->opt, data->expand); > + batch_object_write(NULL, data->scratch, data->opt, data->expand); > return 0; > } > > @@ -509,6 +512,7 @@ static int batch_objects(struct batch_options *opt) > > cb.opt = opt; > cb.expand = &data; > + cb.scratch = &output; > > if (opt->unordered) { > struct oidset seen = OIDSET_INIT; > @@ -531,6 +535,7 @@ static int batch_objects(struct batch_options *opt) > oid_array_clear(&sa); > } > > + strbuf_release(&output); > return 0; > } > > @@ -559,10 +564,11 @@ static int batch_objects(struct batch_options *opt) > data.rest = p; > } > > - batch_one_object(input.buf, opt, &data); > + batch_one_object(input.buf, &output, opt, &data); > } > > strbuf_release(&input); > + strbuf_release(&output); > warn_on_object_refname_ambiguity = save_warning; >
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote: > On 8/13/2018 5:54 PM, Jeff King wrote: > > So I try not to think too hard on metrics, and just use them to get a > > rough view on who is active. > > I've been very interested in measuring community involvement, with the > knowledge that any metric is flawed and we should not ever say "this metric > is how we measure the quality of a contributor". It can be helpful, though, > to track some metrics and their change over time. > > Here are a few measurements we can make: Thanks, it was nice to see a more comprehensive list in one spot. It would be neat to have a tool that presents all of these automatically, but I think the email ones are pretty tricky (most people don't have the whole list archive sitting around). > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, > etc.). > > Using git repo: > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 At one point I sent a patch series that would let shortlog group by trailers. Nobody seemed all that interested and I didn't end up using it for its original purpose, so I didn't polish it further. But I'd be happy to re-submit it if you think it would be useful. The shell hackery here isn't too bad, but doing it internally is a little faster, a little more robust (less parsing), and lets you show more details about the commits themselves (e.g., who reviews whom). > 3. Number of threads started by user. You have "started" and "participated in". I guess one more would be "closed", as in "solved a bug", but that is quite hard to tell without looking at the content. Taking just the last person in a thread as the closer means that an OP saying "thanks!" wrecks it. And somebody who rants long enough that everybody else loses interest gets marked as a closer. ;) > If you have other ideas for fun measurements, then please let me know. I think I mentioned "surviving lines" elsewhere, which I do like this (and almost certainly stole from Junio a long time ago): # Obviously you can tweak this as you like, but the mass-imported bits # in compat and xdiff tend to skew the counts. It's possibly worth # counting language lines separately. git ls-files '*.c' '*.h' :^compat :^contrib :^xdiff | while read fn; do # eye candy echo >&2 "Blaming $fn..." # You can use more/fewer -C to dig more or less for code moves. # Possibly "-w" would help, though I doubt it shifts things more # than a few percent anyway. git blame -C --line-porcelain $fn done | perl -lne '/^author (.*)/ and print $1' | sort | uniq -c | sort -rn | head The output right now is: 35156 Junio C Hamano 22207 Jeff King 17466 Nguyễn Thái Ngọc Duy 12005 Johannes Schindelin 10259 Michael Haggerty 9389 Linus Torvalds 8318 Brandon Williams 7776 Stefan Beller 5947 Christian Couder 4935 René Scharfe which seems reasonable. -Peff
Re: [PATCH 3/4] cat-file: use a single strbuf for all output
On Tue, Aug 14, 2018 at 09:30:57PM +0200, René Scharfe wrote: > > -static void batch_object_write(const char *obj_name, struct batch_options > > *opt, > > +static void batch_object_write(const char *obj_name, > > + struct strbuf *scratch, > > + struct batch_options *opt, > >struct expand_data *data) > > { > > - struct strbuf buf = STRBUF_INIT; > > We could also avoid passing that buffer around by making it static. I > shy away from adding static variables because the resulting code won't > be thread-safe, but that fear might be irrational, especially with > cat-file. True, I didn't even think of that after your original got me in the mindset of passing the buffer down. It's not too bad to do it this way, and I agree with you that we are better avoiding static variables if we can. Five years ago I might have said the opposite, but we've cleaned up a lot of confusing hidden-static bits in that time. Let's not go in the opposite direction. :) -Peff
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 12:36 PM Jeff King wrote: > Thanks, it was nice to see a more comprehensive list in one spot. > > It would be neat to have a tool that presents all of these > automatically, but I think the email ones are pretty tricky (most people > don't have the whole list archive sitting around). With the advent of public inbox, this is easy to obtain? > > > 2. Number of other commit tag-lines (Reviewed-By, Helped-By, Reported-By, > > etc.). > > > > Using git repo: > > > > $ git log --since=2018-01-01 junio/next|grep by:|grep -v > > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20 > > At one point I sent a patch series that would let shortlog group by > trailers. Nobody seemed all that interested and I didn't end up using it > for its original purpose, so I didn't polish it further. But I'd be > happy to re-submit it if you think it would be useful. I would think it is useful. Didn't Linus also ask for a related thing? https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/
Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails
Eric Wong writes: >> Some popular E-Mail clients completely ignore the "Date" header, which >> format-patch is careful to set such that the patches will be displayed >> in order, and instead sort by the time the E-mail was received. It is send-email that carefully shows monotonically increasing timestamps so that the sender's datestamp can be used for sorting by the recipient, not format-patch, which records author-date, primarily meant for local replaying, in the generated messages but discarded by send-email. > Disconnecting during the delay might be more sympathetic to > existing mail servers (which aren't C10K-optimized). If the > client sleeps, the server may disconnect the client anyways > to save resources. > > But maybe --send-delay can be a shortcut for > --relogin-delay and --batch-size=1 Both good points to point at a simpler and better solution, I guess.
Re: [PATCH] submodule: add more exhaustive up-path testing
Stefan Beller writes: > Thanks for this patch! > Stefan Thanks, I'd take it as your Acked-by: (please holler if it isn't before the patch hits 'next').
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On Tue, Aug 14, 2018 at 08:51:41PM +0200, Duy Nguyen wrote: > > But AFAICT these series are all about putting the sampling points into the > > code base, so formatting would be orthogonal to it? > > It's not just sampling points. There's things like index id being > shown in the message for example. I prefer to keep free style format > to help me read. There's also things like indentation I do here to > help me read. Granted you could do all that with scripts and stuff, > but will we pass around in mail dumps of json messages to be decoded > locally? I think you could have both forms using the same entry points sprinkled through the code. At GitHub we have a similar telemetry-ish thing, where we collect some data points and then the resulting JSON is stored for every operation (for a few weeks for read ops, and indefinitely attached to every ref write). And I've found that the storage and the trace-style "just show a human-readable message to stderr" interface complement each other in both directions: - you can output a human readable message that is sent immediately to the trace mechanism but _also_ becomes part of the telemetry. E.g., imagine that one item in the json blob is "this is the last message from GIT_TRACE_FOO". Now you can push tracing messages into whatever plan you're using to store SLOG. We do this less with TRACE, and much more with error() and die() messages. - when a structured telemetry item is updated, we can still output a human-readable trace message with just that item. E.g., with: trace_performance(n, "foo"); we could either store a json key (perf.foo=n) or output a nicely formatted string like we do now, depending on what the user has configured (or even both, of course). It helps if the sampling points give enough information to cover both cases (as in the trace_performance example), but you can generally shoe-horn unstructured data into the structured log, and pretty-print structured data. -Peff
Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
On Tue, Aug 14, 2018 at 11:18 AM Jonathan Tan wrote: > > > @@ -743,6 +743,9 @@ specification contained in . > > A debug option to help with future "partial clone" development. > > This option specifies how missing objects are handled. > > + > > +The form '--filter=tree:' omits all blobs and trees deeper than > > + from the root tree. Currently, only =0 is supported. > > ++ > > The form '--missing=error' requests that rev-list stop with an error if > > a missing object is encountered. This is the default action. > > + > > The "--filter" documentation should go with the other "--filter" > information, not right after --missing. Fixed. My problem was that I didn't know what the + meant - I guess it means that the paragraph before and after are in the same section? > > > +test_expect_success 'setup for tests of tree:0' ' > > + mkdir r1/subtree && > > + echo "This is a file in a subtree" > r1/subtree/file && > > + git -C r1 add subtree/file && > > + git -C r1 commit -m subtree > > +' > > Style: no space after > Fixed. > > > +test_expect_success 'grab tree directly when using tree:0' ' > > + # We should get the tree specified directly but not its blobs or > > subtrees. > > + git -C r1 pack-objects --rev --stdout --filter=tree:0 > > >commitsonly.pack <<-EOF && > > + HEAD: > > + EOF > > + git -C r1 index-pack ../commitsonly.pack && > > + git -C r1 verify-pack -v ../commitsonly.pack >objs && > > + grep -E "tree|blob" objs >trees_and_blobs && > > + test_line_count = 1 trees_and_blobs > > +' > > Can we also verify that the SHA-1 in trees_and_blobs is what we > expected? Done - Now I'm comparing to the output of `git rev-parse HEAD:` and I don't need the separate line count check either. > > > +test_expect_success 'use fsck before and after manually fetching a missing > > subtree' ' > > + # push new commit so server has a subtree > > + mkdir src/dir && > > + echo "in dir" > src/dir/file.txt && > > No space after > Fixed. > > > + git -C src add dir/file.txt && > > + git -C src commit -m "file in dir" && > > + git -C src push -u srv master && > > + SUBTREE=$(git -C src rev-parse HEAD:dir) && > > + > > + rm -rf dst && > > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst > > && > > + git -C dst fsck && > > + git -C dst cat-file -p $SUBTREE >tree_contents 2>err && > > + git -C dst fsck > > +' > > If you don't need to redirect to err, don't do so. > > Before the cat-file, also verify that the tree is missing, most likely > through a "git rev-list" with "--missing=print". That won't work though - the subtree's hash is not known because its parent tree is not there. I've merged the three tests in this file, and as a result am now using the check which makes sure the object types are only "commit" > > And I would grep on the tree_contents to ensure that the filename > ("file.txt") is there, so that we know that we got the correct tree. Done. > > > +test_expect_success 'can use tree:0 to filter partial clone' ' > > + rm -rf dst && > > + git clone --no-checkout --filter=tree:0 "file://$(pwd)/srv.bare" dst > > && > > + git -C dst rev-list master --missing=allow-any --objects > > >fetched_objects && > > + cat fetched_objects \ > > + | awk -f print_1.awk \ > > + | xargs -n1 git -C dst cat-file -t >fetched_types && > > + sort fetched_types -u >unique_types.observed && > > + echo commit > unique_types.expected && > > + test_cmp unique_types.observed unique_types.expected > > +' > > + > > +test_expect_success 'auto-fetching of trees with --missing=error' ' > > + git -C dst rev-list master --missing=error --objects >fetched_objects > > && > > + cat fetched_objects \ > > + | awk -f print_1.awk \ > > + | xargs -n1 git -C dst cat-file -t >fetched_types && > > + sort fetched_types -u >unique_types.observed && > > + printf "blob\ncommit\ntree\n" >unique_types.expected && > > + test_cmp unique_types.observed unique_types.expected > > +' > > These two tests seem redundant with the 'use fsck before and after > manually fetching a missing subtree' test (after the latter is > appropriately renamed). I think we only need to test this sequence once, > which can be placed in one or spread over multiple tests: > > 1. partial clone with --filter=tree:0 > 2. fsck works > 3. verify that trees are indeed missing > 4. autofetch a tree > 5. fsck still works Done - that's much nicer. Thanks! Here is an interdiff from v4 of the patch: diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 9e351ec2a..0b5f77ad3 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -731,6 +731,9 @@ the requested refs. + The form '--filter=sparse:path=' similarly uses a sparse-checkout specification contained in . ++ +The form '--filter=tree:' omits all bl
Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
On Tue, Aug 14, 2018 at 10:28:13AM -0700, Matthew DeVore wrote: > The name "tree:0" allows later filtering based on depth, i.e. "tree:1" > would filter out all but the root tree and blobs. In order to avoid > confusion between 0 and capital O, the documentation was worded in a > somewhat round-about way that also hints at this future improvement to > the feature. I'm OK with this as a name, since we're explicitly not supporting deeper depths. But I'd note that "depth" is actually a tricky characteristic, as it's not a property of the object itself, but rather who refers to it. So: - it's expensive to compute, because you have to actually walk all of the possible commits and trees that could refer to it. This prohibits a lot of other optimizations like reachability bitmaps (though with some complexity you could cache the depths, too). - you have to define it as something like "the minimum depth at which this object is found", since there may be multiple depths I think you can read that second definition between the lines of: > +The form '--filter=tree:' omits all blobs and trees deeper than > + from the root tree. Currently, only =0 is supported. But I wonder if we should be more precise. It doesn't matter now, but it may help set expectations if the feature does come later. -Peff
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
On Tue, Aug 14, 2018 at 12:47:59PM -0700, Stefan Beller wrote: > On Tue, Aug 14, 2018 at 12:36 PM Jeff King wrote: > > > Thanks, it was nice to see a more comprehensive list in one spot. > > > > It would be neat to have a tool that presents all of these > > automatically, but I think the email ones are pretty tricky (most people > > don't have the whole list archive sitting around). > > With the advent of public inbox, this is easy to obtain? For our project, yes. But I was thinking of a tool that could be used for other projects, too. > > At one point I sent a patch series that would let shortlog group by > > trailers. Nobody seemed all that interested and I didn't end up using it > > for its original purpose, so I didn't polish it further. But I'd be > > happy to re-submit it if you think it would be useful. > > I would think it is useful. Didn't Linus also ask for a related thing? > https://public-inbox.org/git/CA+55aFzWkE43rSm-TJNKkHq4F3eOiGR0-Bo9V1=a1s=vq0k...@mail.gmail.com/ He wanted grouping by committer, which we ended up adding as a separate feature. I think there's some discussion of the trailer thing in that thread. -Peff
Syncing HEAD
Hi, When cloning with --mirror, the clone gets its HEAD initialized with the value HEAD has in its origin remote. After that if HEAD changes in origin there is no simple way to sync HEAD at the same time as the refs are synced. It looks like the simplest way to sync HEAD is: 1) git remote show origin 2) parse "HEAD branch: XXX" from the output of the above command 3) git symbolic-ref HEAD refs/heads/XXX It looks like it would be quite easy to add an option to `fetch` to sync HEAD at the same time as regular refs are synced because every fetch from an origin that uses a recent Git contains something like: 19:55:39.304976 pkt-line.c:80 packet: git< HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/test-1 agent=git/2.18.0 which in this example shows that HEAD is a symref to refs/heads/test-1 in origin. Is there a reason why no such option already exists? Would it makes sense to add one? Is there any reason why it's not a good idea? Or am I missing something? I am asking because GitLab uses HEAD in the bare repos it manages to store the default branch against which the Merge Requests (same thing as Pull Requests on GitHub) are created. So when people want to keep 2 GitLab hosted repos in sync, GitLab needs to sync HEADs too, not just the refs. I think this could be useful to other setups than GitLab though. Thanks, Christian.
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
On 8/14/2018 2:44 PM, Stefan Beller wrote: On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen wrote: On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler wrote: I'm looking at adding code to my SLOG (better name suggestions welcome) patch series to eventually replace the existing git_trace facility. Complement maybe. Replace, please no. I'd rather not stare at json messages. From the sidelines: We'd only need one logging infrastructure in place, as the formatting would be done as a later step? For local operations we'd certainly find better formatting than json, and we figured that we might end up desiring ProtocolBuffers[1] instead of JSon, so if it would be easy to change the output of the structured logging easily that would be great. But AFAICT these series are all about putting the sampling points into the code base, so formatting would be orthogonal to it? Stefan [1] https://developers.google.com/protocol-buffers/ Last time I checked, protocol-buffers has a C++ binding but not a C binding. I've not had a chance to use pbuffers, so I have to ask what advantages would they have over JSON or some other similar self-describing format? And/or would it be possible for you to tail the json log file and convert it to whatever format you preferred? It seems like the important thing is to capture structured data (whatever the format) to disk first. Jeff
Re: [PATCH v3 3/7] t7411: be nicer to future tests and really clean things up
Antonio Ospite writes: > test_expect_success 'error message contains blob reference' ' > + # Remove the error introduced in the previous test. > + # It is not needed in the following tests. > + test_when_finished "git -C super reset --hard HEAD^" && > (cd super && > sha1=$(git rev-parse HEAD) && > test-tool submodule-config \ Antonio Ospite writes: > Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with > invalid lines in .gitmodules but then only the second commit is removed. > > This may affect future subsequent tests if they assume that the > .gitmodules file has no errors. > > Remove both the commits as soon as they are not needed anymore. > > The error introduced in test 5 is also required by test 6, so the two > commits from above are removed respectively in tests 6 and 8. > > Signed-off-by: Antonio Ospite > --- > t/t7411-submodule-config.sh | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh > index 0bde5850ac..c6b6cf6fae 100755 > --- a/t/t7411-submodule-config.sh > +++ b/t/t7411-submodule-config.sh > @@ -98,6 +98,9 @@ test_expect_success 'error in one submodule config lets > continue' ' > ' > > test_expect_success 'error message contains blob reference' ' > + # Remove the error introduced in the previous test. > + # It is not needed in the following tests. > + test_when_finished "git -C super reset --hard HEAD^" && Hmm, that is ugly. Depending on where in the subshell the previous test failed, you'd still be taking us to an unexpected place. Imagine if "git commit -m 'add error'" failed, for example, in the test before this one. I am wondering if the proper fix is to merge the previous one and this one into a single test. The combined test would - remember where the HEAD in super is and arrange to come back to it when test is done - break .gitmodules and commit it - run test-tool and check its output - also check its error output in a single test_expect_success. > @@ -123,6 +126,7 @@ test_expect_success 'using different treeishs works' ' > ' > > test_expect_success 'error in history in fetchrecursesubmodule lets > continue' ' > + test_when_finished "git -C super reset --hard HEAD^" && > (cd super && > git config -f .gitmodules \ > submodule.submodule.fetchrecursesubmodules blabla && > @@ -134,8 +138,7 @@ test_expect_success 'error in history in > fetchrecursesubmodule lets continue' ' > HEAD b \ > HEAD submodule \ > >actual && > - test_cmp expect_error actual && > - git reset --hard HEAD^ > + test_cmp expect_error actual > ) > ' If we want to be more robust, you'd probably need to find a better anchoring point than HEAD, which can be pointing different commit depending on where in the subshell the process is hit with ^C, i.e. ORIG=$(git -C super rev-parse HEAD) && test_when_finished "git -C super reset --hard $ORIG" && ( cd super && ... The patch is still an improvement compared to the current code, where a broken test-tool that does not produce expected output in the file 'actual' is guaranteed to leave us at a commit that we do not expect to be at, but not entirely satisfactory.
Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
> - grep -E "tree|blob" objs >trees_and_blobs && > - test_line_count = 1 trees_and_blobs > + grep -E "tree|blob" objs \ > + | awk -f print_1.awk >trees_and_blobs && > + git -C r1 rev-parse HEAD: >expected && > + test_cmp trees_and_blobs expected Indent "| awk" (and similar lines in this patch) - although I guess it is likely that you actually have it indented, and your e-mail client modified the whitespace so that it looks like there is no indent. Other than that, this interdiff looks good to me. Thanks.
Re: [PATCH v3 7/7] submodule: support reading .gitmodules even when it's not checked out
Antonio Ospite writes: > /* Equivalent to ACTION_SET in builtin/config.c */ > - if (argc == 3) > + if (argc == 3) { > + struct object_id oid; > + > + /* > + * If the .gitmodules file is not in the working tree but it > + * is in the current branch, stop, as writing new values (and > + * staging them) would blindly overwrite ALL the old content. Hmph, "the file is missing" certainly is a condition we would want to notice, but wouldn't we in general want to prevent us from overwriting any local modification, where "missing" is merely a very special case of local modification? I am wondering if we would want to stop if .gitmodules file exists both in the working tree and in the index, and the contents of them differ, or something like that. > diff --git a/git-submodule.sh b/git-submodule.sh > index ff258e2e8c..b1cb187227 100755 > --- a/git-submodule.sh > +++ b/git-submodule.sh > @@ -159,6 +159,13 @@ cmd_add() > shift > done > > + # For more details about this check, see > + # builtin/submodule--helper.c::module_config() > + if test ! -e .gitmodules && git cat-file -e HEAD:.gitmodules > > /dev/null 2>&1 No SP between redirection '>' and its target '/dev/null'. More importantly, I think it is better to add a submodule--helper subcommand that exposes the check in question, as the code is already written ;-) That approach will guarantee that the logic and the message stay the same between here and in the C code. Then you do not even need these two line comment. > + then > + die "$(eval_gettext "please make sure that the .gitmodules > file in the current branch is checked out")" > + fi > + > diff --git a/submodule-config.c b/submodule-config.c > index b7ef055c63..088dabb56f 100644 > --- a/submodule-config.c > +++ b/submodule-config.c > @@ -1,4 +1,5 @@ > #include "cache.h" > +#include "dir.h" > #include "repository.h" > #include "config.h" > #include "submodule-config.h" > @@ -603,8 +604,19 @@ static void submodule_cache_check_init(struct repository > *repo) > static void config_from_gitmodules(config_fn_t fn, struct repository *repo, > void *data) > { > if (repo->worktree) { > - char *file = repo_worktree_path(repo, GITMODULES_FILE); > - git_config_from_file(fn, file, data); > + struct git_config_source config_source = { 0 }; > + const struct config_options opts = { 0 }; > + struct object_id oid; > + char *file; > + > + file = repo_worktree_path(repo, GITMODULES_FILE); > + if (file_exists(file)) > + config_source.file = file; > + else if (get_oid(GITMODULES_HEAD, &oid) >= 0) > + config_source.blob = GITMODULES_HEAD; What is the reason why we fall back directly to HEAD when working tree file does not exist? I thought that our usual fallback was to the version in the index for other things like .gitignore/attribute and this codepath look like an oddball. Are you trying to handle the case where we are in a bare repository without any file checked out (and there is not even the index)? > diff --git a/t/t7416-submodule-sparse-gitmodules.sh > b/t/t7416-submodule-sparse-gitmodules.sh > new file mode 100755 > index 00..5341e9b012 > --- /dev/null > +++ b/t/t7416-submodule-sparse-gitmodules.sh > @@ -0,0 +1,90 @@ > +#!/bin/sh > +# > +# Copyright (C) 2018 Antonio Ospite > +# > + > +test_description='Test reading/writing .gitmodules when not in the working > tree > + > +This test verifies that, when .gitmodules is in the current branch but is not > +in the working tree reading from it still works but writing to it does not. > + > +The test setup uses a sparse checkout, however the same scenario can be set > up > +also by committing .gitmodules and then just removing it from the filesystem. > +' > + > +. ./test-lib.sh > + > +test_expect_success 'sparse checkout setup which hides .gitmodules' ' > + echo file > file && No SP between redirection '>' and its target 'file'. > + git add file && > + test_tick && > + git commit -m upstream && > + git clone . super && > + git clone super submodule && > + git clone super new_submodule && > + (cd super && > + git submodule add ../submodule && > + test_tick && > + git commit -m submodule && > + cat >.git/info/sparse-checkout <<\EOF && > +/* > +!/.gitmodules > +EOF You can use <<-\EOF and indent the body of the here-doc, which makes the result easier to read, i.e. cat >target <<-\EOF && line 1 line 2 EOF > + git config core.sparsecheckout true && > + git read-tree -m -u HEAD && That's old fashioned---I am curious if this has to be one-way merge or can just be a usual "git checkout" (I am merely curious;
Re: Measuring Community Involvement (was Re: Contributor Summit planning)
Jeff King writes: > On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote: > >> On 8/13/2018 5:54 PM, Jeff King wrote: >> > So I try not to think too hard on metrics, and just use them to get a >> > rough view on who is active. >> >> I've been very interested in measuring community involvement, with the >> knowledge that any metric is flawed and we should not ever say "this metric >> is how we measure the quality of a contributor". It can be helpful, though, >> to track some metrics and their change over time. >> >> Here are a few measurements we can make: > > Thanks, it was nice to see a more comprehensive list in one spot. > > It would be neat to have a tool that presents all of these > automatically, but I think the email ones are pretty tricky (most people > don't have the whole list archive sitting around). I do not think it covered e-mail at all, but there was git stats project several years ago (perhaps part of GSoC IIRC). > I think I mentioned "surviving lines" elsewhere, which I do like this > (and almost certainly stole from Junio a long time ago): Yeah, I recall that one as part of counting how many of 1244 lines Linus originally wrote still were in our codebase at around v1.6.0 timeframe (the answer was ~220 IIRC) ;-)
Re: [PATCH v4 2/5] unpack-trees: add performance tracing
Duy Nguyen writes: > These trace messages are made for human consumption. Granted > occasionally we need some processing but I find one liners mostly > suffice. Now we turn these into something made for machines, turning > people to second citizens. I've read these messages reformatted for > human, it's usually too verbose even if it's reformatted. I actually actively hate the aspect of the slog thing that exposes the fact that it wants to take and show JSON too much in its API, but if you look at these "jw_object_*()" thing as _only_ filling parameters to be emitted, there is no reason to think we cannot enhance/extend slog_emit_*() thing to take a format string (perhaps inside the jw structure) so that the formatter does not have to generate JSON at all. Envisioning that kind of future, json_writer is a misnomer that too narrowly defines what it is---it is merely a generic data container that the codepath being traced can use to communicate what needs to be logged to the outside world. slog_emit* can (and when enhanced, should) be capable of paying attention to an external input (e.g. environment variable) to switch the output format, and JSON could be just one of the choices. > It's not just sampling points. There's things like index id being > shown in the message for example. I prefer to keep free style format > to help me read. There's also things like indentation I do here to > help me read. Yup, I do not think that contradicts with the approach to have a single unified "data collection" API; you should also be able to specify how that collection of data is to be presented in the trace messages meant for humans, which would be discarded when emitting json but would be used when showing human-readble trace, no?
Re: [PATCH v4 6/6] list-objects-filter: implement filter tree:0
Jonathan Tan writes: >> - grep -E "tree|blob" objs >trees_and_blobs && >> - test_line_count = 1 trees_and_blobs >> + grep -E "tree|blob" objs \ >> + | awk -f print_1.awk >trees_and_blobs && >> + git -C r1 rev-parse HEAD: >expected && >> + test_cmp trees_and_blobs expected > > Indent "| awk" (and similar lines in this patch) - although I guess it > is likely that you actually have it indented, and your e-mail client > modified the whitespace so that it looks like there is no indent. No, wrap lines like this command1 arg1 arg2 | command2 arg1 arg2 && That way, you do not need backslash to continue line. Also think twice when you are seeing yourself piping output from "grep" to more powerful tools like "perl", "awk" or "sed". Often you can lose the upstream "grep".
Re: Syncing HEAD
On Tue, Aug 14, 2018 at 1:09 PM Christian Couder wrote: > > Hi, > > When cloning with --mirror, the clone gets its HEAD initialized with > the value HEAD has in its origin remote. After that if HEAD changes in > origin there is no simple way to sync HEAD at the same time as the > refs are synced. > > It looks like the simplest way to sync HEAD is: > > 1) git remote show origin > 2) parse "HEAD branch: XXX" from the output of the above command > 3) git symbolic-ref HEAD refs/heads/XXX > > It looks like it would be quite easy to add an option to `fetch` to > sync HEAD at the same time as regular refs are synced because every > fetch from an origin that uses a recent Git contains something like: > > 19:55:39.304976 pkt-line.c:80 packet: git< > HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow > deepen-since deepen-not deepen-relative no-progress include-tag > multi_ack_detailed no-done symref=HEAD:refs/heads/test-1 > agent=git/2.18.0 > > which in this example shows that HEAD is a symref to refs/heads/test-1 > in origin. > > Is there a reason why no such option already exists? Would it makes > sense to add one? Is there any reason why it's not a good idea? Or am > I missing something? I think it is a great idea to add that. IIRC there was some talk when designing protocol v2 on how fetching of symrefs could be added later on in the protocol, which is why I cc'd Brandon who did the work there. > I am asking because GitLab uses HEAD in the bare repos it manages to > store the default branch against which the Merge Requests (same thing > as Pull Requests on GitHub) are created. > > So when people want to keep 2 GitLab hosted repos in sync, GitLab > needs to sync HEADs too, not just the refs. > > I think this could be useful to other setups than GitLab though. As said, I can see how that is useful; I recently came across some HEAD bug related to submodules, and there we'd also have the application. git clone --recurse-submodules file://... might clone the submodules that are in detached HEAD, which is totally not a long term viable good HEAD, so a subsequent fetch might want to change the detached HEAD in submodules or re-affix it to branches. Unrelated/extended: I think it would be cool to mirror a repository even more, e.g. it would be cool to be able to fetch (if configured as allowed) the remote reflog, (not to be confused with you local reflog of the remote). I think that work would be enabled once reftables are available, which you have an eye on?
Re: Contributor Summit planning
Jeff King writes: > One problem there is that the prefixes are ambiguous (e.g., Jacob Keller > shares with me, and I think at least one other over the years). You > could look at the author of the tip commit, but that's not always right > (and in fact, counting just merged topics misses bug-fixes that get > applied directly on top of other people's topics). Yes, a fix by somebody else to a bug that was recently introduced is safest to apply to the original topic and merge down; that way makes it more difficult to merge the original topic to older maintenance track(s) without the fix by mistake. So a "topic" with commits from multiple people is not all that unusual. Thanks.
Re: [PATCH 0/7] Resend of sb/submodule-update-in-c
Stefan Beller writes: > Thanks Brandon for pointing out to use repo_git_path instead of > manually constructing the path. > > That is the only change in this resend. Rcpt. Hopefully this is now ready for 'next'?
Re: [PATCH v2] send-email: add an option to impose delay sent E-Mails
On Tue, Aug 14 2018, Eric Wong wrote: > Ævar Arnfjörð Bjarmason wrote: >> Add a --send-delay option with a corresponding sendemail.smtpSendDelay >> configuration variable. When set to e.g. 2, this causes send-email to >> sleep 2 seconds before sending the next E-Mail. We'll only sleep >> between sends, not before the first send, or after the last. >> >> This option has two uses. Firstly, to be able to Ctrl+C a long send >> with "all" if you have a change of heart. Secondly, as a hack in some >> mail setups to, with a sufficiently high delay, force the receiving >> client to sort the E-Mails correctly. >> >> Some popular E-Mail clients completely ignore the "Date" header, which >> format-patch is careful to set such that the patches will be displayed >> in order, and instead sort by the time the E-mail was received. >> >> Google's GMail is a good example of such a client. It ostensibly sorts >> by some approximation of received time (although not by any "Received" >> header). It's more usual than not to see patches showing out of order >> in GMail. To take a few examples of orders seen on patches on the Git >> mailing list: >> >> 1 -> 3 -> 4 -> 2 -> 8 -> 7 (completion by Nguyễn Thái Ngọc Duy) >> 2 -> 0 -> 1 -> 3 (pack search by Derrick Stolee) >> 3 -> 2 -> 1 (fast-import by Jameson Miller) >> 2 -> 3 -> 1 -> 5 -> 4 -> 6 (diff-highlight by Jeff King) >> >> The reason to add the new "X-Mailer-Send-Delay" header is to make it >> easy to tell what the imposed delay was, if any. This allows for >> gathering some data on how the transfer of E-Mails with & without this >> option behaves. This may not be workable without really long delays, >> see [1] and [2]. > > Aside from the new header, I think this is better implemented > using the existing $relogin_delay and $batch_size=1. > > Disconnecting during the delay might be more sympathetic to > existing mail servers (which aren't C10K-optimized). Yeah that's a good point, maybe we're being wasteful on remote resources here. > If the client sleeps, the server may disconnect the client anyways to > save resources. Seems like something we'd need to deal with anyway, do we? >> @@ -1741,6 +1747,10 @@ sub process_file { >> $message, $xfer_encoding, $target_xfer_encoding); >> push @xh, "Content-Transfer-Encoding: $xfer_encoding"; >> unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version; >> +if ($send_delay && $i > 0) { >> +push @xh, "X-Mailer-Send-Delay: $send_delay"; >> +sleep $send_delay; >> +} > > We can add this header for relogin_delay + batch_size > > But maybe --send-delay can be a shortcut for > --relogin-delay and --batch-size=1 I need to enter a password when sending a batch with my SMTP server now, once. With relogin I'd need to enter this N times unless I use whatever auth save facility there is in git-send-email (which I don't use now). I don't think it makes sense to conflate these two modes.
Re: [PATCH] submodule: add more exhaustive up-path testing
On Tue, Aug 14 2018, Stefan Beller wrote: > On Tue, Aug 14, 2018 at 11:59 AM Ævar Arnfjörð Bjarmason > wrote: >> >> The tests added in 63e95beb08 ("submodule: port resolve_relative_url >> from shell to C", 2016-04-15) didn't do a good job of testing various >> up-path invocations where the up-path would bring us beyond even the >> URL in question without emitting an error. >> >> These results look nonsensical, but it's worth exhaustively testing >> them before fixing any of this code, so we can see which of these >> cases were changed. > > Yeah. Please look at the comment in builtin/submodule--helper.c > in that commit, where I described my expectations. > > I should have put them into tests instead with the expectations > spelled out there. I'll check that out thanks. I saw that comment, but have been skimming most of this code... > Thanks for this patch! > Stefan > >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> --- > > >> So I think these tests are worthwihle in themselves, > > The reason I put it in the comment instead of tests was the > ease of spelling out both the status quo and expectations. > >> but would like >> some advice on how to proceed with that from someone more familiar >> with submodules. > > So ideally we'd also error out as soon as the host name is touched? Do we have some utility function that'll take whatever we have in remote..url and spew out the username / host / path? We must, since the clone protocol needs it, but I haven't found it.
Re: Syncing HEAD
On Tue, Aug 14, 2018 at 10:09:37PM +0200, Christian Couder wrote: > When cloning with --mirror, the clone gets its HEAD initialized with > the value HEAD has in its origin remote. After that if HEAD changes in > origin there is no simple way to sync HEAD at the same time as the > refs are synced. > > It looks like the simplest way to sync HEAD is: > > 1) git remote show origin > 2) parse "HEAD branch: XXX" from the output of the above command > 3) git symbolic-ref HEAD refs/heads/XXX How about: git remote set-head origin -a ? > It looks like it would be quite easy to add an option to `fetch` to > sync HEAD at the same time as regular refs are synced because every > fetch from an origin that uses a recent Git contains something like: I think the "remote set-head" option is not very discoverable, since people are used to working with "fetch", making it the natural place to look. Just like we ported "remote update" over to "fetch --all", I think it would be sensible to have "fetch --update-head" or similar. One tricky thing is that the name "refs/remotes//HEAD" is only special by convention, and that convention is known on the writing side only by git-clone and git-remote. So obviously: git fetch --update-head https://example.com/ is nonsense. We don't even have a ref. What should: git config remote.origin.fetch refs/heads/*:refs/remotes/foo/* git fetch --update-head origin do? Should it update based no the remote name, or based on the refspec? What happens if there are several refspecs? Etc. 99% of the time those questions won't come up. But we should design so that we do the obvious thing in those 99%, and something sane in the other 1%. -Peff
Re: [PATCH 2/2] submodule: munge paths to submodule git directories
On Tue, Aug 14, 2018 at 11:57 AM Jonathan Nieder wrote: > > Hi, > > Brandon Williams wrote: > > On 08/09, Jeff King wrote: > > >> One interesting thing about url-encoding is that it's not one-to-one. > >> This case could also be %2F, which is a different file (on a > >> case-sensitive filesystem). I think "%20" and "+" are similarly > >> interchangeable. > >> > >> If we were decoding the filenames, that's fine. The round-trip is > >> lossless. > >> > >> But that's not quite how the new code behaves. We encode the input and > >> then check to see if it matches an encoding we previously performed. So > >> if our urlencode routines ever change, this will subtly break. > >> > >> I don't know how much it's worth caring about. We're not that likely to > >> change the routines ourself (though certainly a third-party > >> implementation would need to know our exact url-encoding decisions). > > > > This is exactly the reason why I wanted to get some opinions on what the > > best thing to do here would be. I _think_ the best thing would probably > > be to write a specific routine to do the conversion, and it wouldn't > > even have to be all that complex. Basically I'm just interested in > > converting '/' characters so that things no longer behave like > > nested directories. > > First of all, I think the behavior with this patch is already much > better than the previous status quo. I'm using the patch now and am > very happy with it. > > Second, what if we store the pathname in config? We already store the > URL there: > > [submodule "plugins/hooks"] > url = https://gerrit.googlesource.com/plugins/hooks > > So we could (as a followup patch) do something like > > [submodule "plugins/hooks"] > url = https://gerrit.googlesource.com/plugins/hooks > gitdirname = plugins%2fhooks > > and use that for lookups instead of regenerating the directory name. > What do you think? As I just looked at worktree code, this sounds intriguing for the wrong reason (again), as a user may want to point the gitdirname to a repository that they have already on disk outside the actual superproject. They would be reinventing worktrees in the submodule space. ;-) This would open up the security hole that we just had, again. So we'd have to make sure that the gitdirname (instead of the now meaningless subsection name) is proof to ../ attacks. I feel uneasy about this as then the user might come in and move submodules and repoint the gitdirname... to a not url encoded path. Exposing this knob just asks for trouble, no? On the other hand, the only requirement for the "name" is now uniqueness, and that is implied with subsections, so I guess it looks elegant. What would happen if gitdirname is changed as part of history? (The same problem we have now with changing the subsection name) The more I think about it the less appealing this is, but it looks elegant. Stefan
Re: Syncing HEAD
On 08/14, Stefan Beller wrote: > On Tue, Aug 14, 2018 at 1:09 PM Christian Couder > wrote: > > > > Hi, > > > > When cloning with --mirror, the clone gets its HEAD initialized with > > the value HEAD has in its origin remote. After that if HEAD changes in > > origin there is no simple way to sync HEAD at the same time as the > > refs are synced. > > > > It looks like the simplest way to sync HEAD is: > > > > 1) git remote show origin > > 2) parse "HEAD branch: XXX" from the output of the above command > > 3) git symbolic-ref HEAD refs/heads/XXX > > > > It looks like it would be quite easy to add an option to `fetch` to > > sync HEAD at the same time as regular refs are synced because every > > fetch from an origin that uses a recent Git contains something like: > > > > 19:55:39.304976 pkt-line.c:80 packet: git< > > HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow > > deepen-since deepen-not deepen-relative no-progress include-tag > > multi_ack_detailed no-done symref=HEAD:refs/heads/test-1 > > agent=git/2.18.0 > > > > which in this example shows that HEAD is a symref to refs/heads/test-1 > > in origin. > > > > Is there a reason why no such option already exists? Would it makes > > sense to add one? Is there any reason why it's not a good idea? Or am > > I missing something? > > I think it is a great idea to add that. IIRC there was some talk when > designing protocol v2 on how fetching of symrefs could be added later > on in the protocol, which is why I cc'd Brandon who did the work there. Actually the functionality for fetching symrefs already exists (when using protocol v2 of course). Despite this functionality existing its not being used right now. When performing a v2 fetch the first thing that a client does is request the list of refs (by doing an ls-refs request). The output from ls-refs (if asked) will included information about each ref including if they are a symref and what ref they resolve to. So really we just need to plumb that information through fetch to actually update HEAD, or even update other symrefs which exist on the server. > > > I am asking because GitLab uses HEAD in the bare repos it manages to > > store the default branch against which the Merge Requests (same thing > > as Pull Requests on GitHub) are created. > > > > So when people want to keep 2 GitLab hosted repos in sync, GitLab > > needs to sync HEADs too, not just the refs. > > > > I think this could be useful to other setups than GitLab though. > > As said, I can see how that is useful; I recently came across some > HEAD bug related to submodules, and there we'd also have the application. > > git clone --recurse-submodules file://... > > might clone the submodules that are in detached HEAD, which is totally > not a long term viable good HEAD, so a subsequent fetch might want > to change the detached HEAD in submodules or re-affix it to branches. > > Unrelated/extended: I think it would be cool to mirror a repository even > more, e.g. it would be cool to be able to fetch (if configured as allowed) > the remote reflog, (not to be confused with you local reflog of the remote). > I think that work would be enabled once reftables are available, which you > have an eye on? -- Brandon Williams