Re: [GSoC] Info: new blog series of my work on Git GSoC '18
Hi Pratik, On Wed, 2 May 2018, Pratik Karki wrote: > As promised in my proposal, I've started > to write a blog series of GSoC '18 with Git. The initial blog is up. > You can find it here[1]. The initial one is just to get started and > from next iterations, I'll start detailing of my work towards converting > rebase to builtin. > > [1]: https://prertik.github.io/categories/git/ This is awesome! Thanks for doing this, Dscho
Re: git merge banch w/ different submodule revision
Am Montag, den 30.04.2018, 19:02 +0200 schrieb Heiko Voigt: > On Thu, Apr 26, 2018 at 03:19:36PM -0700, Stefan Beller wrote: > > Stefan wrote: > > > See > > > https://github.com/git/git/commit/68d03e4a6e448aa557f52adef92595ac4d6cd4bd > > > (68d03e4a6e (Implement automatic fast-forward merge for submodules, > > > 2010-07-07) > > > to explain the situation you encounter. (specifically merge_submodule > > > at the end of the diff) > > > > +cc Heiko, author of that commit. > > In that commit we tried to be very careful about. I do not understand > the situation in which the current strategy would be wrong by default. > > We only merge if the following applies: > > * The changes in the superproject on both sides point forward in the >submodule. > > * One side is contained in the other. Contained from the submodule >perspective. Sides from the superproject merge perspective. > > So in case of the mentioned rewind of a submodule: Only one side of the > 3-way merge would point forward and the merge would fail. > > I can imagine, that in case of a temporary revert of a commit in the > submodule that you would not want that merged into some other branch. > But that would be the same without submodules. If you merge a temporary > revert from another branch you will not get any conflict. > > So maybe someone can explain the use case in which one would get the > results that seem wrong? In an ideal world, where there are no regressions between revisions, a fast-forward is appropriate. However, we might have regressions within submodules. So the usecase is the following: Environment: - We have a base library L that is developed by some team (Team B). - Another team (Team A) developes a product P based on those libraries using git-flow. Case: The problem occurs, when a developer (D) of Team A tries to have a feature that he developed on a branch accepted by a core developer of P: If a core developer of P advanced the reference of L within P (linear history), he might deem the work D insufficient. Not because of the actual work by D, but regressions that snuck into L. The core developer will not be informed about the missmatching revisions of L. So it would be nice if there was some kind of switch or at least some trigger. Cheers, Leif > > Cheers Heiko >
Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()
Hi, On Wed, 2 May 2018, Junio C Hamano wrote: > Duy Nguyen writes: > > > On Mon, Apr 30, 2018 at 12:17 AM, Johannes Schindelin > > wrote: > >> t1406 specifically verifies that certain code paths fail with a BUG: ... > >> message. > >> > >> In the upcoming commit, we will convert that message to be generated via > >> BUG() instead of die("BUG: ..."), which implies SIGABRT instead of a > >> regular exit code. > > > > On the other hand, SIGABRT on linux creates core dumps. And on some > > setup (like mine) core dumps may be redirected to some central place > > via /proc/sys/kernel/core_pattern. I think systemd does it too but I > > didn't check. > > > > This moving to SIGABRT when we know it _will_ happen when running the > > test suite will accumulate core dumps over time and not cleaned up by > > the test suite. Maybe keeping die("BUG: here is a good compromise. > > I do not think it is. At regular runtime, we _do_ want Git to dump > core if it triggers BUG() condition, whose point is to mark > conditions that should never happen. Indeed. > As discussed in this thread, tests that use t/helper/ executables > that try to trickle BUG() codepath to ensure that these "should > never happen" conditions are caught do need to deal with it. If > dumping core is undesirable, tweaking BUG() implementation so that > it becomes die("BUG: ...") *ONLY* when the caller knows what it is > doing (e.g. running t/helper/ commands) is probably a good idea. > Perhaps GIT_TEST_OPTS can gain one feature "--bug-no-abort" and set > an environment variable so that implementation of BUG() can notice, > or something. I think we can do even better than that. t/helper/*.c could set a global variable that no other code is supposed to set, to trigger an alternative to SIGABRT. Something like -- snip -- diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 87066ced62a..5176f9f20ae 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -47,7 +47,9 @@ static struct test_cmd cmds[] = { int cmd_main(int argc, const char **argv) { int i; + extern int BUG_exit_code; + BUG_exit_code = 99; if (argc < 2) die("I need a test name!"); diff --git a/usage.c b/usage.c index cdd534c9dfc..9c84dccfa97 100644 --- a/usage.c +++ b/usage.c @@ -210,6 +210,9 @@ void warning(const char *warn, ...) va_end(params); } +/* Only set this, ever, from t/helper/, when verifying that bugs are caught. */ +int BUG_exit_code; + static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_list params) { char prefix[256]; @@ -221,6 +224,8 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis snprintf(prefix, sizeof(prefix), "BUG: "); vreportf(prefix, fmt, params); + if (BUG_exit_code) + exit(BUG_exit_code); abort(); } -- snap -- I'll try to find some time to play with this. Ciao, Dscho > > When we are testing normal parts of Git outside t/helper/, we never > want to hit BUG(). Aborting and dumping core when that happens is > an desirable outcome. From that point of view, the idea in 1/6 of > this patch series to annotate test_must_fail and say "we know this > one is going to hit BUG()" is a sound one. The implementation in > 1/6 to treat SIGABRT as an acceptable failure needs to be replaced > to instead use the above mechanism you would use to tell BUG() not > to abort but die with message to arrange that to happen before > running the git command (most likely something from t/helper/) under > test_must_fail ok=sigabrt; and then those who regularly break their > Git being tested (read: us devs) and hit BUG() could instead set the > environment variable (i.e. internal implementation detail) manually > in their environment to turn these BUG()s into die("BUG:...)s while > testing their early mistakes if they do not want core (of course, > you could just do "ulimit -c", and that may be simpler solution of > your "testing Git contaminates central core depot" issue). > > > > > > >
Re: [PATCH v4 10/10] commit-graph.txt: update design document
Derrick Stolee writes: > On 4/30/2018 7:32 PM, Jakub Narebski wrote: >> Derrick Stolee writes: [...] >>> - After computing and storing generation numbers, we must make graph >>> walks aware of generation numbers to gain the performance benefits they >>> enable. This will mostly be accomplished by swapping a >>> commit-date-ordered >>> priority queue with one ordered by generation number. The following >>> - operations are important candidates: >>> + operation is an important candidate: >>> -- paint_down_to_common() >>> - 'log --topo-order' >> >> Another possible candidates: >> >> - remove_redundant() - see comment in previous patch >> - still_interesting() - where Git uses date slop to stop walking >> too far > > remove_redundant() will be included in v5, thanks. Oh. Nice. I'll try to review the new patch in detail soon. > Instead of "still_interesting()" I'll add "git tag --merged" as the > candidate to consider, as discussed in [1]. > > [1] https://public-inbox.org/git/87fu3g67ry@lant.ki.iif.hu/t/#u > "branch --contains / tag --merged inconsistency" All right. I have mentioned still_interesting() as a hint where possible additional generation numbers based optimization may lurk (because that's where heuristic based on dates is used - similarly to how it was done in this series with paint_down_to_common()). [...] >> One important issue left is handling features that change view of >> project history, and their interaction with commit-graph feature. >> >> What would happen, if we turn on commit-graph feature, generate commit >> graph file, and then: >> >>* use graft file or remove graft entries to cut history, or remove cut >> or join two [independent] histories. >>* use git-replace mechanims to do the same >>* in shallow clone, deepen or shorten the clone >> >> What would happen if without re-generating commit-graph file (assuming >> tha Git wouldn't do it for us), we run some feature that makes use of >> commit-graph data: >> >>- git branch --contains >>- git tag --contains >>- git rev-list A..B >> > > The commit-graph is not supported in these scenarios (yet). grafts are > specifically mentioned in the future work section. > > I'm not particularly interested in supporting these features, so they > are good venues for other contributors to get involved in the > commit-graph feature. Eventually, they will be blockers to making the > commit-graph feature a "default" feature. That is when I will pay > attention to these situations. For now, a user must opt-in to having a > commit-graph file (and that same user has possibly opted in to these > history modifying features). Well, that is sensible approach. Get commit-graph features in working condition, and worry about beng able to make it on by default later. Nice to have it clarified. I'll stop nagging about that, then ;-P One issue: 'grafts' are mentioned in the future work section of the technical documentation, but we don't have *any* warning about commit-graph limitations in user-facing documentation, that is git-commit-graph(1) manpage. Best, -- Jakub Narębski
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
Shin Kojima writes: > Offset positions should not be counted by byte length, but by actual > character length. > ... > # escape tabs (convert tabs to spaces) > sub untabify { > - my $line = shift; > + my $line = to_utf8(shift); > > while ((my $pos = index($line, "\t")) != -1) { > if (my $count = (8 - ($pos % 8))) { Some codepaths in the resulting codeflow look even hackier than they already are. For example, format_rem_add_lines_pair() calls untabify() and then feeds its result to esc_html(). The first thing done in esc_html() is to call to_utf8(). I know that to_utf8() cheats and leaves the incoming string as-is if it is already UTF-8, so this may be a safe conversion, but ideally we should be able to say "function X takes non-UTF8 and works on it", "function Y takes UTF8 and works on it", and "function Z takes non-UTF8 and gives UTF8 data back" for each functions clearly, not "function W can take either UTF8 or any other garbage and tries to return UTF8". Also, does it even "fix" the problem to use to_utf8() here in the first place? Untabify is about aligning the character after a HT to multiple of 8 display position, so we'd want measure display width, which is not the same as either byte count or char count.
Re: [PATCH 0/3] Add --progress and --dissociate to git submodule
Thanks I was not aware of that option. On Wed, May 2, 2018 at 12:37 AM, Junio C Hamano wrote: > Casey Fitzpatrick writes: > >> These patches add --progress and --dissociate options to git submodule. >> >> The --progress option existed beforehand, but only for the update command and >> it was left undocumented. >> >> Both add and update submodule commands supported --reference, but not its >> pair >> option --dissociate which allows for independent clones rather than depending >> on the reference. >> >> This is a resubmission with comments from Stefan Beller and Eric Sunshine >> addressed. > > Readers would really appreciate it if these are prepared with > format-patch with -v$N option. Unless they read faster than you > post patches, they will find a few messages identically titled, all > unread in their mailbox, and it is not always easy to tell which > ones are the latest. > > Thanks.
[PATCH v2 1/4] test-tool: help verifying BUG() code paths
When we call BUG(), we signal via SIGABRT that something bad happened, dumping cores if so configured. In some setups these coredumps are redirected to some central place such as /proc/sys/kernel/core_pattern, which is a good thing. However, when we try to verify in our test suite that bugs are caught in certain code paths, we do *not* want to clutter such a central place with unnecessary coredumps. So let's special-case the test helpers (which we use to verify such code paths) so that the BUG() calls will *not* call abort() but exit with a special-purpose exit code instead. Helped-by: Nguyễn Thái Ngọc Duy Signed-off-by: Johannes Schindelin --- t/helper/test-tool.c | 2 ++ usage.c | 5 + 2 files changed, 7 insertions(+) diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 87066ced62a..5176f9f20ae 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -47,7 +47,9 @@ static struct test_cmd cmds[] = { int cmd_main(int argc, const char **argv) { int i; + extern int BUG_exit_code; + BUG_exit_code = 99; if (argc < 2) die("I need a test name!"); diff --git a/usage.c b/usage.c index cdd534c9dfc..9c84dccfa97 100644 --- a/usage.c +++ b/usage.c @@ -210,6 +210,9 @@ void warning(const char *warn, ...) va_end(params); } +/* Only set this, ever, from t/helper/, when verifying that bugs are caught. */ +int BUG_exit_code; + static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_list params) { char prefix[256]; @@ -221,6 +224,8 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis snprintf(prefix, sizeof(prefix), "BUG: "); vreportf(prefix, fmt, params); + if (BUG_exit_code) + exit(BUG_exit_code); abort(); } -- 2.17.0.windows.1.36.gdf4ca5fb72a
[PATCH v2 0/4] Finish the conversion from die("BUG: ...") to BUG()
The BUG() macro was introduced in this patch series: https://public-inbox.org/git/20170513032414.mfrwabt4hovuj...@sigill.intra.peff.net The second patch in that series converted one caller from die("BUG: ") to use the BUG() macro. It seems that there was no concrete plan to address the same issue in the rest of the code base. This patch series tries to do that. Note: I would be very surprised if the monster commit 3/4 ("Replace all die("BUG: ...") calls by BUG() ones") would apply cleanly on `pu` (I develop this on top of `master`). For that reason, the commit message contains the precise Unix shell invocation (GNU sed semantics, not BSD sed ones, because I know that the Git maintainer as well as the author of the patch introducing BUG() both use Linux and not macOS or any other platform that would offer a BSD sed). It should be straight-forward to handle merge conflicts/non-applying patches by simply re-running that command. Changes since v2: - Avoided the entire discussion about the previous 2/6 (now dropped) that prepared t1406 to handle SIGABRT by side-stepping the issue: the ref-store test helper will no longer call abort() in BUG() calls but exit with exit code 99 instead. - As a consequence, I could fold 3/6 into 4/6 (i.e. *not* special-case the refs/*.c code but do one wholesale die("BUG: ...") -> BUG() conversion). - Further consequence: 1/6, which added handling for ok=sigabrt to test_must_fail, was dropped. - I also used the opportunity to explain in the commit message of the last commit why the localization was dropped from one die(_("BUG: ...")) call. Johannes Schindelin (4): test-tool: help verifying BUG() code paths run-command: use BUG() to report bugs, not die() Replace all die("BUG: ...") calls by BUG() ones Convert remaining die*(BUG) messages apply.c | 4 ++-- archive-tar.c| 2 +- attr.c | 10 +- blame.c | 2 +- builtin/am.c | 20 +-- builtin/branch.c | 2 +- builtin/cat-file.c | 4 ++-- builtin/clone.c | 2 +- builtin/commit.c | 2 +- builtin/config.c | 2 +- builtin/fast-export.c| 2 +- builtin/fsck.c | 2 +- builtin/index-pack.c | 4 ++-- builtin/init-db.c| 2 +- builtin/ls-files.c | 2 +- builtin/notes.c | 8 builtin/pack-objects.c | 4 ++-- builtin/pull.c | 2 +- builtin/receive-pack.c | 2 +- builtin/replace.c| 2 +- builtin/update-index.c | 2 +- bulk-checkin.c | 2 +- color.c | 4 ++-- column.c | 2 +- config.c | 12 +-- date.c | 2 +- diff.c | 12 +-- dir-iterator.c | 2 +- git-compat-util.h| 2 +- grep.c | 16 +++ http.c | 8 imap-send.c | 2 +- lockfile.c | 2 +- mailinfo.c | 2 +- merge-recursive.c| 12 +-- notes-merge.c| 4 ++-- pack-bitmap-write.c | 2 +- pack-bitmap.c| 6 +++--- pack-objects.c | 2 +- packfile.c | 6 +++--- pathspec.c | 12 +-- pkt-line.c | 2 +- prio-queue.c | 2 +- ref-filter.c | 4 ++-- refs.c | 34 refs/files-backend.c | 20 +-- refs/iterator.c | 6 +++--- refs/packed-backend.c| 16 +++ refs/ref-cache.c | 2 +- remote.c | 2 +- revision.c | 4 ++-- run-command.c| 33 ++- setup.c | 4 ++-- sha1-lookup.c| 2 +- sha1-name.c | 4 ++-- shallow.c| 6 +++--- sigchain.c | 2 +- strbuf.c | 4 ++-- submodule.c | 8 t/helper/test-example-decorate.c | 16 +++ t/helper/test-tool.c | 2 ++ tmp-objdir.c | 2 +- trailer.c| 6 +++--- transport.c | 4 ++-- unpack-trees.c | 2 +- usage.c | 5 + vcs-svn/fast_export.c| 6 -- worktree.c | 2 +- wrapper.c| 4 ++-- wt-status.c
[PATCH v2 2/4] run-command: use BUG() to report bugs, not die()
The slightly misleading name die_bug() of the function intended to report a bug is actually called always, and only reports a bug if the passed-in parameter `err` is non-zero. It uses die_errno() to report the bug, to helpfully include the error message corresponding to `err`. However, as these messages indicate bugs, we really should use BUG(). And as BUG() is a macro to be able to report the exact file and line number, we need to convert die_bug() to a macro instead of only replacing the die_errno() by a call to BUG(). While at it, use a name more indicative of the purpose: CHECK_BUG(). Signed-off-by: Johannes Schindelin --- run-command.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/run-command.c b/run-command.c index 12c94c1dbe5..0ad6f135d5a 100644 --- a/run-command.c +++ b/run-command.c @@ -471,15 +471,12 @@ struct atfork_state { sigset_t old; }; -#ifndef NO_PTHREADS -static void bug_die(int err, const char *msg) -{ - if (err) { - errno = err; - die_errno("BUG: %s", msg); - } -} -#endif +#define CHECK_BUG(err, msg) \ + do { \ + int e = (err); \ + if (e) \ + BUG("%s: %s", msg, strerror(e)); \ + } while(0) static void atfork_prepare(struct atfork_state *as) { @@ -491,9 +488,9 @@ static void atfork_prepare(struct atfork_state *as) if (sigprocmask(SIG_SETMASK, &all, &as->old)) die_errno("sigprocmask"); #else - bug_die(pthread_sigmask(SIG_SETMASK, &all, &as->old), + CHECK_BUG(pthread_sigmask(SIG_SETMASK, &all, &as->old), "blocking all signals"); - bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &as->cs), + CHECK_BUG(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &as->cs), "disabling cancellation"); #endif } @@ -504,9 +501,9 @@ static void atfork_parent(struct atfork_state *as) if (sigprocmask(SIG_SETMASK, &as->old, NULL)) die_errno("sigprocmask"); #else - bug_die(pthread_setcancelstate(as->cs, NULL), + CHECK_BUG(pthread_setcancelstate(as->cs, NULL), "re-enabling cancellation"); - bug_die(pthread_sigmask(SIG_SETMASK, &as->old, NULL), + CHECK_BUG(pthread_sigmask(SIG_SETMASK, &as->old, NULL), "restoring signal mask"); #endif } -- 2.17.0.windows.1.36.gdf4ca5fb72a
[PATCH v2 3/4] Replace all die("BUG: ...") calls by BUG() ones
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovuj...@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin --- apply.c | 4 ++-- archive-tar.c| 2 +- attr.c | 10 +- blame.c | 2 +- builtin/am.c | 20 +-- builtin/branch.c | 2 +- builtin/cat-file.c | 4 ++-- builtin/clone.c | 2 +- builtin/commit.c | 2 +- builtin/config.c | 2 +- builtin/fast-export.c| 2 +- builtin/fsck.c | 2 +- builtin/index-pack.c | 4 ++-- builtin/init-db.c| 2 +- builtin/ls-files.c | 2 +- builtin/notes.c | 8 builtin/pack-objects.c | 4 ++-- builtin/pull.c | 2 +- builtin/receive-pack.c | 2 +- builtin/replace.c| 2 +- builtin/update-index.c | 2 +- bulk-checkin.c | 2 +- color.c | 4 ++-- column.c | 2 +- config.c | 12 +-- date.c | 2 +- diff.c | 12 +-- dir-iterator.c | 2 +- grep.c | 16 +++ http.c | 8 imap-send.c | 2 +- lockfile.c | 2 +- mailinfo.c | 2 +- merge-recursive.c| 12 +-- notes-merge.c| 4 ++-- pack-bitmap-write.c | 2 +- pack-bitmap.c| 6 +++--- pack-objects.c | 2 +- packfile.c | 6 +++--- pathspec.c | 10 +- pkt-line.c | 2 +- prio-queue.c | 2 +- ref-filter.c | 4 ++-- refs.c | 34 refs/files-backend.c | 20 +-- refs/iterator.c | 6 +++--- refs/packed-backend.c| 16 +++ refs/ref-cache.c | 2 +- remote.c | 2 +- revision.c | 4 ++-- run-command.c| 10 +- setup.c | 4 ++-- sha1-lookup.c| 2 +- sha1-name.c | 4 ++-- shallow.c| 6 +++--- sigchain.c | 2 +- strbuf.c | 4 ++-- submodule.c | 6 +++--- t/helper/test-example-decorate.c | 16 +++ tmp-objdir.c | 2 +- trailer.c| 6 +++--- transport.c | 4 ++-- unpack-trees.c | 2 +- worktree.c | 2 +- wrapper.c| 4 ++-- wt-status.c | 20 +-- zlib.c | 4 ++-- 67 files changed, 190 insertions(+), 190 deletions(-) diff --git a/apply.c b/apply.c index 7e5792c996f..3866adbc97f 100644 --- a/apply.c +++ b/apply.c @@ -2375,7 +2375,7 @@ static void update_pre_post_images(struct image *preimage, if (postlen ? postlen < new_buf - postimage->buf : postimage->len < new_buf - postimage->buf) - die("BUG: caller miscounted postlen: asked %d, orig = %d, used = %d", + BUG("caller miscounted postlen: asked %d, orig = %d, used = %d", (int)postlen, (int) postimage->len, (int)(new_buf - postimage->buf)); /* Fix the length of the whole thing */ @@ -3509,7 +3509,7 @@ static int load_current(struct apply_state *state, unsigned mode = patch->new_mode; if (!patch->is_new) - die("BUG: patch to %s is not a creation", patch->old_name); + BUG("patch to %s is not a creation", patch->old_name); pos = cache_name_pos(name, strlen(name)); if (pos < 0) diff --git a/archive-tar.c b/archive-tar.c index 3563bcb9f26..61a1a2547cc 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -441,7 +441,7 @@ static int write_tar_filter_archive(const struct ar
[PATCH v2 4/4] Convert remaining die*(BUG) messages
These were not caught by the previous commit, as they did not match the regular expression. While at it, remove the localization from one instance: we never want BUG() messages to be translated, as they target Git developers, not the end user (hence it would be quite unhelpful to not only burden the translators, but then even end up with a bug report in a language that no core Git contributor understands). Signed-off-by: Johannes Schindelin --- git-compat-util.h | 2 +- pathspec.c| 2 +- submodule.c | 2 +- vcs-svn/fast_export.c | 6 -- 4 files changed, 7 insertions(+), 5 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 07e383257b4..3a7216f5313 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1052,7 +1052,7 @@ int git_qsort_s(void *base, size_t nmemb, size_t size, #define QSORT_S(base, n, compar, ctx) do { \ if (qsort_s((base), (n), sizeof(*(base)), compar, ctx)) \ - die("BUG: qsort_s() failed"); \ + BUG("qsort_s() failed");\ } while (0) #ifndef REG_STARTEND diff --git a/pathspec.c b/pathspec.c index a637a6d15c0..27cd6067860 100644 --- a/pathspec.c +++ b/pathspec.c @@ -486,7 +486,7 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags, /* sanity checks, pathspec matchers assume these are sane */ if (item->nowildcard_len > item->len || item->prefix > item->len) { - die ("BUG: error initializing pathspec_item"); + BUG("error initializing pathspec_item"); } } diff --git a/submodule.c b/submodule.c index 733db441714..c282fa8fe51 100644 --- a/submodule.c +++ b/submodule.c @@ -2043,7 +2043,7 @@ const char *get_superproject_working_tree(void) if (super_sub_len > cwd_len || strcmp(&cwd[cwd_len - super_sub_len], super_sub)) - die (_("BUG: returned path string doesn't match cwd?")); + BUG("returned path string doesn't match cwd?"); super_wt = xstrdup(cwd); super_wt[cwd_len - super_sub_len] = '\0'; diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c index 3fd047a8b82..b5b8913cb0f 100644 --- a/vcs-svn/fast_export.c +++ b/vcs-svn/fast_export.c @@ -320,7 +320,8 @@ const char *fast_export_read_path(const char *path, uint32_t *mode_out) err = fast_export_ls(path, mode_out, &buf); if (err) { if (errno != ENOENT) - die_errno("BUG: unexpected fast_export_ls error"); + BUG("unexpected fast_export_ls error: %s", + strerror(errno)); /* Treat missing paths as directories. */ *mode_out = S_IFDIR; return NULL; @@ -338,7 +339,8 @@ void fast_export_copy(uint32_t revision, const char *src, const char *dst) err = fast_export_ls_rev(revision, src, &mode, &data); if (err) { if (errno != ENOENT) - die_errno("BUG: unexpected fast_export_ls_rev error"); + BUG("unexpected fast_export_ls_rev error: %s", + strerror(errno)); fast_export_delete(dst); return; } -- 2.17.0.windows.1.36.gdf4ca5fb72a
Re: [PATCH 2/6] t1406: prepare for the refs code to fail with BUG()
Johannes Schindelin writes: >> As discussed in this thread, tests that use t/helper/ executables >> that try to trickle BUG() codepath to ensure that these "should >> never happen" conditions are caught do need to deal with it. If >> dumping core is undesirable, tweaking BUG() implementation so that >> it becomes die("BUG: ...") *ONLY* when the caller knows what it is >> doing (e.g. running t/helper/ commands) is probably a good idea. >> Perhaps GIT_TEST_OPTS can gain one feature "--bug-no-abort" and set >> an environment variable so that implementation of BUG() can notice, >> or something. > > I think we can do even better than that. t/helper/*.c could set a global > variable that no other code is supposed to set, to trigger an alternative > to SIGABRT. Something like Yes, I agree with that solution for t/helper/ part. But we also need to arrange a way for things outside t/helper/ to set the BUG_exit_code at runtime, so that those like Duy who causes BUG() to trigger in some "git cmd" under development when running tests for himself, where he does not want his BUG() to dump core and contaminate global core storage.
Re: What's cooking in git.git (Apr 2018, #04; Mon, 30)
Johannes Schindelin writes: >> As I wrote elsewhere, for a low-impact and ralatively old issue like >> this, it is OK for a fix to use supporting code that only exists in >> more recent codebase and become unmergeable to anything older than >> the concurrent 'maint' track as of the time when the fix is written. >> Even though it is sometimes nicer if the fix were written to be >> mergeable to codebase near the point where the issue originates, it >> is often not worth doing so if it requires bending backwards to >> refrain from using a newer and more convenient facility. > > So do you want me to clean up the backporting branches? I mean, we could... For a relatively obscure and low-impact bug that is old like this one, I'd actually prefer to be able to say "heh, if that hurts, either refrain from doing so, or update to the recent maintenance track that has a fix for it", to use the fix as an incentive to move the world forward ;-). After all, people have lived with the bug for a long time.
[PATCH] checkout & worktree: introduce a core.DWIMRemote setting
Introduce a core.DWIMRemote setting which can be used to designate a remote to prefer (via core.DWIMRemote=origin) when running e.g. "git checkout master" to mean origin/master, even though there's other remotes that have the "master" branch. I want this because it's very handy to use this workflow to checkout a repository and create a topic branch, then get back to a "master" as retrieved from upstream: ( rm -rf /tmp/tbdiff && git clone g...@github.com:trast/tbdiff.git && cd tbdiff && git branch -m topic && git checkout master ) That will output: Branch 'master' set up to track remote branch 'master' from 'origin'. Switched to a new branch 'master' But as soon as a new remote is added (e.g. just to inspect something from someone else) the DWIMery goes away: ( rm -rf /tmp/tbdiff && git clone g...@github.com:trast/tbdiff.git && cd tbdiff && git branch -m topic && git remote add avar g...@github.com:avar/tbdiff.git && git fetch avar && git checkout master ) Will output: error: pathspec 'master' did not match any file(s) known to git. The new core.DWIMRemote config allows me to say that whenever that ambiguity comes up I'd like to prefer "origin", and it'll still work as though the only remote I had was "origin". I considered calling the config setting checkout.DWIMRemote, but then discovered that this behavior is also used by "worktree" for similar purposes, so it makes sense to have it under core.*. As noted in the documentation we may also want to use this in other commands in the future if they have similar DWIM behavior in the presence of one remote. See also 70c9ac2f19 ("DWIM "git checkout frotz" to "git checkout -b frotz origin/frotz"", 2009-10-18) which introduced this DWIM feature to begin with, and 4e85333197 ("worktree: make add dwim", 2017-11-26) which added it to git-worktree. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config.txt | 15 +++ Documentation/git-checkout.txt | 5 + Documentation/git-worktree.txt | 5 + builtin/checkout.c | 3 ++- checkout.c | 17 +++-- t/t2024-checkout-dwim.sh | 10 ++ t/t2025-worktree-add.sh| 18 ++ 7 files changed, 70 insertions(+), 3 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2659153cb3..d4740fcdb9 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -898,6 +898,21 @@ core.notesRef:: This setting defaults to "refs/notes/commits", and it can be overridden by the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1]. +core.DWIMRemote:: + Various git commands will look at references on the configured + remotes and DW[YI]M (Do What (You|I) Mean) if the reference + only exists on one remote. This setting allows for setting the + name of a special remote that should always win when it comes + to disambiguation. The typical use-case is to set this to + `origin`. ++ +Currently this is used by linkgit:git-checkout[1] when `git checkout +' will checkout the '' branch on another remote, +and linkgit:git-worktree[1] when 'git worktree add' similarly DWYM +when a branch is unique across remotes, or this setting is set to a +special remote. This setting might be used for other commands or +functionality in the future when appropriate. + core.sparseCheckout:: Enable "sparse checkout" feature. See section "Sparse checkout" in linkgit:git-read-tree[1] for more information. diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt index ca5fc9c798..8ad7be6c53 100644 --- a/Documentation/git-checkout.txt +++ b/Documentation/git-checkout.txt @@ -38,6 +38,11 @@ equivalent to $ git checkout -b --track / + +If the branch exists in multiple remotes the `core.DWIMRemote` +variable can be used to pick the remote you really mean. Set it to +e.g. `core.DWIMRemote=origin` to always checkout remote branches from +there. See also `core.DWIMRemote` in linkgit:git-config[1]. ++ You could omit , in which case the command degenerates to "check out the current branch", which is a glorified no-op with rather expensive side-effects to show only the tracking information, diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt index 2755ca90e3..37db12f816 100644 --- a/Documentation/git-worktree.txt +++ b/Documentation/git-worktree.txt @@ -60,6 +60,11 @@ with a matching name, treat as equivalent to: $ git worktree add --track -b / + +It's also possible to use the `core.DWIMRemote` setting to designate a +special remote this rule should be applied to, even if the branch +isn't unique across all remotes. See `core.DWIMRemote` in +linkgit:git-config[1]. ++ If `` is omitted and neither `-b` nor `-B` nor `--detach` used, then, as a convenience, a
Re: [PATCH v4 09/10] merge: check config before loading commits
Derrick Stolee writes: > On 4/30/2018 6:54 PM, Jakub Narebski wrote: >> Derrick Stolee writes: >> >>> Now that we use generation numbers from the commit-graph, we must >>> ensure that all commits that exist in the commit-graph are loaded >>> from that file instead of from the object database. Since the >>> commit-graph file is only checked if core.commitGraph is true, we >>> must check the default config before we load any commits. >>> >>> In the merge builtin, the config was checked after loading the HEAD >>> commit. This was due to the use of the global 'branch' when checking >>> merge-specific config settings. >>> >>> Move the config load to be between the initialization of 'branch' and >>> the commit lookup. >> >> Sidenote: I wonder why reading config was postponed to later in the >> command lifetime... I guess it was to avoid having to read config if >> HEAD was invalid. > > The 'branch' does need to be loaded before the call to git_config (as > I found out after moving the config call too early), so I suppose it > was natural to pair that with resolving head_commit. Right, so there was only a limited number of places where call to git_config could be put correctly. Now I wonder no more. [...] >>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh >>> index a380419b65..77d85aefe7 100755 >>> --- a/t/t5318-commit-graph.sh >>> +++ b/t/t5318-commit-graph.sh >>> @@ -221,4 +221,13 @@ test_expect_success 'write graph in bare repo' ' >>> graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare >>> commits/8 merge/1 >>> graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare >>> commits/8 merge/2 >>> +test_expect_success 'perform fast-forward merge in full repo' ' >>> + cd "$TRASH_DIRECTORY/full" && >>> + git checkout -b merge-5-to-8 commits/5 && >>> + git merge commits/8 && >>> + git show-ref -s merge-5-to-8 >output && >>> + git show-ref -s commits/8 >expect && >>> + test_cmp expect output >>> +' >> All right. (though I wonder if this tests catches all problems where >> BUG("bad generation skip") could have been encountered. > > We will never know until we have this series running in the wild (and > even then, some features are very obscure) and enough people turn on > the config setting. > > One goal of the "fsck and gc" series is to get this feature running > during the rest of the test suite as much as possible, so we can get > additional coverage. Also to get more experience from the community > dogfooding the feature. Sidenote: for two out of three features that change the view of history we could also update commit-graph automatically: * the shortening or deepening of shallow clone could also re-calculate the commit graph (or invalidate it) * git-replace could check if the replacement modifies history, and if so, recalculate the commit graph (or invalidate it/check its validity) * there is no such possibility for grafts, but they are deprecated anyway -- Jakub Narębski
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
> ideally we should be able to say "function X takes non-UTF8 and > works on it", "function Y takes UTF8 and works on it", and "function > Z takes non-UTF8 and gives UTF8 data back" for each functions > clearly, not "function W can take either UTF8 or any other garbage > and tries to return UTF8". Yes, I totally agree with that. > Some codepaths in the resulting codeflow look even harder than they > already are. For example, format_rem_add_lines_pair() calls > untabify() and then feeds its result to esc_html(). Honestly, I discovered this problem exactly from format_rem_add_lines_pair(). In my environment($fallback_encoding = 'cp932'), some commitdiff shows garbled text only inside color-words portions. I added a reproduce process at the end of this message. After my investigation, I thought format_rem_add_lines_pair() tries to use split()/index()/substr()/etc against raw blob contents and that produces funny characters. These builtin functions should be used to decoded string. untabify() looks proper place for me to decode blob contents beforehand, as it definitely is not to be used for binary contests like images and compressed snapshot files. I'm sure using to_utf8() in untabify() fixes this problem. In fact, there is also another similar problem in blame function that assumes blob contents as if utf8 encoded: binmode $fd, ':utf8'; I personally consider text blob contents should be decoded as soon as possible and esc_html() should not contain to_utf8(), but the codeflow is slightly vast and I couldn't eliminate the possibility of calls from somewhere else that does not care character encodings. So yes, I would appreciate hearing your thoughts. > Also, does it even "fix" the problem to use to_utf8() here in the > first place? Untabify is about aligning the character after a HT to > multiple of 8 display position, so we'd want measure display width, > which is not the same as either byte count or char count. Following is a reproduce process: $ git --version git version 2.17.0 $ mkdir test $ cd test $ git init $ echo 'モバイル' | iconv -f UTF-8 -t Shift_JIS > dummy $ git add . $ git commit -m 'init' $ echo 'インスタント' | iconv -f UTF-8 -t Shift_JIS > dummy $ git commit -am 'change' $ git instaweb $ echo 'our $fallback_encoding = "cp932";' >> .git/gitweb/gitweb_config.perl $ w3m -dump 'http://127.0.0.1:1234/?p=.git;a=commitdiff' What I got: gitprojects / .git / commitdiff [commit ] ? search: [] [ ]re summary | shortlog | log | commit | commitdiff | tree raw | patch | inline | side by side (parent: 79e26fe) change master authorShin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) committer Shin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) dummy patch | blob | history diff --git a/dummy b/dummy index ac37f38..31fb96a 100644 (file) --- a/dummy +++ b/dummy @@ -1 +1 @@ -coイル +Cンスタント Unnamed repository; edit this file 'description' to name the repository. RSS Atom What I expected: gitprojects / .git / commitdiff [commit ] ? search: [] [ ]re summary | shortlog | log | commit | commitdiff | tree raw | patch | inline | side by side (parent: 79e26fe) change master authorShin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) committer Shin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) dummy patch | blob | history diff --git a/dummy b/dummy index ac37f38..31fb96a 100644 (file) --- a/dummy +++ b/dummy @@ -1 +1 @@ -モバイル +インスタント Unnamed repository; edit this file 'description' to name the repository. RSS Atom -- Shin Kojima
Re: [PATCH v3 00/12] get_short_oid UI improvements
On 5/1/2018 2:40 PM, Ævar Arnfjörð Bjarmason wrote: The biggest change in v3 is the no change at all to the code, but a lengthy explanation of why I didn't go for Derrick's simpler implementation. Maybe I'm wrong about that, but I felt uneasy offloading undocumented (or if I documented it, it would only be for this one edge-case) magic on the oid_array API. Instead I'm just making this patch a bit more complex. I think that's fair. Thanks for going along with me on the thought experiment. -Stolee
Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()
Derrick Stolee writes: > On 4/30/2018 6:19 PM, Jakub Narebski wrote: >> Derrick Stolee writes: [...] >>> @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct >>> commit *one, int n, struc >>> struct commit_list *parents; >>> int flags; >>> + if (commit->generation > last_gen) >>> + BUG("bad generation skip"); >>> + last_gen = commit->generation; >> Shouldn't we provide more information about where the problem is to the >> user, to make it easier to debug the repository / commit-graph data? >> >> Good to have this sanity check here. > > This BUG() _should_ only be seen by developers who add callers which > do not load commits from the commit-graph file. There is a chance that > there are cases not covered by this patch and the added tests, > though. Hopefully we catch them all by dogfooding the feature before > turning it on by default. > > I can add the following to help debug these bad situations: > > + BUG("bad generation skip %d > %d at %s", > + commit->generation, last_gen, > + oid_to_hex(&commit->object.oid)); On one hand, after thiking about this a bit, I agree that this BUG() is more about catching the errors in Git code, rather than in repository. On the other hand, the more detailed information could help determining what the problems is (e.g. that "at " is at HEAD). Hopefully we won't see which is which, as it would mean bugs in Git ;)) [...] >>> @@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int >>> cnt) >>> filled_index[filled] = j; >>> work[filled++] = array[j]; >>> } >>> - common = paint_down_to_common(array[i], filled, work); >>> + common = paint_down_to_common(array[i], filled, work, 0); >> >> Here we are interested not only if "one"/array[i] is reachable from >> "twos"/work, but also if "twos" is reachable from "one". Simple cutoff >> only works in one way, though I wonder if we couldn't use cutoff being >> minimum generation number of "one" and "twos" together. >> >> But that may be left for a separate commit (after checking that the >> above is correct). >> >> Not as simple and obvious as paint_down_to_common() used in >> in_merge_bases_any(), so it is all right. > > Thanks for reporting this. Since we are only concerned about > reachability in this method, it is a good candidate to use > min_generation. It is also subtle enough that we should leave it as a > separate commit. Thanks for checking this, and for the followup. > Also, we can measure performance improvements > separately, as I will mention in my commit message (but I'll copy it > here): > > For a copy of the Linux repository, we measured the following > performance improvements: > > git merge-base v3.3 v4.5 > > Before: 234 ms > After: 208 ms > Rel %: -11% > > git merge-base v4.3 v4.5 > > Before: 102 ms > After: 83 ms > Rel %: -19% > > The experiments above were chosen to demonstrate that we are > improving the filtering of the merge-base set. In the first > example, more time is spent walking the history to find the > set of merge bases before the remove_redundant() call. The > starting commits are closer together in the second example, > therefore more time is spent in remove_redundant(). The relative > change in performance differs as expected. Nice. I was not expecting as much performance improvements as we got for --contains tests because remove_redundant() is a final step in longer process, dominated by man calculations. Still, nothing to sneeze about. Best regards, -- Jakub Narębski
Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()
On 5/2/2018 9:05 AM, Jakub Narebski wrote: Derrick Stolee writes: For a copy of the Linux repository, we measured the following performance improvements: git merge-base v3.3 v4.5 Before: 234 ms After: 208 ms Rel %: -11% git merge-base v4.3 v4.5 Before: 102 ms After: 83 ms Rel %: -19% The experiments above were chosen to demonstrate that we are improving the filtering of the merge-base set. In the first example, more time is spent walking the history to find the set of merge bases before the remove_redundant() call. The starting commits are closer together in the second example, therefore more time is spent in remove_redundant(). The relative change in performance differs as expected. Nice. I was not expecting as much performance improvements as we got for --contains tests because remove_redundant() is a final step in longer process, dominated by man calculations. Still, nothing to sneeze about. One reason these numbers are not too surprising is that remove_redundant() can demonstrate quadratic behavior. It is calculating pair-wise reachability by starting a walk at each of the candidates (in the worst case). In typical cases, the first walk marks many of the other candidates as redundant and we don't need to start walks from those commits. A possible optimization could be to sort the candidates by descending generation so we find the first walk is likely to mark the rest as redundant. But this may already be the case if the candidates are added to the list in order of "discovery" which is already simulating this behavior. Thanks, -Stolee
Re: [PATCH v3 00/12] get_short_oid UI improvements
On 5/2/2018 8:42 AM, Derrick Stolee wrote: On 5/1/2018 2:40 PM, Ævar Arnfjörð Bjarmason wrote: The biggest change in v3 is the no change at all to the code, but a lengthy explanation of why I didn't go for Derrick's simpler implementation. Maybe I'm wrong about that, but I felt uneasy offloading undocumented (or if I documented it, it would only be for this one edge-case) magic on the oid_array API. Instead I'm just making this patch a bit more complex. I think that's fair. Thanks for going along with me on the thought experiment. Also, v3 looks good to me. Reviewed-by: Derrick Stolee
Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults
On 5/1/2018 8:08 PM, Elijah Newren wrote: On Tue, May 1, 2018 at 4:11 PM, Junio C Hamano wrote: Elijah Newren writes: I also just realized that in addition to status being inconsistent with diff/log/show, it was also inconsistent with itself -- it handles staged and unstaged changes differently. (wt_status_collect_changes_worktree() had different settings for break detection than wt_status_collect_changes_index().) Eckhard, can you add some comments to your commit message mentioning the email pointed to by Junio about break detection and rename detection being unsafe to use together, as well as the inconsistencies in break detection between different commands? That may help future readers understand why break detection was turned off for wt_status_collect_changes_index(). Wow, lots of inconsistent behaviors here with diff/merge/status and the various options being used. Let me add another one: I've been sitting on another patch that we've been using internally for some time that enables us to turn off rename and break detection for status via config settings and new command-line options. The issue that triggered the creation of the patch was that if someone ran status while in a merge conflict state, the status would take a very long time. Turning off rename and break detection "fixed" the problem. I was waiting for some of these inflight changes to settle down and get accepted before I started another patch series but I thought I should at least let everyone know about this additional issue that will need to be addressed.
[PATCH] git-send-email: allow re-editing of message
When shown the email summary, an opportunity is presented for the user to edit the email as if they had specified --annotate. This also permits them to edit it multiple times. Signed-off-by: Drew DeVault Reviewed-by: Simon Ser --- Sent to the ML without comment, sending back around with more people Cc'd for better visibility. git-send-email.perl | 35 --- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index 2fa7818ca..14f2e8ae4 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -1330,9 +1330,14 @@ sub file_name_is_absolute { return File::Spec::Functions::file_name_is_absolute($path); } -# Returns 1 if the message was sent, and 0 otherwise. -# In actuality, the whole program dies when there -# is an error sending a message. +# Prepares the email, then asks the user what to do. +# +# If the user chooses to send the email, it's sent and 1 is returned. +# If the user chooses not to send the email, 0 is returned. +# If the user decides they want to make further edits, -1 is returned and the +# caller is expected to call send_message again after the edits are performed. +# +# If an error occurs sending the email, this just dies. sub send_message { my @recipients = unique_email_list(@to); @@ -1404,15 +1409,17 @@ Message-Id: $message_id EOF } - # TRANSLATORS: Make sure to include [y] [n] [q] [a] in your + # TRANSLATORS: Make sure to include [y] [n] [e] [q] [a] in your # translation. The program will only accept English input # at this point. - $_ = ask(__("Send this email? ([y]es|[n]o|[q]uit|[a]ll): "), -valid_re => qr/^(?:yes|y|no|n|quit|q|all|a)/i, + $_ = ask(__("Send this email? ([y]es|[n]o|[e]dit|[q]uit|[a]ll): "), +valid_re => qr/^(?:yes|y|no|n|edit|e|quit|q|all|a)/i, default => $ask_default); die __("Send this email reply required") unless defined $_; if (/^n/i) { return 0; + } elsif (/^e/i) { + return -1; } elsif (/^q/i) { cleanup_compose_files(); exit(0); @@ -1552,7 +1559,9 @@ $references = $initial_in_reply_to || ''; $subject = $initial_subject; $message_num = 0; -foreach my $t (@files) { +sub process_file { + my ($t) = @_; + open my $fh, "<", $t or die sprintf(__("can't open file %s"), $t); my $author = undef; @@ -1755,6 +1764,10 @@ foreach my $t (@files) { } my $message_was_sent = send_message(); + if ($message_was_sent == -1) { + do_edit($t); + return 0; + } # set up for the next message if ($thread && $message_was_sent && @@ -1776,6 +1789,14 @@ foreach my $t (@files) { undef $auth; sleep($relogin_delay) if defined $relogin_delay; } + + return 1; +} + +foreach my $t (@files) { + while (!process_file($t)) { + # This space deliberately left blank + } } # Execute a command (e.g. $to_cmd) to get a list of email addresses -- 2.17.0
Re:
On 4/30/2018 12:12 PM, Elijah Newren wrote: On Mon, Apr 30, 2018 at 6:11 AM, Ben Peart wrote: On 4/27/2018 2:19 PM, Elijah Newren wrote: From: Elijah Newren On Thu, Apr 26, 2018 at 5:54 PM, Ben Peart wrote: Can you write the documentation that clearly explains the exact behavior you want? That would kill two birds with one stone... :) Sure, something like the following is what I envision, and I've tried to include the suggestion from Junio to document the copy behavior in the merge-recursive documentation. Thanks Elijah. I've applied this patch and reviewed and tested it. It works and addresses the concerns around the settings inheritance from diff.renames. I still _prefer_ the simpler model that doesn't do the partial inheritance but I can use this model as well. I'm unsure on the protocol here. Should I incorporate this patch and submit a reroll or can it just be applied as is? I suspect you'll want to re-roll anyway, to base your series on en/rename-directory-detection-reboot instead of on master. (Junio plans to merge it down to next, and your series has four different merge conflicts with it.) There are two other loose ends with this series that Junio will need to weigh in on: - I'm obviously a strong proponent of the inherited setting, but Junio may change his mind after reading Dscho's arguments against it (or after reading my arguments for it). - I like the setting as-is, and think we could allow a "copy" setting for merge.renames to specify that the post-merge diffstat should detect copies (not part of your series, but a useful addition I'd like to tackle afterwards). However, Junio had comments in xmqqwox19ohw@gitster-ct.c.googlers.com about merge.renames handling the scoring as well, like -Xfind-renames. Those sound incompatible to me for a single setting, and I'm unsure if Junio would resolve them the way I do or still feels strongly about the scoring. I think this patch series (including Elijah's fixup!) improves the situation from where we were and it provides the necessary functionality to solve the problem I started out to solve. While there are other changes that could be made, I think they should be done in separate follow up patches. I'm happy to reroll this incorporating the fixup! so that we can make progress. Junio, would you prefer I reroll this based on en/rename-directory-detection-reboot or master?
Re: [PATCH] git-p4 - Add option --sha1 to submit in p4
On 2 May 2018 at 15:32, Merland Romain wrote: > From 4867808cad2b759ebf31c275356e602b72c5659f Mon Sep 17 00:00:00 2001 > From: Romain Merland > To: git@vger.kernel.org > Cc: Junio C Hamano , Jeff King , Luke > Diamand , Vinicius Kursancew > Date: Wed, 2 May 2018 15:02:11 +0200 > Subject: [PATCH] git-p4 - Add option --sha1 to submit in p4 > > Add option --sha1 to command 'git-p4 submit' in order to submit in p4 a > commit > that is not necessarily on master. > In that case, don't rebase the submitted changes. That could be very useful, I often find the commit I want to submit is half-way down a long list of other commits. Currently I end up cherry-picking the one I want into a clean repo, but that's much more awkward than your --sha1 change. A few comments inline: > Signed-off-by: Romain Merland > --- > git-p4.py | 21 +++-- > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index 7bb9cadc6..d64ff79dd 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -1352,7 +1352,9 @@ class P4Submit(Command, P4UserMap): > optparse.make_option("--update-shelve", > dest="update_shelve", action="append", type="int", > metavar="CHANGELIST", > help="update an existing shelved > changelist, implies --shelve, " > - "repeat in-order for multiple > shelved changelists") > + "repeat in-order for multiple > shelved changelists"), > +optparse.make_option("--sha1", dest="sha1", metavar="SHA1", > + help="submit only the specified > commit, don't rebase afterwards") Is there a better name than "sha1" ? If git ever changes its hash to something else will this still make sense? I wonder why you wouldn't rebase afterwards? Perhaps an additional option to skip the rebase? > ] > self.description = "Submit changes from git to the perforce depot." > self.usage += " [name of git branch to submit into perforce depot]" > @@ -1362,6 +1364,7 @@ class P4Submit(Command, P4UserMap): > self.dry_run = False > self.shelve = False > self.update_shelve = list() > +self.sha1 = "" > self.prepare_p4_only = False > self.conflict_behavior = None > self.isWindows = (platform.system() == "Windows") > @@ -2103,9 +2106,12 @@ class P4Submit(Command, P4UserMap): > else: > commitish = 'HEAD' > > -for line in read_pipe_lines(["git", "rev-list", "--no-merges", > "%s..%s" % (self.origin, commitish)]): > -commits.append(line.strip()) > -commits.reverse() > +if self.sha1 != "": > +commits.append(self.sha1) > +else: > +for line in read_pipe_lines(["git", "rev-list", "--no-merges", > "%s..%s" % (self.origin, commitish)]): > +commits.append(line.strip()) > +commits.reverse() > > if self.preserveUser or gitConfigBool("git-p4.skipUserNameCheck"): > self.checkAuthorship = False > @@ -2215,8 +2221,11 @@ class P4Submit(Command, P4UserMap): > sync.branch = self.branch > sync.run([]) > > -rebase = P4Rebase() > -rebase.rebase() > +if self.sha1 == "": if not self.skip_rebase: > +rebase = P4Rebase() > +rebase.rebase() > +else: > +print "You will have to do 'git p4 sync' and rebase." > > else: > if len(applied) == 0: > -- > 2.17.0 > > This would be better with some documentation in git-p4.txt and a test case! Regards! Luke
Re: [PATCH v2 1/4] test-tool: help verifying BUG() code paths
On Wed, May 2, 2018 at 11:38 AM, Johannes Schindelin wrote: > When we call BUG(), we signal via SIGABRT that something bad happened, > dumping cores if so configured. In some setups these coredumps are > redirected to some central place such as /proc/sys/kernel/core_pattern, > which is a good thing. > > However, when we try to verify in our test suite that bugs are caught in > certain code paths, we do *not* want to clutter such a central place > with unnecessary coredumps. > > So let's special-case the test helpers (which we use to verify such code > paths) so that the BUG() calls will *not* call abort() but exit with a > special-purpose exit code instead. > > Helped-by: Nguyễn Thái Ngọc Duy > Signed-off-by: Johannes Schindelin > --- > t/helper/test-tool.c | 2 ++ > usage.c | 5 + > 2 files changed, 7 insertions(+) > > diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c > index 87066ced62a..5176f9f20ae 100644 > --- a/t/helper/test-tool.c > +++ b/t/helper/test-tool.c > @@ -47,7 +47,9 @@ static struct test_cmd cmds[] = { > int cmd_main(int argc, const char **argv) > { > int i; > + extern int BUG_exit_code; > > + BUG_exit_code = 99; It may be even better to let individual tests in t1406 control this, pretty much like your original patch, except that we tell usage.c to not send SIGABRT and just return a special fault code. That way we don't accidentally suppress BUG()'s sigabrt behavior in tests that do not anticipate it (even in t1406). But this patch is ok for me too if you don't want another reroll. > if (argc < 2) > die("I need a test name!"); > > diff --git a/usage.c b/usage.c > index cdd534c9dfc..9c84dccfa97 100644 > --- a/usage.c > +++ b/usage.c > @@ -210,6 +210,9 @@ void warning(const char *warn, ...) > va_end(params); > } > > +/* Only set this, ever, from t/helper/, when verifying that bugs are caught. > */ > +int BUG_exit_code; > + > static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, > va_list params) > { > char prefix[256]; > @@ -221,6 +224,8 @@ static NORETURN void BUG_vfl(const char *file, int line, > const char *fmt, va_lis > snprintf(prefix, sizeof(prefix), "BUG: "); > > vreportf(prefix, fmt, params); > + if (BUG_exit_code) > + exit(BUG_exit_code); > abort(); > } > > -- > 2.17.0.windows.1.36.gdf4ca5fb72a > > -- Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 2, 2018 at 12:54 PM, Ævar Arnfjörð Bjarmason wrote: > Introduce a core.DWIMRemote setting which can be used to designate a > remote to prefer (via core.DWIMRemote=origin) when running e.g. "git > checkout master" to mean origin/master, even though there's other > remotes that have the "master" branch. Do we anticipate more dwimy customizations to justify a new dwim group in config (i.e. dwim.remote instead of core.dwimRemote)? -- Duy
Re: [PATCH 08/41] packfile: abstract away hash constant values
On Wed, May 2, 2018 at 2:11 AM, brian m. carlson wrote: > On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote: >> On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote: >> > There are several instances of the constant 20 and 20-based values in >> > the packfile code. Abstract away dependence on SHA-1 by using the >> > values from the_hash_algo instead. >> >> While we're abstracting away 20. There's the only 20 left in >> sha1_file.c that should also be gone. But I guess you could do that >> later since you need to rename fill_sha1_path to >> fill_loose_object_path or something. > > I'm already working on knocking those out. Yeah I checked out your part14 branch after writing this note :P You still need to rename the function though. I can remind that again when part14 is sent out. >> > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p) >> > " while index indicates %"PRIu32" objects", >> > p->pack_name, ntohl(hdr.hdr_entries), >> > p->num_objects); >> > - if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1) >> > + if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1) >> > return error("end of packfile %s is unavailable", >> > p->pack_name); >> > - read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1)); >> > + read_result = read_in_full(p->pack_fd, hash, hashsz); >> > if (read_result < 0) >> > return error_errno("error reading from %s", p->pack_name); >> > - if (read_result != sizeof(sha1)) >> > + if (read_result != hashsz) >> > return error("packfile %s signature is unavailable", >> > p->pack_name); >> > - idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40; >> > - if (hashcmp(sha1, idx_sha1)) >> > + idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * >> > 2; >> > + if (hashcmp(hash, idx_hash)) >> >> Since the hash size is abstracted away, shouldn't this hashcmp become >> oidcmp? (which still does not do the right thing, but at least it's >> one less place to worry about) > > Unfortunately, I can't, because it's not an object ID. I think the > decision was made to not transform non-object ID hashes into struct > object_id, which makes sense. I suppose we could have an equivalent > struct hash or something for those other uses. I probably miss something, is hashcmp() supposed to stay after the conversion? And will it compare any hash (as configured in the_algo) or will it for SHA-1 only? If hashcmp() will eventually compare the_algo->rawsz then yes this makes sense. >> > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, >> > size_t path_len, int local) >> > p->pack_size = st.st_size; >> > p->pack_local = local; >> > p->mtime = st.st_mtime; >> > - if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1)) >> > + if (path_len < the_hash_algo->hexsz || >> > + get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1)) >> >> get_sha1_hex looks out of place when we start going with >> the_hash_algo. Maybe change to get_oid_hex() too. > > I believe this is the pack hash, which isn't an object ID. I will > transform it to be called something other than "sha1" and allocate more > memory for it in a future series, though. Ah ok, it's the "sha1" in the name that bugged me. I'm all good then. -- Duy
Re: [PATCH v2 00/42] object_id part 13
On Wed, May 02, 2018 at 12:25:28AM +, brian m. carlson wrote: > Changes from v1: > * Add missing sign-off. > * Removed unneeded braces from init_pack_info. > * Express 51 in terms of the_hash_algo->hexsz. > * Fix comments referring to SHA-1. > * Update commit messages as suggested. > * Add and use empty_tree_oid_hex and empty_blob_oid_hex. Interdiff for people who don't have time to read 42 patches yet diff --git a/builtin/merge.c b/builtin/merge.c index 8d75ebe64b..7084bcfdea 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -290,7 +290,7 @@ static void read_empty(const struct object_id *oid, int verbose) args[i++] = "-v"; args[i++] = "-m"; args[i++] = "-u"; - args[i++] = oid_to_hex(the_hash_algo->empty_tree); + args[i++] = empty_tree_oid_hex(); args[i++] = oid_to_hex(oid); args[i] = NULL; diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index c31ceb30c2..dca523f50f 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -968,7 +968,7 @@ static const char *push_to_deploy(unsigned char *sha1, return "Working directory has unstaged changes"; /* diff-index with either HEAD or an empty tree */ - diff_index[4] = head_has_history() ? "HEAD" : oid_to_hex(the_hash_algo->empty_tree); + diff_index[4] = head_has_history() ? "HEAD" : empty_tree_oid_hex(); child_process_init(&child); child.argv = diff_index; diff --git a/cache.h b/cache.h index c5b041019b..71b3c1b15b 100644 --- a/cache.h +++ b/cache.h @@ -1033,6 +1033,9 @@ static inline int is_empty_tree_oid(const struct object_id *oid) return !oidcmp(oid, the_hash_algo->empty_tree); } +const char *empty_tree_oid_hex(void); +const char *empty_blob_oid_hex(void); + /* set default permissions by passing mode arguments to open(2) */ int git_mkstemps_mode(char *pattern, int suffix_len, int mode); int git_mkstemp_mode(char *pattern, int mode); diff --git a/http.c b/http.c index ec70676748..312a5e1833 100644 --- a/http.c +++ b/http.c @@ -2070,7 +2070,7 @@ int http_get_info_packs(const char *base_url, struct packed_git **packs_head) get_sha1_hex(data + i + 6, hash); fetch_and_setup_pack_index(packs_head, hash, base_url); - i += 51; + i += hexsz + 11; break; } default: diff --git a/sequencer.c b/sequencer.c index 12c1e1cdbb..94b6513402 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1480,8 +1480,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit, unborn = get_oid("HEAD", &head); if (unborn) oidcpy(&head, the_hash_algo->empty_tree); - if (index_differs_from(unborn ? - oid_to_hex(the_hash_algo->empty_tree) : "HEAD", + if (index_differs_from(unborn ? empty_tree_oid_hex() : "HEAD", NULL, 0)) return error_dirty_index(opts); } diff --git a/server-info.c b/server-info.c index 828ec5e538..7ce6dcd67b 100644 --- a/server-info.c +++ b/server-info.c @@ -223,11 +223,9 @@ static void init_pack_info(const char *infofile, int force) else stale = 1; - for (i = 0; i < num_pack; i++) { - if (stale) { + for (i = 0; i < num_pack; i++) + if (stale) info[i]->old_num = -1; - } - } /* renumber them */ QSORT(info, num_pack, compare_info); diff --git a/sha1_file.c b/sha1_file.c index 794753bd54..bf6c8da3ff 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -109,6 +109,18 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { }, }; +const char *empty_tree_oid_hex(void) +{ + static char buf[GIT_MAX_HEXSZ + 1]; + return oid_to_hex_r(buf, the_hash_algo->empty_tree); +} + +const char *empty_blob_oid_hex(void) +{ + static char buf[GIT_MAX_HEXSZ + 1]; + return oid_to_hex_r(buf, the_hash_algo->empty_blob); +} + /* * This is meant to hold a *small* number of objects that you would * want read_sha1_file() to be able to return, but yet you do not want diff --git a/submodule.c b/submodule.c index 22a96b7af0..ee7eea4877 100644 --- a/submodule.c +++ b/submodule.c @@ -1567,7 +1567,7 @@ static void submodule_reset_index(const char *path) get_super_prefix_or_empty(), path); argv_array_pushl(&cp.args, "read-tree", "-u", "--reset", NULL); - argv_array_push(&cp.args, oid_to_hex(the_hash_algo->empty_tree)); + argv_array_push(&cp.args, empty_tree_oid_hex()); if (run_command(&cp)) die("could not reset submodule index"); @@ -1659,9 +1659,9 @@ int s
Re: [PATCH v2 1/2] commit: fix --short and --porcelain options
On Tue, May 1, 2018 at 10:50 PM Junio C Hamano wrote: > Samuel Lijin writes: > > Mark the commitable flag in the wt_status object in the call to > > `wt_status_collect()`, instead of in `wt_longstatus_print_updated()`, > > and simplify the logic in the latter function to take advantage of the > > logic shifted to the former. This means that callers do not need to use > > `wt_longstatus_print_updated()` to collect the `commitable` flag; > > calling `wt_status_collect()` is sufficient. > > > > As a result, invoking `git commit` with `--short` or `--porcelain` > > (which imply `--dry-run`, but previously returned an inconsistent error > > code inconsistent with dry run behavior) correctly returns status code > > zero when there is something to commit. This fixes two bugs documented > > in the test suite. > Hmm, I couldn't quite get what the above two paragraphs were trying > to say, but I think I figured out by looking at wt_status.c before > applying this patch, so let me see if I correctly understand what > this patch is about by thinking a bit aloud. > There are only two assignments to s->commitable in wt-status.c; one > happens in wt_longstatus_print_updated(), when the function notices > there is even one record to be shown (i.e. there is an "updated" > path) and the other in show_merge_in_progress() which is called by > wt_longstatus_prpint_state(). The latter codepath happens when we > are in a merge and there is no remaining conflicted paths (the code > allows the contents to be committed to be identical to HEAD). Both > are called from wt_longstatus_print(), which in turn is called by > wt_status_print(). > The implication of the above observation is that we do not set > commitable bit (by the way, shouldn't we spell it with two 'T's?) Yep, MW confirms: https://www.merriam-webster.com/dictionary/commitable I didn't think to check how common "commitable" is in the codebase, but it doesn't seem to be too many, looking at the output of `git grep commitable`, so I'll add that to the patch series when I reroll. > if we are not doing the long format status. The title hints at it > but "fix" is too vague. It would be easier to understand if it > began like this (i.e. state problem clearly first, before outlining > the solution): > [PATCH 1/2] commit: fix exit status under --short/--porcelain options > In wt-status.c, s->commitable bit is set only in the > codepaths reachable from wt_status_print() when output > format is STATUS_FORMAT_LONG as a side effect of printing > bits of status. Consequently, when running with --short and > --porcelain options, the bit is not set to reflect if there > is anything to be committed, and "git commit --short" or > "--porcelain" (both of which imply "--dry-run") failed to > signal if there is anything to commit with its exit status. > Instead, update s->commitable bit in wt_status_collect(), > regardless of the output format. ... > Is that what is going on here? Yours made it sound as if moving the > code to _collect() was done for the sake of moving code around and > simplifying the logic, and bugfix fell out of the move merely as a > side effect, which probably was the source of my confusion. Yep, that's right. I wasn't sure if the imperative tone was required for the whole commit or just the description, hence the awkward structure. (I also wasn't sure how strict the 70 char limit on the description was.) > > +static void wt_status_mark_commitable(struct wt_status *s) { > > + int i; > > + > > + for (i = 0; i < s->change.nr; i++) { > > + struct wt_status_change_data *d = (s->change.items[i]).util; > > + > > + if (d->index_status && d->index_status != DIFF_STATUS_UNMERGED) { > > + s->commitable = 1; > > + return; > > + } > > + } > > +} > I am not sure if this is sufficient. From a cursory look of the > existing code (and vague recollection in my ageing brain ;-), I > think we say it is committable if > (1) when not merging, there is something to show in the "to be > committed" section (i.e. there must be something changed since > HEAD in the index). > (2) when merging, no conflicting paths remain (i.e. change.nr being > zero is fine). > So it is unclear to me how you are dealing with (2) under "--short" > option, which does not call show_merge_in_progress() to catch that > case. And the answer there is that I'm not :) I had hoped that there was a test to catch mistakes like this but evidently not. Thanks for pointing that out, I'll add a test to catch that. I'm also realizing that I didn't const-ify wt_longstatus_print() in the next patch, which is another reason I didn't catch this. This seems a bit trickier to handle. What do you think of this approach: (1) move wt_status_state into wt_status and (2) move the call to wt_status_get_sta
[PATCH v4 0/3] add additional config settings for merge
This version incorporates Elijah's fixup patch and is now based on en/rename-directory-detection in next. Base Ref: en/rename-directory-detection Web-Diff: https://github.com/benpeart/git/commit/16b175c25f Checkout: git fetch https://github.com/benpeart/git merge-renames-v4 && git checkout 16b175c25f ### Patches Ben Peart (3): merge: update documentation for {merge,diff}.renameLimit merge: Add merge.renames config setting merge: pass aggressive when rename detection is turned off Documentation/diff-config.txt | 3 ++- Documentation/merge-config.txt| 8 +- Documentation/merge-strategies.txt| 11 +--- diff.c| 2 +- diff.h| 1 + merge-recursive.c | 31 ++- merge-recursive.h | 8 +- t/t3034-merge-recursive-rename-options.sh | 18 + 8 files changed, 68 insertions(+), 14 deletions(-) base-commit: c5b761fb2711542073cf1906c0e86a34616b79ae -- 2.17.0.windows.1
[PATCH v4 1/3] merge: update documentation for {merge,diff}.renameLimit
Update the documentation to better indicate that the renameLimit setting is ignored if rename detection is turned off via command line options or config settings. Signed-off-by: Ben Peart --- Documentation/diff-config.txt | 3 ++- Documentation/merge-config.txt | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index 5ca942ab5e..77caa66c2f 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -112,7 +112,8 @@ diff.orderFile:: diff.renameLimit:: The number of files to consider when performing the copy/rename - detection; equivalent to the 'git diff' option `-l`. + detection; equivalent to the 'git diff' option `-l`. This setting + has no effect if rename detection is turned off. diff.renames:: Whether and how Git detects renames. If set to "false", diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt index 12b6bbf591..48ee3bce77 100644 --- a/Documentation/merge-config.txt +++ b/Documentation/merge-config.txt @@ -35,7 +35,8 @@ include::fmt-merge-msg-config.txt[] merge.renameLimit:: The number of files to consider when performing rename detection during a merge; if not specified, defaults to the value of - diff.renameLimit. + diff.renameLimit. This setting has no effect if rename detection + is turned off. merge.renormalize:: Tell Git that canonical representation of files in the -- 2.17.0.windows.1
[PATCH v4 2/3] merge: Add merge.renames config setting
Add the ability to control rename detection for merge via a config setting. This setting behaves the same and defaults to the value of diff.renames but only applies to merge. Reviewed-by: Johannes Schindelin Helped-by: Elijah Newren Signed-off-by: Ben Peart --- Documentation/merge-config.txt| 5 Documentation/merge-strategies.txt| 11 ++--- diff.c| 2 +- diff.h| 1 + merge-recursive.c | 30 ++- merge-recursive.h | 8 +- t/t3034-merge-recursive-rename-options.sh | 18 ++ 7 files changed, 63 insertions(+), 12 deletions(-) diff --git a/Documentation/merge-config.txt b/Documentation/merge-config.txt index 48ee3bce77..662c2713ca 100644 --- a/Documentation/merge-config.txt +++ b/Documentation/merge-config.txt @@ -38,6 +38,11 @@ merge.renameLimit:: diff.renameLimit. This setting has no effect if rename detection is turned off. +merge.renames:: + Whether and how Git detects renames. If set to "false", + rename detection is disabled. If set to "true", basic rename + detection is enabled. Defaults to the value of diff.renames. + merge.renormalize:: Tell Git that canonical representation of files in the repository has changed over time (e.g. earlier commits record diff --git a/Documentation/merge-strategies.txt b/Documentation/merge-strategies.txt index fd5d748d1b..30acc99232 100644 --- a/Documentation/merge-strategies.txt +++ b/Documentation/merge-strategies.txt @@ -23,8 +23,9 @@ recursive:: causing mismerges by tests done on actual merge commits taken from Linux 2.6 kernel development history. Additionally this can detect and handle merges involving - renames. This is the default merge strategy when - pulling or merging one branch. + renames, but currently cannot make use of detected + copies. This is the default merge strategy when pulling + or merging one branch. + The 'recursive' strategy can take the following options: @@ -84,12 +85,14 @@ no-renormalize;; `merge.renormalize` configuration variable. no-renames;; - Turn off rename detection. + Turn off rename detection. This overrides the `merge.renames` + configuration variable. See also linkgit:git-diff[1] `--no-renames`. find-renames[=];; Turn on rename detection, optionally setting the similarity - threshold. This is the default. + threshold. This is the default. This overrides the + 'merge.renames' configuration variable. See also linkgit:git-diff[1] `--find-renames`. rename-threshold=;; diff --git a/diff.c b/diff.c index fb22b19f09..e744b35cdc 100644 --- a/diff.c +++ b/diff.c @@ -177,7 +177,7 @@ static int parse_submodule_params(struct diff_options *options, const char *valu return 0; } -static int git_config_rename(const char *var, const char *value) +int git_config_rename(const char *var, const char *value) { if (!value) return DIFF_DETECT_RENAME; diff --git a/diff.h b/diff.h index 7cf276f077..966fc8fce6 100644 --- a/diff.h +++ b/diff.h @@ -321,6 +321,7 @@ extern int git_diff_ui_config(const char *var, const char *value, void *cb); extern void diff_setup(struct diff_options *); extern int diff_opt_parse(struct diff_options *, const char **, int, const char *); extern void diff_setup_done(struct diff_options *); +extern int git_config_rename(const char *var, const char *value); #define DIFF_DETECT_RENAME 1 #define DIFF_DETECT_COPY 2 diff --git a/merge-recursive.c b/merge-recursive.c index 5f42c677d5..372ffbbacc 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1524,7 +1524,15 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *o, diff_setup(&opts); opts.flags.recursive = 1; opts.flags.rename_empty = 0; - opts.detect_rename = DIFF_DETECT_RENAME; + opts.detect_rename = merge_detect_rename(o); + /* +* We do not have logic to handle the detection of copies. In +* fact, it may not even make sense to add such logic: would we +* really want a change to a base file to be propagated through +* multiple other files by a merge? +*/ + if (opts.detect_rename > DIFF_DETECT_RENAME) + opts.detect_rename = DIFF_DETECT_RENAME; opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit : o->diff_rename_limit >= 0 ? o->diff_rename_limit : 1000; @@ -2564,7 +2572,7 @@ static int handle_renames(struct merge_options *o, ri->head_renames = NULL; ri->merge_renames = NULL; - if (!o->detect_rename) + if (!merge_detect_rename(o)) return 1; head_pairs = get_diffpairs(o, common, head); @@
[PATCH v4 3/3] merge: pass aggressive when rename detection is turned off
Set aggressive flag in git_merge_trees() when rename detection is turned off. This allows read_tree() to auto resolve more cases that would have otherwise been handled by the rename detection. Reviewed-by: Johannes Schindelin Signed-off-by: Ben Peart --- merge-recursive.c | 1 + 1 file changed, 1 insertion(+) diff --git a/merge-recursive.c b/merge-recursive.c index 372ffbbacc..cea054cfd4 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -355,6 +355,7 @@ static int git_merge_trees(struct merge_options *o, o->unpack_opts.fn = threeway_merge; o->unpack_opts.src_index = &the_index; o->unpack_opts.dst_index = &the_index; + o->unpack_opts.aggressive = !merge_detect_rename(o); setup_unpack_trees_porcelain(&o->unpack_opts, "merge"); init_tree_desc_from_tree(t+0, common); -- 2.17.0.windows.1
Re: [PATCH v4 03/10] commit-graph: compute generation numbers
Derrick Stolee writes: > On 4/29/2018 5:08 AM, Jakub Narebski wrote: >> Derrick Stolee writes: [...] >> It is a bit strange to me that the code uses get_be32 for reading, but >> htonl for writing. Is Git tested on non little-endian machines, like >> big-endian ppc64 or s390x, or on mixed-endian machines (or >> selectable-endian machines with data endianness set to non >> little-endian, like ia64)? If not, could we use for example openSUSE >> Build Service (https://build.opensuse.org/) for this? > > Since we are packing two values into 64 bits, I am using htonl() here > to arrange the 30-bit generation number alongside the 34-bit commit > date value, then writing with hashwrite(). The other 32-bit integers > are written with hashwrite_be32() to avoid translating this data > in-memory. O.K., so you are using what is more effective and easier to use. Nice to know, thanks for the information. [...] >>> +static void compute_generation_numbers(struct commit** commits, >>> + int nr_commits) >>> +{ [...] >>> + for (i = 0; i < nr_commits; i++) { >>> + if (commits[i]->generation != GENERATION_NUMBER_INFINITY && >>> + commits[i]->generation != GENERATION_NUMBER_ZERO) >>> + continue; [...] >>> + compute_generation_numbers(commits.list, commits.nr); >>> + >> Nice and simple. All right. >> >> I guess that we do not pass "struct packed_commit_list commits" as >> argument to compute_generation_numbers instead of "struct commit** >> commits.list" and "int commits.nr" to compute_generation_numbers() to >> keep the latter nice and generic? > > Good catch. There is no reason to not use packed_commit_list here. Actually, now that v5 shows how using packed_commit_list looks like, in my opinion it looks uglier. And it might be easier to make mistake. Also, depending on how compiler is able to optimize it, the version passing packed_commit_list as an argument has one more indirection (following two pointers) in the loop. Best, -- Jakub Narębski
Gute Nachrichten
Ich bin Sgt.Monica Lin Brown, ursprünglich aus Lake Jackson Texas USA. Ich habe persönlich eine spezielle Recherche im Internet durchgeführt und bin auf Ihre Informationen gestoßen. Ich sende Ihnen diese E-Mail von der US-Militärbasis Kabul Afghanistan. Ich habe einen gesicherten Geschäftsvorschlag für Sie. Fragen Sie mich, wenn Sie Interesse an weiteren Informationen haben
Re: [PATCH 00/13] object store: alloc
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller wrote: > I also debated if it is worth converting alloc.c via this patch series > or if it might make more sense to use the new mem-pool by Jameson[1]. > > I vaguely wonder about the performance impact, as the object allocation > code seemed to be relevant in the past. If I remember correctly, alloc.c was added because malloc() has too high overhead per allocation (and we create like millions of them). As long as you keep allocation overhead low, it should be ok. Note that we allocate a lot more than the mem-pool's main target (cache entries if I remember correctly). We may have a couple thousands cache entries. We already deal with a couple million of struct object. -- Duy
Re: [PATCH 01/13] repository: introduce object parser field
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller wrote: > /* > -* Holds any information related to accessing the raw object content. > +* Holds any information needed to retrieve the raw content > +* of objects. The object_parser uses this to get object > +* content which it then parses. > */ > struct raw_object_store *objects; > > + /* > +* State for the object parser. This owns all parsed objects > +* (struct object) so callers do not have to manage their > +* lifetime. > +*/ > + struct object_parser *parsed_objects; I like this name 'parsed_objects'. Should we rename the struct after it (e.g. parsed_object_store as opposed to raw_object_store above)? Another suggestion is object_pool, if we keep 'struct object' instead of 'struct parsed_object' and also want to keep current allocation behavior: no individual deallocation. If you free, you free the whole pool (e.g. you could run rev-list --all --objects in a separate pool, once you're done, you delete the whole pool). -- Duy
Re: [PATCH v4 0/3] add additional config settings for merge
Hi Ben, Thanks for your persistence. On Wed, May 2, 2018 at 9:01 AM, Ben Peart wrote: > This version incorporates Elijah's fixup patch and is now based on > en/rename-directory-detection in next. en/rename-directory-detection was in master but reverted. en/rename-directory-detection-reboot is the new series, and isn't quite in next yet; Junio just said in the last "What's cooking" that he intends to merge it down. However, the two series are obviously similar, so your rebasing even onto the old one does cut down on the conflicts considerably. Rebasing this latest series onto en/rename-directory-detection-reboot leaves just one minor conflict. Anyway, I've looked over this latest re-roll and it looks good to me: Reviewed-by: Elijah Newren
Re: [PATCH 01/13] repository: introduce object parser field
On Wed, May 2, 2018 at 10:17 AM, Duy Nguyen wrote: > On Tue, May 1, 2018 at 11:33 PM, Stefan Beller wrote: >> /* >> -* Holds any information related to accessing the raw object content. >> +* Holds any information needed to retrieve the raw content >> +* of objects. The object_parser uses this to get object >> +* content which it then parses. >> */ >> struct raw_object_store *objects; >> >> + /* >> +* State for the object parser. This owns all parsed objects >> +* (struct object) so callers do not have to manage their >> +* lifetime. >> +*/ >> + struct object_parser *parsed_objects; > > I like this name 'parsed_objects'. Should we rename the struct after > it (e.g. parsed_object_store as opposed to raw_object_store above)? I can rename it to parsed_object_store for consistency. > Another suggestion is object_pool, if we keep 'struct object' instead > of 'struct parsed_object' and also want to keep current allocation > behavior: no individual deallocation. If you free, you free the whole > pool (e.g. you could run rev-list --all --objects in a separate pool, > once you're done, you delete the whole pool). That is what the following patches will be about, you can only free the whole set of parsed objects. So if you want to do a separate rev walk, you may need to instantiate a new repository for it (ideally you'd only need a separate parsed object store). I'd want to have the ability to have separate pools for submodules, such that they can be free'd on a per-repo basis. > -- > Duy
Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
On Tue, May 1, 2018 at 11:34 PM, Stefan Beller wrote: > #include "cache.h" > #include "object.h" > @@ -30,8 +31,25 @@ struct alloc_state { > int count; /* total number of nodes allocated */ > int nr;/* number of nodes left in current allocation */ > void *p; /* first free node in current allocation */ > + > + /* bookkeeping of allocations */ > + void **slabs; Another way to manage this is linked list: you could reserve one "object" in each slab to store the "next" (or "prev") pointer to another slab, then you can just walk through all slabs and free. It's a bit cheaper than reallocating slabs[], but I guess we reallocate so few times that readability matters more (whichever way is chosen). > + int slab_nr, slab_alloc; > }; > > +void *allocate_alloc_state(void) > +{ > + return xcalloc(1, sizeof(struct alloc_state)); > +} > + > +void clear_alloc_state(struct alloc_state *s) > +{ > + while (s->slab_nr > 0) { > + s->slab_nr--; > + free(s->slabs[s->slab_nr]); I think you're leaking memory here. Commit and tree objects may have more allocations in them (especially trees, but I think we have commit_list in struct commit too). Those need to be freed as well. > + } > +} > + > static inline void *alloc_node(struct alloc_state *s, size_t node_size) > { > void *ret; -- Duy
Re: [PATCH 01/13] repository: introduce object parser field
On Wed, May 2, 2018 at 7:26 PM, Stefan Beller wrote: >> Another suggestion is object_pool, if we keep 'struct object' instead >> of 'struct parsed_object' and also want to keep current allocation >> behavior: no individual deallocation. If you free, you free the whole >> pool (e.g. you could run rev-list --all --objects in a separate pool, >> once you're done, you delete the whole pool). > > That is what the following patches will be about, you can > only free the whole set of parsed objects. > > So if you want to do a separate rev walk, you may need to > instantiate a new repository for it (ideally you'd only need a > separate parsed object store). I'm not sure if it's a good idea to create a separate struct repository just because you want to free this parsed object store. What if updates are made in both repositories? All the cache (well, mostly the delta base cache in sha1_file.c) will double memory usage as well. Yeah not ideal. But I guess making rev-list related code use a separate parsed object store is no small task (and kinda risky as well since we migrate from global lookup_* functions to local ones and need to choose the correct parsed object store to look up from) For your information, there is already a good use case for this wholesale memory free: if we can free the rev-list related memory early in pack-objects (e.g. part of repack operation) then it could lower memory pressure significantly when running on large repos. This has been discussed a bit lately. -- Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 2, 2018 at 11:21 AM, Duy Nguyen wrote: > On Wed, May 2, 2018 at 12:54 PM, Ævar Arnfjörð Bjarmason > wrote: >> Introduce a core.DWIMRemote setting which can be used to designate a >> remote to prefer (via core.DWIMRemote=origin) when running e.g. "git >> checkout master" to mean origin/master, even though there's other >> remotes that have the "master" branch. > > Do we anticipate more dwimy customizations to justify a new dwim group > in config (i.e. dwim.remote instead of core.dwimRemote)? A few observations: 1. DWIM has broad meaning; while this certainly falls within DWIM, it's also just a _default_, so should this instead be named "defaultRemote"? 2. Building on #1: How well is the term "DWIM" understood by non-power users? A term, such as "default" is more well known. 3. git-worktree learned --guess-remote and worktree.guessRemote in [1] and [2], which, on the surface, sound confusingly similar to "DWIMRemote". Even though I was well involved in the discussion and repeatedly reviewed the patch series which introduced those, it still took me an embarrassingly long time, and repeated re-reads of all commit messages involved, to grasp (or re-grasp) how "guess remote" and "DWIM remote" differ. If it was that difficult for me, as a person involved in the patch series, to figure out "guess remote" vs. "DWIM remote", then I worry that it may be too confusing in general. If the new config option had been named "defaultRemote", I don't think I'd have been confused at all. It may sound as if I'm advocating the name "defaultRemote", but that wasn't the intention, and I'm not wedded to it; that name simply fell out naturally as I enumerated the above observations. [1]: 71d6682d8c (worktree: add --guess-remote flag to add subcommand, 2017-11-29) [2]: e92445a731 (add worktree.guessRemote config option, 2017-11-29)
RE: [PATCH 00/13] object store: alloc
> -Original Message- > From: Duy Nguyen > Sent: Wednesday, May 2, 2018 1:02 PM > To: Stefan Beller > Cc: Git Mailing List ; Jameson Miller > > Subject: Re: [PATCH 00/13] object store: alloc > > On Tue, May 1, 2018 at 11:33 PM, Stefan Beller wrote: > > I also debated if it is worth converting alloc.c via this patch series > > or if it might make more sense to use the new mem-pool by Jameson[1]. > > > > I vaguely wonder about the performance impact, as the object > > allocation code seemed to be relevant in the past. > > If I remember correctly, alloc.c was added because malloc() has too high > overhead per allocation (and we create like millions of them). As long as you > keep allocation overhead low, it should be ok. Note that we allocate a lot > more > than the mem-pool's main target (cache entries if I remember correctly). We > may have a couple thousands cache entries. We already deal with a couple > million of struct object. The work to move cache entry allocation onto a memory pool was motivated by the fact that we are looking at indexes with millions of entries. If there is scaling concern with the current version of mem-pool, we would like to address it there as well. Or if there is improvements that can be shared, that would be nice as well. > -- > Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 2, 2018 at 8:00 PM, Eric Sunshine wrote: > 2. Building on #1: How well is the term "DWIM" understood by non-power > users? A term, such as "default" is more well known. I'm going off topic but I kinda dislike this term. First time I encountered it in the code I didn't even know what it meant. Since it has not been leaked to user documents (I only did a grep on Documentation/) perhaps a better term should be used, preferably not another acronym. -- Duy
Re: [PATCH 00/13] object store: alloc
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller wrote: > > >> -Original Message- >> From: Duy Nguyen >> Sent: Wednesday, May 2, 2018 1:02 PM >> To: Stefan Beller >> Cc: Git Mailing List ; Jameson Miller >> >> Subject: Re: [PATCH 00/13] object store: alloc >> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller wrote: >> > I also debated if it is worth converting alloc.c via this patch series >> > or if it might make more sense to use the new mem-pool by Jameson[1]. >> > >> > I vaguely wonder about the performance impact, as the object >> > allocation code seemed to be relevant in the past. >> >> If I remember correctly, alloc.c was added because malloc() has too high >> overhead per allocation (and we create like millions of them). As long as you >> keep allocation overhead low, it should be ok. Note that we allocate a lot >> more >> than the mem-pool's main target (cache entries if I remember correctly). We >> may have a couple thousands cache entries. We already deal with a couple >> million of struct object. > > The work to move cache entry allocation onto a memory pool was motivated by > the fact that we are looking at indexes with millions of entries. If there is > scaling > concern with the current version of mem-pool, we would like to address it > there > as well. Or if there is improvements that can be shared, that would be nice > as well. I think the two have quite different characteristics. alloc.c code is driven by overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, and each malloc has 16 bytes overhead or so if I remember correctly. struct cache_entry at minimum in 88 bytes so relative overhead is not that a big deal (but sure reducing it is still very nice). mem-pool is about allocation speed, but I think that's not a concern for alloc.c because when we do full rev walk, I think I/O is always the bottleneck (maybe object lookup as well). I don't see a good way to have the one memory allocator that satisfyies both to be honest. If you could allocate fixed-size cache entries most of the time (e.g. larger entries will be allocated using malloc or something, and should be a small number), then perhaps we can unify. Or if mem-pool can have an option to allocated fixed size pieces with no overhead, that would be great (sorry I still have not read that mem-pool series yet) -- Duy
Re: [PATCH] checkout & worktree: introduce a core.DWIMRemote setting
On Wed, May 02 2018, Eric Sunshine wrote: > On Wed, May 2, 2018 at 11:21 AM, Duy Nguyen wrote: >> On Wed, May 2, 2018 at 12:54 PM, Ævar Arnfjörð Bjarmason >> wrote: >>> Introduce a core.DWIMRemote setting which can be used to designate a >>> remote to prefer (via core.DWIMRemote=origin) when running e.g. "git >>> checkout master" to mean origin/master, even though there's other >>> remotes that have the "master" branch. >> >> Do we anticipate more dwimy customizations to justify a new dwim group >> in config (i.e. dwim.remote instead of core.dwimRemote)? > > A few observations: > > 1. DWIM has broad meaning; while this certainly falls within DWIM, > it's also just a _default_, so should this instead be named > "defaultRemote"? > > 2. Building on #1: How well is the term "DWIM" understood by non-power > users? A term, such as "default" is more well known. I've got no love for the DWIM term. And I think I should change it in v2, I just want some way to enable this functionality since this behavior has annoyed me for a long time. I wonder though if something like "core.defaultRemote" might not bring up connotations about whether this would e.g. affect "push" in the minds of users (not that my initial suggestion is better). So maybe something like checkout.implicitRemote would be better? And we could just break the rule that only git-checkout would use it, since git-worktree could be said to be doing something checkout-like, or just also add a worktree.implicitRemote. > 3. git-worktree learned --guess-remote and worktree.guessRemote in [1] > and [2], which, on the surface, sound confusingly similar to > "DWIMRemote". Even though I was well involved in the discussion and > repeatedly reviewed the patch series which introduced those, it still > took me an embarrassingly long time, and repeated re-reads of all > commit messages involved, to grasp (or re-grasp) how "guess remote" > and "DWIM remote" differ. If it was that difficult for me, as a person > involved in the patch series, to figure out "guess remote" vs. "DWIM > remote", then I worry that it may be too confusing in general. If the > new config option had been named "defaultRemote", I don't think I'd > have been confused at all. I hadn't looked at this at all before today when I wrote the patch, and I've never used git-worktree (but maybe I should...), but my understanding of this --[no-]guess-remote functionality is that it effectively splits up the functionality that the "git checkout" does, and we'll unconditionally check out the branch, but the option controls whether or not we'd set up the equivalent of remote tracking for git-worktree. But maybe I've completely misunderstood it. > It may sound as if I'm advocating the name "defaultRemote", but that > wasn't the intention, and I'm not wedded to it; that name simply fell > out naturally as I enumerated the above observations. > > [1]: 71d6682d8c (worktree: add --guess-remote flag to add subcommand, > 2017-11-29) > [2]: e92445a731 (add worktree.guessRemote config option, 2017-11-29)
RE: [PATCH 00/13] object store: alloc
> -Original Message- > From: Duy Nguyen > Sent: Wednesday, May 2, 2018 2:23 PM > To: Jameson Miller > Cc: Stefan Beller ; Git Mailing List > Subject: Re: [PATCH 00/13] object store: alloc > > On Wed, May 2, 2018 at 8:07 PM, Jameson Miller > wrote: > > > > > >> -Original Message- > >> From: Duy Nguyen > >> Sent: Wednesday, May 2, 2018 1:02 PM > >> To: Stefan Beller > >> Cc: Git Mailing List ; Jameson Miller > >> > >> Subject: Re: [PATCH 00/13] object store: alloc > >> > >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller > wrote: > >> > I also debated if it is worth converting alloc.c via this patch > >> > series or if it might make more sense to use the new mem-pool by > Jameson[1]. > >> > > >> > I vaguely wonder about the performance impact, as the object > >> > allocation code seemed to be relevant in the past. > >> > >> If I remember correctly, alloc.c was added because malloc() has too > >> high overhead per allocation (and we create like millions of them). > >> As long as you keep allocation overhead low, it should be ok. Note > >> that we allocate a lot more than the mem-pool's main target (cache > >> entries if I remember correctly). We may have a couple thousands > >> cache entries. We already deal with a couple million of struct object. > > > > The work to move cache entry allocation onto a memory pool was > > motivated by the fact that we are looking at indexes with millions of > > entries. If there is scaling concern with the current version of > > mem-pool, we would like to address it there as well. Or if there is > improvements that can be shared, that would be nice as well. > > I think the two have quite different characteristics. alloc.c code is driven > by > overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, > and > each malloc has 16 bytes overhead or so if I remember correctly. struct > cache_entry at minimum in 88 bytes so relative overhead is not that a big deal > (but sure reducing it is still very nice). > > mem-pool is about allocation speed, but I think that's not a concern for > alloc.c > because when we do full rev walk, I think I/O is always the bottleneck (maybe > object lookup as well). I don't see a good way to have the one memory > allocator > that satisfyies both to be honest. If you could allocate fixed-size cache > entries > most of the time (e.g. > larger entries will be allocated using malloc or something, and should be a > small > number), then perhaps we can unify. Or if mem-pool can have an option to > allocated fixed size pieces with no overhead, that would be great (sorry I > still > have not read that mem-pool series yet) > -- > Duy Thank you for the extra details - the extra context was helpful - especially the motivations for each of the areas. I agree with your general analysis, but with the extra point that the memory pool does allocate memory (variable sized) without any overhead, except for possible alignment considerations and differences in bookkeeping the larger "blocks" of memory from which small allocations are made from - but I don't think this would be enough to have a meaningful overall impact. The mem-pool only tracks the pointer to the next available bit of memory, and the end of the available memory range. It has a similar constraint in that individual allocations cannot be freed - you have to free the whole block. It may be that the requirements are different enough (or the gains worth it) to have another dedicated pooling allocator, but I think the current design of the memory pool would satisfy both consumers, even if the memory considerations are a bigger motivation for blob structs. I would be interested in your thoughts if you get the opportunity to read the mem-pool series.
Re: [PATCH] git-p4 - Add option --sha1 to submit in p4
On 2 May 2018 at 16:51, Merland Romain wrote: > Thanks Luke, > > Following your comments, I will: > - change the option name to --commit if it suits you Seems like a good name. > - add an option --force-rebase which defaults to false. Setting it to true > will rebase even with --commit option used. Or "--disable-rebase" ? I think there's no reason you couldn't rebase after doing a commit like this is there? And "--disable-rebase" would be useful generally, not just with the --commit option, but also with the regular options. "--force-rebase" is a bit confusing since for the normal flow, it always rebases anyway, and there's no need to force! But you're the one who is using this in anger, so your opinion likely counts for more than mine! > it can be useful too if some commits have been skipped and user wants to > rebase anyway > - add an entry in the documentation > - (learn how to create a test and) add tests for these specific cases To create a test just look in t/ for the t98*.sh files. Cut-n-paste one or add to an existing one. Send an email here if you get stuck (but it's pretty straightforward). You can run only the git-p4 tests with: $ (cd t && make T=t98* -j) > > What is the best practice ? to send a new email with the new diff ? to > continue this discussion ? I think either, probably a new email is easiest. See Documentation/SubmittingPatches for the general process. Regards! Luke
Re: [PATCH 01/13] repository: introduce object parser field
On Tue, 1 May 2018 14:33:51 -0700 Stefan Beller wrote: > Git's object access code can be thought of as containing two layers: > the raw object store provides access to raw object content, while the > higher level obj_hash code parses raw objects and keeps track of > parenthood and other object relationships using 'struct object'. > Keeping these layers separate should make it easier to find relevant > functions and to change the implementation of one without having to > touch the other. I only understood this after reading the code below. I think this paragraph can be removed; I don't see its relevance - of course we need to store metadata about how to load objects somewhere, and caching objects we have already loaded is a good idea: and the metadata and cache are two separate things before and after this patch anyway. > Add an object_parser field to 'struct repository' to prepare obj_hash > to be handled per repository. Callers still only use the_repository > for now --- later patches will adapt them to handle arbitrary > repositories. I think this is better reworded as: Convert the existing global cache for parsed objects (obj_hash) into repository-specific parsed object caches. Existing code that uses obj_hash are modified to use the parsed object cache of the_repository; future patches will use the parsed object caches of other repositories. > +struct object_parser *object_parser_new(void) > +{ > + struct object_parser *o = xmalloc(sizeof(*o)); > + memset(o, 0, sizeof(*o)); > + return o; > +} I'm not sure that I like this style of code (I prefer the strbuf style with _INIT and release(), which I think is more flexible) but I don't feel too strongly about it. > +struct object_parser { > + struct object **obj_hash; > + int nr_objs, obj_hash_size; > +}; object_parser is probably a bad name. I think Duy had some good suggestions in [1]. [1] https://public-inbox.org/git/CACsJy8CgX6BME=ehidbfmrzboydbnhe6ouxv4fzc-gw7rsi...@mail.gmail.com/ > /* > - * Holds any information related to accessing the raw object content. > + * Holds any information needed to retrieve the raw content > + * of objects. The object_parser uses this to get object > + * content which it then parses. >*/ > struct raw_object_store *objects; I think the additional sentence ("The object_parser uses this...") is unnecessary and confusing, especially if its identity is going to be one of storage (like "parsed_objects" implies). > + /* > + * State for the object parser. This owns all parsed objects > + * (struct object) so callers do not have to manage their > + * lifetime. > + */ > + struct object_parser *parsed_objects; Even after all the commits in this patch set, this does not store any state for parsing. Maybe just document as: All objects in this repository that have been parsed. This structure owns all objects it references, so users of "struct object *" generally do not need to free them; instead, when a repository is no longer used, call object_parser_clear() on this structure. (And maybe say that the freeing method on struct repository will automatically call object_parser_clear().)
Re: [PATCH 04/13] alloc: add repository argument to alloc_blob_node
On Tue, 1 May 2018 14:33:54 -0700 Stefan Beller wrote: > Signed-off-by: Stefan Beller Add the same boilerplate explanation to this and subsequent commits. If editing it for every new function name is cumbersome, maybe use this shortened version: This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. Use a macro to catch callers passing a repository other than the_repository at compile time.
Re: [PATCH 12/13] object: allow create_object to handle arbitrary repositories
On Tue, 1 May 2018 14:34:02 -0700 Stefan Beller wrote: > Signed-off-by: Jonathan Nieder > Signed-off-by: Stefan Beller Reviewed-by: Jonathan Tan Downloading this patch set and viewing the whole function modified in this patch shows that globals are no longer referenced, so this is good. Same comment for patch 11.
Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
On Tue, 1 May 2018 14:34:03 -0700 Stefan Beller wrote: > +void *allocate_alloc_state(void) > +{ > + return xcalloc(1, sizeof(struct alloc_state)); > +} > + > +void clear_alloc_state(struct alloc_state *s) > +{ > + while (s->slab_nr > 0) { > + s->slab_nr--; > + free(s->slabs[s->slab_nr]); > + } > +} These functions are asymmetrical. I understand why it is this way (because we use pointers, and we want to use FREE_AND_NULL), but would still prefer _INIT and _release(). > static inline void *alloc_node(struct alloc_state *s, size_t node_size) > { > void *ret; > @@ -45,54 +63,56 @@ static inline void *alloc_node(struct alloc_state *s, > size_t node_size) > ret = s->p; > s->p = (char *)s->p + node_size; > memset(ret, 0, node_size); > + > + ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc); > + s->slabs[s->slab_nr++] = ret; > + > return ret; > } This unconditionally grows the slabs for each node allocation. Shouldn't more than one node fit in each slab? > +extern struct alloc_state the_repository_blob_state; > +extern struct alloc_state the_repository_tree_state; > +extern struct alloc_state the_repository_commit_state; > +extern struct alloc_state the_repository_tag_state; > +extern struct alloc_state the_repository_object_state; (Context: these were in alloc.h) These seem to be used only in object.c, and only in one function - could we declare them "static" inside that function instead? > -struct object_parser *object_parser_new(void) > +struct object_parser *object_parser_new(int is_the_repo) > { > struct object_parser *o = xmalloc(sizeof(*o)); > memset(o, 0, sizeof(*o)); > + > + if (is_the_repo) { > + o->blob_state = &the_repository_blob_state; > + o->tree_state = &the_repository_tree_state; > + o->commit_state = &the_repository_commit_state; > + o->tag_state = &the_repository_tag_state; > + o->object_state = &the_repository_object_state; > + } else { > + o->blob_state = allocate_alloc_state(); > + o->tree_state = allocate_alloc_state(); > + o->commit_state = allocate_alloc_state(); > + o->tag_state = allocate_alloc_state(); > + o->object_state = allocate_alloc_state(); > + } > return o; > } I don't think saving 5 allocations is worth the code complexity (of the additional parameter).
Re: [PATCH 08/41] packfile: abstract away hash constant values
On Wed, May 02, 2018 at 05:26:25PM +0200, Duy Nguyen wrote: > On Wed, May 2, 2018 at 2:11 AM, brian m. carlson > wrote: > > On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote: > >> While we're abstracting away 20. There's the only 20 left in > >> sha1_file.c that should also be gone. But I guess you could do that > >> later since you need to rename fill_sha1_path to > >> fill_loose_object_path or something. > > > > I'm already working on knocking those out. > > Yeah I checked out your part14 branch after writing this note :P You > still need to rename the function though. I can remind that again when > part14 is sent out. I've made a note in my project notes. > > Unfortunately, I can't, because it's not an object ID. I think the > > decision was made to not transform non-object ID hashes into struct > > object_id, which makes sense. I suppose we could have an equivalent > > struct hash or something for those other uses. > > I probably miss something, is hashcmp() supposed to stay after the > conversion? And will it compare any hash (as configured in the_algo) > or will it for SHA-1 only? Yes, it will stick around for the handful of places where we have hashes like pack checksums. > If hashcmp() will eventually compare the_algo->rawsz then yes this makes > sense. That's my intention, yes. > > I believe this is the pack hash, which isn't an object ID. I will > > transform it to be called something other than "sha1" and allocate more > > memory for it in a future series, though. > > Ah ok, it's the "sha1" in the name that bugged me. I'm all good then. Also noted in my project notes. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: What's cooking in git.git (Apr 2018, #04; Mon, 30)
On 04/30, Junio C Hamano wrote: > * bw/protocol-v2 (2018-03-15) 35 commits > (merged to 'next' on 2018-04-11 at 23ee234a2c) > + remote-curl: don't request v2 when pushing > + remote-curl: implement stateless-connect command > + http: eliminate "# service" line when using protocol v2 > + http: don't always add Git-Protocol header > + http: allow providing extra headers for http requests > + remote-curl: store the protocol version the server responded with > + remote-curl: create copy of the service name > + pkt-line: add packet_buf_write_len function > + transport-helper: introduce stateless-connect > + transport-helper: refactor process_connect_service > + transport-helper: remove name parameter > + connect: don't request v2 when pushing > + connect: refactor git_connect to only get the protocol version once > + fetch-pack: support shallow requests > + fetch-pack: perform a fetch using v2 > + upload-pack: introduce fetch server command > + push: pass ref prefixes when pushing > + fetch: pass ref prefixes when fetching > + ls-remote: pass ref prefixes when requesting a remote's refs > + transport: convert transport_get_remote_refs to take a list of ref prefixes > + transport: convert get_refs_list to take a list of ref prefixes > + connect: request remote refs using v2 > + ls-refs: introduce ls-refs server command > + serve: introduce git-serve > + test-pkt-line: introduce a packet-line test helper > + protocol: introduce enum protocol_version value protocol_v2 > + transport: store protocol version > + connect: discover protocol version outside of get_remote_heads > + connect: convert get_remote_heads to use struct packet_reader > + transport: use get_refs_via_connect to get refs > + upload-pack: factor out processing lines > + upload-pack: convert to a builtin > + pkt-line: add delim packet support > + pkt-line: allow peeking a packet line without consuming it > + pkt-line: introduce packet_read_with_status > (this branch is used by bw/server-options.) > > The beginning of the next-gen transfer protocol. > > Will merge to 'master'. Awesome, I see you're already planning on merging this! I was just going to drop by and saw that internal testing showed that this *should* be ready to be merged to master :) Thanks! -- Brandon Williams
Re: [PATCH v2 00/42] object_id part 13
On Wed, May 02, 2018 at 05:32:24PM +0200, Duy Nguyen wrote: > Interdiff for people who don't have time to read 42 patches yet Thanks for this. I had intended to include tbdiff output in my series but had forgotten when I reformatted the patches. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Blame / annotate with mixed mac line endings?
I think this file may have been from a mac at one point. Note the lone CR (0x0d) line endings, with CRLFs later. $ cat src/htdocs/hr/ats/rpts/incl/slt_org.cfm | sed 's/[a-zA-Z]/x/g;s/[0-9]/8/g' | hexdump -C 3c 21 2d 2d 2d 0d 78 78 78 78 78 78 78 78 3a 20 |.- | 00c0 78 78 20 78 78 78 20 78 78 20 78 78 78 78 2d 3c |xx xxx xx -<| 00d0 2f 78 78 78 78 78 78 3e 0d 0a 09 3c 78 78 78 78 |/xx>..#xxx_xxx#,xxx">xxx.| 0190 0d 0a 09 3c 78 78 78 78 78 78 20 78 78 78 78 78 |...#xx| 01b0 78 5f 78 78 78 23 3c 2f 78 78 78 78 78 78 3e 0d |x_xxx#.| 01c0 0a 3c 2f 78 78 78 78 78 78 78 78 3e 0d 0a 3c 2f |...| 01db $ git annotate -L 13,17 --line-porcelain 'src/htdocs/hr/ats/rpts/incl/slt_org.cfm' fatal: file src/htdocs/hr/ats/rpts/incl/slt_org.cfm has only 10 lines $ cat src/htdocs/hr/ats/rpts/incl/slt_org.cfm | wc -l 10 $ cat src/htdocs/hr/ats/rpts/incl/slt_org.cfm | mac2unix | wc -l 18 $ git -c core.autocrlf=false -c core.eol=cr annotate -L 13,17 --line-porcelain 'src/htdocs/hr/ats/rpts/incl/slt_org.cfm' fatal: file src/htdocs/hr/ats/rpts/incl/slt_org.cfm has only 10 lines Any way to hit git with a stick to treat lone CR as a line break for blame/annotate? -Jason -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100- - +1 (443) 269-1555 x333Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: [GSoC] Info: new blog series of my work on Git GSoC '18
On 2 May 2018 at 17:12, Johannes Schindelin wrote: > Hi Pratik, > > On Wed, 2 May 2018, Pratik Karki wrote: > >> As promised in my proposal, I've started >> to write a blog series of GSoC '18 with Git. The initial blog is up. >> You can find it here[1]. The initial one is just to get started and >> from next iterations, I'll start detailing of my work towards converting >> rebase to builtin. >> >> [1]: https://prertik.github.io/categories/git/ > > This is awesome! Thanks for doing this, > Dscho Agreed, was fun to read. I'd encourage you to post to the list when you blog, or perhaps include a link to the blog as part of any regular updates you give on your project progress. Would also make for an interesting addition to the newsletter. I know it can be difficult to sit down and write about what you're doing, especially when it feels like you could be focusing on 'real work'. Hopefully you will find the process rewarding; I'm looking forward to reading about what you find easy and hard, how you learn the git developer processes, and the challenges you find in converting shell scripts to a built-in. I'm sure other people are too, and I'll bet the ones who have been there before will have feedback for you as well. I'd find it interesting even if it was a 5-line bullet list of what's going through your mind with respect to the project! Looking forward to following along. Regards, Andrew Ardill
[PATCH 1/5] submodule foreach: correct '$path' in nested submodules from a subdirectory
From: Prathamesh Chavan When running 'git submodule foreach --recursive' from a subdirectory of your repository, nested submodules get a bogus value for $path: For a submodule 'sub' that contains a nested submodule 'nested', running 'git -C dir submodule foreach echo $path' from the root of the superproject would report path='../nested' for the nested submodule. The first part '../' is derived from the logic computing the relative path from $pwd to the root of the superproject. The second part is the submodule path inside the submodule. This value is of little use and is hard to document. There are three different possible solutions that have more value: (a) The path value is documented as the path from the toplevel of the superproject to the mount point of the submodule. If 'the' refers to the superproject holding this submodule ('sub' holding 'nested'), the path would be expected to be path='nested'. (b) In case 'the' superproject is referring to the toplevel, which is the superproject in which the original command was invoked, then path is expected to be path='sub/nested'. (c) The documentation explains $path as [...] "relative to the superproject", following 091a6eb0fe (submodule: drop the top-level requirement, 2013-06-16), such that the nested submodule would be expected as path='../sub/nested', when "the" superproject is the superproject, where the command was run from (d) or the value of path='nested' is expected if we take the intermediate superproject into account. [This is the same as (a); it highlights that the documentation is not clear, but technically correct if we were to revert 091a6eb0fe.] The behavior for (c) was introduced in 091a6eb0fe (submodule: drop the top-level requirement, 2013-06-16) the intent for $path seemed to be relative to $cwd to the submodule worktree, but that did not work for nested submodules, as the intermittent submodules were not included in the path. If we were to fix the meaning of the $path using (a), (d) such that "path" is "the path from the intermediate superproject to the mount point of the submodule", we would break any existing submodule user that runs foreach from non-root of the superproject as the non-nested submodule '../sub' would change its path to 'sub'. If we were to fix the meaning of $path using (b) such that "path" is "the path from the topmost superproject to the mount point of the submodule", then we would break any user that uses nested submodules (even from the root directory) as the 'nested' would become 'sub/nested'. If we were to fix the meaning of $path using (c) such that "path" is "the display path from where the original command was invoked to the submodule", then we would break users that rely on the assumption that "$toplevel / $path" is the absolute path of the submodule. All groups can be found in the wild. The author has no data if one group outweighs the other by large margin, and offending each one seems equally bad at first. However in the authors imagination it is better to go with (a) as running from a sub directory sounds like it is carried out by a human rather than by some automation task. With a human on the keyboard the feedback loop is short and the changed behavior can be adapted to quickly unlike some automation that can break silently. Another argument in favor of (a) is the consistency of the variables provided, "$toplevel / $path" gives the absolute path of the submodule, with 'toplevel' (both the variable as well as the documentation) referring to the immediate superproject of the submodule. Documentation of the variable is adjusted in a follow-up patch. Discussed-with: Ramsay Jones Signed-off-by: Prathamesh Chavan Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano Signed-off-by: Stefan Beller --- git-submodule.sh | 1 - t/t7407-submodule-foreach.sh | 36 ++-- 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/git-submodule.sh b/git-submodule.sh index 24914963ca2..331d71c908b 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -345,7 +345,6 @@ cmd_foreach() prefix="$prefix$sm_path/" sanitize_submodule_env cd "$sm_path" && - sm_path=$(git submodule--helper relative-path "$sm_path" "$wt_prefix") && # we make $path available to scripts ... path=$sm_path && if test $# -eq 1 diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh index 6ba5daf42ee..5144cc6926b 100755 --- a/t/t7407-submodule-foreach.sh +++ b/t/t7407-submodule-foreach.sh @@ -82,9 +82,9 @@ test_expect_success 'test basic "submodule foreach" usage' ' cat >expectexpect <
[PATCH 0/5] Rebooting pc/submodule-helper-foreach
The "What's cooking" email carried this series for some time now: > * pc/submodule-helper-foreach (2018-02-02) 5 commits > - submodule: port submodule subcommand 'foreach' from shell to C > - submodule foreach: document variable '$displaypath' > - submodule foreach: clarify the '$toplevel' variable documentation > - submodule foreach: document '$sm_path' instead of '$path' > - submodule foreach: correct '$path' in nested submodules from a subdirectory > > Expecting a response to review comments > e.g. cf. <20180206150044.1bffbb573c088d38c8e44...@google.com> I reworded the commit message of the first patch and nearly confused myself again, as "toplevel" doesn't refer to the "topmost" superproject, just the direct superproject of the submodule. However I think the code of the first patch is correct as originally presented. Just the wording of the commit message was changed to explain the reasoning more extensively. With this series, we get * keep the invariant of $toplevel + $path to be an absolute path that is correctly pointing at the submodule. "git -C $toplevel config" + $name allowing to ask configuration of the submodule. * $displaypath will have the relative path form $pwd to the submodule root. * better documentation. In later patches we could add $topmost, that points at the superproject in which the command was started from, and $path_from_topmost, that would be the relative path from $topmost to the submodule, potentially skipping intermediate superprojects. Thanks, Stefan Prathamesh Chavan (5): submodule foreach: correct '$path' in nested submodules from a subdirectory submodule foreach: document '$sm_path' instead of '$path' submodule foreach: clarify the '$toplevel' variable documentation submodule foreach: document variable '$displaypath' submodule: port submodule subcommand 'foreach' from shell to C Documentation/git-submodule.txt | 15 ++-- builtin/submodule--helper.c | 148 git-submodule.sh| 40 + t/t7407-submodule-foreach.sh| 38 +++- 4 files changed, 194 insertions(+), 47 deletions(-) -- 2.17.0.441.gb46fe60e1d-goog
[PATCH 2/5] submodule foreach: document '$sm_path' instead of '$path'
From: Prathamesh Chavan As using a variable '$path' may be harmful to users due to capitalization issues, see 64394e3ae9 (git-submodule.sh: Don't use $path variable in eval_gettext string, 2012-04-17). Adjust the documentation to advocate for using $sm_path, which contains the same value. We still make the 'path' variable available and document it as a deprecated synonym of 'sm_path'. Discussed-with: Ramsay Jones Signed-off-by: Stefan Beller Signed-off-by: Prathamesh Chavan Signed-off-by: Junio C Hamano --- Documentation/git-submodule.txt | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index 71c5618e82a..755ed695f08 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -183,12 +183,14 @@ information too. foreach [--recursive] :: Evaluates an arbitrary shell command in each checked out submodule. - The command has access to the variables $name, $path, $sha1 and + The command has access to the variables $name, $sm_path, $sha1 and $toplevel: $name is the name of the relevant submodule section in `.gitmodules`, - $path is the name of the submodule directory relative to the - superproject, $sha1 is the commit as recorded in the superproject, - and $toplevel is the absolute path to the top-level of the superproject. + $sm_path is the path of the submodule as recorded in the superproject, + $sha1 is the commit as recorded in the superproject, and + $toplevel is the absolute path to the top-level of the superproject. + Note that to avoid conflicts with '$PATH' on Windows, the '$path' + variable is now a deprecated synonym of '$sm_path' variable. Any submodules defined in the superproject but not checked out are ignored by this command. Unless given `--quiet`, foreach prints the name of each submodule before evaluating the command. -- 2.17.0.441.gb46fe60e1d-goog
[PATCH 3/5] submodule foreach: clarify the '$toplevel' variable documentation
From: Prathamesh Chavan It does not contain the topmost superproject as the author assumed, but the direct superproject, such that $toplevel/$sm_path is the actual absolute path of the submodule. Discussed-with: Ramsay Jones Signed-off-by: Stefan Beller Signed-off-by: Prathamesh Chavan Signed-off-by: Junio C Hamano Signed-off-by: Stefan Beller --- Documentation/git-submodule.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index 755ed695f08..408d5a0387f 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -188,7 +188,8 @@ foreach [--recursive] :: $name is the name of the relevant submodule section in `.gitmodules`, $sm_path is the path of the submodule as recorded in the superproject, $sha1 is the commit as recorded in the superproject, and - $toplevel is the absolute path to the top-level of the superproject. + $toplevel is the absolute path to its direct superproject, such that + $toplevel/$sm_path is the absolute path of the submodule. Note that to avoid conflicts with '$PATH' on Windows, the '$path' variable is now a deprecated synonym of '$sm_path' variable. Any submodules defined in the superproject but not checked out are -- 2.17.0.441.gb46fe60e1d-goog
[PATCH 4/5] submodule foreach: document variable '$displaypath'
From: Prathamesh Chavan It was observed that the variable '$displaypath' was accessible but undocumented. Hence, document it. Discussed-with: Ramsay Jones Signed-off-by: Stefan Beller Signed-off-by: Prathamesh Chavan Signed-off-by: Junio C Hamano --- Documentation/git-submodule.txt | 6 -- t/t7407-submodule-foreach.sh| 22 +++--- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index 408d5a0387f..4372d00c42e 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -183,10 +183,12 @@ information too. foreach [--recursive] :: Evaluates an arbitrary shell command in each checked out submodule. - The command has access to the variables $name, $sm_path, $sha1 and - $toplevel: + The command has access to the variables $name, $sm_path, $displaypath, + $sha1 and $toplevel: $name is the name of the relevant submodule section in `.gitmodules`, $sm_path is the path of the submodule as recorded in the superproject, + $displaypath contains the relative path from the current working + directory to the submodules root directory, $sha1 is the commit as recorded in the superproject, and $toplevel is the absolute path to its direct superproject, such that $toplevel/$sm_path is the absolute path of the submodule. diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh index 5144cc6926b..77729ac4aa1 100755 --- a/t/t7407-submodule-foreach.sh +++ b/t/t7407-submodule-foreach.sh @@ -82,16 +82,16 @@ test_expect_success 'test basic "submodule foreach" usage' ' cat >expect <../../actual + git submodule foreach "echo \$toplevel-\$name-\$sm_path-\$displaypath-\$sha1" >../../actual ) && test_i18ncmp expect actual ' @@ -206,25 +206,25 @@ submodulesha1=$(cd clone2/nested1/nested2/nested3/submodule && git rev-parse HEA cat >expect <../../actual + git submodule foreach --recursive "echo toplevel: \$toplevel name: \$name path: \$sm_path displaypath: \$displaypath hash: \$sha1" >../../actual ) && test_i18ncmp expect actual ' -- 2.17.0.441.gb46fe60e1d-goog
[PATCH 5/5] submodule: port submodule subcommand 'foreach' from shell to C
From: Prathamesh Chavan This aims to make git-submodule foreach a builtin. 'foreach' is ported to the submodule--helper, and submodule--helper is called from git-submodule.sh. Helped-by: Brandon Williams Mentored-by: Christian Couder Mentored-by: Stefan Beller Signed-off-by: Prathamesh Chavan Signed-off-by: Stefan Beller --- builtin/submodule--helper.c | 148 git-submodule.sh| 39 +- 2 files changed, 149 insertions(+), 38 deletions(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index a404df3ea49..bbbea5868de 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -439,6 +439,153 @@ static void for_each_listed_submodule(const struct module_list *list, fn(list->entries[i], cb_data); } +struct cb_foreach { + int argc; + const char **argv; + const char *prefix; + char *toplevel; + int quiet; + int recursive; +}; +#define CB_FOREACH_INIT { 0 } + +static void runcommand_in_submodule_cb(const struct cache_entry *list_item, + void *cb_data) +{ + struct cb_foreach *info = cb_data; + const char *path = list_item->name; + const struct object_id *ce_oid = &list_item->oid; + + const struct submodule *sub; + struct child_process cp = CHILD_PROCESS_INIT; + char *displaypath; + + displaypath = get_submodule_displaypath(path, info->prefix); + + sub = submodule_from_path(&null_oid, path); + + if (!sub) + die(_("No url found for submodule path '%s' in .gitmodules"), + displaypath); + + if (!is_submodule_populated_gently(path, NULL)) + goto cleanup; + + prepare_submodule_repo_env(&cp.env_array); + + /* + * For the purpose of executing in the submodule, + * separate shell is used for the purpose of running the + * child process. + */ + cp.use_shell = 1; + cp.dir = path; + + /* + * NEEDSWORK: the command currently has access to the variables $name, + * $sm_path, $displaypath, $sha1 and $toplevel only when the command + * contains a single argument. This is done for maintaining a faithful + * translation from shell script. + */ + if (info->argc == 1) { + char *toplevel = xgetcwd(); + + argv_array_pushf(&cp.env_array, "name=%s", sub->name); + argv_array_pushf(&cp.env_array, "sm_path=%s", path); + argv_array_pushf(&cp.env_array, "displaypath=%s", displaypath); + argv_array_pushf(&cp.env_array, "sha1=%s", + oid_to_hex(ce_oid)); + argv_array_pushf(&cp.env_array, "toplevel=%s", toplevel); + + /* + * Since the path variable was accessible from the script + * before porting, it is also made available after porting. + * The environment variable "PATH" has a very special purpose + * on windows. And since environment variables are + * case-insensitive in windows, it interferes with the + * existing PATH variable. Hence, to avoid that, we expose + * path via the args argv_array and not via env_array. + */ + argv_array_pushf(&cp.args, "path=%s; %s", + path, info->argv[0]); + + free(toplevel); + } else { + argv_array_pushv(&cp.args, info->argv); + } + + if (!info->quiet) + printf(_("Entering '%s'\n"), displaypath); + + if (info->argv[0] && run_command(&cp)) + die(_("run_command returned non-zero status for %s\n."), + displaypath); + + if (info->recursive) { + struct child_process cpr = CHILD_PROCESS_INIT; + + cpr.git_cmd = 1; + cpr.dir = path; + prepare_submodule_repo_env(&cpr.env_array); + + argv_array_pushl(&cpr.args, "--super-prefix", NULL); + argv_array_pushf(&cpr.args, "%s/", displaypath); + argv_array_pushl(&cpr.args, "submodule--helper", "foreach", "--recursive", + NULL); + + if (info->quiet) + argv_array_push(&cpr.args, "--quiet"); + + if (info->toplevel) + argv_array_pushf(&cpr.args, "--toplevel=%s", info->toplevel); + + argv_array_pushv(&cpr.args, info->argv); + + if (run_command(&cpr)) + die(_("run_command returned non-zero status while" + "recursing in the nested submodules of %s\n."), + displaypath); + } + +cleanup: + free(displaypath); +} + +static int module_foreach(int argc, const char **argv, const char *prefix) +{ +
Re: [PATCH 5/5] submodule: port submodule subcommand 'foreach' from shell to C
On Wed, May 2, 2018 at 5:53 PM, Stefan Beller wrote: > +struct cb_foreach { > + char *toplevel; ... > + OPT_STRING(0, "toplevel", &info.toplevel, N_("path"), > + N_("path from the top level of the invocation")), This is a leftover from my experimentation that I hinted at in the cover letter that would be used to implement $topmost. I'll remove this in a reroll.
Re: Blame / annotate with mixed mac line endings?
"Jason Pyeron" writes: > Any way to hit git with a stick to treat lone CR as a line break for > blame/annotate? I highly suspect that you would get more help from those whose love is Git if your inquiry were about a way to ask Git politely to do what you want to achieve, rather than hitting it with a stick ;-) Perhaps define a textconv filter to fix the incorrect line endings? That way, you do not have to rewrite the history only to run blame, but can pretend as if the contents were all using consistent line endings (either LF or CRLF) retroactively.
Re: [GSoC] Info: new blog series of my work on Git GSoC '18
On Thu, May 3, 2018 at 6:31 AM, Andrew Ardill wrote: > On 2 May 2018 at 17:12, Johannes Schindelin > wrote: >> Hi Pratik, >> >> On Wed, 2 May 2018, Pratik Karki wrote: >> >>> As promised in my proposal, I've started >>> to write a blog series of GSoC '18 with Git. The initial blog is up. >>> You can find it here[1]. The initial one is just to get started and >>> from next iterations, I'll start detailing of my work towards converting >>> rebase to builtin. >>> >>> [1]: https://prertik.github.io/categories/git/ >> >> This is awesome! Thanks for doing this, >> Dscho > > Agreed, was fun to read. > > I'd encourage you to post to the list when you blog, or perhaps > include a link to the blog as part of any regular updates you give on > your project progress. > > Would also make for an interesting addition to the newsletter. > > I know it can be difficult to sit down and write about what you're > doing, especially when it feels like you could be focusing on 'real > work'. Hopefully you will find the process rewarding; I'm looking > forward to reading about what you find easy and hard, how you learn > the git developer processes, and the challenges you find in converting > shell scripts to a built-in. I'm sure other people are too, and I'll > bet the ones who have been there before will have feedback for you as > well. > > I'd find it interesting even if it was a 5-line bullet list of what's > going through your mind with respect to the project! Looking forward > to following along. > > Regards, > > Andrew Ardill Thank you. Feedbacks mean a lot to me. I will be writing a weekly blog of the progress I made during the week. I will try my best to detail things I did and intend to do. I will surely notify everyone as soon as I publish a post. Cheers, Pratik Karki
RE: Blame / annotate with mixed mac line endings?
> -Original Message- > From: Junio C Hamano > Sent: Wednesday, May 2, 2018 21:22 > Subject: Re: Blame / annotate with mixed mac line endings? > > "Jason Pyeron" writes: > > > Any way to hit git with a stick to treat lone CR as a line > break for blame/annotate? > I highly suspect that you would get more help from those whose love > is Git if your inquiry were about a way to ask Git politely to do > what you want to achieve, rather than hitting it with a stick ;-) No offense meant. > Perhaps define a textconv filter to fix the incorrect line endings? Worked perfectly! I Read: https://github.com/git/git/blob/master/t/t8006-blame-textconv.sh Added to .git/config: [diff "test"] textconv = /usr/local/bin/helper.sh Added to .gitattributes: *.cfm diff=test $ cat /usr/local/bin/helper.sh cat "$1" | mac2unix The important issue was to not use mac2unix directly, because it modifies the file itself. Read: https://git.wiki.kernel.org/index.php/Textconv but it did not help so much. Thanks, Jason -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100- - +1 (443) 269-1555 x333Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: Blame / annotate with mixed mac line endings?
"Jason Pyeron" writes: > $ cat /usr/local/bin/helper.sh > cat "$1" | mac2unix > > The important issue was to not use mac2unix directly, because it > modifies the file itself. I wonder if [diff "test"] textconv = sh -c 'mac2unix <"$1"' -IAmArgv0- works without an extra helper.
Re: [PATCH v3 04/12] cache.h: add comment explaining the order in object_type
Ævar Arnfjörð Bjarmason writes: > The order in the enum might seem arbitrary, and isn't explained by > 72518e9c26 ("more lightweight revalidation while reusing deflated > stream in packing", 2006-09-03) which added it. > > Derrick Stolee suggested that it's ordered topologically in > that as a comment. > 5f8b1ec1-258d-1acc-133e-a7c248b40...@gmail.com. Makes sense to me, add When referring to a message-id, please do not omit surrounding <>, which is part of the message-id string. That's like writing your e-mail address as avarab gmail.com without at sign. > enum object_type { > OBJ_BAD = -1, > OBJ_NONE = 0, > + /* > + * Why have our our "real" object types in this order? They're > + * ordered topologically: > + * > + * tag(4)-> commit(1), tree(2), blob(3) > + * commit(1) -> tree(2) > + * tree(2) -> blob(3) > + */ I am not sure if the above makes sense at all in explaining tag. With all others, type with smaller type id can refer to another type that is with equal or larger type id (tree can refer to another tree). If tag had the smallest ID among all, it would have made sense, though. Before anybody confused raises noise, a gitlink records a commit in a tree, which may seem to contradict the rule even more, but it merely records a commit, without "referring" to it in the same sense as other reference that require connectivity. > OBJ_COMMIT = 1, > OBJ_TREE = 2, > OBJ_BLOB = 3,
Re: [PATCH v3 06/12] get_short_oid: sort ambiguous objects by type, then SHA-1
Ævar Arnfjörð Bjarmason writes: > + /* > + * Between object types show tags, then commits, and finally > + * trees and blobs. > + * > + * The object_type enum is commit, tree, blob, tag, but we > + * want tag, commit, tree blob. Cleverly (perhaps too The missing comma between "tree blob" on the second line made me read this comment twice, which made me notice the lack of "and" before "tag" on the previous line. Assignment is "commit, tree, blob, and then tag" but we want "tag, commit, tree and then blob", perhaps?
Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults
On Tue, May 01, 2018 at 05:08:27PM -0700, Elijah Newren wrote: > Eckhard, can you add some comments to your commit message mentioning > the email pointed to by Junio about break detection and rename > detection being unsafe to use together, as well as the inconsistencies > in break detection between different commands? I will work on that. Greetings, Eckhard
Re: [PATCH v3 09/12] get_short_oid / peel_onion: ^{tree} should be tree, not treeish
Ævar Arnfjörð Bjarmason writes: > But ^{tree} shows just the trees, but would previously be equivalent > to the above: > > $ git rev-parse e8f2^{tree} > error: short SHA1 e8f2 is ambiguous > hint: The candidates are: > hint: e8f2093055 tree > hint: e8f25a3a50 tree > hint: e8f28d537c tree > hint: e8f2cf6ec0 tree > [...] When a user says "git $cmd e8f2^{tree}", the user is telling Git that s/he knows e8f2 *is* a tree-ish, but for whatever reason $cmd wants a tree and does not accept an arbitrary tree-ish---that is the whole piont of appending ^{tree} as a suffix. A useful hint in such a case would be "oh, you said e8f2 is a tree-ish, but there are more than one tree-ish, so let me show them to you to help you decide which one among them is the one you meant". When $cmd is rev-parse, I would even say that the user is saying "I know e8f2 is a tree-ish, and I know it not a tree--it merely is a tree-ish. I want the tree that e8f2 thing points at". Limiting that hint to show only real trees does not make any sense to me. I do not think we care _too_ deeply, because most of the time, command line location that expects a tree-ish can be given a tree-ish, so there is not much reason to use ^{tree} suffix these days. But in a case where it _does_ matter, I think this change makes the "hint" almost useless. Or am I misleading what you wanted to achieve with this patch?
Re: [PATCH v3 00/12] get_short_oid UI improvements
On Wed, May 2, 2018 at 6:45 AM, Derrick Stolee wrote: > On 5/2/2018 8:42 AM, Derrick Stolee wrote: >> >> On 5/1/2018 2:40 PM, Ævar Arnfjörð Bjarmason wrote: >>> >>> The biggest change in v3 is the no change at all to the code, but a >>> lengthy explanation of why I didn't go for Derrick's simpler >>> implementation. Maybe I'm wrong about that, but I felt uneasy >>> offloading undocumented (or if I documented it, it would only be for >>> this one edge-case) magic on the oid_array API. Instead I'm just >>> making this patch a bit more complex. >> >> >> I think that's fair. Thanks for going along with me on the thought >> experiment. > > > Also, v3 looks good to me. > > Reviewed-by: Derrick Stolee > I also reviewed this, and it looks good to me as well. Thanks, Jake
Re: Generate more revenue from Git
Hello, Following up on my last email, It would be great to setup a call this week. Looking forward to your response. Best Regards, -- Michal Sapozhnikov | Business Manager, Luminati SDK | +972-50-2826778 | Skype: live:michals_43 http://luminati.io/sdk On 15-Apr-18 14:14, 7d (by eremind) wrote: > Hi, > > My name is Michal and I lead the SDK partnerships at Luminati. I assume > your > software earns money by charging users for a premium subscription or by > showing > ads - both models do not pay out much and harm the user experience. > > We now offer you a third option. > > Luminati’s monetization SDK for Windows desktop provides your users the option > to use the software for free, and in exchange we pay you $3,000 USD per month, > for every 100K daily active users. > Luminati is the largest proxy network, having millions of IPs based on other > developers who embedded our SDK. More information is available on > http://luminati.io/sdk_win. > > I’d like to schedule a 15 minute call to let you know how we can start. Are > you > available tomorrow at 12:30pm your local time? > > Best Regards, > Michal >