Re: Surprising use of memory and time when repacking mozilla's gecko repository
Mike Hommey writes: > On Fri, Jul 05, 2019 at 01:14:13AM -0400, Jeff King wrote: >> On Thu, Jul 04, 2019 at 10:13:20PM +0900, Mike Hommey wrote: [...] >> I think I explained all of the memory-usage questions in my earlier >> response, but just for reference: if you have access to it, valgrind's >> "massif" tool is really good for this kind of profiling. Something like: >> >> valgrind --tool=massif git pack-objects ... >> ms_print massif.out.* >> >> which shows heap usage at various times, points out the snapshot with >> peak usage, and shows a backtrace of the main culprits at a few >> snapshots. > > At the expense of time ;) A run would likely last an entire day under > massif (by which I mean a full 24 hours, not a 9-5 day). Valgrind, as I understand it, runs the program under emulation. I wonder if perf / OProfile based solution could help here (gathering memory-based events and metrics). There is also trace2 built-in into Git, but I don't know if it could be used for this purpose or not. Best, -- Jakub Narębski
Handling text files encoded in little-endian UTF-16 with BOM
Hi, Using git version 2.22.0.windows.1 I have a repository with number of .txt files encoded in little-endian UTF-16 with BOM. What are the best practice and recommended configuration to manage such files with Git to avoid unexpected re-encoding to UTF-8 or others? Currently, there is .gitattriuts with entries like resource/*.txt working-tree-encoding=UTF-16LE-BOM -text Despite that some of team members have noticed that the files occacionally get re-encoded to UTF-8. It is unknow what are actual steps leading to that. BTW, there a few Git clients in use: git in Git Bash, VSCode, Fork. What bothers me in the .gitattributes is this `-text` attribute. Is the use of `working-tree-encoding` and `-text` together a valid combination at all? The documentation at https://git-scm.com/docs/gitattributes does not seem to touch on that. I'll appreciate any suggestions on those UTF-16LE-BOM files. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
Re: Surprising use of memory and time when repacking mozilla's gecko repository
On Fri, Jul 05, 2019 at 02:45:16PM +0900, Mike Hommey wrote: > On Fri, Jul 05, 2019 at 01:09:55AM -0400, Jeff King wrote: > > On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote: > > > Finally, with 1 thread, the picture changes greatly. The overall process > > > takes 2.5h: > > > - 50 seconds enumerating and counting objects. > > > - ~2.5h compressing objects. > > > - 3 minutes and 25 seconds writing objects! > > > > That's weird. I'd expect us to find similar amounts of deltas, but we > > don't have the writing slow-down. I wonder if there is some bad > > interaction between the multi-threaded code and the delta cache. > > > > Did you run the second, single-thread run against the exact same > > original repository you had? Or did you re-run it on the result of the > > multi-thread run? Another explanation is that the original repository > > had some poor patterns that made objects expensive to access (say, a ton > > of really deep delta chains). And so the difference between the two runs > > was not the threads, but just the on-disk repository state. > > > > Kind of a long shot, but if that is what happened, try running another > > multi-threaded "repack -f" and see if its writing phase is faster. > > I've run 36-threads, 16-threads and 1-thread in sequence on the same > repo, so 16-threads was repacking what was repacked by the 36-threads, > and 1-thread was repacking what was repacked by the 16-threads. I > assumed it didn't matter, but come to think of it, I guess it can. I tried: - fresh clone -> 36-threads - fresh clone -> 1-thread -> 36-threads The 36-threads gc in the latter was only marginally faster than in the former (between 19 and 20 minutes instead of 22 for both "Compressing" and "Writing"). Mike
Re: Virtual Git Contributor Summit
Johannes Schindelin writes: > Team, > > I kept talking about this idea of a purely online Git Contributor Summit, > and it is finally time for action. > > The idea: just like the Git Contributor Summits we have on the day before > GitMerge, but instead of traveling to the same place, we'll use an online > conferencing system. By the way, would there be GitTogether meeting held in conjunction with Google Summer of Code 2019 Mentors Meeting? > From my point of view, the main benefits of doing this online are: > > - It should make it easier for all contributors to attend (yes, Junio, we > do miss you every single time you're not there). > > - We will leave a substantially reduced carbon footprint. > > - There won't be any jet-lag involved, including the consequences on > health/sleeping. > > - It is substantially cheaper and more sustainable. > > Of course, there are also some challenges: > > - Timezones. My idea here is to have all participants put down their > preferred timezone and then compromising on some sort of Median. > > - Timing. As it is not attached to a conference, we are less bound to a > specific date, and that flexibility might make it harder to agree > on one date. Also, all the challenges of remote-only meeting instead of real-life contact. Though it might be good supplement to the meeting before GitMerge. [...] > To coordinate this event, I started a Google Spreadsheet where we can > decide on timezone, date(s), number of days, and later on try to imitate > the sticky game with virtual stickies for the discussions. > > I will send out an invitation to the top 25 contributors (according to > `git shortlog -nse --since=1.year.ago`) after sending out this mail. > > Obviously, there will be more people interested in participating; Please > do drop me an email if you're interested and I will send you a link to > that spreadsheet (as I do not read my mails continuously, it might take a > day, but I will, promise). I think it would be good idea to have some of us from Git Rev News team to chronicle this Virtual Git Contributors Summit meeting. Best, -- Jakub Narębski
Re: Virtual Git Contributor Summit
Hi Dscho, On 03/07/2019 14:01, Johannes Schindelin wrote: Team, I kept talking about this idea of a purely online Git Contributor Summit, and it is finally time for action. The idea: just like the Git Contributor Summits we have on the day before GitMerge, but instead of traveling to the same place, we'll use an online conferencing system. From my point of view, the main benefits of doing this online are: - It should make it easier for all contributors to attend (yes, Junio, we do miss you every single time you're not there). - We will leave a substantially reduced carbon footprint. - There won't be any jet-lag involved, including the consequences on health/sleeping. - It is substantially cheaper and more sustainable. Of course, there are also some challenges: - Timezones. My idea here is to have all participants put down their preferred timezone and then compromising on some sort of Median. - Timing. As it is not attached to a conference, we are less bound to a specific date, and that flexibility might make it harder to agree on one date. To alleviate both of those points, we might want to consider spreading it out over a couple of days? I already heard some fierce opposition against that idea, though. In any case, I think it would be good to give everybody some time to coordinate and to shift work out of the way, so I propose that we look for dates in the first two weeks of September. Peff kindly agreed to use GitHub's Zoom license for this, which should also make it easy to have a recording (and not having people in the same room will benefit this, too). To coordinate this event, I started a Google Spreadsheet where we can decide on timezone, date(s), number of days, and later on try to imitate the sticky game with virtual stickies for the discussions. I will send out an invitation to the top 25 contributors (according to `git shortlog -nse --since=1.year.ago`) after sending out this mail. Obviously, there will be more people interested in participating; Please do drop me an email if you're interested and I will send you a link to that spreadsheet (as I do not read my mails continuously, it might take a day, but I will, promise). Thanks all, Dscho Will there also be an opportunity for a listen/read -only mode for those of us on the lower reaches of the contributor list? I'd certainly like to at least keep up with progress. -- Philip
Re: mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)
On Thu, Jul 4, 2019 at 6:30 PM Johannes Schindelin wrote: > > Hi Matheus, > > On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote: > > > > I wanted to take a look at the failures to see if I could help, [...] > > Could you point me to the right place, please? [...] > > I usually click on the "Tests" tab in that page: > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab > > You can click on any of the 1,384 (!) failing test cases, it will pop up a > pane on the right-hand side that shows the trace of that failing test > case. For the full trace of that test script, go to "Attachments" and > download the `Standard_Error_Output.log` (via the horizontal bread-crumbs > menu you can see when hovering over the file name). Thanks for the explanation! I inspected some of the `Standard_Error_Output.log`'s and it seems the problem is always with local clone (which started to use dir-iterator in this series). It seems all .git/objects/ dirs are being ignored. That makes sense, since st_ino will always be 0 on Windows. But your fixup patch should solve this. Is there any azure build for it? [...] > > > > Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off > > when resolving a path. So we may also "ignore" circular symlinks and > > let the iteration continue until the point where `stat()` will return > > an ELOOP. (But it may be expensive...) > > This would not be equivalent, though, as your code also tried to address > circular references when two or more symlinks are involved, e.g. when > one symlink points to a directory that has another symlink that points to > the directory containing the first symlink. Hm, `stat()` also addresses this case doesn't it? For example: $ mkdir a b $ ln -s ../a b/s2a $ ln -s ../b a/s2b $ stat b/s2a/s2b/s2a/.../s2b Gives me: "too many levels of symbolic links" > > > Do we _have_ to, though? At some stage the path we come up with is beyond > > > `PATH_MAX` and we can stop right then and there. > > > > > > Besides, the way `find_recursive_symlinks()` is implemented adds quadratic > > > behavior. > > > > Yes, indeed. But it only happens when we have a path like this: > > `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`, > > right? I think this case won't be very usual on actual filesystems, > > thought. > > No, the check is performed in a loop, and that loop is executed whether > you have symlinks or not. That loop is executed for every item in the > iteration, and as we cannot assume a flat directory in general (in fact, > we should assume a directory depth proportional to the total number of > files), that adds that quadratic behavior. Oh, you're right. Sorry for the noise, I forgot this function was not only called for symlinks but for every directory entry :( An alternative solution would be to use `lstat()` together with `stat()` to only feed symlinked dirs to this function. This should reduce a lot the number of calls. Although it'd still be quadratic in the worst case, would that be any good? [...] > > > But I also think there are enough > > > reasons to do away with this function in the first place. > > > > We can delegate the circular symlinks problem to `stat()'s ELOOP` > > Not really. I mean, we _already_ delegate to the `ELOOP` condition, we > cannot even avoid it as long as we keep using `stat()` instead of > `lstat()` Yes, indeed. What I meant is that we may chose to _fully_ delegate to ELOOP. The way it is now, we should detect circular symlinks way before stat() fails. For example, with the case I showed above, we would stop at "b/s2a/s2b" whereas stat() would only fail at a much longer "b/s2a/s2b/s2a/s2b/...", far beyond in the iteration. > but as I demonstrated above, that only catches part of the > circular symlinks. > > > or `path_len > PATH_MAX`. > > This would have the advantage of _not_ adding quadratic behavior. > > And just in case you think quadratic behavior would not matter much: Git > is used to manage the largest repository on this planet, which has over 3 > million index entries when checked out. > > Quadratic behavior matters. > > I don't know where the dir-iterator is used, but we simply should try our > best to aim for the optimal time complexity in the first place. Currently, with the follow symlinks option, dir-iterator is only being used to iterate over .git/objects. As it's rather shallow, perhaps the quadratic complexity wouldn't be a huge deal in this case. But I agree with you that we should take care of performance so that this API may, as well, be used in other places, in the future. > > The only downside is the overhead of iterating through directories which > > will be latter discarded for being in circular symlinks' chains. I mean, > > the overhead at dir-iterator shouldn't be much, but the one on the code > > using this API to do something for each of these directories (and its > > contents), may be. Also we would need to "undo" t
Re: Handling text files encoded in little-endian UTF-16 with BOM
On Fri, Jul 05, 2019 at 01:35:13PM +0200, Mateusz Loskot wrote: > Hi, > > Using git version 2.22.0.windows.1 > > I have a repository with number of .txt files encoded in > little-endian UTF-16 with BOM. > > What are the best practice and recommended configuration to > manage such files with Git to avoid unexpected re-encoding to > UTF-8 or others? > > Currently, there is .gitattriuts with entries like > >resource/*.txt working-tree-encoding=UTF-16LE-BOM -text > > Despite that some of team members have noticed that the files > occacionally get re-encoded to UTF-8. It is unknow what are > actual steps leading to that. BTW, there a few Git clients > in use: git in Git Bash, VSCode, Fork. If possible, I don't want to comment on this kind of "sometimes something happens something on someones computer" thing. A little bit more information could be helpful. > > What bothers me in the .gitattributes is this `-text` attribute. > > Is the use of `working-tree-encoding` and `-text` together a > valid combination at all? Yes, it means that the content re-encoded between the repo and the working tree, (that is what you want) And the "-text" means "leave the line endings" (LF or CRLF) as is, don't convert them. In that sense you can call that a legal combination, but may be not a recommended one. > > The documentation at https://git-scm.com/docs/gitattributes > does not seem to touch on that. > > I'll appreciate any suggestions on those UTF-16LE-BOM files. > My suggestion would be to use the "text" attribute: resource/*.txt working-tree-encoding=UTF-16LE-BOM text And depending on your application: Do the resource files need a special line ending ? The use either resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=LF or resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=CRLF I hope that helps a little bit. > Best regards, > -- > Mateusz Loskot, http://mateusz.loskot.net
[RFC/PATCH v1] merge - make squash a 1-step operation
--- builtin/merge.c | 63 + 1 file changed, 32 insertions(+), 31 deletions(-) diff --git a/builtin/merge.c b/builtin/merge.c index 6e99aead46..d5745a7888 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -64,6 +64,7 @@ static int option_edit = -1; static int allow_trivial = 1, have_message, verify_signatures; static int overwrite_ignore = 1; static struct strbuf merge_msg = STRBUF_INIT; +static struct strbuf squash_msg = STRBUF_INIT; static struct strategy **use_strategies; static size_t use_strategies_nr, use_strategies_alloc; static const char **xopts; @@ -390,12 +391,9 @@ static void finish_up_to_date(const char *msg) static void squash_message(struct commit *commit, struct commit_list *remoteheads) { struct rev_info rev; - struct strbuf out = STRBUF_INIT; struct commit_list *j; struct pretty_print_context ctx = {0}; - printf(_("Squash commit -- not updating HEAD\n")); - repo_init_revisions(the_repository, &rev, NULL); rev.ignore_merges = 1; rev.commit_format = CMIT_FMT_MEDIUM; @@ -414,15 +412,13 @@ static void squash_message(struct commit *commit, struct commit_list *remotehead ctx.date_mode = rev.date_mode; ctx.fmt = rev.commit_format; - strbuf_addstr(&out, "Squashed commit of the following:\n"); + strbuf_addstr(&squash_msg, "Squashed commit of the following:\n"); while ((commit = get_revision(&rev)) != NULL) { - strbuf_addch(&out, '\n'); - strbuf_addf(&out, "commit %s\n", + strbuf_addch(&squash_msg, '\n'); + strbuf_addf(&squash_msg, "commit %s\n", oid_to_hex(&commit->object.oid)); - pretty_print_commit(&ctx, commit, &out); + pretty_print_commit(&ctx, commit, &squash_msg); } - write_file_buf(git_path_squash_msg(the_repository), out.buf, out.len); - strbuf_release(&out); } static void finish(struct commit *head_commit, @@ -440,8 +436,12 @@ static void finish(struct commit *head_commit, strbuf_addf(&reflog_message, "%s: %s", getenv("GIT_REFLOG_ACTION"), msg); } - if (squash) { + if (squash && !squash_msg.len) { + // message hasn't been calculated, that means we are stopping the squash process so the user can finish it squash_message(head_commit, remoteheads); + write_file_buf(git_path_squash_msg(the_repository), squash_msg.buf, squash_msg.len); + if (option_commit > 0) + printf(_("Squash conflicts -- not updating HEAD\n")); } else { if (verbosity >= 0 && !merge_msg.len) printf(_("No merge message -- not updating HEAD\n")); @@ -893,13 +893,21 @@ static int finish_automerge(struct commit *head, struct object_id result_commit; free_commit_list(common); - parents = remoteheads; - if (!head_subsumed || fast_forward == FF_NO) - commit_list_insert(head, &parents); - prepare_to_commit(remoteheads); - if (commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, - &result_commit, NULL, sign_commit)) - die(_("failed to write commit object")); + if (squash) { + squash_message(head, remoteheads); + parents = commit_list_insert(head, &parents); + if (commit_tree(squash_msg.buf, squash_msg.len, result_tree, parents, + &result_commit, NULL, sign_commit)) + die(_("failed to write commit object on squash")); + } else { + parents = remoteheads; + if (!head_subsumed || fast_forward == FF_NO) + commit_list_insert(head, &parents); + prepare_to_commit(remoteheads); + if (commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, + &result_commit, NULL, sign_commit)) + die(_("failed to write commit object")); + } strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); finish(head, remoteheads, &result_commit, buf.buf); strbuf_release(&buf); @@ -1342,18 +1350,8 @@ int cmd_merge(int argc, const char **argv, const char *prefix) if (verbosity < 0) show_diffstat = 0; - if (squash) { - if (fast_forward == FF_NO) - die(_("You cannot combine --squash with --no-ff.")); - if (option_commit > 0) - die(_("You cannot combine --squash with --commit.")); - /* -* squash can now silently disable option_commit - this is not -* a problem as it is only overriding the default, not a user -* supplied option. -*/ -
Re: [RFC/PATCH v1] merge - make squash a 1-step operation
On Fri, Jul 5, 2019 at 10:56 AM Edmundo Carmona Antoranz wrote: > > --- > builtin/merge.c | 63 + > 1 file changed, 32 insertions(+), 31 deletions(-) > This would be a first step in order to achieve the dream of having rebase/squash in a single step. Let the flame-fest begin. I'm wondering if it makes sense to support -m with squash or if we should keep the default message with all the revisions that are being squashed (perhaps erroring out if the user mixes --squash and -m, just like mixing --squash and --no-ff) Thanks in advance for your feedback.
[PATCH v2 07/14] apply: make parse_git_header public
Make parse_git_header a "public" function in apply.h, so we can re-use it in range-diff in a subsequent commit. Signed-off-by: Thomas Gummerer --- I considered creating a separate struct for only the metadata here, and embedding that in 'struct patch'. As struct patch is mostly metadata fields though, I decided against that to avoid more code churn here. apply.c | 68 - apply.h | 48 2 files changed, 67 insertions(+), 49 deletions(-) diff --git a/apply.c b/apply.c index 468f1d3fee..04319c233f 100644 --- a/apply.c +++ b/apply.c @@ -207,40 +207,6 @@ struct fragment { #define BINARY_DELTA_DEFLATED 1 #define BINARY_LITERAL_DEFLATED 2 -/* - * This represents a "patch" to a file, both metainfo changes - * such as creation/deletion, filemode and content changes represented - * as a series of fragments. - */ -struct patch { - char *new_name, *old_name, *def_name; - unsigned int old_mode, new_mode; - int is_new, is_delete; /* -1 = unknown, 0 = false, 1 = true */ - int rejected; - unsigned ws_rule; - int lines_added, lines_deleted; - int score; - int extension_linenr; /* first line specifying delete/new/rename/copy */ - unsigned int is_toplevel_relative:1; - unsigned int inaccurate_eof:1; - unsigned int is_binary:1; - unsigned int is_copy:1; - unsigned int is_rename:1; - unsigned int recount:1; - unsigned int conflicted_threeway:1; - unsigned int direct_to_threeway:1; - unsigned int crlf_in_old:1; - struct fragment *fragments; - char *result; - size_t resultsize; - char old_oid_prefix[GIT_MAX_HEXSZ + 1]; - char new_oid_prefix[GIT_MAX_HEXSZ + 1]; - struct patch *next; - - /* three-way fallback result */ - struct object_id threeway_stage[3]; -}; - static void free_fragment_list(struct fragment *list) { while (list) { @@ -1321,11 +1287,13 @@ static int check_header_line(int linenr, struct patch *patch) } /* Verify that we recognize the lines following a git header */ -static int parse_git_header(struct apply_state *state, - const char *line, - int len, - unsigned int size, - struct patch *patch) +int parse_git_header(struct strbuf *root, +int *linenr, +int p_value, +const char *line, +int len, +unsigned int size, +struct patch *patch) { unsigned long offset; struct parse_git_header_state parse_hdr_state; @@ -1340,21 +1308,21 @@ static int parse_git_header(struct apply_state *state, * or removing or adding empty files), so we get * the default name from the header. */ - patch->def_name = git_header_name(state->p_value, line, len); - if (patch->def_name && state->root.len) { - char *s = xstrfmt("%s%s", state->root.buf, patch->def_name); + patch->def_name = git_header_name(p_value, line, len); + if (patch->def_name && root->len) { + char *s = xstrfmt("%s%s", root->buf, patch->def_name); free(patch->def_name); patch->def_name = s; } line += len; size -= len; - state->linenr++; - parse_hdr_state.root = &state->root; - parse_hdr_state.linenr = state->linenr; - parse_hdr_state.p_value = state->p_value; + (*linenr)++; + parse_hdr_state.root = root; + parse_hdr_state.linenr = *linenr; + parse_hdr_state.p_value = p_value; - for (offset = len ; size > 0 ; offset += len, size -= len, line += len, state->linenr++) { + for (offset = len ; size > 0 ; offset += len, size -= len, line += len, (*linenr)++) { static const struct opentry { const char *str; int (*fn)(struct parse_git_header_state *, const char *, struct patch *); @@ -1391,7 +1359,7 @@ static int parse_git_header(struct apply_state *state, res = p->fn(&parse_hdr_state, line + oplen, patch); if (res < 0) return -1; - if (check_header_line(state->linenr, patch)) + if (check_header_line(*linenr, patch)) return -1; if (res > 0) return offset; @@ -1572,7 +1540,9 @@ static int find_header(struct apply_state *state, * or mode change, so we handle that specially */ if (!memcmp("diff --git ", line, 11)) { - int git_hdr_len = parse_git_header(state, line, len, size, patch); + int git_hdr_len = parse_git_header(&stat
[PATCH v2 11/14] range-diff: suppress line count in outer diff
The line count in the outer diff's hunk headers of a range diff is not all that interesting. It merely shows how far along the inner diff are on both sides. That number is of no use for human readers, and range-diffs are not meant to be machine readable. In a subsequent commit we're going to add some more contextual information such as the filename corresponding to the diff to the hunk headers. Remove the unnecessary information, and just keep the "@@" to indicate that a new hunk of the outer diff is starting. Signed-off-by: Thomas Gummerer --- diff.c| 5 - diff.h| 1 + range-diff.c | 1 + t/t3206-range-diff.sh | 16 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/diff.c b/diff.c index ec5c095199..9c28ff0a92 100644 --- a/diff.c +++ b/diff.c @@ -1672,7 +1672,10 @@ static void emit_hunk_header(struct emit_callback *ecbdata, if (ecbdata->opt->flags.dual_color_diffed_diffs) strbuf_addstr(&msgbuf, reverse); strbuf_addstr(&msgbuf, frag); - strbuf_add(&msgbuf, line, ep - line); + if (ecbdata->opt->flags.suppress_hunk_header_line_count) + strbuf_add(&msgbuf, atat, sizeof(atat)); + else + strbuf_add(&msgbuf, line, ep - line); strbuf_addstr(&msgbuf, reset); /* diff --git a/diff.h b/diff.h index c9db9825bb..49913049f9 100644 --- a/diff.h +++ b/diff.h @@ -98,6 +98,7 @@ struct diff_flags { unsigned stat_with_summary; unsigned suppress_diff_headers; unsigned dual_color_diffed_diffs; + unsigned suppress_hunk_header_line_count; }; static inline void diff_flags_or(struct diff_flags *a, diff --git a/range-diff.c b/range-diff.c index 484b1ec5a9..b31fbab026 100644 --- a/range-diff.c +++ b/range-diff.c @@ -480,6 +480,7 @@ int show_range_diff(const char *range1, const char *range2, opts.output_format = DIFF_FORMAT_PATCH; opts.flags.suppress_diff_headers = 1; opts.flags.dual_color_diffed_diffs = dual_color; + opts.flags.suppress_hunk_header_line_count = 1; opts.output_prefix = output_prefix_cb; strbuf_addstr(&indent, ""); opts.output_prefix_data = &indent; diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index aebd4e3693..9f89af7178 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -99,7 +99,7 @@ test_expect_success 'changed commit' ' 1: 4de457d = 1: a4b s/5/A/ 2: fccce22 = 2: f51d370 s/4/A/ 3: 147e64e ! 3: 0559556 s/11/B/ - @@ -10,7 +10,7 @@ + @@ 9 10 -11 @@ -109,7 +109,7 @@ test_expect_success 'changed commit' ' 13 14 4: a63e992 ! 4: d966c5c s/12/B/ - @@ -8,7 +8,7 @@ + @@ @@ A 9 10 @@ -158,7 +158,7 @@ test_expect_success 'changed commit with sm config' ' 1: 4de457d = 1: a4b s/5/A/ 2: fccce22 = 2: f51d370 s/4/A/ 3: 147e64e ! 3: 0559556 s/11/B/ - @@ -10,7 +10,7 @@ + @@ 9 10 -11 @@ -168,7 +168,7 @@ test_expect_success 'changed commit with sm config' ' 13 14 4: a63e992 ! 4: d966c5c s/12/B/ - @@ -8,7 +8,7 @@ + @@ @@ A 9 10 @@ -191,7 +191,7 @@ test_expect_success 'changed message' ' sed s/Z/\ /g >expected <<-EOF && 1: 4de457d = 1: f686024 s/5/A/ 2: fccce22 ! 2: 4ab067d s/4/A/ - @@ -2,6 +2,8 @@ + @@ Z Zs/4/A/ Z @@ -210,7 +210,7 @@ test_expect_success 'dual-coloring' ' sed -e "s|^:||" >expect <<-\EOF && :1: a4b = 1: f686024 s/5/A/ :2: f51d370 ! 2: 4ab067d s/4/A/ - :@@ -2,6 +2,8 @@ + :@@ : : s/4/A/ : @@ -220,7 +220,7 @@ test_expect_success 'dual-coloring' ' : --- a/file : +++ b/file :3: 0559556 ! 3: b9cb956 s/11/B/ - :@@ -10,7 +10,7 @@ + :@@ : 9 : 10 : -11 @@ -230,7 +230,7 @@ test_expect_success 'dual-coloring' ' : 13 : 14 :4: d966c5c ! 4: 8add5f1 s/12/B/ - :@@ -8,7 +8,7 @@ + :@@ : @@ A : 9 : 10 -- 2.22.0.510.g264f2c817a
[PATCH v2 12/14] range-diff: add section header instead of diff header
Currently range-diff keeps the diff header of the inner diff intact (apart from stripping lines starting with index). This diff header is somewhat useful, especially when files get different names in different ranges. However there is no real need to keep the whole diff header for that. The main reason we currently do that is probably because it is easy to do. Introduce a new range diff hunk header, that's enclosed by "##", similar to how line numbers in diff hunks are enclosed by "@@", and give human readable information of what exactly happened to the file, including the file name. This improves the readability of the range-diff by giving more concise information to the users. For example if a file was renamed in one iteration, but not in another, the diff of the headers would be quite noisy. However the diff of a single line is concise and should be easier to understand. Additionaly, this allows us to add these range diff section headers to the outer diffs hunk headers using a custom userdiff pattern, which should help making the range-diff more readable. Signed-off-by: Thomas Gummerer --- range-diff.c | 35 t/t3206-range-diff.sh | 91 +++--- t/t3206/history.export | 84 -- 3 files changed, 193 insertions(+), 17 deletions(-) diff --git a/range-diff.c b/range-diff.c index b31fbab026..cc01f7f573 100644 --- a/range-diff.c +++ b/range-diff.c @@ -10,6 +10,7 @@ #include "commit.h" #include "pretty.h" #include "userdiff.h" +#include "apply.h" struct patch_util { /* For the search for an exact match */ @@ -95,12 +96,36 @@ static int read_patches(const char *range, struct string_list *list) } if (starts_with(line, "diff --git")) { + struct patch patch; + struct strbuf root = STRBUF_INIT; + int linenr = 0; + in_header = 0; strbuf_addch(&buf, '\n'); if (!util->diff_offset) util->diff_offset = buf.len; - strbuf_addch(&buf, ' '); - strbuf_addstr(&buf, line); + memset(&patch, 0, sizeof(patch)); + line[len - 1] = '\n'; + len = parse_git_header(&root, &linenr, 1, line, + len, size, &patch); + if (len < 0) + die(_("could not parse git header")); + strbuf_addstr(&buf, " ## "); + if (patch.is_new > 0) + strbuf_addf(&buf, "%s (new)", patch.new_name); + else if (patch.is_delete > 0) + strbuf_addf(&buf, "%s (deleted)", patch.old_name); + else if (patch.is_rename) + strbuf_addf(&buf, "%s => %s", patch.old_name, patch.new_name); + else + strbuf_addstr(&buf, patch.new_name); + + if (patch.new_mode && patch.old_mode && + patch.old_mode != patch.new_mode) + strbuf_addf(&buf, " (mode change %06o => %06o)", + patch.old_mode, patch.new_mode); + + strbuf_addstr(&buf, " ##"); } else if (in_header) { if (starts_with(line, "Author: ")) { strbuf_addstr(&buf, line); @@ -117,17 +142,13 @@ static int read_patches(const char *range, struct string_list *list) if (!(p = strstr(p, "@@"))) die(_("invalid hunk header in inner diff")); strbuf_addstr(&buf, p); - } else if (!line[0] || starts_with(line, "index ")) + } else if (!line[0]) /* * A completely blank (not ' \n', which is context) * line is not valid in a diff. We skip it * silently, because this neatly handles the blank * separator line between commits in git-log * output. -* -* We also want to ignore the diff's `index` lines -* because they contain exact blob hashes in which -* we are not interested. */ continue; else if (line[0] == '>') { diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 9f89af7178..c277756057 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -181,6 +181,85 @@ test_expect_success 'changed commit with sm config' ' test_cmp expected actual ' +test_expect_s
[PATCH v2 06/14] apply: only pass required data to gitdiff_* functions
Currently the 'gitdiff_*()' functions take 'struct apply_state' as parameter, even though they only needs the root, linenr and p_value from that struct. These functions are in the callchain of 'parse_git_header()', which we want to make more generally useful in a subsequent commit. To make that happen we only want to pass in the required data to 'parse_git_header()', and not the whole 'struct apply_state', and thus we want functions in the callchain of 'parse_git_header()' to only take arguments they really need. Signed-off-by: Thomas Gummerer --- apply.c | 59 ++--- 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/apply.c b/apply.c index 3cd4e3d3b3..468f1d3fee 100644 --- a/apply.c +++ b/apply.c @@ -22,6 +22,12 @@ #include "rerere.h" #include "apply.h" +struct parse_git_header_state { + struct strbuf *root; + int linenr; + int p_value; +}; + static void git_apply_config(void) { git_config_get_string_const("apply.whitespace", &apply_default_whitespace); @@ -914,7 +920,7 @@ static int parse_traditional_patch(struct apply_state *state, return 0; } -static int gitdiff_hdrend(struct apply_state *state, +static int gitdiff_hdrend(struct parse_git_header_state *state, const char *line, struct patch *patch) { @@ -933,14 +939,14 @@ static int gitdiff_hdrend(struct apply_state *state, #define DIFF_OLD_NAME 0 #define DIFF_NEW_NAME 1 -static int gitdiff_verify_name(struct apply_state *state, +static int gitdiff_verify_name(struct parse_git_header_state *state, const char *line, int isnull, char **name, int side) { if (!*name && !isnull) { - *name = find_name(&state->root, line, NULL, state->p_value, TERM_TAB); + *name = find_name(state->root, line, NULL, state->p_value, TERM_TAB); return 0; } @@ -949,7 +955,7 @@ static int gitdiff_verify_name(struct apply_state *state, if (isnull) return error(_("git apply: bad git-diff - expected /dev/null, got %s on line %d"), *name, state->linenr); - another = find_name(&state->root, line, NULL, state->p_value, TERM_TAB); + another = find_name(state->root, line, NULL, state->p_value, TERM_TAB); if (!another || strcmp(another, *name)) { free(another); return error((side == DIFF_NEW_NAME) ? @@ -965,7 +971,7 @@ static int gitdiff_verify_name(struct apply_state *state, return 0; } -static int gitdiff_oldname(struct apply_state *state, +static int gitdiff_oldname(struct parse_git_header_state *state, const char *line, struct patch *patch) { @@ -974,7 +980,7 @@ static int gitdiff_oldname(struct apply_state *state, DIFF_OLD_NAME); } -static int gitdiff_newname(struct apply_state *state, +static int gitdiff_newname(struct parse_git_header_state *state, const char *line, struct patch *patch) { @@ -992,21 +998,21 @@ static int parse_mode_line(const char *line, int linenr, unsigned int *mode) return 0; } -static int gitdiff_oldmode(struct apply_state *state, +static int gitdiff_oldmode(struct parse_git_header_state *state, const char *line, struct patch *patch) { return parse_mode_line(line, state->linenr, &patch->old_mode); } -static int gitdiff_newmode(struct apply_state *state, +static int gitdiff_newmode(struct parse_git_header_state *state, const char *line, struct patch *patch) { return parse_mode_line(line, state->linenr, &patch->new_mode); } -static int gitdiff_delete(struct apply_state *state, +static int gitdiff_delete(struct parse_git_header_state *state, const char *line, struct patch *patch) { @@ -1016,7 +1022,7 @@ static int gitdiff_delete(struct apply_state *state, return gitdiff_oldmode(state, line, patch); } -static int gitdiff_newfile(struct apply_state *state, +static int gitdiff_newfile(struct parse_git_header_state *state, const char *line, struct patch *patch) { @@ -1026,47 +1032,47 @@ static int gitdiff_newfile(struct apply_state *state, return gitdiff_newmode(state, line, patch); } -static int gitdiff_copysrc(struct apply_state *state, +static int gitdiff_copysrc(struct parse_git_header_state *state, const char *line, struct patch *patch) { patch->is_copy =
[PATCH v2 13/14] range-diff: add filename to inner diff
In a range-diff it's not always clear which file a certain funcname of the inner diff belongs to, because the diff header (or section header as added in a previous commit) is not always visible in the range-diff. Add the filename to the inner diffs header, so it's always visible to users. This also allows us to add the filename + the funcname to the outer diffs hunk headers using a custom userdiff pattern, which will be done in the next commit. Signed-off-by: Thomas Gummerer --- range-diff.c | 14 -- t/t3206-range-diff.sh | 16 ++-- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/range-diff.c b/range-diff.c index cc01f7f573..09cb1ddbb1 100644 --- a/range-diff.c +++ b/range-diff.c @@ -46,7 +46,7 @@ static int read_patches(const char *range, struct string_list *list) struct strbuf buf = STRBUF_INIT, file = STRBUF_INIT; struct patch_util *util = NULL; int in_header = 1; - char *line; + char *line, *current_filename = NULL; int offset, len; size_t size; @@ -120,6 +120,12 @@ static int read_patches(const char *range, struct string_list *list) else strbuf_addstr(&buf, patch.new_name); + free(current_filename); + if (patch.is_delete > 0) + current_filename = xstrdup(patch.old_name); + else + current_filename = xstrdup(patch.new_name); + if (patch.new_mode && patch.old_mode && patch.old_mode != patch.new_mode) strbuf_addf(&buf, " (mode change %06o => %06o)", @@ -141,7 +147,10 @@ static int read_patches(const char *range, struct string_list *list) } else if (skip_prefix(line, "@@ ", &p)) { if (!(p = strstr(p, "@@"))) die(_("invalid hunk header in inner diff")); - strbuf_addstr(&buf, p); + strbuf_addstr(&buf, "@@"); + if (current_filename && p[2]) + strbuf_addf(&buf, " %s:", current_filename); + strbuf_addstr(&buf, p + 2); } else if (!line[0]) /* * A completely blank (not ' \n', which is context) @@ -172,6 +181,7 @@ static int read_patches(const char *range, struct string_list *list) if (util) string_list_append(list, buf.buf)->util = util; strbuf_release(&buf); + free(current_filename); if (finish_command(&cp)) return -1; diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index c277756057..d4de270979 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -110,7 +110,7 @@ test_expect_success 'changed commit' ' 14 4: a63e992 ! 4: d966c5c s/12/B/ @@ -@@ A +@@ file: A 9 10 - B @@ -169,7 +169,7 @@ test_expect_success 'changed commit with sm config' ' 14 4: a63e992 ! 4: d966c5c s/12/B/ @@ -@@ A +@@ file: A 9 10 - B @@ -203,20 +203,24 @@ test_expect_success 'renamed file' ' Zs/11/B/ Z - ## file ## + -@@ file: A + ## renamed-file ## - Z@@ A + +@@ renamed-file: A Z 8 Z 9 + Z 10 4: a63e992 ! 4: 1e6226b s/12/B/ @@ Z Zs/12/B/ Z - ## file ## + -@@ file: A + ## renamed-file ## - Z@@ A + +@@ renamed-file: A Z 9 Z 10 + Z B EOF test_cmp expected actual ' @@ -248,7 +252,7 @@ test_expect_success 'file added and later removed' ' +s/11/B/ + remove file Z Z ## file ## - Z@@ A + Z@@ file: A @@ Z 12 Z 13 @@ -310,7 +314,7 @@ test_expect_success 'dual-coloring' ' : 14 :4: d966c5c ! 4: 8add5f1 s/12/B/ :@@ - : @@ A + : @@ file: A : 9 : 10 :- BB -- 2.22.0.510.g264f2c817a
[PATCH v2 00/14] output improvements for git range-diff
It's been quite a while since I sent the RFC [1] (thanks all for the comments on that), and the series changed shapes quite a bit since the last round. Since it's been such a long time, just to remind everyone, the goal of this series is to make the range-diff output clearer, by showing information about the filenames to which the current diff belongs. In the previous round, we did this using "section headers" that include information about the current file and adding that to the outer diff's hunk headers. In this round we still keep the section headers (with slightly more information), but in addition we also add the filename to the inner diff hunk headers. In the outer diff hunk headers we then display either the section header or the inner diff hunk header using a userdiff pattern. In terms of code changes the biggest change is that we're now re-using the 'parse_git_header' function from the apply code to parse the diff headers, instead of trying to parse them with some hacky parsing code in range-diff.c. This way we are sure that the diff headers are properly parsed. I had also considered just outputting the section headers directly from 'git log', but then decided against that. Parsing the headers allows a future enhancement of range-diff, where we would allow parsing an mbox file [2]. I split the "only pass required data" commits up, in the hopes of making them easier to review, but I'm also happy to squash them if people feel like that makes it easier to review them. An added advantage of this is that we're also getting rid of things like the similarity index, which are not important in the range-diff, and are thus not represented in the "section header". One thing that did not change is that the new/deleted strings are not translated in this version either. This is similar to the regular diff output, where we also don't translate these. We can still consider translating them in the future though. [1]: https://public-inbox.org/git/20190414210933.20875-1-t.gumme...@gmail.com/ [2]: https://github.com/gitgitgadget/git/issues/207 I'm including the range-diff between this version of the series and the RFC, and a diff between the range diff and the range-diff without these changes below. Probably not useful in reviewing the code, but good to show off the changes made in this series. range-diff: -: -- > 1: ef2245edda apply: replace marc.info link with public-inbox -: -- > 2: 94578fa45c apply: only pass required data to skip_tree_prefix -: -- > 3: 988269a68e apply: only pass required data to git_header_name -: -- > 4: a2c1ef3f5f apply: only pass required data to check_header_line -: -- > 5: 0f4cfe21cb apply: only pass required data to find_name_* -: -- > 6: 7f1d7a4569 apply: only pass required data to gitdiff_* functions -: -- > 7: a5af8b0845 apply: make parse_git_header public 1: 0e678d222c = 8: 1f25bb1002 range-diff: fix function parameter indentation -: -- > 9: 01ed0f2a6a range-diff: split lines manually 2: 48716230fc ! 10: 044a79868b range-diff: don't remove funcname from inner diff @@ range-diff.c: static int read_patches(const char *range, struct string_list *lis strbuf_addch(&buf, '\n'); } continue; -- } else if (starts_with(line.buf, "@@ ")) +- } else if (starts_with(line, "@@ ")) - strbuf_addstr(&buf, "@@"); -- else if (!line.buf[0] || starts_with(line.buf, "index ")) -+ } else if (starts_with(line.buf, "@@ ")) { -+ char *skip_lineno = strstr(line.buf + 3, "@@"); -+ strbuf_remove(&line, 0, skip_lineno - line.buf); -+ strbuf_addch(&buf, ' '); -+ strbuf_addbuf(&buf, &line); -+ } else if (!line.buf[0] || starts_with(line.buf, "index ")) +- else if (!line[0] || starts_with(line, "index ")) ++ } else if (skip_prefix(line, "@@ ", &p)) { ++ if (!(p = strstr(p, "@@"))) ++ die(_("invalid hunk header in inner diff")); ++ strbuf_addstr(&buf, p); ++ } else if (!line[0] || starts_with(line, "index ")) /* * A completely blank (not ' \n', which is context) * line is not valid in a diff. We skip it + + ## t/t3206-range-diff.sh ## +@@ t/t3206-range-diff.sh: test_expect_success 'changed commit' ' + 14 + 4: a63e992 ! 4: d966c5c s/12/B/ + @@ -8,7 +8,7 @@ +- @@ ++ @@ A + 9 + 10 + - B +@@ t/t3206-range-diff.sh: test_expect_success 'changed commit with sm config' ' + 14 + 4: a63e992 ! 4: d966c5c s/12/B/ + @@ -8,7 +8,7 @@ +- @
[PATCH v2 10/14] range-diff: don't remove funcname from inner diff
When postprocessing the inner diff in range-diff, we currently replace the whole hunk header line with just "@@". This matches how 'git tbdiff' used to handle hunk headers as well. Most likely this is being done because line numbers in the hunk header are not relevant without other changes. They can for example easily change if a range is rebased, and lines are added/removed before a change that we actually care about in our ranges. However it can still be useful to have the function name that 'git diff' extracts as additional context for the change. Note that it is not guaranteed that the hunk header actually shows up in the range-diff, and this change only aims to improve the case where a hunk header would already be included in the final output. Signed-off-by: Thomas Gummerer --- range-diff.c | 8 +--- t/t3206-range-diff.sh | 6 +++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/range-diff.c b/range-diff.c index 916afa44c0..484b1ec5a9 100644 --- a/range-diff.c +++ b/range-diff.c @@ -113,9 +113,11 @@ static int read_patches(const char *range, struct string_list *list) strbuf_addch(&buf, '\n'); } continue; - } else if (starts_with(line, "@@ ")) - strbuf_addstr(&buf, "@@"); - else if (!line[0] || starts_with(line, "index ")) + } else if (skip_prefix(line, "@@ ", &p)) { + if (!(p = strstr(p, "@@"))) + die(_("invalid hunk header in inner diff")); + strbuf_addstr(&buf, p); + } else if (!line[0] || starts_with(line, "index ")) /* * A completely blank (not ' \n', which is context) * line is not valid in a diff. We skip it diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 048feaf6dd..aebd4e3693 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -110,7 +110,7 @@ test_expect_success 'changed commit' ' 14 4: a63e992 ! 4: d966c5c s/12/B/ @@ -8,7 +8,7 @@ -@@ +@@ A 9 10 - B @@ -169,7 +169,7 @@ test_expect_success 'changed commit with sm config' ' 14 4: a63e992 ! 4: d966c5c s/12/B/ @@ -8,7 +8,7 @@ -@@ +@@ A 9 10 - B @@ -231,7 +231,7 @@ test_expect_success 'dual-coloring' ' : 14 :4: d966c5c ! 4: 8add5f1 s/12/B/ :@@ -8,7 +8,7 @@ - : @@ + : @@ A : 9 : 10 :- BB -- 2.22.0.510.g264f2c817a
[PATCH v2 03/14] apply: only pass required data to git_header_name
Currently the 'git_header_name()' function takes 'struct apply_state' as parameter, even though it only needs the p_value from that struct. This function is in the callchain of 'parse_git_header()', which we want to make more generally useful in a subsequent commit. To make that happen we only want to pass in the required data to 'parse_git_header()', and not the whole 'struct apply_state', and thus we want functions in the callchain of 'parse_git_header()' to only take arguments they really need. Signed-off-by: Thomas Gummerer --- apply.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/apply.c b/apply.c index fc7083fcbc..ac668e754d 100644 --- a/apply.c +++ b/apply.c @@ -1164,7 +1164,7 @@ static const char *skip_tree_prefix(int p_value, * creation or deletion of an empty file. In any of these cases, * both sides are the same name under a/ and b/ respectively. */ -static char *git_header_name(struct apply_state *state, +static char *git_header_name(int p_value, const char *line, int llen) { @@ -1184,7 +1184,7 @@ static char *git_header_name(struct apply_state *state, goto free_and_fail1; /* strip the a/b prefix including trailing slash */ - cp = skip_tree_prefix(state->p_value, first.buf, first.len); + cp = skip_tree_prefix(p_value, first.buf, first.len); if (!cp) goto free_and_fail1; strbuf_remove(&first, 0, cp - first.buf); @@ -1201,7 +1201,7 @@ static char *git_header_name(struct apply_state *state, if (*second == '"') { if (unquote_c_style(&sp, second, NULL)) goto free_and_fail1; - cp = skip_tree_prefix(state->p_value, sp.buf, sp.len); + cp = skip_tree_prefix(p_value, sp.buf, sp.len); if (!cp) goto free_and_fail1; /* They must match, otherwise ignore */ @@ -1212,7 +1212,7 @@ static char *git_header_name(struct apply_state *state, } /* unquoted second */ - cp = skip_tree_prefix(state->p_value, second, line + llen - second); + cp = skip_tree_prefix(p_value, second, line + llen - second); if (!cp) goto free_and_fail1; if (line + llen - cp != first.len || @@ -1227,7 +1227,7 @@ static char *git_header_name(struct apply_state *state, } /* unquoted first name */ - name = skip_tree_prefix(state->p_value, line, llen); + name = skip_tree_prefix(p_value, line, llen); if (!name) return NULL; @@ -1243,7 +1243,7 @@ static char *git_header_name(struct apply_state *state, if (unquote_c_style(&sp, second, NULL)) goto free_and_fail2; - np = skip_tree_prefix(state->p_value, sp.buf, sp.len); + np = skip_tree_prefix(p_value, sp.buf, sp.len); if (!np) goto free_and_fail2; @@ -1287,7 +1287,7 @@ static char *git_header_name(struct apply_state *state, */ if (!name[len + 1]) return NULL; /* no postimage name */ - second = skip_tree_prefix(state->p_value, name + len + 1, + second = skip_tree_prefix(p_value, name + len + 1, line_len - (len + 1)); if (!second) return NULL; @@ -1333,7 +1333,7 @@ static int parse_git_header(struct apply_state *state, * or removing or adding empty files), so we get * the default name from the header. */ - patch->def_name = git_header_name(state, line, len); + patch->def_name = git_header_name(state->p_value, line, len); if (patch->def_name && state->root.len) { char *s = xstrfmt("%s%s", state->root.buf, patch->def_name); free(patch->def_name); -- 2.22.0.510.g264f2c817a
[PATCH v2 05/14] apply: only pass required data to find_name_*
Currently the 'find_name_*()' functions take 'struct apply_state' as parameter, even though they only need the 'root' member from that struct. These functions are in the callchain of 'parse_git_header()', which we want to make more generally useful in a subsequent commit. To make that happen we only want to pass in the required data to 'parse_git_header()', and not the whole 'struct apply_state', and thus we want functions in the callchain of 'parse_git_header()' to only take arguments they really need. Signed-off-by: Thomas Gummerer --- apply.c | 48 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/apply.c b/apply.c index 1602fd5db0..3cd4e3d3b3 100644 --- a/apply.c +++ b/apply.c @@ -469,7 +469,7 @@ static char *squash_slash(char *name) return name; } -static char *find_name_gnu(struct apply_state *state, +static char *find_name_gnu(struct strbuf *root, const char *line, int p_value) { @@ -495,8 +495,8 @@ static char *find_name_gnu(struct apply_state *state, } strbuf_remove(&name, 0, cp - name.buf); - if (state->root.len) - strbuf_insert(&name, 0, state->root.buf, state->root.len); + if (root->len) + strbuf_insert(&name, 0, root->buf, root->len); return squash_slash(strbuf_detach(&name, NULL)); } @@ -659,7 +659,7 @@ static size_t diff_timestamp_len(const char *line, size_t len) return line + len - end; } -static char *find_name_common(struct apply_state *state, +static char *find_name_common(struct strbuf *root, const char *line, const char *def, int p_value, @@ -702,30 +702,30 @@ static char *find_name_common(struct apply_state *state, return squash_slash(xstrdup(def)); } - if (state->root.len) { - char *ret = xstrfmt("%s%.*s", state->root.buf, len, start); + if (root->len) { + char *ret = xstrfmt("%s%.*s", root->buf, len, start); return squash_slash(ret); } return squash_slash(xmemdupz(start, len)); } -static char *find_name(struct apply_state *state, +static char *find_name(struct strbuf *root, const char *line, char *def, int p_value, int terminate) { if (*line == '"') { - char *name = find_name_gnu(state, line, p_value); + char *name = find_name_gnu(root, line, p_value); if (name) return name; } - return find_name_common(state, line, def, p_value, NULL, terminate); + return find_name_common(root, line, def, p_value, NULL, terminate); } -static char *find_name_traditional(struct apply_state *state, +static char *find_name_traditional(struct strbuf *root, const char *line, char *def, int p_value) @@ -734,7 +734,7 @@ static char *find_name_traditional(struct apply_state *state, size_t date_len; if (*line == '"') { - char *name = find_name_gnu(state, line, p_value); + char *name = find_name_gnu(root, line, p_value); if (name) return name; } @@ -742,10 +742,10 @@ static char *find_name_traditional(struct apply_state *state, len = strchrnul(line, '\n') - line; date_len = diff_timestamp_len(line, len); if (!date_len) - return find_name_common(state, line, def, p_value, NULL, TERM_TAB); + return find_name_common(root, line, def, p_value, NULL, TERM_TAB); len -= date_len; - return find_name_common(state, line, def, p_value, line + len, 0); + return find_name_common(root, line, def, p_value, line + len, 0); } /* @@ -759,7 +759,7 @@ static int guess_p_value(struct apply_state *state, const char *nameline) if (is_dev_null(nameline)) return -1; - name = find_name_traditional(state, nameline, NULL, 0); + name = find_name_traditional(&state->root, nameline, NULL, 0); if (!name) return -1; cp = strchr(name, '/'); @@ -883,17 +883,17 @@ static int parse_traditional_patch(struct apply_state *state, if (is_dev_null(first)) { patch->is_new = 1; patch->is_delete = 0; - name = find_name_traditional(state, second, NULL, state->p_value); + name = find_name_traditional(&state->root, second, NULL, state->p_value); patch->new_name = name; } else if (is_dev_null(second)) { patch->is_new = 0; patch->is_delete = 1; - name = find_name_traditional(sta
[PATCH v2 04/14] apply: only pass required data to check_header_line
Currently the 'check_header_line()' function takes 'struct apply_state' as parameter, even though it only needs the linenr from that struct. This function is in the callchain of 'parse_git_header()', which we want to make more generally useful in a subsequent commit. To make that happen we only want to pass in the required data to 'parse_git_header()', and not the whole 'struct apply_state', and thus we want functions in the callchain of 'parse_git_header()' to only take arguments they really need. Signed-off-by: Thomas Gummerer --- apply.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/apply.c b/apply.c index ac668e754d..1602fd5db0 100644 --- a/apply.c +++ b/apply.c @@ -1302,15 +1302,15 @@ static char *git_header_name(int p_value, } } -static int check_header_line(struct apply_state *state, struct patch *patch) +static int check_header_line(int linenr, struct patch *patch) { int extensions = (patch->is_delete == 1) + (patch->is_new == 1) + (patch->is_rename == 1) + (patch->is_copy == 1); if (extensions > 1) return error(_("inconsistent header lines %d and %d"), -patch->extension_linenr, state->linenr); +patch->extension_linenr, linenr); if (extensions && !patch->extension_linenr) - patch->extension_linenr = state->linenr; + patch->extension_linenr = linenr; return 0; } @@ -1380,7 +1380,7 @@ static int parse_git_header(struct apply_state *state, res = p->fn(state, line + oplen, patch); if (res < 0) return -1; - if (check_header_line(state, patch)) + if (check_header_line(state->linenr, patch)) return -1; if (res > 0) return offset; -- 2.22.0.510.g264f2c817a
[PATCH v2 14/14] range-diff: add headers to the outer hunk header
Add the section headers/hunk headers we introduced in the previous commits to the outer diff's hunk headers. This makes it easier to understand which change we are actually looking at. For example an outer hunk header might now look like: @@ Documentation/config/interactive.txt while previously it would have only been @@ which doesn't give a lot of context for the change that follows. For completeness also add section headers for the commit metadata and the commit message, although they are arguably less important. Signed-off-by: Thomas Gummerer --- range-diff.c | 9 ++--- t/t3206-range-diff.sh | 41 ++--- 2 files changed, 28 insertions(+), 22 deletions(-) diff --git a/range-diff.c b/range-diff.c index 09cb1ddbb1..f43b229031 100644 --- a/range-diff.c +++ b/range-diff.c @@ -134,8 +134,10 @@ static int read_patches(const char *range, struct string_list *list) strbuf_addstr(&buf, " ##"); } else if (in_header) { if (starts_with(line, "Author: ")) { + strbuf_addstr(&buf, " ## Metadata ##\n"); strbuf_addstr(&buf, line); strbuf_addstr(&buf, "\n\n"); + strbuf_addstr(&buf, " ## Commit message ##\n"); } else if (starts_with(line, "")) { p = line + len - 2; while (isspace(*p) && p >= line) @@ -396,8 +398,9 @@ static void output_pair_header(struct diff_options *diffopt, fwrite(buf->buf, buf->len, 1, diffopt->file); } -static struct userdiff_driver no_func_name = { - .funcname = { "$^", 0 } +static struct userdiff_driver section_headers = { + .funcname = { "^ ## (.*) ##$\n" + "^.?@@ (.*)$", REG_EXTENDED } }; static struct diff_filespec *get_filespec(const char *name, const char *p) @@ -409,7 +412,7 @@ static struct diff_filespec *get_filespec(const char *name, const char *p) spec->size = strlen(p); spec->should_munmap = 0; spec->is_stdin = 1; - spec->driver = &no_func_name; + spec->driver = §ion_headers; return spec; } diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index d4de270979..ec548654ce 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -99,7 +99,7 @@ test_expect_success 'changed commit' ' 1: 4de457d = 1: a4b s/5/A/ 2: fccce22 = 2: f51d370 s/4/A/ 3: 147e64e ! 3: 0559556 s/11/B/ - @@ + @@ file: A 9 10 -11 @@ -109,7 +109,7 @@ test_expect_success 'changed commit' ' 13 14 4: a63e992 ! 4: d966c5c s/12/B/ - @@ + @@ file @@ file: A 9 10 @@ -158,7 +158,7 @@ test_expect_success 'changed commit with sm config' ' 1: 4de457d = 1: a4b s/5/A/ 2: fccce22 = 2: f51d370 s/4/A/ 3: 147e64e ! 3: 0559556 s/11/B/ - @@ + @@ file: A 9 10 -11 @@ -168,7 +168,7 @@ test_expect_success 'changed commit with sm config' ' 13 14 4: a63e992 ! 4: d966c5c s/12/B/ - @@ + @@ file @@ file: A 9 10 @@ -186,9 +186,10 @@ test_expect_success 'renamed file' ' sed s/Z/\ /g >expected <<-EOF && 1: 4de457d = 1: f258d75 s/5/A/ 2: fccce22 ! 2: 017b62d s/4/A/ - @@ + @@ Metadata ZAuthor: Thomas Rast Z + Z ## Commit message ## -s/4/A/ +s/4/A/ + rename file Z @@ -198,8 +199,8 @@ test_expect_success 'renamed file' ' Z 1 Z 2 3: 147e64e ! 3: 3ce7af6 s/11/B/ - @@ - Z + @@ Metadata + Z ## Commit message ## Zs/11/B/ Z - ## file ## @@ -210,8 +211,8 @@ test_expect_success 'renamed file' ' Z 9 Z 10 4: a63e992 ! 4: 1e6226b s/12/B/ - @@ - Z + @@ Metadata + Z ## Commit message ## Zs/12/B/ Z - ## file ## @@ -230,30 +231,32 @@ test_expect_success 'file added and later removed' ' sed s/Z/\ /g >expected <<-EOF && 1: 4de457d = 1: 096b1ba s/5/A/ 2: fccce22 ! 2: d92e698 s/4/A/ - @@ + @@ Metadata ZAuthor: Thomas Rast Z + Z ## Commit message ## -s/4/A/ +s/4/A/ + new-file Z Z ## file ## Z@@ - @@ + @@ file Z A Z 6 Z 7 + + ## new-file (new) ## 3: 147e64e ! 3: 9a
[PATCH v2 01/14] apply: replace marc.info link with public-inbox
public-inbox.org links include the whole message ID by default. This means the message can still be found even if the site goes away, which is not the case with the marc.info link. Replace the marc.info link with a more future proof one. Signed-off-by: Thomas Gummerer --- apply.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/apply.c b/apply.c index f15afa9f6a..599cf8956f 100644 --- a/apply.c +++ b/apply.c @@ -478,7 +478,7 @@ static char *find_name_gnu(struct apply_state *state, /* * Proposed "new-style" GNU patch/diff format; see -* http://marc.info/?l=git&m=112927316408690&w=2 +* https://public-inbox.org/git/7vll0wvb2a@assigned-by-dhcp.cox.net/ */ if (unquote_c_style(&name, line, NULL)) { strbuf_release(&name); -- 2.22.0.510.g264f2c817a
[PATCH v2 08/14] range-diff: fix function parameter indentation
Fix the indentation of the function parameters for a couple of functions, to match the style in the rest of the file. Signed-off-by: Thomas Gummerer --- range-diff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/range-diff.c b/range-diff.c index 48b0e1b4ce..9242b8975f 100644 --- a/range-diff.c +++ b/range-diff.c @@ -148,7 +148,7 @@ static int read_patches(const char *range, struct string_list *list) } static int patch_util_cmp(const void *dummy, const struct patch_util *a, -const struct patch_util *b, const char *keydata) + const struct patch_util *b, const char *keydata) { return strcmp(a->diff, keydata ? keydata : b->diff); } @@ -373,7 +373,7 @@ static struct diff_filespec *get_filespec(const char *name, const char *p) } static void patch_diff(const char *a, const char *b, - struct diff_options *diffopt) + struct diff_options *diffopt) { diff_queue(&diff_queued_diff, get_filespec("a", a), get_filespec("b", b)); -- 2.22.0.510.g264f2c817a
[PATCH v2 09/14] range-diff: split lines manually
Currently range-diff uses the 'strbuf_getline()' function for doing its line by line processing. In a future patch we want to do parts of that parsing using the 'parse_git_header()' function, which does requires reading parts of the input from that function, which doesn't use strbufs. Switch range-diff to do our own line by line parsing, so we can re-use the parse_git_header function later. Signed-off-by: Thomas Gummerer --- Longer term it might be better to have both range-diff and apply code use strbufs. However I didn't feel it's worth making that change for this patch series. range-diff.c | 69 +--- 1 file changed, 39 insertions(+), 30 deletions(-) diff --git a/range-diff.c b/range-diff.c index 9242b8975f..916afa44c0 100644 --- a/range-diff.c +++ b/range-diff.c @@ -24,6 +24,17 @@ struct patch_util { struct object_id oid; }; +static unsigned long linelen(const char *buffer, unsigned long size) +{ + unsigned long len = 0; + while (size--) { + len++; + if (*buffer++ == '\n') + break; + } + return len; +} + /* * Reads the patches into a string list, with the `util` field being populated * as struct object_id (will need to be free()d). @@ -31,10 +42,12 @@ struct patch_util { static int read_patches(const char *range, struct string_list *list) { struct child_process cp = CHILD_PROCESS_INIT; - FILE *in; - struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT, file = STRBUF_INIT; struct patch_util *util = NULL; int in_header = 1; + char *line; + int offset, len; + size_t size; argv_array_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges", "--reverse", "--date-order", "--decorate=no", @@ -54,17 +67,15 @@ static int read_patches(const char *range, struct string_list *list) if (start_command(&cp)) return error_errno(_("could not start `log`")); - in = fdopen(cp.out, "r"); - if (!in) { - error_errno(_("could not read `log` output")); - finish_command(&cp); - return -1; - } + strbuf_read(&file, cp.out, 0); - while (strbuf_getline(&line, in) != EOF) { + line = strbuf_detach(&file, &size); + for (offset = 0; size > 0; offset += len, size -= len, line += len) { const char *p; - if (skip_prefix(line.buf, "commit ", &p)) { + len = linelen(line, size); + line[len - 1] = '\0'; + if (skip_prefix(line, "commit ", &p)) { if (util) { string_list_append(list, buf.buf)->util = util; strbuf_reset(&buf); @@ -75,8 +86,6 @@ static int read_patches(const char *range, struct string_list *list) free(util); string_list_clear(list, 1); strbuf_release(&buf); - strbuf_release(&line); - fclose(in); finish_command(&cp); return -1; } @@ -85,26 +94,28 @@ static int read_patches(const char *range, struct string_list *list) continue; } - if (starts_with(line.buf, "diff --git")) { + if (starts_with(line, "diff --git")) { in_header = 0; strbuf_addch(&buf, '\n'); if (!util->diff_offset) util->diff_offset = buf.len; strbuf_addch(&buf, ' '); - strbuf_addbuf(&buf, &line); + strbuf_addstr(&buf, line); } else if (in_header) { - if (starts_with(line.buf, "Author: ")) { - strbuf_addbuf(&buf, &line); + if (starts_with(line, "Author: ")) { + strbuf_addstr(&buf, line); strbuf_addstr(&buf, "\n\n"); - } else if (starts_with(line.buf, "")) { - strbuf_rtrim(&line); - strbuf_addbuf(&buf, &line); + } else if (starts_with(line, "")) { + p = line + len - 2; + while (isspace(*p) && p >= line) + p--; + strbuf_add(&buf, line, p - line + 1); strbuf_addch(&buf, '\n'); } continue; - } else if (starts_with(line.buf, "@@ ")) + } else if (starts_with(line, "@
[PATCH v2 02/14] apply: only pass required data to skip_tree_prefix
Currently the 'skip_tree_prefix()' function takes 'struct apply_state' as parameter, even though it only needs the p_value from that struct. This function is in the callchain of 'parse_git_header()', which we want to make more generally useful in a subsequent commit. To make that happen we only want to pass in the required data to 'parse_git_header()', and not the whole 'struct apply_state', and thus we want functions in the callchain of 'parse_git_header()' to only take arguments they really need. Signed-off-by: Thomas Gummerer --- apply.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/apply.c b/apply.c index 599cf8956f..fc7083fcbc 100644 --- a/apply.c +++ b/apply.c @@ -1137,17 +1137,17 @@ static int gitdiff_unrecognized(struct apply_state *state, * Skip p_value leading components from "line"; as we do not accept * absolute paths, return NULL in that case. */ -static const char *skip_tree_prefix(struct apply_state *state, +static const char *skip_tree_prefix(int p_value, const char *line, int llen) { int nslash; int i; - if (!state->p_value) + if (!p_value) return (llen && line[0] == '/') ? NULL : line; - nslash = state->p_value; + nslash = p_value; for (i = 0; i < llen; i++) { int ch = line[i]; if (ch == '/' && --nslash <= 0) @@ -1184,7 +1184,7 @@ static char *git_header_name(struct apply_state *state, goto free_and_fail1; /* strip the a/b prefix including trailing slash */ - cp = skip_tree_prefix(state, first.buf, first.len); + cp = skip_tree_prefix(state->p_value, first.buf, first.len); if (!cp) goto free_and_fail1; strbuf_remove(&first, 0, cp - first.buf); @@ -1201,7 +1201,7 @@ static char *git_header_name(struct apply_state *state, if (*second == '"') { if (unquote_c_style(&sp, second, NULL)) goto free_and_fail1; - cp = skip_tree_prefix(state, sp.buf, sp.len); + cp = skip_tree_prefix(state->p_value, sp.buf, sp.len); if (!cp) goto free_and_fail1; /* They must match, otherwise ignore */ @@ -1212,7 +1212,7 @@ static char *git_header_name(struct apply_state *state, } /* unquoted second */ - cp = skip_tree_prefix(state, second, line + llen - second); + cp = skip_tree_prefix(state->p_value, second, line + llen - second); if (!cp) goto free_and_fail1; if (line + llen - cp != first.len || @@ -1227,7 +1227,7 @@ static char *git_header_name(struct apply_state *state, } /* unquoted first name */ - name = skip_tree_prefix(state, line, llen); + name = skip_tree_prefix(state->p_value, line, llen); if (!name) return NULL; @@ -1243,7 +1243,7 @@ static char *git_header_name(struct apply_state *state, if (unquote_c_style(&sp, second, NULL)) goto free_and_fail2; - np = skip_tree_prefix(state, sp.buf, sp.len); + np = skip_tree_prefix(state->p_value, sp.buf, sp.len); if (!np) goto free_and_fail2; @@ -1287,7 +1287,7 @@ static char *git_header_name(struct apply_state *state, */ if (!name[len + 1]) return NULL; /* no postimage name */ - second = skip_tree_prefix(state, name + len + 1, + second = skip_tree_prefix(state->p_value, name + len + 1, line_len - (len + 1)); if (!second) return NULL; -- 2.22.0.510.g264f2c817a
Hi
I need your help
Re: mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)
Hi Matheus, On Fri, 5 Jul 2019, Matheus Tavares Bernardino wrote: > On Thu, Jul 4, 2019 at 6:30 PM Johannes Schindelin > wrote: > > > > Hi Matheus, > > > > On Thu, 4 Jul 2019, Matheus Tavares Bernardino wrote: > > > > > > I wanted to take a look at the failures to see if I could help, [...] > > > Could you point me to the right place, please? > [...] > > > > I usually click on the "Tests" tab in that page: > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=11495&view=ms.vss-test-web.build-test-results-tab > > > > You can click on any of the 1,384 (!) failing test cases, it will pop up a > > pane on the right-hand side that shows the trace of that failing test > > case. For the full trace of that test script, go to "Attachments" and > > download the `Standard_Error_Output.log` (via the horizontal bread-crumbs > > menu you can see when hovering over the file name). > > Thanks for the explanation! I inspected some of the > `Standard_Error_Output.log`'s and it seems the problem is always with > local clone (which started to use dir-iterator in this series). It > seems all .git/objects/ dirs are being ignored. That makes sense, > since st_ino will always be 0 on Windows. But your fixup patch should > solve this. Is there any azure build for it? There is no direct Azure Pipeline run for it, as I have not created a branch with the fixup on top of your branch. But the shears/pu branch in https://github.com/git-for-windows/git/branches does have the fixup, and passes all tests. > [...] > > > > > > Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off > > > when resolving a path. So we may also "ignore" circular symlinks and > > > let the iteration continue until the point where `stat()` will return > > > an ELOOP. (But it may be expensive...) > > > > This would not be equivalent, though, as your code also tried to address > > circular references when two or more symlinks are involved, e.g. when > > one symlink points to a directory that has another symlink that points to > > the directory containing the first symlink. > > Hm, `stat()` also addresses this case doesn't it? For example: > > $ mkdir a b > $ ln -s ../a b/s2a > $ ln -s ../b a/s2b > $ stat b/s2a/s2b/s2a/.../s2b > > Gives me: > "too many levels of symbolic links" Okay, then. Even better. (With the caveat that I don't know how ubiquitous this behavior is, I assume you only tested on Linux.) > > > > Do we _have_ to, though? At some stage the path we come up with is > > > > beyond > > > > `PATH_MAX` and we can stop right then and there. > > > > > > > > Besides, the way `find_recursive_symlinks()` is implemented adds > > > > quadratic > > > > behavior. > > > > > > Yes, indeed. But it only happens when we have a path like this: > > > `symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`, > > > right? I think this case won't be very usual on actual filesystems, > > > thought. > > > > No, the check is performed in a loop, and that loop is executed whether > > you have symlinks or not. That loop is executed for every item in the > > iteration, and as we cannot assume a flat directory in general (in fact, > > we should assume a directory depth proportional to the total number of > > files), that adds that quadratic behavior. > > Oh, you're right. Sorry for the noise, I forgot this function was not > only called for symlinks but for every directory entry :( > > An alternative solution would be to use `lstat()` together with > `stat()` to only feed symlinked dirs to this function. This should > reduce a lot the number of calls. Although it'd still be quadratic in > the worst case, would that be any good? Why not just skip this logic? At least for now? It really blocks the development of this patch series, causing `pu` to be broken until the test failures are resolved. > [...] > > > > But I also think there are enough > > > > reasons to do away with this function in the first place. > > > > > > We can delegate the circular symlinks problem to `stat()'s ELOOP` > > > > Not really. I mean, we _already_ delegate to the `ELOOP` condition, we > > cannot even avoid it as long as we keep using `stat()` instead of > > `lstat()` > > Yes, indeed. What I meant is that we may chose to _fully_ delegate to > ELOOP. The way it is now, we should detect circular symlinks way > before stat() fails. For example, with the case I showed above, we > would stop at "b/s2a/s2b" whereas stat() would only fail at a much > longer "b/s2a/s2b/s2a/s2b/...", far beyond in the iteration. Sounds like the solution to me that I wanted: drop the special code to detect circular symlinks. In other words: I like that idea. > > > The only downside is the overhead of iterating through directories which > > > will be latter discarded for being in circular symlinks' chains. I mean, > > > the overhead at dir-iterator shouldn't be much, but the one on the code > > > using this API to do something for each of these directories (and its > > > contents), m
Re: [PATCH v2 07/14] apply: make parse_git_header public
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > Make parse_git_header a "public" function in apply.h, so we can re-use > it in range-diff in a subsequent commit. This function probably wants a different name when it becomes public, as it was relatively obvious inside `apply.c` that it was talking about a Git _diff_ header. Maybe `parse_git_diff_header()`? > @@ -127,6 +161,20 @@ int init_apply_state(struct apply_state *state, > void clear_apply_state(struct apply_state *state); > int check_apply_state(struct apply_state *state, int force_apply); > > +/* > + * Parse a get header, starting at line. Fills the relevant metadata s/get/git/ Ciao, Dscho > + * information in 'struct patch'. > + * > + * Returns -1 on failure, the length of the parsed header otherwise. > + */ > +int parse_git_header(struct strbuf *root, > + int *linenr, > + int p_value, > + const char *line, > + int len, > + unsigned int size, > + struct patch *patch); > + > /* > * Some aspects of the apply behavior are controlled by the following > * bits in the "options" parameter passed to apply_all_patches(). > -- > 2.22.0.510.g264f2c817a > >
Re: [PATCH v2 06/14] apply: only pass required data to gitdiff_* functions
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > Currently the 'gitdiff_*()' functions take 'struct apply_state' as > parameter, even though they only needs the root, linenr and p_value > from that struct. > > These functions are in the callchain of 'parse_git_header()', which we > want to make more generally useful in a subsequent commit. To make > that happen we only want to pass in the required data to > 'parse_git_header()', and not the whole 'struct apply_state', and thus > we want functions in the callchain of 'parse_git_header()' to only > take arguments they really need. This commit message follows the exact pattern of the previous patches (and they were pretty convincing to me), but... > diff --git a/apply.c b/apply.c > index 3cd4e3d3b3..468f1d3fee 100644 > --- a/apply.c > +++ b/apply.c > @@ -22,6 +22,12 @@ > #include "rerere.h" > #include "apply.h" > > +struct parse_git_header_state { > + struct strbuf *root; > + int linenr; > + int p_value; > +}; > + > static void git_apply_config(void) > { > git_config_get_string_const("apply.whitespace", > &apply_default_whitespace); > @@ -914,7 +920,7 @@ static int parse_traditional_patch(struct apply_state > *state, > return 0; > } > > -static int gitdiff_hdrend(struct apply_state *state, > +static int gitdiff_hdrend(struct parse_git_header_state *state, ... here we pass a different, newly-invented struct instead of passing the relevant information explicitly. My guess is that the code would look horribly verbose if we started passing three instead of one parameter? If that is the case, I think a little side note in the commit message might be warranted. Ciao, Dscho
Re: Handling text files encoded in little-endian UTF-16 with BOM
On Fri, 5 Jul 2019 at 18:25, Torsten Bögershausen wrote: > On Fri, Jul 05, 2019 at 01:35:13PM +0200, Mateusz Loskot wrote: > > > > Using git version 2.22.0.windows.1 > > > > I have a repository with number of .txt files encoded in > > little-endian UTF-16 with BOM. > > > > What are the best practice and recommended configuration to > > manage such files with Git to avoid unexpected re-encoding to > > UTF-8 or others? > > > > Currently, there is .gitattriuts with entries like > > > >resource/*.txt working-tree-encoding=UTF-16LE-BOM -text > > > > Despite that some of team members have noticed that the files > > occacionally get re-encoded to UTF-8. It is unknow what are > > actual steps leading to that. BTW, there a few Git clients > > in use: git in Git Bash, VSCode, Fork. > > If possible, I don't want to comment on this kind of > "sometimes something happens something on someones computer" thing. Perfectly understood. > A little bit more information could be helpful. If there was more, I'd have provided. > > What bothers me in the .gitattributes is this `-text` attribute. > > > > Is the use of `working-tree-encoding` and `-text` together a > > valid combination at all? > > Yes, it means that the content re-encoded between the repo and the working > tree, > (that is what you want) > And the "-text" means "leave the line endings" (LF or CRLF) as is, don't > convert them. That's quite a useful insight. I understood "-text" means content is not a text, but binary. > In that sense you can call that a legal combination, but may be not a > recommended one. Right. > > The documentation at https://git-scm.com/docs/gitattributes > > does not seem to touch on that. > > > > I'll appreciate any suggestions on those UTF-16LE-BOM files. > > > > My suggestion would be to use the "text" attribute: > resource/*.txt working-tree-encoding=UTF-16LE-BOM text > > And depending on your application: Do the resource files need a special line > ending ? I need CRLF. > The use either > resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=LF > or > resource/*.txt working-tree-encoding=UTF-16LE-BOM text eol=CRLF This is very helpful. Thanks! -- Mateusz Loskot, http://mateusz.loskot.net
Re: [PATCH v2 09/14] range-diff: split lines manually
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > Currently range-diff uses the 'strbuf_getline()' function for doing > its line by line processing. In a future patch we want to do parts of > that parsing using the 'parse_git_header()' function, which does If you like my suggestion in patch 7/14, this commit message needs to talk about the new name, too. > requires reading parts of the input from that function, which doesn't s/requires/require/ > use strbufs. > > Switch range-diff to do our own line by line parsing, so we can re-use > the parse_git_header function later. > > Signed-off-by: Thomas Gummerer > --- > > Longer term it might be better to have both range-diff and apply code > use strbufs. However I didn't feel it's worth making that change for > this patch series. Makes sense. > range-diff.c | 69 +--- > 1 file changed, 39 insertions(+), 30 deletions(-) > > diff --git a/range-diff.c b/range-diff.c > index 9242b8975f..916afa44c0 100644 > --- a/range-diff.c > +++ b/range-diff.c > @@ -24,6 +24,17 @@ struct patch_util { > struct object_id oid; > }; > > +static unsigned long linelen(const char *buffer, unsigned long size) Shouldn't this be `size_t`? > +{ > + unsigned long len = 0; Likewise. > + while (size--) { > + len++; > + if (*buffer++ == '\n') > + break; > + } > + return len; How about const char *eol = memchr(buffer, '\n', size); return !eol ? size : eol + 1 - buffer; instead? For an extra brownie point, you could even rename this function to `find_end_of_line()` and replace the LF by a NUL: if (!eol) return size; *eol = '\0'; return eol + 1 - buffer; > +} > + > /* > * Reads the patches into a string list, with the `util` field being > populated > * as struct object_id (will need to be free()d). > @@ -31,10 +42,12 @@ struct patch_util { > static int read_patches(const char *range, struct string_list *list) > { > struct child_process cp = CHILD_PROCESS_INIT; > - FILE *in; > - struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT; > + struct strbuf buf = STRBUF_INIT, file = STRBUF_INIT; This puzzled me. I'd like to suggest s/file/contents/ > struct patch_util *util = NULL; > int in_header = 1; > + char *line; > + int offset, len; > + size_t size; > > argv_array_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges", > "--reverse", "--date-order", "--decorate=no", > @@ -54,17 +67,15 @@ static int read_patches(const char *range, struct > string_list *list) > > if (start_command(&cp)) > return error_errno(_("could not start `log`")); > - in = fdopen(cp.out, "r"); > - if (!in) { > - error_errno(_("could not read `log` output")); > - finish_command(&cp); > - return -1; > - } > + strbuf_read(&file, cp.out, 0); Shouldn't we handle a negative return value here, erroring out with "could not read `log` output" as before? > > - while (strbuf_getline(&line, in) != EOF) { > + line = strbuf_detach(&file, &size); I strongly suspect this to leak, given that `line` is subsequently advanced, and there is no backup copy. Maybe line = file.buf; size = file.len; would make more sense here? > + for (offset = 0; size > 0; offset += len, size -= len, line += len) { > const char *p; > > - if (skip_prefix(line.buf, "commit ", &p)) { > + len = linelen(line, size); > + line[len - 1] = '\0'; > + if (skip_prefix(line, "commit ", &p)) { > if (util) { > string_list_append(list, buf.buf)->util = util; > strbuf_reset(&buf); > @@ -75,8 +86,6 @@ static int read_patches(const char *range, struct > string_list *list) > free(util); > string_list_clear(list, 1); > strbuf_release(&buf); > - strbuf_release(&line); > - fclose(in); We should release the file contents in `file` (or `contents`, if you like my suggestions) here. > finish_command(&cp); > return -1; > } > @@ -85,26 +94,28 @@ static int read_patches(const char *range, struct > string_list *list) > continue; > } > > - if (starts_with(line.buf, "diff --git")) { > + if (starts_with(line, "diff --git")) { > in_header = 0; > strbuf_addch(&buf, '\n'); > if (!util->diff_offset) > util->diff_offset = buf.len; > strbuf_addch(&buf, ' '); > - st
Re: [PATCH v2 10/14] range-diff: don't remove funcname from inner diff
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > When postprocessing the inner diff in range-diff, we currently replace > the whole hunk header line with just "@@". This matches how 'git > tbdiff' used to handle hunk headers as well. Right, that's why I did it this way: `tbdiff` did it. > Most likely this is being done because line numbers in the hunk header > are not relevant without other changes. Yep, and even worse, it also leads to uninteresting differences. > They can for example easily change if a range is rebased, and lines are > added/removed before a change that we actually care about in our ranges. Exactly. > However it can still be useful to have the function name that 'git > diff' extracts as additional context for the change. > > Note that it is not guaranteed that the hunk header actually shows up > in the range-diff, and this change only aims to improve the case where > a hunk header would already be included in the final output. Very important paragraph here, thank you! > Signed-off-by: Thomas Gummerer > --- > range-diff.c | 8 +--- > t/t3206-range-diff.sh | 6 +++--- > 2 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/range-diff.c b/range-diff.c > index 916afa44c0..484b1ec5a9 100644 > --- a/range-diff.c > +++ b/range-diff.c > @@ -113,9 +113,11 @@ static int read_patches(const char *range, struct > string_list *list) > strbuf_addch(&buf, '\n'); > } > continue; > - } else if (starts_with(line, "@@ ")) > - strbuf_addstr(&buf, "@@"); > - else if (!line[0] || starts_with(line, "index ")) > + } else if (skip_prefix(line, "@@ ", &p)) { > + if (!(p = strstr(p, "@@"))) > + die(_("invalid hunk header in inner diff")); That's a bit more strict than the preimage. How about p = strstr(p, "@@"); strbuf_addstr(&buf, p ? p : "@@"); instead? The rest of the patch looks fine to me, Dscho > + strbuf_addstr(&buf, p); > + } else if (!line[0] || starts_with(line, "index ")) > /* >* A completely blank (not ' \n', which is context) >* line is not valid in a diff. We skip it > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index 048feaf6dd..aebd4e3693 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -110,7 +110,7 @@ test_expect_success 'changed commit' ' > 14 > 4: a63e992 ! 4: d966c5c s/12/B/ > @@ -8,7 +8,7 @@ > - @@ > + @@ A > 9 > 10 > - B > @@ -169,7 +169,7 @@ test_expect_success 'changed commit with sm config' ' > 14 > 4: a63e992 ! 4: d966c5c s/12/B/ > @@ -8,7 +8,7 @@ > - @@ > + @@ A > 9 > 10 > - B > @@ -231,7 +231,7 @@ test_expect_success 'dual-coloring' ' > : 14 > :4: d966c5c ! 4: > 8add5f1 s/12/B/ > :@@ -8,7 +8,7 @@ > - : @@ > + : @@ A > : 9 > : 10 > :- BB > -- > 2.22.0.510.g264f2c817a > >
Re: [PATCH v2 12/14] range-diff: add section header instead of diff header
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > Currently range-diff keeps the diff header of the inner diff > intact (apart from stripping lines starting with index). This diff > header is somewhat useful, especially when files get different > names in different ranges. > > However there is no real need to keep the whole diff header for that. > The main reason we currently do that is probably because it is easy to > do. > > Introduce a new range diff hunk header, that's enclosed by "##", > similar to how line numbers in diff hunks are enclosed by "@@", and > give human readable information of what exactly happened to the file, > including the file name. > > This improves the readability of the range-diff by giving more concise > information to the users. For example if a file was renamed in one > iteration, but not in another, the diff of the headers would be quite > noisy. However the diff of a single line is concise and should be > easier to understand. > > Additionaly, this allows us to add these range diff section headers to s/Additionaly/Additionally/ > the outer diffs hunk headers using a custom userdiff pattern, which > should help making the range-diff more readable. Makes sense. > Signed-off-by: Thomas Gummerer > --- > range-diff.c | 35 > t/t3206-range-diff.sh | 91 +++--- > t/t3206/history.export | 84 -- > 3 files changed, 193 insertions(+), 17 deletions(-) > > diff --git a/range-diff.c b/range-diff.c > index b31fbab026..cc01f7f573 100644 > --- a/range-diff.c > +++ b/range-diff.c > @@ -10,6 +10,7 @@ > #include "commit.h" > #include "pretty.h" > #include "userdiff.h" > +#include "apply.h" > > struct patch_util { > /* For the search for an exact match */ > @@ -95,12 +96,36 @@ static int read_patches(const char *range, struct > string_list *list) > } > > if (starts_with(line, "diff --git")) { > + struct patch patch; If you append ` = { 0 }` (or ` = { NULL }`, depending on the first field of that struct, you don't need that `memset()` later. > + struct strbuf root = STRBUF_INIT; > + int linenr = 0; > + > in_header = 0; > strbuf_addch(&buf, '\n'); > if (!util->diff_offset) > util->diff_offset = buf.len; > - strbuf_addch(&buf, ' '); > - strbuf_addstr(&buf, line); > + memset(&patch, 0, sizeof(patch)); > + line[len - 1] = '\n'; I guess `parse_git_header()` cannot handle lines ending in a NUL? > + len = parse_git_header(&root, &linenr, 1, line, > +len, size, &patch); > + if (len < 0) > + die(_("could not parse git header")); Maybe include the line's contents, like ` '%.*s'", (int)len, line`? > + strbuf_addstr(&buf, " ## "); > + if (patch.is_new > 0) > + strbuf_addf(&buf, "%s (new)", patch.new_name); > + else if (patch.is_delete > 0) > + strbuf_addf(&buf, "%s (deleted)", > patch.old_name); > + else if (patch.is_rename) > + strbuf_addf(&buf, "%s => %s", patch.old_name, > patch.new_name); > + else > + strbuf_addstr(&buf, patch.new_name); > + > + if (patch.new_mode && patch.old_mode && > + patch.old_mode != patch.new_mode) > + strbuf_addf(&buf, " (mode change %06o => %06o)", > + patch.old_mode, patch.new_mode); > + > + strbuf_addstr(&buf, " ##"); > } else if (in_header) { > if (starts_with(line, "Author: ")) { > strbuf_addstr(&buf, line); > @@ -117,17 +142,13 @@ static int read_patches(const char *range, struct > string_list *list) > if (!(p = strstr(p, "@@"))) > die(_("invalid hunk header in inner diff")); > strbuf_addstr(&buf, p); > - } else if (!line[0] || starts_with(line, "index ")) > + } else if (!line[0]) > /* >* A completely blank (not ' \n', which is context) >* line is not valid in a diff. We skip it >* silently, because this neatly handles the blank >* separator line between commits in git-log >* output. > - * > - * We also want to ignore the diff's `index` lines > - * because they contain
Re: [PATCH v2 00/14] output improvements for git range-diff
Hi Thomas, On Fri, 5 Jul 2019, Thomas Gummerer wrote: > It's been quite a while since I sent the RFC [1] (thanks all for the > comments on that), and the series changed shapes quite a bit since the > last round. > > Since it's been such a long time, just to remind everyone, the goal of > this series is to make the range-diff output clearer, by showing > information about the filenames to which the current diff belongs. Thank you for that reminder ;-) > In the previous round, we did this using "section headers" that > include information about the current file and adding that to the > outer diff's hunk headers. > > In this round we still keep the section headers (with slightly more > information), but in addition we also add the filename to the inner > diff hunk headers. In the outer diff hunk headers we then display > either the section header or the inner diff hunk header using a > userdiff pattern. I like this idea! > In terms of code changes the biggest change is that we're now re-using > the 'parse_git_header' function from the apply code to parse the diff > headers, instead of trying to parse them with some hacky parsing code > in range-diff.c. This way we are sure that the diff headers are > properly parsed. Yep, very good. > I had also considered just outputting the section headers directly > from 'git log', but then decided against that. Parsing the headers > allows a future enhancement of range-diff, where we would allow > parsing an mbox file [2]. Thanks you for your consideration; I still would like to have the option at some stage to compare a patch series from public-inbox.org/git to the commits in `pu`, without having to fiddle with finding a valid base commit to apply the patches on. > I split the "only pass required data" commits up, in the hopes of > making them easier to review, but I'm also happy to squash them if > people feel like that makes it easier to review them. I found it very easy to review in the current form, thank you for putting in the extra effort. > An added advantage of this is that we're also getting rid of things > like the similarity index, which are not important in the range-diff, > and are thus not represented in the "section header". > > One thing that did not change is that the new/deleted strings are not > translated in this version either. This is similar to the regular > diff output, where we also don't translate these. We can still > consider translating them in the future though. > > [1]: https://public-inbox.org/git/20190414210933.20875-1-t.gumme...@gmail.com/ > [2]: https://github.com/gitgitgadget/git/issues/207 > > I'm including the range-diff between this version of the series and > the RFC, and a diff between the range diff and the range-diff without > these changes below. Probably not useful in reviewing the code, but > good to show off the changes made in this series. Indeed! I very much like the idea, and the current iteration. I offered a couple of minor suggestions, in the hope that you find them helpful. Thank you, Dscho
[PATCH v1] merge - rename a shadowed variable in cmd_merge
variable ret used in cmd_merge introduced in d5a35c114ab was already a local variable used inside a for loop inside the function. for-local variable is being renamed to ret_try_merge to avoid shadow. --- builtin/merge.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/builtin/merge.c b/builtin/merge.c index 6e99aead46..972b6c376a 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -1587,7 +1587,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix) oidclr(&stash); for (i = 0; i < use_strategies_nr; i++) { - int ret; + int ret_try_merge; if (i) { printf(_("Rewinding the tree to pristine...\n")); restore_state(&head_commit->object.oid, &stash); @@ -1601,26 +1601,26 @@ int cmd_merge(int argc, const char **argv, const char *prefix) */ wt_strategy = use_strategies[i]->name; - ret = try_merge_strategy(use_strategies[i]->name, -common, remoteheads, -head_commit); - if (!option_commit && !ret) { + ret_try_merge = try_merge_strategy(use_strategies[i]->name, + common, remoteheads, + head_commit); + if (!option_commit && !ret_try_merge) { merge_was_ok = 1; /* * This is necessary here just to avoid writing * the tree, but later we will *not* exit with * status code 1 because merge_was_ok is set. */ - ret = 1; + ret_try_merge = 1; } - if (ret) { + if (ret_try_merge) { /* * The backend exits with 1 when conflicts are * left to be resolved, with 2 when it does not * handle the given merge at all. */ - if (ret == 1) { + if (ret_try_merge == 1) { int cnt = evaluate_result(); if (best_cnt <= 0 || cnt <= best_cnt) { -- 2.22.0.214.g8dca754b1e
Re: Virtual Git Contributor Summit
Hi Jakub, On Fri, 5 Jul 2019, Jakub Narebski wrote: > Johannes Schindelin writes: > > > I kept talking about this idea of a purely online Git Contributor > > Summit, and it is finally time for action. > > > > The idea: just like the Git Contributor Summits we have on the day > > before GitMerge, but instead of traveling to the same place, we'll use > > an online conferencing system. > > By the way, would there be GitTogether meeting held in conjunction with > Google Summer of Code 2019 Mentors Meeting? I know of no such plans. Please note also that the GSoC summit is in Munich, Germany this year. > > Obviously, there will be more people interested in participating; Please > > do drop me an email if you're interested and I will send you a link to > > that spreadsheet (as I do not read my mails continuously, it might take a > > day, but I will, promise). > > I think it would be good idea to have some of us from Git Rev News team > to chronicle this Virtual Git Contributors Summit meeting. Sure, I forwarded the information to you (Christian already had it). Ciao, Dscho
Fwd: Discovering the new parents in commit hooks
Hi, I’d like to be able to discover the new parents-to-be SHA-1s during the hooks that run before a commit*. Essentially I’d like be able to distinguish the ‘git commit’ case from ‘git commit —amend’. Is there already a way to do this that I have overlooked? (I’ve read ‘man githooks’ and searched the wiki and various other places.) If not, I would propose that perhaps the hooks could be passed a GIT_PARENTS (or perhaps GIT_NEW_PARENTS) environment variable which in the ‘not amend’ case would contain the SHA-1 for HEAD and MERGE_HEAD (if appropriate) and in the ‘—amend' case would contain HEAD^@ (all of the parents of HEAD). [*] Specifically, it seems a useful thing to be able to find out in these hooks: applypatch-msg, pre-applypatch, commit-msg, prepare-commit-msg, commit-msg Thanks.