Re: [PATCH v2 07/21] builtin/receive-pack: convert portions to struct object_id
On Sun, Mar 26, 2017 at 04:01:29PM +, brian m. carlson wrote: > Convert some hardcoded constants into uses of parse_oid_hex. > Additionally, convert all uses of struct command, and miscellaneous > other functions necessary for that. This work is necessary to be able > to convert sha1_array_append later on. > > To avoid needing to specify a constant, reject shallow lines with the > wrong length instead of simply ignoring them. It took me a while to find it. This is the switch from "len == 48" to "len > 8" when matching "shallow" lines. I think this makes sense. > Note that in queue_command we are guaranteed to have a NUL-terminated > buffer or at least one byte of overflow that we can safely read, so the > linelen check can be elided. We would die in such a case, but not read > invalid memory. I think linelen is always just strlen(line). Since the queue_command function no longer cares about it, perhaps we can just omit it? > @@ -1541,12 +1541,12 @@ static struct command *read_head_info(struct > sha1_array *shallow) > if (!line) > break; > > - if (len == 48 && starts_with(line, "shallow ")) { > - unsigned char sha1[20]; > - if (get_sha1_hex(line + 8, sha1)) > + if (len > 8 && starts_with(line, "shallow ")) { > + struct object_id oid; > + if (get_oid_hex(line + 8, &oid)) > die("protocol error: expected shallow sha, got > '%s'", > line + 8); This would be much nicer as: if (skip_prefix(line, "shallow ", &hex)) It's probably best to keep to one type of cleanup at a time here. I'm just making a mental note. -Peff
Re: [PATCH v2 15/21] sha1-array: convert internal storage for struct sha1_array to object_id
On Sun, Mar 26, 2017 at 04:01:37PM +, brian m. carlson wrote: > Make the internal storage for struct sha1_array use an array of struct > object_id internally. Update the users of this struct which inspect its > internals. Yay. I'm happy to see all those gross (*sha1)[20] bits go away. -Peff
Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *
On Sun, Mar 26, 2017 at 04:01:38PM +, brian m. carlson wrote: > diff --git a/transport.c b/transport.c > index 8a90b0c29b..e492757726 100644 > --- a/transport.c > +++ b/transport.c > @@ -1027,7 +1027,8 @@ int transport_push(struct transport *transport, > > for (; ref; ref = ref->next) > if (!is_null_oid(&ref->new_oid)) > - sha1_array_append(&commits, > ref->new_oid.hash); > + sha1_array_append(&commits, > + &ref->new_oid); Funny that this line wrapped when it got shorter. :) I think wrapping is the right thing, though (it is longer than 80). -Peff
Re: [PATCH v2 00/21] object_id part 7
On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote: > This is part 7 in the continuing transition to use struct object_id. > > This series focuses on two main areas: adding two constants for the > maximum hash size we'll be using (which will be suitable for allocating > memory) and converting struct sha1_array to struct oid_array. Both changes are very welcome. I do think it's probably worth changing the name of sha1-array.[ch], but it doesn't need to happen immediately. I read through the whole series and didn't find anything objectionable. The pointer-arithmetic fix should perhaps graduate separately. I suggested an additional cleanup around "linelen" in one patch. In the name of keeping the number of re-rolls sane, I'm OK if we skip that for now (the only reason I mentioned it at all is that you have to justify the caveat in the commit message; with the fix, that justification can go away). -Peff
Re: What's cooking in git.git (Mar 2017, #11; Mon, 27)
> On 28 Mar 2017, at 00:35, Junio C Hamano wrote: > ... > > * ls/filter-process-delayed (2017-03-06) 1 commit > - convert: add "status=delayed" to filter process protocol > > What's the status of this one??? > cf. I was about to send out a new round last Sunday but then I discovered a problem. Believe it or not but I am still working on this :-) ... although way too slow :-( --- Completely different topic: Would it be possible to get the travis-ci Windows patch queued in pu for testing? http://public-inbox.org/git/20170324113747.44991-1-larsxschnei...@gmail.com/ Thanks, Lars
Microproject | Add more builtin patterns for userdiff
While I was working buildins shell patterns for user diffs. I noticed that the tests t4034 passes but I can reproduce it manually with: mkdir cpp/ && cd cpp/ && git init cat > pre b ab a>=b a==b a!=b a&b a^b a|b a&&b a||b a?b:z a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b a,y a::b EOF cat > posty xy x>=y x==y x!=y x&y x^y x|y x&&y x||y x?y:z x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y x,y x::y EOF echo '* diff="cpp"' > .gitmodules git diff --no-index --color-words pre post > output Surprisingly, it shows (which is very different from the expected output): ^[[1mdiff --git a/pre b/post^[[m ^[[1mindex 23d5c8a..7e8c026 100644^[[m ^[[1m--- a/pre^[[m ^[[1m+++ b/post^[[m ^[[36m@@ -1,19 +1,19 @@^[[m ^[[31mFoo():x(0&&1){}^[[m^[[32mFoo() : x(0&42) { bar(x); }^[[m cout<<"Hello ^[[31mWorld!\n"b^[[m ^[[31mab a>=b^[[m ^[[31ma==b a!=b^[[m ^[[31ma&b^[[m ^[[31ma^b^[[m ^[[31ma|b^[[m ^[[31ma&&b^[[m ^[[31ma||b^[[m ^[[31ma?b:z^[[m ^[[31ma=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b^[[m ^[[31ma,y^[[m ^[[31ma::b^[[m^[[32mWorld?\n" y^[[m ^[[32mxy x>=y^[[m ^[[32mx==y x!=y^[[m ^[[32mx&y^[[m ^[[32mx^y^[[m ^[[32mx|y^[[m ^[[32mx&&y^[[m ^[[32mx||y^[[m ^[[32mx?y:z^[[m ^[[32mx=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y^[[m ^[[32mx,y^[[m ^[[32mx::y^[[m Instead of: diff --git a/pre b/post index 23d5c8a..7e8c026 100644 --- a/pre +++ b/post @@ -1,19 +1,19 @@ Foo() : x(0&&1&42) { bar(x); } cout<<"Hello World!?\n"<(1) (-1e10) (0xabcdef) 'xy' [ax] ax->b ay x.by !ax ~a ax x++ ax-- ax*b ay x&b ay x*b ay x/b ay x%b ay x+b ay x-b ay x<>b ay xb ay x>=b ay x==b ay x!=b ay x&b ay x^b ay x|b ay x&&b ay x||b ay x?by:z ax=b ay x+=b ay x-=b ay x*=b ay x/=b ay x%=b ay x<<=b ay x>>=b ay x&=b ay x^=b ay x|=b ay x,y ax::by That's does not just happens to cpp builtins, it happens to bibtex as well. Is it that I had missed some configuration since I have tested this on a few machines? I have a workaround for that, which is to run log with --color-words= with regex from the userdiff.c, does it automatically use the builtins diff?
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 03:31:59AM -0400, Jeff King wrote: > I read through the whole series and didn't find anything objectionable. > The pointer-arithmetic fix should perhaps graduate separately. Junio's welcome to take that patch separately if he likes. > I suggested an additional cleanup around "linelen" in one patch. In the > name of keeping the number of re-rolls sane, I'm OK if we skip that for > now (the only reason I mentioned it at all is that you have to justify > the caveat in the commit message; with the fix, that justification can > go away). Let's leave it as it is, assuming Junio's okay with it. I can send in a few more patches to clean that up and use skip_prefix that we can drop on top and graduate separately. I think the justification is useful as it is, since it explains why we no longer want to check that particular value for historical reasons. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: UNS: Re: cherry-pick --message?
On Tue, 21 Mar 2017 13:33:35 +, Jeff King wrote: ... > Probably "format-patch | sed | am -3" is your best bet if you want to > modify the patches in transit _and_ have the user just use normal git > tools. Except that 'git am' doesn't have --no-commit like cherry-pick does. :-( It's always something. (Perhaps I'm instead going to rewrite the commit before cherry-picking it.) - Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds Date: Fri, 22 Jan 2010 07:29:21 -0800
[PATCH RFC 1/2] strbuf: move strbuf_add_tabexpand into strbuf.c
From: Jacob Keller In commit 7cc13c717b52 ("pretty: expand tabs in indented logs to make things line up properly", 2016-03-16) a new function was added to insert a line into a strbuf while expanding the tabs into spaces. This functionality was used to help show the log message correctly when it has been indented, so as to properly display the expected output. This functionality will be useful in a future patch that adds similar functionality into git diff, so lets move it into strbuf.c and make it a public function. While we're doing this, rename a few of the variables to fix the surrounding strbuf code. Signed-off-by: Jacob Keller --- pretty.c | 50 -- strbuf.c | 50 ++ strbuf.h | 6 ++ 3 files changed, 56 insertions(+), 50 deletions(-) diff --git a/pretty.c b/pretty.c index d0f86f5d85ca..70368509ffea 100644 --- a/pretty.c +++ b/pretty.c @@ -1658,56 +1658,6 @@ void pp_title_line(struct pretty_print_context *pp, strbuf_release(&title); } -static int pp_utf8_width(const char *start, const char *end) -{ - int width = 0; - size_t remain = end - start; - - while (remain) { - int n = utf8_width(&start, &remain); - if (n < 0 || !start) - return -1; - width += n; - } - return width; -} - -static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth, -const char *line, int linelen) -{ - const char *tab; - - while ((tab = memchr(line, '\t', linelen)) != NULL) { - int width = pp_utf8_width(line, tab); - - /* -* If it wasn't well-formed utf8, or it -* had characters with badly defined -* width (control characters etc), just -* give up on trying to align things. -*/ - if (width < 0) - break; - - /* Output the data .. */ - strbuf_add(sb, line, tab - line); - - /* .. and the de-tabified tab */ - strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth)); - - /* Skip over the printed part .. */ - linelen -= tab + 1 - line; - line = tab + 1; - } - - /* -* Print out everything after the last tab without -* worrying about width - there's nothing more to -* align. -*/ - strbuf_add(sb, line, linelen); -} - /* * pp_handle_indent() prints out the intendation, and * the whole line (without the final newline), after diff --git a/strbuf.c b/strbuf.c index 00457940cfc1..6cecfcadb05b 100644 --- a/strbuf.c +++ b/strbuf.c @@ -275,6 +275,56 @@ void strbuf_commented_addf(struct strbuf *sb, const char *fmt, ...) strbuf_release(&buf); } +static int find_utf8_width(const char *start, const char *end) +{ + int width = 0; + size_t remain = end - start; + + while (remain) { + int n = utf8_width(&start, &remain); + if (n < 0 || !start) + return -1; + width += n; + } + return width; +} + +void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth, + const char *buf, size_t size) +{ + const char *tab; + + while ((tab = memchr(buf, '\t', size)) != NULL) { + int width = find_utf8_width(buf, tab); + + /* +* If it wasn't well-formed utf8, or it +* had characters with badly defined +* width (control characters etc), just +* give up on trying to align things. +*/ + if (width < 0) + break; + + /* Output the data .. */ + strbuf_add(sb, buf, tab - buf); + + /* .. and the de-tabified tab */ + strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth)); + + /* Skip over the printed part .. */ + size -= tab + 1 - buf; + buf = tab + 1; + } + + /* +* Print out everything after the last tab without +* worrying about width - there's nothing more to +* align. +*/ + strbuf_add(sb, buf, size); +} + void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap) { int len; diff --git a/strbuf.h b/strbuf.h index 80047b1bb7b8..17e04911833e 100644 --- a/strbuf.h +++ b/strbuf.h @@ -238,6 +238,12 @@ extern void strbuf_splice(struct strbuf *, size_t pos, size_t len, */ extern void strbuf_add_commented_lines(struct strbuf *out, const char *buf, size_t size); +/** + * Add a NUL-terminated string to the buffer. Tabs will be expanded using the + * provided tabwidth. + */ +extern void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth, +const char *buf, size
[PATCH RFC 2/2] diff: teach diff to expand tabs in output
From: Jacob Keller When creating a diff for contents, we prepend a single character to represent the state of that line. This character can offset how the tabs in the real content are displayed, which may result in weird alignment issues when viewing the diffs. Teach the diff core a new option to expand these tabs, similar to how we already expand log contents. This new option can be used to display the lines so that the reader can see the expected results without the confusion of the offset tabstops caused by the extra character prepended to each line. Because some of the printing location is tied up into the whitespace checking code, we also need to teach this code that it may need to expand tabs as well. We will expand the output at the last moment so that the whitespace checks see the contents before it is expanded. Signed-off-by: Jacob Keller --- I'm really not a fan of how the ws code ended up. It seems pretty ugly and weird to hack in the expand_tabs stuff here. However, I'm really not sure how else I could handle this. Additionally, I'm not 100% sure whether this interacts with format-patch or other machinery which may well want some way to be excluded. Thoughts? Does anyone have a better implementation? I think there also may be some wonky bits when performing the tab expansion during whitespace checks, due to the way we expand, because I don't think that the tabexpand function takes into account the "current" location when adding a string, so it very well may not be correct I am unsure if there is a good way to fix this. Documentation/diff-options.txt | 6 cache.h| 2 +- diff.c | 23 +-- diff.h | 6 t/t4063-diff-expand-tabs.sh| 65 ++ ws.c | 42 ++- 6 files changed, 126 insertions(+), 18 deletions(-) create mode 100755 t/t4063-diff-expand-tabs.sh diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 89cc0f48deef..82e314b20b3d 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -329,6 +329,12 @@ endif::git-format-patch[] the diff-patch output format. Non default number of digits can be specified with `--abbrev=`. +--diff-expand-tabs[=]:: + When showing diff output, expand any tabs into spaces first before + printing. The size of the tabs is determined by which defaults to + 8 if not provided. --no-diff-expand-tabs or a size of 0 will disable + expansion. + -B[][/]:: --break-rewrites[=[][/]]:: Break complete rewrite changes into pairs of delete and diff --git a/cache.h b/cache.h index 5c8078291c47..2e221174edd4 100644 --- a/cache.h +++ b/cache.h @@ -2155,7 +2155,7 @@ extern unsigned whitespace_rule_cfg; extern unsigned whitespace_rule(const char *); extern unsigned parse_whitespace_rule(const char *); extern unsigned ws_check(const char *line, int len, unsigned ws_rule); -extern void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE *stream, const char *set, const char *reset, const char *ws); +extern void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE *stream, const char *set, const char *reset, const char *ws, int expand_tabs); extern char *whitespace_error_string(unsigned ws); extern void ws_fix_copy(struct strbuf *, const char *, int, unsigned, int *); extern int ws_blank_line(const char *line, int len, unsigned ws_rule); diff --git a/diff.c b/diff.c index 58cb72d7e72a..488019335df7 100644 --- a/diff.c +++ b/diff.c @@ -544,7 +544,14 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res fputs(set, file); if (!nofirst) fputc(first, file); - fwrite(line, len, 1, file); + if (o->expand_tabs) { + struct strbuf sb = STRBUF_INIT; + strbuf_add_tabexpand(&sb, o->expand_tabs, line, len); + fwrite(sb.buf, sb.len, 1, file); + strbuf_release(&sb); + } else { + fwrite(line, len, 1, file); + } fputs(reset, file); } if (has_trailing_carriage_return) @@ -595,7 +602,8 @@ static void emit_line_checked(const char *reset, /* Emit just the prefix, then the rest. */ emit_line_0(ecbdata->opt, set, reset, sign, "", 0); ws_check_emit(line, len, ecbdata->ws_rule, - ecbdata->opt->file, set, reset, ws); + ecbdata->opt->file, set, reset, ws, + ecbdata->opt->expand_tabs); } } @@ -2190,7 +2198,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len) free(err); emit_line(data->o, set, reset, line, 1);
Re: Microproject | Add more builtin patterns for userdiff
On Tue, Mar 28, 2017 at 3:46 AM, Pickfire wrote: > While I was working buildins shell patterns for user diffs. I noticed that > the tests t4034 passes but I can reproduce it manually with: > > mkdir cpp/ && cd cpp/ && git init > > cat > pre < Foo():x(0&&1){} > cout<<"Hello World!\n"< 1 -1e10 0xabcdef 'x' > [a] a->b a.b > !a ~a a++ a-- a*b a&b > a*b a/b a%b > a+b a-b > a<>b > ab a>=b > a==b a!=b > a&b > a^b > a|b > a&&b > a||b > a?b:z > a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b > a,y > a::b > EOF > > cat > post < Foo() : x(0&42) { bar(x); } > cout<<"Hello World?\n"< (1) (-1e10) (0xabcdef) 'y' > [x] x->y x.y > !x ~x x++ x-- x*y x&y > x*y x/y x%y > x+y x-y > x<>y > xy x>=y > x==y x!=y > x&y > x^y > x|y > x&&y > x||y > x?y:z > x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y > x,y > x::y > EOF > > echo '* diff="cpp"' > .gitmodules > git diff --no-index --color-words pre post > output > > Surprisingly, it shows (which is very different from the expected output): > The diff test code uses "test_decode_color" function which decodes the color commands into human readable text. From the looks of it, you're trying to reproduce the test outside the test suite. However, you're not decoding the colors using the test library function, so it doesn't look right. Thanks, Jake
[PATCH/RFC v2] WIP configurable options facility
--- FWIW here's the current state of this WIP hack which I worked a bit on yesterday. I converted it to use a hashmap and got rid of the need to s/const // the options struct. I've either converted options that read the config to this already, or left TODO comments on those that are candidates for migration. All tests pass with this, but as the various TODO comments note there's some problems. It's leaking memory currently, and as the comment in parse_options_step() notes due to how this options code is called I've had to sprinkle some boilerplate in several parse_*_opt() functions. This is getting rid of less code than I'd hoped at first, although it'll be made up if we make more stuff configurable. On Tue, Mar 28, 2017 at 7:17 AM, Jeff King wrote: > On Sat, Mar 25, 2017 at 11:32:02PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> > So hopefully it's clear that the two are functionally equivalent, and >> > differ only in syntax (in this case we manually decided which options >> > are safe to pull from the config, but we'd have to parse the options.log >> > string, too, and we could make the same decision there). >> >> I like the simplicity of this approach a lot. I.e. (to paraphrase it >> just to make sure we're on the same page): Skip all the complexity of >> reaching into the getopt guts, and just munge argc/argv given a config >> we can stick ahead of the getopt (struct options) spec, inject some >> options at the beginning if they're in the config, and off we go >> without any further changes to the getopt guts. > > Yep, I think that's an accurate description. > >> There's two practical issues with this that are easy to solve with my >> current approach, but I can't find an easy solution to using this >> method. >> >> The first is that we're replacing the semantics of: >> >> "If you're specifying it on the command-line, we take it from there, >> otherwise we use your config, if set, regardless of how the option >> works" >> >> with: >> >> "We read your config, inject options implicitly at the start of the >> command line, and then append whatever command-line you give us" >> >> These two are not the same. Consider e.g. the commit.verbose config. >> With my current patch if have commit.verbose=1 in your config and do >> "commit --verbose" you just end up with a result equivalent to not >> having it in your config, but since the --verbose option can be >> supplied multiple times to increase verbosity with the injection >> method you'd end up with the equivalent of commit.verbose=2. > > Right, for anything where multiple options are meaningful, they'd have > to give "--no-verbose" to reset the option. In a sense that's less > friendly, because it's more manual. But it's also less magical, because > the semantics are clear: the config option behaves exactly as if you > gave the option on the command line. So for an OPT_STRING_LIST(), you > could append to the list, or reset it to empty, etc, as you see fit. > > But I do agree that it's more manual, and probably would cause some > confusion. And some slight breakage of backwards compatibiilty for things that supply the option on the CLI now, although it wouldn't be a huge deal. >> I can't think of a good way around that with your proposed approach >> that doesn't essentially get us back to something very similar to my >> patch, i.e. we'd need to parse the command-line using the options spec >> before applying our implicit config. > > Yes, the semantics you gave require parsing the options first. I think > it would be sufficient to just give each "struct option" a "seen" flag > (rather than having it understand the config mechanism), having > parse_options() set the flag, and then feeding the result to a separate > config/cmdline mapping mechanism. That keeps the complexity out of the > options code. Right, would require s/const // on the struct though like in my v1, or keeping this info in some datastructure on the side. > It does tie us back in to requiring parse-options, which not all the > options use. I think if we do keep something like this patch it's fair enough to say it won't work for everything. It's just there to make it easier for us to add configuration for options in the common case, but there's going to be various special cases (e.g. currently the options synonyms, and I think I'll leave that) that we won't be able to handle. > In a lot of cases that "seen" flag is effectively a sentinel value in > whatever variable the option value is stored in. But some of the options > don't have reasonable sentinel values (as you noticed with the "revert > -m" handling recently). > >> The second issue is related, i.e. I was going to add some flag an >> option could supply to say "if I'm provided none of these other >> maybe-from-config options get to read their config". This is needed >> for hybrid plumbing/porcelain like "git status --porcelain". > > Yeah, I agree you can't make that decision until you've seen the > command-line options. I think
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On 3/27/2017 6:44 PM, Jeff King wrote: On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote: From: Jeff Hostetler Teach git to skip verification of the index SHA in read_index(). This is a performance optimization. The index file SHA verification can be considered an ancient relic from the early days of git and only useful for detecting disk corruption. For small repositories, this SHA calculation is not that significant, but for gigantic repositories this calculation adds significant time to every command. I added a global "skip_verify_index" variable to control this and allow it to be tested. I did not create a config setting for this because of chicken-n-egg problems with the loading the config and the index. Hrm, there shouldn't be any dependency of the config on the index (and there are a handful of options which impact the index already). Did you try it and run into problems? Yeah, I tried adding a new "core.verifyindex" property and the corresponding global variable. But read_index() and verify_hdr() was being called BEFORE the config was loaded. And it wasn't clear how best to solve that. The issue was in "git status" where cmd_status() called status_init_config() which called gitmodules_config() before git_config(). but gitmodules_config() called read_index(), so my settings weren't loaded yet in verify_hdr(). I tried switching the order in status_init_config(), but that caused errors in t7508 with submodules not being handled properly. https://github.com/jeffhostetler/git/commits/upstream/core_verify_index At this point I decided that it wasn't that important to have this config setting, since we'll probably default it to be faster and be done with it. In general, I'd much rather see us either: 1. Rip the code out entirely if it is not meant to be configurable, and cannot be triggered by the actual git binary. or 2. Make it configurable, even if most people wouldn't use it. And then have a test to exercise it using a git command (unlike the one-off test helper, which isn't run at all). -Peff I'm OK with (1) if everyone else is. Jeff
Re: [PATCH v2 0/2] read-cache: call verify_hdr() in a background thread
On 3/27/2017 6:45 PM, Jeff King wrote: On Mon, Mar 27, 2017 at 09:09:37PM +, g...@jeffhostetler.com wrote: From: Jeff Hostetler Version 2 of this patch series simplifies this to just turn off the hash verification. Independent comments from Linus and Peff suggested that we could just turn this off and not worry about it. So I've updated this patch to do that. I added a global variable to allow the original code path to be used. I also added a t/helper command to demonstrate the differences. On the Linux repo, the effect is rather trivial: $ ~/work/gfw/t/helper/test-skip-verify-index -c 3 0.029884 0 [cache_nr 57994] 0.031035 0 [cache_nr 57994] 0.024308 0 [cache_nr 57994] 0.028409 0 avg 0.018359 1 [cache_nr 57994] 0.017025 1 [cache_nr 57994] 0.011087 1 [cache_nr 57994] 0.015490 1 avg On my Windows source tree (450MB index), I'm seeing a savings of 0.6 seconds -- read_index() went from 1.2 to 0.6 seconds. Very satisfying. I assume that was with OpenSSL as the SHA-1 implementation (sha1dc would have been much slower on 450MB, I think). -Peff Yes, this was with the OpenSSL SHA-1 code in a GfW build. I haven't played with the sha1dc code yet. $ $/work/gh_gfw/t/helper/test-skip-verify-index.exe -c 5 1.276485 0 [cache_nr 3077831] 1.261164 0 [cache_nr 3077831] 1.256012 0 [cache_nr 3077831] 1.261411 0 [cache_nr 3077831] 1.266174 0 [cache_nr 3077831] 1.264249 0 avg 0.672057 1 [cache_nr 3077831] 0.666968 1 [cache_nr 3077831] 0.668725 1 [cache_nr 3077831] 0.675879 1 [cache_nr 3077831] 0.670213 1 [cache_nr 3077831] 0.670768 1 avg Jeff
Re: [PATCHv2 08/14] completion: let 'for-each-ref' and 'ls-remote' filter matching refs
On Fri, Mar 24, 2017 at 8:42 PM, Jeff King wrote: > On Thu, Mar 23, 2017 at 04:29:18PM +0100, SZEDER Gábor wrote: >> case "$cur_" in >> refs|refs/*) >> format="refname" >> - refs="${cur_%/*}" >> + refs=("$match*" "$match*/**") >> track="" > > Working on the aforementioned patch, I noticed that for-each-ref's > matching is a little tricky due to its path semantics. So I wanted to > double-check your patterns. :) I think these should do the right thing. Yeah, I always thought that it's weird that globbing in for-each-ref behaves differently from globbing in ls-remote or refspecs, but there is nothing we can do about it now. Anyway, this is why the tests added in this patch include e.g. both 'matching-branch' and 'matching/branch'.
Re: [PATCH v2 1/2] read-cache: skip index SHA verification
On Tue, Mar 28, 2017 at 11:27:19AM -0400, Jeff Hostetler wrote: > > Hrm, there shouldn't be any dependency of the config on the index (and > > there are a handful of options which impact the index already). Did you > > try it and run into problems? > > Yeah, I tried adding a new "core.verifyindex" property and the > corresponding global variable. But read_index() and verify_hdr() > was being called BEFORE the config was loaded. And it wasn't clear > how best to solve that. > > The issue was in "git status" where cmd_status() called > status_init_config() which called gitmodules_config() before > git_config(). but gitmodules_config() called read_index(), > so my settings weren't loaded yet in verify_hdr(). Ugh, yeah, the callback-oriented interface suffers from these kind of dependency cycles. You can fix it by doing a limited "basic config that should always be loaded" git_config() call before anything else, and then following up with the application-level config. For something low-level that should _always_ be respected, even in plumbing programs, I think we're better off lazy-loading the config inside the function. The configset cache makes them more or less free. I.e., something like: diff --git a/read-cache.c b/read-cache.c index e44775182..89bbf8d1e 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1376,17 +1376,23 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) git_SHA_CTX c; unsigned char sha1[20]; int hdr_version; + int do_checksum = 0; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error("bad signature"); hdr_version = ntohl(hdr->hdr_version); if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version) return error("bad index version %d", hdr_version); - git_SHA1_Init(&c); - git_SHA1_Update(&c, hdr, size - 20); - git_SHA1_Final(sha1, &c); - if (hashcmp(sha1, (unsigned char *)hdr + size - 20)) - return error("bad index file sha1 signature"); + + git_config_get_bool("core.checksumindex", &do_checksum); + if (do_checksum) { + git_SHA1_Init(&c); + git_SHA1_Update(&c, hdr, size - 20); + git_SHA1_Final(sha1, &c); + if (hashcmp(sha1, (unsigned char *)hdr + size - 20)) + return error("bad index file sha1 signature"); + } + return 0; } -Peff
Re: [GSoC] Proposal Discussion
On Tue, Mar 28, 2017 at 07:52:42AM +0200, Christian Couder wrote: > Hi, > > On Tue, Mar 28, 2017 at 12:17 AM, Devin Lehmacher wrote: > > Hello everyone, > > > > I am a student studying Computer Science at Cornell University. I > > already completed a microproject, Move ~/.git-credential-cache/socket to > > $XDG_CACHE_HOME/credential/socket a week and a half ago or so. > > Nice. It would be better though if you could provide a link to the > thread where your microproject was discussed. If it has been merged to > master, you could also provide the merge commit. Otherwise please tell > what is its branch name and current status in the last "What's cooking > in git.git" email from Junio. Here is the merge commit into master: 78cf8efec34c419ecea86bc8d1fe47ec0b51ba37 > > I am interested in 2 different projects and would like some advice on > > them, to help me decide which one to submit a proposal for. > > > > 1. `git rebase -i` conversion. > >I was initially the most interested in this project but realize that > >after having a very busy week last week that Ivan Tham started > >[discussion][1] about this project. Would it be appropriate to submit > >a proposal for a project that someone else also wants to work on? > > Yes, it is ok. Obviously only one student/proposal can be selected for > a given project, but as anyway the main constraint for us is usually > the number of available mentors, there is a low chance that this would > prevent us from selecting one more student than we could otherwise > select. > > You could also submit 2 proposals if you have time to work on more than one. Ok! I think I will post rough drafts of both proposals sometime tomorrow or maybe later today if I have time. > > 2. formatting tool improvements. > >There are four different git commands mentioned [here][2] as possible > >tools to improve as can be seen in the email. Of those I think it > >would make the most sense to extend `git name-rev`. It seems best > >suited to the desired behavior. It would need to be extended to > >understand rev's that refer to objects rather than just a commit-ish > >and also add formatting support similar to the information that log > >and for-each-ref can output. Since this doesn't seem like much work, > >would it be feasible to generalize and somewhat standardize all of > >the formatting commands? > > Yeah, I think it would be good. It might involve a lot of discussion > though and this could slow your project. > So if you really want to do it, my advice is to try to start the > discussion as soon as possible, that is now. > > To do that you could for example Cc people involved in the email > discussions, and try to come up with concrete proposals about how to > generalize and standardize the formatting commands. I will try to send out an email later this afternoon with a preliminary plan and to start discussion about how best to rework formatting commands. Thanks, Devin
Re: What's cooking in git.git (Mar 2017, #11; Mon, 27)
> * sb/submodule-short-status (2017-03-27) 7 commits > - submodule.c: correctly handle nested submodules in is_submodule_modified > - short status: improve reporting for submodule changes > - submodule.c: stricter checking for submodules in is_submodule_modified > - submodule.c: port is_submodule_modified to use porcelain 2 > - submodule.c: convert is_submodule_modified to use strbuf_getwholeline > - submodule.c: factor out early loop termination in is_submodule_modified > - submodule.c: use argv_array in is_submodule_modified > > The output from "git status --short" has been extended to show > various kinds of dirtyness in submodules differently; instead of to > "M" for modified, 'm' and '?' can be shown to signal changes only > to the working tree of the submodule but not the commit that is > checked out. > > Waiting for further comments. > The endgame looked mostly OK. I will reroll the top most commit > - submodule.c: correctly handle nested submodules in is_submodule_modified per jrnieder's request to explain itself more (via tests, documentation) Thanks, Stefan
Re: [PATCH v2 06/21] builtin/receive-pack: fix incorrect pointer arithmetic
Jeff King writes: > The patch itself is obviously an improvement. It may be worth graduating > separately from the rest of the series. Yup, I will split it out to bc/push-cert-receive-fix that builds directly on an ancient jc/push-cert topic that was merged at v2.2.0-rc0~64. I'll need to drop the duplicate from bc/object-id topic, of course (which hasn't happend). Thanks.
Re: [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators
On Sat, Mar 25, 2017 at 11:12 AM, Daniel Ferreira wrote: > This is the third version of the GSoC microproject > of refactoring remove_subtree() from recursively using > readdir() to use dir_iterator. Below are the threads for > other versions: > > v1: > https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730 > v2: > https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t > > Duy suggested adding features to dir_iterator might go > beyond the intention of a microproject, but I figured I > might go for it to learn more about the project. > > The dir_iterator reimplementation has been tested in a > separate binary I created (and linked with libgit.a) to > reproduce remove_subtree()'s contents. As pointed out in the > last thread, git's tests for this function were unable to > catch a daunting bug I had introduced, and I still haven't > been able to come up with a way to reproduce remove_subtree() > being called. Any help? > I would think a test llike the following would work: test_expect_success 'remove nested subtrees' ' test_commit initial && mkdir -p dir/with/nested/dir && echo content >dir/with/nested/dir/file && echo content >dir/file && git add dir/with/nested/dir/file dir/file && git commit -a -m "commit directory structure" && git checkout initial && ! test dir '
Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *
"brian m. carlson" writes: > Convert the callers to pass struct object_id by changing the function > declaration and definition and applying the following semantic patch: > > @@ > expression E1, E2, E3; > @@ > - sha1_array_append(E1, E2[E3].hash) > + sha1_array_append(E1, E2 + E3) > > @@ > expression E1, E2; > @@ > - sha1_array_append(E1, E2.hash) > + sha1_array_append(E1, &E2) I noticed something similar in the change to bisect.c while reading the previous step, and I suspect that the above two rules leave somewhat inconsistent and harder-to-read result. Wouldn't it make the result more readable if the former rule were -sha1_array_append(E1, E2[E3].hash) +sha1_array_append(E1, &E2[E3]) FWIW, the bit that made me read it twice in the previous step was this change - strbuf_addstr(&joined_hexs, sha1_to_hex(array->sha1[i])); + strbuf_addstr(&joined_hexs, oid_to_hex(array->oid + i)); which I would have written &(array->oid[i]) instead. After all, the original written by a human said E2[E3].hash (or array->sha1[i]) because to the human's mind, E2 is a series of things that can be indexed with an int E3, and even though *(E2 + E3) E2[E3] E3[E2] all mean the same thing, the human decided that E2[E3] is the most natural way to express this particular reference to an item in the array. &E2[E3] would keep that intention by the original author better than E2 + E3. The above comment does not affect the correctness of the conversion, but I think it would affect the readability of the resulting code.
Re: [PATCH v2 00/21] object_id part 7
Jeff King writes: > On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote: > >> This is part 7 in the continuing transition to use struct object_id. >> >> This series focuses on two main areas: adding two constants for the >> maximum hash size we'll be using (which will be suitable for allocating >> memory) and converting struct sha1_array to struct oid_array. > > Both changes are very welcome. I do think it's probably worth changing > the name of sha1-array.[ch], but it doesn't need to happen immediately. > > I read through the whole series and didn't find anything objectionable. > The pointer-arithmetic fix should perhaps graduate separately. I didn't see anything incorrect when I queued the series, either, and after I re-read it I saw a few minor readability issues, but modulo that this looks ready. I did split the push-cert parsing fix and applied to an older base independently, though. > I suggested an additional cleanup around "linelen" in one patch. In the > name of keeping the number of re-rolls sane, I'm OK if we skip that for > now (the only reason I mentioned it at all is that you have to justify > the caveat in the commit message; with the fix, that justification can > go away). A follow-up after the dust settles could also mention "we earlier mentioned this caveat but with this fix we no longer have to worry about it", no? Thanks both, anyways.
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 11:13:15AM +, brian m. carlson wrote: > > I suggested an additional cleanup around "linelen" in one patch. In the > > name of keeping the number of re-rolls sane, I'm OK if we skip that for > > now (the only reason I mentioned it at all is that you have to justify > > the caveat in the commit message; with the fix, that justification can > > go away). > > Let's leave it as it is, assuming Junio's okay with it. I can send in a > few more patches to clean that up and use skip_prefix that we can drop > on top and graduate separately. > > I think the justification is useful as it is, since it explains why we > no longer want to check that particular value for historical reasons. I thought I'd knock this out quickly before I forgot about it. But it actually isn't so simple. The main caller in read_head_info() does indeed just pass strlen(line) as the length in each case. But the cert parser really does need us to respect the line length. So we either have to pass it in, or tie off the string. The latter looks something like the patch below (on top of a minor tweak around "eol" handling). It's sufficiently ugly that it may not count as an actual cleanup, though. I'm OK if we just drop the idea. --- diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 58de2a1a9..561a982e7 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1483,13 +1483,10 @@ static void execute_commands(struct command *commands, } static struct command **queue_command(struct command **tail, - const char *line, - int linelen) + const char *line) { struct object_id old_oid, new_oid; struct command *cmd; - const char *refname; - int reflen; const char *p; if (parse_oid_hex(line, &old_oid, &p) || @@ -1498,9 +1495,7 @@ static struct command **queue_command(struct command **tail, *p++ != ' ') die("protocol error: expected old/new/ref, got '%s'", line); - refname = p; - reflen = linelen - (p - line); - FLEX_ALLOC_MEM(cmd, ref_name, refname, reflen); + FLEX_ALLOC_STR(cmd, ref_name, p); oidcpy(&cmd->old_oid, &old_oid); oidcpy(&cmd->new_oid, &new_oid); *tail = cmd; @@ -1510,7 +1505,7 @@ static struct command **queue_command(struct command **tail, static void queue_commands_from_cert(struct command **tail, struct strbuf *push_cert) { - const char *boc, *eoc; + char *boc, *eoc; if (*tail) die("protocol error: got both push certificate and unsigned commands"); @@ -1523,10 +1518,17 @@ static void queue_commands_from_cert(struct command **tail, eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len); while (boc < eoc) { - const char *eol = memchr(boc, '\n', eoc - boc); + char *eol = memchr(boc, '\n', eoc - boc); + char tmp; + if (!eol) eol = eoc; - tail = queue_command(tail, boc, eol - boc); + + tmp = *eol; + *eol = '\0'; + tail = queue_command(tail, boc); + *eol = tmp; + boc = eol + 1; } } @@ -1590,7 +1592,7 @@ static struct command *read_head_info(struct oid_array *shallow) continue; } - p = queue_command(p, line, linelen); + p = queue_command(p, line); } if (push_cert.len)
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 01:35:36PM -0400, Jeff King wrote: > I thought I'd knock this out quickly before I forgot about it. But it > actually isn't so simple. > > The main caller in read_head_info() does indeed just pass strlen(line) > as the length in each case. But the cert parser really does need us to > respect the line length. So we either have to pass it in, or tie off the > string. > > The latter looks something like the patch below (on top of a minor > tweak around "eol" handling). It's sufficiently ugly that it may not > count as an actual cleanup, though. I'm OK if we just drop the idea. Here's that minor tweak, in case anybody is interested. It's less useful without that follow-on that touches "eol" more, but perhaps it increases readability on its own. -- >8 -- Subject: [PATCH] receive-pack: simplify eol handling in cert parsing The queue_commands_from_cert() function wants to handle each line of the cert individually. It looks for "\n" in the to-be-parsed bytes, and special-cases each use of eol (the end-of-line variable) when we didn't find one. Instead, we can just set the end-of-line variable to end-of-cert in the latter case. For advancing to the next line, it's OK for us to move our pointer past end-of-cert, because our loop condition just checks for pointer inequality. And it doesn't even violate the ANSI C "no more than one past the end of an array" rule, because we know in the worst case we've hit the terminating NUL of the strbuf. Signed-off-by: Jeff King --- builtin/receive-pack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 5d9e4da0a..58de2a1a9 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command **tail, while (boc < eoc) { const char *eol = memchr(boc, '\n', eoc - boc); - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc); - boc = eol ? eol + 1 : eoc; + if (!eol) + eol = eoc; + tail = queue_command(tail, boc, eol - boc); + boc = eol + 1; } } -- 2.12.2.845.g55fcf8b10
Re: Re: Microproject | Add more builtin patterns for userdiff
Jacob Keller wrote: > On Tue, Mar 28, 2017 at 3:46 AM, Pickfire wrote: > > While I was working buildins shell patterns for user diffs. I noticed that > > the tests t4034 passes but I can reproduce it manually with: > > > > mkdir cpp/ && cd cpp/ && git init > > > > cat > pre < > Foo():x(0&&1){} > > cout<<"Hello World!\n"< > 1 -1e10 0xabcdef 'x' > > [a] a->b a.b > > !a ~a a++ a-- a*b a&b > > a*b a/b a%b > > a+b a-b > > a<>b > > ab a>=b > > a==b a!=b > > a&b > > a^b > > a|b > > a&&b > > a||b > > a?b:z > > a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b > > a,y > > a::b > > EOF > > > > cat > post < > Foo() : x(0&42) { bar(x); } > > cout<<"Hello World?\n"< > (1) (-1e10) (0xabcdef) 'y' > > [x] x->y x.y > > !x ~x x++ x-- x*y x&y > > x*y x/y x%y > > x+y x-y > > x<>y > > xy x>=y > > x==y x!=y > > x&y > > x^y > > x|y > > x&&y > > x||y > > x?y:z > > x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y > > x,y > > x::y > > EOF > > > > echo '* diff="cpp"' > .gitmodules > > git diff --no-index --color-words pre post > output > > > > Surprisingly, it shows (which is very different from the expected output): > > > > The diff test code uses "test_decode_color" function which decodes the > color commands into human readable text. From the looks of it, you're > trying to reproduce the test outside the test suite. However, you're > not decoding the colors using the test library function, so it doesn't > look right. Yes, I can't reproduce it outside the test suite. I have added the builtin and yet the test fails. test_decode_color gets same output as expect but still it fails, should I send in the patch?
Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream) wrote: > SYNOPSIS > There are many advantages to converting parts of git that are still > scripts to C builtins, among which execution speed, improved > compatibility and code deduplication. agreed. > git-add--interactive, one of the most useful features of Git. knee jerk reaction: I never used it, so it cannot be that important ;) (I use git-gui, which is essentially the same workflow. There are tons of ways to accomplish a given goal using Git, so I guess we don't want to get in an argument here). > > FEASIBILITY > > There was only one discussion regarding the feasibility of its porting > (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1hl...@mail.gmail.com/). > It resulted in a consensus that doing it would be a task too large – > although interesting – for GSoC 2015 based on the amount of its lines > of code. It is, however, only a few lines larger than > git-rebase--interactive, which has been considered an appropriate > idea. As such, it looks like a possible project for three months of > full-time work. ok, it sounds a challenging project. (currently counting 1750 lines of code). Scrolling over the source code, there are quite a couple of functions, where the direct equivalent in C springs to mind. run_cmd_pipe -> see run-command.h unquote_path -> unquote_c_style ? refresh -> update_index_if_able() list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];" > PROJECTED TIMELINE > - Prior to May 4 > -- Refine my basic knowledge of Perl > -- Craft one or two small patches to some of Git's Perl components > (preferentially to git-add--interactive itself) to improve my > understanding of the language and of how Git's Perl scripts actually > work > > - May 4 - May 30 > -- Clarify implementation details with my mentor, and work on a more > detailed roadmap for the project > -- Investigate roughly how to replace command invocations from the > script with actual builtin functions; which Git APIs in Perl already > have functional equivalents in C; which parts will require a full > rewrite. There are different approaches for replacing functionality in another language. Examples: * Implement the functionality in C and then have a "flag-day" commit 783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04) This only works when the whole functionality was replaced in prior commits * Implement partial functionality in C and call it via a helper function. 3604242f08 (submodule: port init from shell to C, 2016-04-15) This works well for only partial conversions (the larger the thing to convert the more appealing this is, as it gets code shipped early.) When choosing this strategy, this part of the Project would be to identify parts that could be ported on its own without much additional glue-code. > - May 30 - June 30 (start of coding period) > -- Define the architecture of the builtin within git (which > functions/interfaces will it have? where will its code reside?). > -- Implement a small subset of the builtin (to be defined with my > mentor) and glue it into the existing Perl script. Present this as a > first patch to get feedback early regarding the implementation and > avoid piling up mistakes early. > -- Do necessary changes based on this initial review. > -- Have roughly 1/3 of the script's functionality ported to C. > > - June 30 - July 28 > -- Port the remainder of the script to a builtin. > -- Have a weekly roadmap, sending a part of the patch every 15 days to > the mailing list for review and to avoid massive commits by the end of > GSoC. yeah; send early, send often. ;) > -- Apply suggestions from community reviews when possible; if not, > save them for doing toward the end of GSoC (see below). Please do not underestimate the discussion by community, finding consensus on list consumes a bit of time in some cases. > (Note: due to a previous commitment, during a five-day period of July > I will only be able to work part-time on GSoC. The actual week will be > known over the next weeks.) Maybe you want to shift the schedule up to here by one week then? (e.g. the first period would be April 27 - May 23) > > - July 28 - August 29 > -- By the start of this period, send a patch with the builtin fully > implemented to the mailing list. /a patch/a patch series consisting of many patches/ Experience shows that smaller patches are easier to review as it is more focused. Consider e.g. e86ab2c1cd (wt-status: convert to struct object_id, 2017-02-21) and the parents leading up to this commit. They work on the same big topic, but focus on very regional areas to ease review. > -- Fix bugs, test extensively, possibly extend test coverage for > git-add--interactive. AFAICT ('$ git grep "git add -i"') there is only t3701 testing the interactive add. Maybe we need to add tests first to document current behavior, before attempting a conversion? This could go well i
Re: Re: Microproject | Add more builtin patterns for userdiff
On Tue, Mar 28, 2017 at 10:53 AM, Pickfire wrote: > > Yes, I can't reproduce it outside the test suite. I have added the builtin > and yet the test fails. test_decode_color gets same output as expect but > still it fails, should I send in the patch? You also need to ensure you have the exact same color settings as used by the test scripts. Thanks, Jake
Re: Microproject | Add more builtin patterns for userdiff
On Tue, Mar 28, 2017 at 3:46 AM, Pickfire wrote: > EOF > > echo '* diff="cpp"' > .gitmodules Did you mean .gitattributes?
Re: UNS: Re: cherry-pick --message?
Andreas Krey wrote: > On Tue, 21 Mar 2017 13:33:35 +, Jeff King wrote: >> Probably "format-patch | sed | am -3" is your best bet if you want to >> modify the patches in transit _and_ have the user just use normal git >> tools. > > Except that 'git am' doesn't have --no-commit like cherry-pick does. :-( > It's always something. (Perhaps I'm instead going to rewrite the commit > before cherry-picking it.) 'git apply --index' can do that. I agree that it would be sensible for 'git am' to grow a --no-commit option to do that. Thanks and hope that helps, Jonathan
Re: [PATCH] [GSOC] get_non_kept_pack_filenames(): reimplement using iterators
On Mon, Mar 27, 2017 at 6:39 PM, Robert Stanca wrote: > Replaces recursive traversing of opendir with dir_iterator. > > Signed-off-by: Robert Stanca > --- > builtin/repack.c | 15 ++- > 1 file changed, 6 insertions(+), 9 deletions(-) > > diff --git a/builtin/repack.c b/builtin/repack.c > index 677bc7c..27a5597 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -7,6 +7,8 @@ > #include "strbuf.h" > #include "string-list.h" > #include "argv-array.h" > +#include "iterator.h" > +#include "dir-iterator.h" > > static int delta_base_offset = 1; > static int pack_kept_objects = -1; > @@ -86,26 +88,21 @@ static void remove_pack_on_signal(int signo) > */ > static void get_non_kept_pack_filenames(struct string_list *fname_list) > { > - DIR *dir; > - struct dirent *e; > + struct dir_iterator *diter = dir_iterator_begin(packdir); > char *fname; > > - if (!(dir = opendir(packdir))) > - return; > - > - while ((e = readdir(dir)) != NULL) { > + while (dir_iterator_advance(diter) == ITER_OK) { > size_t len; > - if (!strip_suffix(e->d_name, ".pack", &len)) > + if (!strip_suffix(diter->relative_path, ".pack", &len)) > continue; > > - fname = xmemdupz(e->d_name, len); > + fname = xmemdupz(diter->relative_path, len); > > if (!file_exists(mkpath("%s/%s.keep", packdir, fname))) > string_list_append_nodup(fname_list, fname); > else > free(fname); > } > - closedir(dir); > } > > static void remove_redundant_pack(const char *dir_name, const char > *base_name) > -- > 2.7.4 > > > > > Hi , this is my first patch submission for Git Gsoc. I ran full tests and > local tests with > prove --timer --jobs 15 ./t*pack*.sh . > > Have a great day, > Robert. Hi and welcome to the Git community! The patch looks like a faithful conversion with no side effects to me. Reviewed-by: Stefan Beller Note: mail readers usually collapse long signatures (e.g. everything after "--\n2.7.4" in this email. So I did not know or even read your comments on how you tested this patch before hitting reply. You can also put these lines after the "---" after the sign off, right before the diff stat. A good example is https://public-inbox.org/git/20170324231013.23346-1-ava...@gmail.com/ as the patch is also long enough, such that people may not scroll to the bottom. Once you're used to it, it is very easy to spot the chatter that will not be part of the commit message, though. Thanks, Stefan
[RFC] should these two topics graduate to 'master' soon?
There are two topics that are marked as "Will cook in 'next'" for practically forever in the "What's cooking" reports. The world may have become ready for one or both of them, in which case we should do the merge not too late in the cycle. * jc/merge-drop-old-syntax (2015-04-29) 1 commit This topic stops "git merge HEAD " syntax that has been deprecated since October 2007 (and we have issued a warning message since around v2.5.0 when the ancient syntax was used). * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit This is the endgame of the topic to avoid blindly falling back to ".git" when the setup sequence said we are _not_ in Git repository. A corner case that happens to work right now may be broken by a call to die("BUG"). I am leaning toward including the former in the upcoming release, whose -rc0 is tentatively scheduled to happen on Apr 20th. I think the rest of the system is also ready for the latter (back when we merged it to 'next' and started cooking, there were still a few codepaths that triggered its die(), which have been fixed). Opinions?
Re: [RFC] should these two topics graduate to 'master' soon?
Hi Junio, Junio C Hamano wrote: > There are two topics that are marked as "Will cook in 'next'" for > practically forever in the "What's cooking" reports. The world may > have become ready for one or both of them, in which case we should > do the merge not too late in the cycle. > > * jc/merge-drop-old-syntax (2015-04-29) 1 commit > > This topic stops "git merge HEAD " syntax that > has been deprecated since October 2007 (and we have issued a > warning message since around v2.5.0 when the ancient syntax was > used). > > * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit > > This is the endgame of the topic to avoid blindly falling back to > ".git" when the setup sequence said we are _not_ in Git repository. > A corner case that happens to work right now may be broken by a call > to die("BUG"). > > I am leaning toward including the former in the upcoming release, > whose -rc0 is tentatively scheduled to happen on Apr 20th. I think > the rest of the system is also ready for the latter (back when we > merged it to 'next' and started cooking, there were still a few > codepaths that triggered its die(), which have been fixed). > > Opinions? Google has been running with both of these for a while. Any problems we ran into were already reported and fixed. I would be all for including them in the next release. Thanks and hope that helps, Jonathan
Re: [RFC] should these two topics graduate to 'master' soon?
On Tue, Mar 28, 2017 at 11:35 AM, Junio C Hamano wrote: > There are two topics that are marked as "Will cook in 'next'" for > practically forever in the "What's cooking" reports. The world may > have become ready for one or both of them, in which case we should > do the merge not too late in the cycle. > > * jc/merge-drop-old-syntax (2015-04-29) 1 commit > > This topic stops "git merge HEAD " syntax that > has been deprecated since October 2007 (and we have issued a > warning message since around v2.5.0 when the ancient syntax was > used). git-gui has: 82fbd8a (git-gui: maintain backwards compatibility for merge syntax, 2016-10-04) which was the only blocker IIUC. So this looks good to me. > > * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit > > This is the endgame of the topic to avoid blindly falling back to > ".git" when the setup sequence said we are _not_ in Git repository. > A corner case that happens to work right now may be broken by a call > to die("BUG"). > > I am leaning toward including the former in the upcoming release, > whose -rc0 is tentatively scheduled to happen on Apr 20th. I think > the rest of the system is also ready for the latter (back when we > merged it to 'next' and started cooking, there were still a few > codepaths that triggered its die(), which have been fixed). Just read through the commit messages of that branch and they look reasonable, but I refrain from having an opinion here. Thanks, Stefan
Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output
Jacob Keller writes: > I'm really not a fan of how the ws code ended up. It seems pretty ugly > and weird to hack in the expand_tabs stuff here. However, I'm really not > sure how else I could handle this. Additionally, I'm not 100% sure > whether this interacts with format-patch or other machinery which may > well want some way to be excluded. Thoughts? As long as you do the same as "do we color the output? no, no, we are format-patch and must not color" logic to refrain from expanding the tabs, you should be OK. > I think there also may be some wonky bits when performing the tab > expansion during whitespace checks, due to the way we expand, because I > don't think that the tabexpand function takes into account the "current" > location when adding a string, so it very well may not be correct I > am unsure if there is a good way to fix this. This "feature" is limited to the diff output, so one way may be to leave the code as-is and pipe the output to a filter that is similar to /usr/bin/expand but knows that the first column is special (this is the part that "this is limited to diff" kicks in). You may even be able to implement it as a new option to "expand(1)" and then people who aren't Git users would also benefit.
[PATCH v3 1/2] read-cache: core.checksumindex
From: Jeff Hostetler Teach git to skip verification of the index SHA in verify_hdr() in read_index(). This is a performance optimization. The index file SHA verification can be considered an ancient relic from the early days of git and only useful for detecting disk corruption. For small repositories, this SHA calculation is not that significant, but for gigantic repositories this calculation adds significant time to every command. Added "core.checksumindex" to enable/disable the SHA verification. Signed-off-by: Jeff Hostetler --- read-cache.c | 12 1 file changed, 12 insertions(+) diff --git a/read-cache.c b/read-cache.c index 9054369..2ab4b74 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1376,12 +1376,24 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) git_SHA_CTX c; unsigned char sha1[20]; int hdr_version; + int do_checksum = 0; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error("bad signature"); hdr_version = ntohl(hdr->hdr_version); if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version) return error("bad index version %d", hdr_version); + + /* +* Since we run very early in command startup, git_config() +* may not have been called yet and the various "core_*" +* global variables haven't been set. So look it up +* explicitly. +*/ + git_config_get_bool("core.checksumindex", &do_checksum); + if (!do_checksum) + return 0; + git_SHA1_Init(&c); git_SHA1_Update(&c, hdr, size - 20); git_SHA1_Final(sha1, &c); -- 2.9.3
[PATCH v3 2/2] test-core-checksum-index: core.checksumindex test helper
From: Jeff Hostetler Created test helper to measure read_index() with and without core.checksumindex set and report performance. Signed-off-by: Jeff Hostetler --- Makefile| 1 + t/helper/.gitignore | 1 + t/helper/test-core-checksum-index.c | 77 + 3 files changed, 79 insertions(+) create mode 100644 t/helper/test-core-checksum-index.c diff --git a/Makefile b/Makefile index 9ec6065..6049427 100644 --- a/Makefile +++ b/Makefile @@ -607,6 +607,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS)) TEST_PROGRAMS_NEED_X += test-chmtime TEST_PROGRAMS_NEED_X += test-ctype TEST_PROGRAMS_NEED_X += test-config +TEST_PROGRAMS_NEED_X += test-core-checksum-index TEST_PROGRAMS_NEED_X += test-date TEST_PROGRAMS_NEED_X += test-delta TEST_PROGRAMS_NEED_X += test-dump-cache-tree diff --git a/t/helper/.gitignore b/t/helper/.gitignore index d6e8b36..f651a6b 100644 --- a/t/helper/.gitignore +++ b/t/helper/.gitignore @@ -1,6 +1,7 @@ /test-chmtime /test-ctype /test-config +/test-core-checksum-index /test-date /test-delta /test-dump-cache-tree diff --git a/t/helper/test-core-checksum-index.c b/t/helper/test-core-checksum-index.c new file mode 100644 index 000..0452dc2 --- /dev/null +++ b/t/helper/test-core-checksum-index.c @@ -0,0 +1,77 @@ +#include "cache.h" +#include "parse-options.h" + +uint64_t time_runs(int do_checksum, int count) +{ + uint64_t t0, t1; + uint64_t sum = 0; + uint64_t avg; + int i; + + git_config_set_gently("core.checksumindex", + (do_checksum ? "true" : "false")); + + for (i = 0; i < count; i++) { + t0 = getnanotime(); + read_cache(); + t1 = getnanotime(); + + sum += (t1 - t0); + + printf("%f %d [cache_nr %d]\n", + ((double)(t1 - t0))/10, + do_checksum, + the_index.cache_nr); + fflush(stdout); + + discard_cache(); + } + + avg = sum / count; + if (count > 1) { + printf("%f %d avg\n", + (double)avg/10, + do_checksum); + fflush(stdout); + } + + return avg; +} + +int cmd_main(int argc, const char **argv) +{ + int original_do_checksum; + int count = 1; + const char *usage[] = { + "test-core-checksum-index", + NULL + }; + struct option options[] = { + OPT_INTEGER('c', "count", &count, "number of passes"), + OPT_END(), + }; + const char *prefix; + uint64_t avg_0, avg_1; + + prefix = setup_git_directory(); + argc = parse_options(argc, argv, prefix, options, usage, 0); + + if (count < 1) + die("count must greater than zero"); + + /* throw away call to get the index in the disk cache */ + read_cache(); + discard_cache(); + + git_config_get_bool("core.checksumindex", &original_do_checksum); + + avg_1 = time_runs(1, count); + avg_0 = time_runs(0, count); + + git_config_set_gently("core.checksumindex", + ((original_do_checksum) ? "true" : "false")); + + if (avg_0 >= avg_1) + die("skipping index checksum verification did not help"); + return 0; +} -- 2.9.3
[PATCH v3 0/2] read-cache: call verify_hdr() in a background thread
From: Jeff Hostetler Version 3 of this patch series simplifies this effort to just turn on/off the hash verification using a "core.checksumindex" config variable. I've preserved the original checksum validation code so that we can force it on in fsck if desired. It eliminates the original threading model completely. Jeff Hostetler (2): read-cache: core.checksumindex test-core-checksum-index: core.checksumindex test helper Makefile| 1 + read-cache.c| 12 ++ t/helper/.gitignore | 1 + t/helper/test-core-checksum-index.c | 77 + 4 files changed, 91 insertions(+) create mode 100644 t/helper/test-core-checksum-index.c -- 2.9.3
Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread
On Tue, Mar 28, 2017 at 07:07:30PM +, g...@jeffhostetler.com wrote: > From: Jeff Hostetler > > Version 3 of this patch series simplifies this effort to just turn > on/off the hash verification using a "core.checksumindex" config variable. > > I've preserved the original checksum validation code so that we can > force it on in fsck if desired. > > It eliminates the original threading model completely. > > Jeff Hostetler (2): > read-cache: core.checksumindex > test-core-checksum-index: core.checksumindex test helper > > Makefile| 1 + > read-cache.c| 12 ++ > t/helper/.gitignore | 1 + > t/helper/test-core-checksum-index.c | 77 > + Do we still need test-core-checksum-index? Can we just time ls-files or something in t/perf? -Peff
Re: [RFC] should these two topics graduate to 'master' soon?
On Tue, Mar 28, 2017 at 11:51:49AM -0700, Jonathan Nieder wrote: > > * jc/merge-drop-old-syntax (2015-04-29) 1 commit > > > > This topic stops "git merge HEAD " syntax that > > has been deprecated since October 2007 (and we have issued a > > warning message since around v2.5.0 when the ancient syntax was > > used). > > > > * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit > > > > This is the endgame of the topic to avoid blindly falling back to > > ".git" when the setup sequence said we are _not_ in Git repository. > > A corner case that happens to work right now may be broken by a call > > to die("BUG"). > > > > I am leaning toward including the former in the upcoming release, > > whose -rc0 is tentatively scheduled to happen on Apr 20th. I think > > the rest of the system is also ready for the latter (back when we > > merged it to 'next' and started cooking, there were still a few > > codepaths that triggered its die(), which have been fixed). > > > > Opinions? > > Google has been running with both of these for a while. Any problems > we ran into were already reported and fixed. I would be all for > including them in the next release. Thanks, I was wondering how much exposure the latter got. It might be a good idea to merge it to "master" early in the post-2.13 cycle to get a little more exposure (since the point of it is really to flush out unusual cases, the more people run it before we make a release the better). But I'm also OK if it's merged to master this cycle, as long as it's soon-ish. It's much better to flush out problems in pre-release master than in a released version. -Peff
Re: [PATCH v2 00/21] object_id part 7
Jeff King writes: > Here's that minor tweak, in case anybody is interested. It's less useful > without that follow-on that touches "eol" more, but perhaps it increases > readability on its own. Yup, the only thing that the original (with Brian's fix) appears to be more careful about is it tries very hard to avoid setting boc past eoc. As we are not checking "boc != eoc" but doing the comparison, that "careful" appearance does not give us any benefit in practice, other than having to do an extra "eol ? eol+1 : eoc"; the result of this patch is easier to read. By the way, eoc is "one past the end" of the array that begins at boc, so setting a pointer to eoc+1 may technically be in violation. I do not know how much it matters, though ;-) > -- >8 -- > Subject: [PATCH] receive-pack: simplify eol handling in cert parsing > > The queue_commands_from_cert() function wants to handle each > line of the cert individually. It looks for "\n" in the > to-be-parsed bytes, and special-cases each use of eol (the > end-of-line variable) when we didn't find one. Instead, we > can just set the end-of-line variable to end-of-cert in the > latter case. > > For advancing to the next line, it's OK for us to move our > pointer past end-of-cert, because our loop condition just > checks for pointer inequality. And it doesn't even violate > the ANSI C "no more than one past the end of an array" rule, > because we know in the worst case we've hit the terminating > NUL of the strbuf. > > Signed-off-by: Jeff King > --- > builtin/receive-pack.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c > index 5d9e4da0a..58de2a1a9 100644 > --- a/builtin/receive-pack.c > +++ b/builtin/receive-pack.c > @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command > **tail, > > while (boc < eoc) { > const char *eol = memchr(boc, '\n', eoc - boc); > - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc); > - boc = eol ? eol + 1 : eoc; > + if (!eol) > + eol = eoc; > + tail = queue_command(tail, boc, eol - boc); > + boc = eol + 1; > } > }
[PATCH 0/18] snprintf cleanups
Our code base calls snprintf() into a fixed-size buffer in a bunch of places. Sometimes we check the result, and sometimes we accept a silent truncation. In some cases an overflow is easy given long input. In some cases it's impossible. And in some cases it depends on how big PATH_MAX is on your filesystem, and whether it's actually enforced. :) This series attempts to give more predictable and consistent results by removing arbitrary buffer limitations. It also tries to make further audits of snprintf() easier by converting to xsnprintf() where appropriate. There are still some snprintf() calls left after this. A few are in code that's in flux, or is being cleaned up in nearby series (several of my recent cleanup series were split off from this). A few should probably remain (e.g., git-daemon will refuse to consider a repo name larger than PATH_MAX, which may be a reasonable defense against weird memory tricks. I wouldn't be sad to see this turned into a strbuf with an explicit length policy enforced separately, though). And there were a few that I just didn't get around to converting (the dumb-http walker, for example, but I think it may need a pretty involved audit overall). It's a lot of patches, but hopefully they're all pretty straightforward to read. [01/18]: do not check odb_mkstemp return value for errors [02/18]: odb_mkstemp: write filename into strbuf [03/18]: odb_mkstemp: use git_path_buf [04/18]: diff: avoid fixed-size buffer for patch-ids [05/18]: tag: use strbuf to format tag header [06/18]: fetch: use heap buffer to format reflog [07/18]: avoid using fixed PATH_MAX buffers for refs [08/18]: avoid using mksnpath for refs [09/18]: create_branch: move msg setup closer to point of use [10/18]: create_branch: use xstrfmt for reflog message [11/18]: name-rev: replace static buffer with strbuf [12/18]: receive-pack: print --pack-header directly into argv array [13/18]: replace unchecked snprintf calls with heap buffers [14/18]: combine-diff: replace malloc/snprintf with xstrfmt [15/18]: convert unchecked snprintf into xsnprintf [16/18]: transport-helper: replace checked snprintf with xsnprintf [17/18]: gc: replace local buffer with git_path [18/18]: daemon: use an argv_array to exec children bisect.c | 8 +--- branch.c | 16 builtin/checkout.c | 5 ++--- builtin/fetch.c| 6 -- builtin/gc.c | 8 +--- builtin/index-pack.c | 22 -- builtin/ls-remote.c| 10 ++ builtin/name-rev.c | 21 - builtin/notes.c| 9 - builtin/receive-pack.c | 17 ++--- builtin/replace.c | 50 +++--- builtin/rev-parse.c| 5 +++-- builtin/tag.c | 42 ++ cache.h| 7 +-- combine-diff.c | 7 --- daemon.c | 38 +- diff.c | 20 +--- environment.c | 14 ++ fast-import.c | 9 + grep.c | 4 ++-- http.c | 10 +- imap-send.c| 2 +- pack-bitmap-write.c| 14 +++--- pack-write.c | 16 refs.c | 44 ++-- sha1_file.c| 4 ++-- submodule.c| 2 +- transport-helper.c | 5 + 28 files changed, 215 insertions(+), 200 deletions(-)
[PATCH 01/18] do not check odb_mkstemp return value for errors
The odb_mkstemp function does not return an error; it dies on failure instead. But many of its callers compare the resulting descriptor against -1 and die themselves. Mostly this is just pointless, but it does raise a question when looking at the callers: if they show the results of the "template" buffer after a failure, what's in it? The answer is: it doesn't matter, because it cannot happen. So let's make that clear by removing the bogus error checks. In bitmap_writer_finish(), we can drop the error-handling code entirely. In the other two cases, it's shared with the open() in another code path; we can just move the error-check next to that open() call. And while we're at it, let's flesh out the function's docstring a bit to make the error behavior clear. Signed-off-by: Jeff King --- builtin/index-pack.c | 7 --- cache.h | 5 - pack-bitmap-write.c | 2 -- pack-write.c | 4 ++-- 4 files changed, 10 insertions(+), 8 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 88d205f85..49e7175d9 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -311,10 +311,11 @@ static const char *open_pack_file(const char *pack_name) output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_pack_XX"); pack_name = xstrdup(tmp_file); - } else + } else { output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600); - if (output_fd < 0) - die_errno(_("unable to create '%s'"), pack_name); + if (output_fd < 0) + die_errno(_("unable to create '%s'"), pack_name); + } nothread_data.pack_fd = output_fd; } else { input_fd = open(pack_name, O_RDONLY); diff --git a/cache.h b/cache.h index db4120c23..acad7078d 100644 --- a/cache.h +++ b/cache.h @@ -1673,7 +1673,10 @@ extern struct packed_git *find_sha1_pack(const unsigned char *sha1, extern void pack_report(void); /* - * Create a temporary file rooted in the object database directory. + * Create a temporary file rooted in the object database directory, or + * die on failure. The filename is taken from "pattern", which should have the + * usual "XX" trailer, and the resulting filename is written into the + * "template" buffer. Returns the open descriptor. */ extern int odb_mkstemp(char *template, size_t limit, const char *pattern); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 970559601..44492c346 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -517,8 +517,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index, int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), "pack/tmp_bitmap_XX"); - if (fd < 0) - die_errno("unable to create '%s'", tmp_file); f = sha1fd(fd, tmp_file); memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); diff --git a/pack-write.c b/pack-write.c index 88bc7f9f7..19cb514ea 100644 --- a/pack-write.c +++ b/pack-write.c @@ -77,9 +77,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec } else { unlink(index_name); fd = open(index_name, O_CREAT|O_EXCL|O_WRONLY, 0600); + if (fd < 0) + die_errno("unable to create '%s'", index_name); } - if (fd < 0) - die_errno("unable to create '%s'", index_name); f = sha1fd(fd, index_name); } -- 2.12.2.845.g55fcf8b10
[PATCH 02/18] odb_mkstemp: write filename into strbuf
The odb_mkstemp() function expects the caller to provide a fixed buffer to write the resulting tempfile name into. But it creates the template using snprintf without checking the return value. This means we could silently truncate the filename. In practice, it's unlikely that the truncation would end in the template-pattern that mkstemp needs to open the file. So we'd probably end up failing either way, unless the path was specially crafted. The simplest fix would be to notice the truncation and die. However, we can observe that most callers immediately xstrdup() the result anyway. So instead, let's switch to using a strbuf, which is easier for them (and isn't a big deal for the other 2 callers, who can just strbuf_release when they're done with it). Note that many of the callers used static buffers, but this was purely to avoid putting a large buffer on the stack. We never passed the static buffers out of the function, so there's no complicated memory handling we need to change. Signed-off-by: Jeff King --- builtin/index-pack.c | 6 +++--- cache.h | 2 +- environment.c| 16 fast-import.c| 9 + pack-bitmap-write.c | 12 +++- pack-write.c | 12 ++-- 6 files changed, 30 insertions(+), 27 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 49e7175d9..f4af2ab37 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -307,10 +307,10 @@ static const char *open_pack_file(const char *pack_name) if (from_stdin) { input_fd = 0; if (!pack_name) { - static char tmp_file[PATH_MAX]; - output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file), + struct strbuf tmp_file = STRBUF_INIT; + output_fd = odb_mkstemp(&tmp_file, "pack/tmp_pack_XX"); - pack_name = xstrdup(tmp_file); + pack_name = strbuf_detach(&tmp_file, NULL); } else { output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 0600); if (output_fd < 0) diff --git a/cache.h b/cache.h index acad7078d..1c1889428 100644 --- a/cache.h +++ b/cache.h @@ -1678,7 +1678,7 @@ extern void pack_report(void); * usual "XX" trailer, and the resulting filename is written into the * "template" buffer. Returns the open descriptor. */ -extern int odb_mkstemp(char *template, size_t limit, const char *pattern); +extern int odb_mkstemp(struct strbuf *template, const char *pattern); /* * Generate the filename to be used for a pack file with checksum "sha1" and diff --git a/environment.c b/environment.c index 2fdba7622..88276790d 100644 --- a/environment.c +++ b/environment.c @@ -274,7 +274,7 @@ char *get_object_directory(void) return git_object_dir; } -int odb_mkstemp(char *template, size_t limit, const char *pattern) +int odb_mkstemp(struct strbuf *template, const char *pattern) { int fd; /* @@ -282,18 +282,18 @@ int odb_mkstemp(char *template, size_t limit, const char *pattern) * restrictive except to remove write permission. */ int mode = 0444; - snprintf(template, limit, "%s/%s", -get_object_directory(), pattern); - fd = git_mkstemp_mode(template, mode); + strbuf_reset(template); + strbuf_addf(template, "%s/%s", get_object_directory(), pattern); + fd = git_mkstemp_mode(template->buf, mode); if (0 <= fd) return fd; /* slow path */ /* some mkstemp implementations erase template on failure */ - snprintf(template, limit, "%s/%s", -get_object_directory(), pattern); - safe_create_leading_directories(template); - return xmkstemp_mode(template, mode); + strbuf_reset(template); + strbuf_addf(template, "%s/%s", get_object_directory(), pattern); + safe_create_leading_directories(template->buf); + return xmkstemp_mode(template->buf, mode); } int odb_pack_keep(const char *name) diff --git a/fast-import.c b/fast-import.c index 41a539f97..900b9626f 100644 --- a/fast-import.c +++ b/fast-import.c @@ -890,14 +890,15 @@ static struct tree_content *dup_tree_content(struct tree_content *s) static void start_packfile(void) { - static char tmp_file[PATH_MAX]; + struct strbuf tmp_file = STRBUF_INIT; struct packed_git *p; struct pack_header hdr; int pack_fd; - pack_fd = odb_mkstemp(tmp_file, sizeof(tmp_file), - "pack/tmp_pack_XX"); - FLEX_ALLOC_STR(p, pack_name, tmp_file); + pack_fd = odb_mkstemp(&tmp_file, "pack/tmp_pack_XX"); + FLEX_ALLOC_STR(p, pack_name, tmp_file.buf); + strbuf_release(&tmp_file); + p->pack_fd = pack_fd; p->do_not_close = 1; pack_file = sha1fd(pack_fd, p
[PATCH 03/18] odb_mkstemp: use git_path_buf
Since git_path_buf() is smart enough to replace "objects/" with the correct object path, we can use it instead of manually assembling the path. That's slightly shorter, and will clean up any non-canonical bits in the path. Signed-off-by: Jeff King --- environment.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/environment.c b/environment.c index 88276790d..a9bf5658a 100644 --- a/environment.c +++ b/environment.c @@ -282,16 +282,14 @@ int odb_mkstemp(struct strbuf *template, const char *pattern) * restrictive except to remove write permission. */ int mode = 0444; - strbuf_reset(template); - strbuf_addf(template, "%s/%s", get_object_directory(), pattern); + git_path_buf(template, "objects/%s", pattern); fd = git_mkstemp_mode(template->buf, mode); if (0 <= fd) return fd; /* slow path */ /* some mkstemp implementations erase template on failure */ - strbuf_reset(template); - strbuf_addf(template, "%s/%s", get_object_directory(), pattern); + git_path_buf(template, "objects/%s", pattern); safe_create_leading_directories(template->buf); return xmkstemp_mode(template->buf, mode); } -- 2.12.2.845.g55fcf8b10
[PATCH 09/18] create_branch: move msg setup closer to point of use
In create_branch() we write the reflog msg into a buffer in the main function, but then use it only inside a conditional. If you carefully follow the logic, you can confirm that we never use the buffer uninitialized nor write when it would not be used. But we can make this a lot more obvious by simply moving the write step inside the conditional. Signed-off-by: Jeff King --- branch.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/branch.c b/branch.c index 5c12036b0..6d0ca94cc 100644 --- a/branch.c +++ b/branch.c @@ -234,7 +234,7 @@ void create_branch(const char *name, const char *start_name, { struct commit *commit; unsigned char sha1[20]; - char *real_ref, msg[PATH_MAX + 20]; + char *real_ref; struct strbuf ref = STRBUF_INIT; int forcing = 0; int dont_change_ref = 0; @@ -290,19 +290,20 @@ void create_branch(const char *name, const char *start_name, die(_("Not a valid branch point: '%s'."), start_name); hashcpy(sha1, commit->object.oid.hash); - if (forcing) - snprintf(msg, sizeof msg, "branch: Reset to %s", -start_name); - else if (!dont_change_ref) - snprintf(msg, sizeof msg, "branch: Created from %s", -start_name); - if (reflog) log_all_ref_updates = LOG_REFS_NORMAL; if (!dont_change_ref) { struct ref_transaction *transaction; struct strbuf err = STRBUF_INIT; + char msg[PATH_MAX + 20]; + + if (forcing) + snprintf(msg, sizeof msg, "branch: Reset to %s", +start_name); + else + snprintf(msg, sizeof msg, "branch: Created from %s", +start_name); transaction = ref_transaction_begin(&err); if (!transaction || -- 2.12.2.845.g55fcf8b10
[PATCH 12/18] receive-pack: print --pack-header directly into argv array
After receive-pack reads the pack header from the client, it feeds the already-read part to index-pack and unpack-objects via their --pack-header command-line options. To do so, we format it into a fixed buffer, then duplicate it into the child's argv_array. Our buffer is long enough to handle any possible input, so this isn't wrong. But it's more complicated than it needs to be; we can just argv_array_pushf() the final value and avoid the intermediate copy. This drops the magic number and is more efficient, too. Note that we need to push to the argv_array in order, which means we can't do the push until we are in the "unpack-objects versus index-pack" conditional. Rather than duplicate the slightly complicated format specifier, I pushed it into a helper function. Signed-off-by: Jeff King --- builtin/receive-pack.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index fb2a090a0..2ca93adef 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1634,12 +1634,17 @@ static const char *parse_pack_header(struct pack_header *hdr) static const char *pack_lockfile; +static void push_header_arg(struct argv_array *args, struct pack_header *hdr) +{ + argv_array_pushf(args, "--pack_header=%"PRIu32",%"PRIu32, + ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries)); +} + static const char *unpack(int err_fd, struct shallow_info *si) { struct pack_header hdr; const char *hdr_err; int status; - char hdr_arg[38]; struct child_process child = CHILD_PROCESS_INIT; int fsck_objects = (receive_fsck_objects >= 0 ? receive_fsck_objects @@ -1653,9 +1658,6 @@ static const char *unpack(int err_fd, struct shallow_info *si) close(err_fd); return hdr_err; } - snprintf(hdr_arg, sizeof(hdr_arg), - "--pack_header=%"PRIu32",%"PRIu32, - ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries)); if (si->nr_ours || si->nr_theirs) { alt_shallow_file = setup_temporary_shallow(si->shallow); @@ -1679,7 +1681,8 @@ static const char *unpack(int err_fd, struct shallow_info *si) tmp_objdir_add_as_alternate(tmp_objdir); if (ntohl(hdr.hdr_entries) < unpack_limit) { - argv_array_pushl(&child.args, "unpack-objects", hdr_arg, NULL); + argv_array_push(&child.args, "unpack-objects"); + push_header_arg(&child.args, &hdr); if (quiet) argv_array_push(&child.args, "-q"); if (fsck_objects) @@ -1697,8 +1700,8 @@ static const char *unpack(int err_fd, struct shallow_info *si) } else { char hostname[256]; - argv_array_pushl(&child.args, "index-pack", -"--stdin", hdr_arg, NULL); + argv_array_pushl(&child.args, "index-pack", "--stdin", NULL); + push_header_arg(&child.args, &hdr); if (gethostname(hostname, sizeof(hostname))) xsnprintf(hostname, sizeof(hostname), "localhost"); -- 2.12.2.845.g55fcf8b10
[PATCH 04/18] diff: avoid fixed-size buffer for patch-ids
To generate a patch id, we format the diff header into a fixed-size buffer, and then feed the result to our sha1 computation. The fixed buffer has size '4*PATH_MAX + 20', which in theory accomodates the four filenames plus some extra data. Except: 1. The filenames may not be constrained to PATH_MAX. The static value may not be a real limit on the current filesystem. Moreover, we may compute patch-ids for names stored only in git, without touching the current filesystem at all. 2. The 20 bytes is not nearly enough to cover the extra content we put in the buffer. As a result, the data we feed to the sha1 computation may be truncated, and it's possible that a commit with a very long filename could erroneously collide in the patch-id space with another commit. For instance, if one commit modified "really-long-filename/foo" and another modified "bar" in the same directory. In practice this is unlikely. Because the filenames are repeated, and because there's a single cutoff at the end of the buffer, the offending filename would have to be on the order of four times larger than PATH_MAX. But it's easy to fix by moving to a strbuf. Technically this may change the output of patch-id for very long filenames, but it's not worth making an exception for this in the --stable output. It was a bug, and one that only affected an unlikely set of paths. And anyway, the exact value would have varied from platform to platform depending on the value of PATH_MAX, so there is no "stable" value. Signed-off-by: Jeff King --- diff.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/diff.c b/diff.c index 58cb72d7e..89b5dc890 100644 --- a/diff.c +++ b/diff.c @@ -4577,7 +4577,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int i; git_SHA_CTX ctx; struct patch_id_t data; - char buffer[PATH_MAX * 4 + 20]; + struct strbuf buffer = STRBUF_INIT; git_SHA1_Init(&ctx); memset(&data, 0, sizeof(struct patch_id_t)); @@ -4607,10 +4607,11 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, diff_fill_sha1_info(p->one); diff_fill_sha1_info(p->two); + strbuf_reset(&buffer); len1 = remove_space(p->one->path, strlen(p->one->path)); len2 = remove_space(p->two->path, strlen(p->two->path)); if (p->one->mode == 0) - len1 = snprintf(buffer, sizeof(buffer), + strbuf_addf(&buffer, "diff--gita/%.*sb/%.*s" "newfilemode%06o" "---/dev/null" @@ -4620,7 +4621,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, p->two->mode, len2, p->two->path); else if (p->two->mode == 0) - len1 = snprintf(buffer, sizeof(buffer), + strbuf_addf(&buffer, "diff--gita/%.*sb/%.*s" "deletedfilemode%06o" "---a/%.*s" @@ -4630,7 +4631,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, p->one->mode, len1, p->one->path); else - len1 = snprintf(buffer, sizeof(buffer), + strbuf_addf(&buffer, "diff--gita/%.*sb/%.*s" "---a/%.*s" "+++b/%.*s", @@ -4638,14 +4639,16 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, len2, p->two->path, len1, p->one->path, len2, p->two->path); - git_SHA1_Update(&ctx, buffer, len1); + git_SHA1_Update(&ctx, buffer.buf, buffer.len); if (diff_header_only) continue; if (fill_mmfile(&mf1, p->one) < 0 || - fill_mmfile(&mf2, p->two) < 0) + fill_mmfile(&mf2, p->two) < 0) { + strbuf_release(&buffer); return error("unable to read files to diff"); + } if (diff_filespec_is_binary(p->one) || diff_filespec_is_binary(p->two)) { @@ -4660,11 +4663,14 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, xecfg.ctxlen = 3; xecfg.flags = 0; if (xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data, -
[PATCH 13/18] replace unchecked snprintf calls with heap buffers
We'd prefer to avoid unchecked snprintf calls because truncation can lead to unexpected results. These are all cases where truncation shouldn't ever happen, because the input to snprintf is fixed in size. That makes them candidates for xsnprintf(), but it's simpler still to just use the heap, and then nobody has to wonder if "100" is big enough. We'll use xstrfmt() where possible, and a strbuf when we need the resulting size or to reuse the same buffer in a loop. Signed-off-by: Jeff King --- bisect.c | 8 +--- builtin/index-pack.c | 9 + builtin/notes.c | 9 - builtin/rev-parse.c | 5 +++-- 4 files changed, 17 insertions(+), 14 deletions(-) diff --git a/bisect.c b/bisect.c index 30808cadf..d12583eaa 100644 --- a/bisect.c +++ b/bisect.c @@ -200,6 +200,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n { struct commit_list *p; struct commit_dist *array = xcalloc(nr, sizeof(*array)); + struct strbuf buf = STRBUF_INIT; int cnt, i; for (p = list, cnt = 0; p; p = p->next) { @@ -217,17 +218,18 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n } QSORT(array, cnt, compare_commit_dist); for (p = list, i = 0; i < cnt; i++) { - char buf[100]; /* enough for dist=%d */ struct object *obj = &(array[i].commit->object); - snprintf(buf, sizeof(buf), "dist=%d", array[i].distance); - add_name_decoration(DECORATION_NONE, buf, obj); + strbuf_reset(&buf); + strbuf_addf(&buf, "dist=%d", array[i].distance); + add_name_decoration(DECORATION_NONE, buf.buf, obj); p->item = array[i].commit; p = p->next; } if (p) p->next = NULL; + strbuf_release(&buf); free(array); return list; } diff --git a/builtin/index-pack.c b/builtin/index-pack.c index f4af2ab37..197c51912 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1443,10 +1443,11 @@ static void final(const char *final_pack_name, const char *curr_pack_name, if (!from_stdin) { printf("%s\n", sha1_to_hex(sha1)); } else { - char buf[48]; - int len = snprintf(buf, sizeof(buf), "%s\t%s\n", - report, sha1_to_hex(sha1)); - write_or_die(1, buf, len); + struct strbuf buf = STRBUF_INIT; + + strbuf_addf(&buf, "%s\t%s\n", report, sha1_to_hex(sha1)); + write_or_die(1, buf.buf, buf.len); + strbuf_release(&buf); /* * Let's just mimic git-unpack-objects here and write diff --git a/builtin/notes.c b/builtin/notes.c index 0513f7455..7b891471c 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -554,7 +554,7 @@ static int append_edit(int argc, const char **argv, const char *prefix) struct notes_tree *t; unsigned char object[20], new_note[20]; const unsigned char *note; - char logmsg[100]; + char *logmsg; const char * const *usage; struct note_data d = { 0, 0, NULL, STRBUF_INIT }; struct option options[] = { @@ -618,17 +618,16 @@ static int append_edit(int argc, const char **argv, const char *prefix) write_note_data(&d, new_note); if (add_note(t, object, new_note, combine_notes_overwrite)) die("BUG: combine_notes_overwrite failed"); - snprintf(logmsg, sizeof(logmsg), "Notes added by 'git notes %s'", - argv[0]); + logmsg = xstrfmt("Notes added by 'git notes %s'", argv[0]); } else { fprintf(stderr, _("Removing note for object %s\n"), sha1_to_hex(object)); remove_note(t, object); - snprintf(logmsg, sizeof(logmsg), "Notes removed by 'git notes %s'", - argv[0]); + logmsg = xstrfmt("Notes removed by 'git notes %s'", argv[0]); } commit_notes(t, logmsg); + free(logmsg); free_note_data(&d); free_notes(t); return 0; diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c index 9e53a1a7c..f54d7b502 100644 --- a/builtin/rev-parse.c +++ b/builtin/rev-parse.c @@ -213,13 +213,14 @@ static int show_abbrev(const unsigned char *sha1, void *cb_data) static void show_datestring(const char *flag, const char *datestr) { - static char buffer[100]; + char *buffer; /* date handling requires both flags and revs */ if ((filter & (DO_FLAGS | DO_REVS)) != (DO_FLAGS | DO_REVS)) return; - snprintf(buffer, sizeof(buffer), "%s%lu", flag, approxidate(datestr)); + buffer = xstrfmt("%s%lu", flag, approxidate(datestr)); show(buffer); + free(buffer); } static int sh
[PATCH 06/18] fetch: use heap buffer to format reflog
Part of the reflog content comes from the environment, which can be much larger than our fixed buffer. Let's use a heap buffer so we avoid truncating it. Signed-off-by: Jeff King --- builtin/fetch.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index b5ad09d04..4ef7a08af 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -421,7 +421,7 @@ static int s_update_ref(const char *action, struct ref *ref, int check_old) { - char msg[1024]; + char *msg; char *rla = getenv("GIT_REFLOG_ACTION"); struct ref_transaction *transaction; struct strbuf err = STRBUF_INIT; @@ -431,7 +431,7 @@ static int s_update_ref(const char *action, return 0; if (!rla) rla = default_rla.buf; - snprintf(msg, sizeof(msg), "%s: %s", rla, action); + msg = xstrfmt("%s: %s", rla, action); transaction = ref_transaction_begin(&err); if (!transaction || @@ -449,11 +449,13 @@ static int s_update_ref(const char *action, ref_transaction_free(transaction); strbuf_release(&err); + free(msg); return 0; fail: ref_transaction_free(transaction); error("%s", err.buf); strbuf_release(&err); + free(msg); return df_conflict ? STORE_REF_ERROR_DF_CONFLICT : STORE_REF_ERROR_OTHER; } -- 2.12.2.845.g55fcf8b10
[PATCH 07/18] avoid using fixed PATH_MAX buffers for refs
Many functions which handle refs use a PATH_MAX-sized buffer to do so. This is mostly reasonable as we have to write loose refs into the filesystem, and at least on Linux the 4K PATH_MAX is big enough that nobody would care. But: 1. The static PATH_MAX is not always the filesystem limit. 2. On other platforms, PATH_MAX may be much smaller. 3. As we move to alternate ref storage, we won't be bound by filesystem limits. Let's convert these to heap buffers so we don't have to worry about truncation or size limits. We may want to eventually constrain ref lengths for sanity and to prevent malicious names, but we should do so consistently across all platforms, and in a central place (like the ref code). Signed-off-by: Jeff King --- builtin/checkout.c | 5 ++--- builtin/ls-remote.c | 10 ++ builtin/replace.c | 50 +++--- builtin/tag.c | 15 ++- 4 files changed, 41 insertions(+), 39 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 81f07c3ef..93e8dfc82 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -889,11 +889,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data) static const char *unique_tracking_name(const char *name, struct object_id *oid) { struct tracking_name_data cb_data = { NULL, NULL, NULL, 1 }; - char src_ref[PATH_MAX]; - snprintf(src_ref, PATH_MAX, "refs/heads/%s", name); - cb_data.src_ref = src_ref; + cb_data.src_ref = xstrfmt("refs/heads/%s", name); cb_data.dst_oid = oid; for_each_remote(check_tracking_name, &cb_data); + free(cb_data.src_ref); if (cb_data.unique) return cb_data.dst_ref; free(cb_data.dst_ref); diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 66cdd45cc..b2d7d5ce6 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -17,17 +17,19 @@ static const char * const ls_remote_usage[] = { static int tail_match(const char **pattern, const char *path) { const char *p; - char pathbuf[PATH_MAX]; + char *pathbuf; if (!pattern) return 1; /* no restriction */ - if (snprintf(pathbuf, sizeof(pathbuf), "/%s", path) > sizeof(pathbuf)) - return error("insanely long ref %.*s...", 20, path); + pathbuf = xstrfmt("/%s", path); while ((p = *(pattern++)) != NULL) { - if (!wildmatch(p, pathbuf, 0, NULL)) + if (!wildmatch(p, pathbuf, 0, NULL)) { + free(pathbuf); return 1; + } } + free(pathbuf); return 0; } diff --git a/builtin/replace.c b/builtin/replace.c index f83e7b8fc..065515bab 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const char *ref, static int for_each_replace_name(const char **argv, each_replace_name_fn fn) { const char **p, *full_hex; - char ref[PATH_MAX]; + struct strbuf ref = STRBUF_INIT; + size_t base_len; int had_error = 0; struct object_id oid; + strbuf_addstr(&ref, git_replace_ref_base); + base_len = ref.len; + for (p = argv; *p; p++) { if (get_oid(*p, &oid)) { error("Failed to resolve '%s' as a valid ref.", *p); had_error = 1; continue; } - full_hex = oid_to_hex(&oid); - snprintf(ref, sizeof(ref), "%s%s", git_replace_ref_base, full_hex); - /* read_ref() may reuse the buffer */ - full_hex = ref + strlen(git_replace_ref_base); - if (read_ref(ref, oid.hash)) { + + strbuf_setlen(&ref, base_len); + strbuf_addstr(&ref, oid_to_hex(&oid)); + full_hex = ref.buf + base_len; + + if (read_ref(ref.buf, oid.hash)) { error("replace ref '%s' not found.", full_hex); had_error = 1; continue; } - if (fn(full_hex, ref, &oid)) + if (fn(full_hex, ref.buf, &oid)) had_error = 1; } return had_error; @@ -129,21 +134,18 @@ static int delete_replace_ref(const char *name, const char *ref, static void check_ref_valid(struct object_id *object, struct object_id *prev, - char *ref, - int ref_size, + struct strbuf *ref, int force) { - if (snprintf(ref, ref_size, -"%s%s", git_replace_ref_base, -oid_to_hex(object)) > ref_size - 1) - die("replace ref name too long: %.*s...", 50, ref); - if (check_refname_format(ref, 0)) - die("'%s' is not a
[PATCH 15/18] convert unchecked snprintf into xsnprintf
These calls to snprintf should always succeed, because their input is small and fixed. Let's use xsnprintf to make sure this is the case (and to make auditing for actual truncation easier). These could be candidates for turning into heap buffers, but they fall into a few broad categories that make it not worth doing: - formatting single numbers is simple enough that we can see the result should fit - the size of a sha1 is likewise well-known, and I didn't want to cause unnecessary conflicts with the ongoing process to convert these constants to GIT_MAX_HEXSZ - the interface for curl_errorstr is dictated by curl Signed-off-by: Jeff King --- grep.c | 4 ++-- http.c | 10 +- imap-send.c | 2 +- sha1_file.c | 4 ++-- submodule.c | 2 +- 5 files changed, 11 insertions(+), 11 deletions(-) diff --git a/grep.c b/grep.c index 0dbdc1d00..39b4b60d2 100644 --- a/grep.c +++ b/grep.c @@ -1164,7 +1164,7 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, } if (opt->linenum) { char buf[32]; - snprintf(buf, sizeof(buf), "%d", lno); + xsnprintf(buf, sizeof(buf), "%d", lno); output_color(opt, buf, strlen(buf), opt->color_lineno); output_sep(opt, sign); } @@ -1651,7 +1651,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle opt->color_filename); output_sep(opt, ':'); } - snprintf(buf, sizeof(buf), "%u\n", count); + xsnprintf(buf, sizeof(buf), "%u\n", count); opt->output(opt, buf, strlen(buf)); return 1; } diff --git a/http.c b/http.c index 96d84bbed..8d94e2c63 100644 --- a/http.c +++ b/http.c @@ -1366,9 +1366,9 @@ static int handle_curl_result(struct slot_results *results) * FAILONERROR it is lost, so we can give only the numeric * status code. */ - snprintf(curl_errorstr, sizeof(curl_errorstr), -"The requested URL returned error: %ld", -results->http_code); + xsnprintf(curl_errorstr, sizeof(curl_errorstr), + "The requested URL returned error: %ld", + results->http_code); } if (results->curl_result == CURLE_OK) { @@ -1410,8 +1410,8 @@ int run_one_slot(struct active_request_slot *slot, { slot->results = results; if (!start_active_slot(slot)) { - snprintf(curl_errorstr, sizeof(curl_errorstr), -"failed to start HTTP request"); + xsnprintf(curl_errorstr, sizeof(curl_errorstr), + "failed to start HTTP request"); return HTTP_START_FAILED; } diff --git a/imap-send.c b/imap-send.c index 5c7e27a89..857591660 100644 --- a/imap-send.c +++ b/imap-send.c @@ -964,7 +964,7 @@ static struct imap_store *imap_open_store(struct imap_server_conf *srvc, char *f int gai; char portstr[6]; - snprintf(portstr, sizeof(portstr), "%d", srvc->port); + xsnprintf(portstr, sizeof(portstr), "%d", srvc->port); memset(&hints, 0, sizeof(hints)); hints.ai_socktype = SOCK_STREAM; diff --git a/sha1_file.c b/sha1_file.c index 71063890f..43990dec7 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3762,8 +3762,8 @@ static int for_each_file_in_obj_subdir(int subdir_nr, char hex[GIT_SHA1_HEXSZ+1]; struct object_id oid; - snprintf(hex, sizeof(hex), "%02x%s", -subdir_nr, de->d_name); + xsnprintf(hex, sizeof(hex), "%02x%s", + subdir_nr, de->d_name); if (!get_oid_hex(hex, &oid)) { if (obj_cb) { r = obj_cb(&oid, path->buf, data); diff --git a/submodule.c b/submodule.c index 3200b7bb2..e11ea7d0a 100644 --- a/submodule.c +++ b/submodule.c @@ -1221,7 +1221,7 @@ static int find_first_merges(struct object_array *result, const char *path, memset(&rev_opts, 0, sizeof(rev_opts)); /* get all revisions that merge commit a */ - snprintf(merged_revision, sizeof(merged_revision), "^%s", + xsnprintf(merged_revision, sizeof(merged_revision), "^%s", oid_to_hex(&a->object.oid)); init_revisions(&revs, NULL); rev_opts.submodule = path; -- 2.12.2.845.g55fcf8b10
[PATCH 08/18] avoid using mksnpath for refs
Like the previous commit, we'd like to avoid the assumption that refs fit into PATH_MAX-sized buffers. These callsites have an extra twist, though: they write the refnames using mksnpath. This does two things beyond a regular snprintf: 1. It quietly writes "/bad-path/" when truncation occurs. This saves the caller having to check the error code, but if you aren't actually feeding the result to a system call (and we aren't here), it's questionable. 2. It calls cleanup_path(), which removes leading instances of "./". That's questionable when dealing with refnames, as we could silently canonicalize a syntactically bogus refname into a valid one. Let's convert each case to use a strbuf. This is preferable to xstrfmt() because we can reuse the same buffer as we loop. Signed-off-by: Jeff King --- refs.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/refs.c b/refs.c index e7606716d..9de6979ac 100644 --- a/refs.c +++ b/refs.c @@ -429,29 +429,31 @@ int expand_ref(const char *str, int len, unsigned char *sha1, char **ref) { const char **p, *r; int refs_found = 0; + struct strbuf fullref = STRBUF_INIT; *ref = NULL; for (p = ref_rev_parse_rules; *p; p++) { - char fullref[PATH_MAX]; unsigned char sha1_from_ref[20]; unsigned char *this_result; int flag; this_result = refs_found ? sha1_from_ref : sha1; - mksnpath(fullref, sizeof(fullref), *p, len, str); - r = resolve_ref_unsafe(fullref, RESOLVE_REF_READING, + strbuf_reset(&fullref); + strbuf_addf(&fullref, *p, len, str); + r = resolve_ref_unsafe(fullref.buf, RESOLVE_REF_READING, this_result, &flag); if (r) { if (!refs_found++) *ref = xstrdup(r); if (!warn_ambiguous_refs) break; - } else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) { - warning("ignoring dangling symref %s.", fullref); - } else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) { - warning("ignoring broken ref %s.", fullref); + } else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, "HEAD")) { + warning("ignoring dangling symref %s.", fullref.buf); + } else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) { + warning("ignoring broken ref %s.", fullref.buf); } } + strbuf_release(&fullref); return refs_found; } @@ -460,21 +462,22 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log) char *last_branch = substitute_branch_name(&str, &len); const char **p; int logs_found = 0; + struct strbuf path = STRBUF_INIT; *log = NULL; for (p = ref_rev_parse_rules; *p; p++) { unsigned char hash[20]; - char path[PATH_MAX]; const char *ref, *it; - mksnpath(path, sizeof(path), *p, len, str); - ref = resolve_ref_unsafe(path, RESOLVE_REF_READING, + strbuf_reset(&path); + strbuf_addf(&path, *p, len, str); + ref = resolve_ref_unsafe(path.buf, RESOLVE_REF_READING, hash, NULL); if (!ref) continue; - if (reflog_exists(path)) - it = path; - else if (strcmp(ref, path) && reflog_exists(ref)) + if (reflog_exists(path.buf)) + it = path.buf; + else if (strcmp(ref, path.buf) && reflog_exists(ref)) it = ref; else continue; @@ -485,6 +488,7 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **log) if (!warn_ambiguous_refs) break; } + strbuf_release(&path); free(last_branch); return logs_found; } @@ -944,6 +948,7 @@ char *shorten_unambiguous_ref(const char *refname, int strict) static char **scanf_fmts; static int nr_rules; char *short_name; + struct strbuf resolved_buf = STRBUF_INIT; if (!nr_rules) { /* @@ -1002,7 +1007,6 @@ char *shorten_unambiguous_ref(const char *refname, int strict) */ for (j = 0; j < rules_to_fail; j++) { const char *rule = ref_rev_parse_rules[j]; - char refname[PATH_MAX]; /* skip matched rule */ if (i == j) @@ -1013,9 +1017,10 @@ char *shorten_unambiguous_ref(const ch
[PATCH 11/18] name-rev: replace static buffer with strbuf
When name-rev needs to format an actual name, we do so into a fixed-size buffer. That includes the actual ref tip, as well as any traversal information. Since refs can exceed 1024 bytes, this means you can get a bogus result. E.g., doing: git tag $(perl -e 'print join("/", 1..1024)') git describe --contains HEAD^ results in ".../282/283", when it should be ".../1023/1024~1". We can solve this by using a heap buffer. We'll use a strbuf, which lets us write into the same buffer from our loop without having to reallocate. Signed-off-by: Jeff King --- builtin/name-rev.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 8bdc3eaa6..92a5d8a5d 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -238,10 +238,9 @@ static const char *get_exact_ref_match(const struct object *o) return NULL; } -/* returns a static buffer */ -static const char *get_rev_name(const struct object *o) +/* may return a constant string or use "buf" as scratch space */ +static const char *get_rev_name(const struct object *o, struct strbuf *buf) { - static char buffer[1024]; struct rev_name *n; struct commit *c; @@ -258,10 +257,9 @@ static const char *get_rev_name(const struct object *o) int len = strlen(n->tip_name); if (len > 2 && !strcmp(n->tip_name + len - 2, "^0")) len -= 2; - snprintf(buffer, sizeof(buffer), "%.*s~%d", len, n->tip_name, - n->generation); - - return buffer; + strbuf_reset(buf); + strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation); + return buf->buf; } } @@ -271,10 +269,11 @@ static void show_name(const struct object *obj, { const char *name; const struct object_id *oid = &obj->oid; + struct strbuf buf = STRBUF_INIT; if (!name_only) printf("%s ", caller_name ? caller_name : oid_to_hex(oid)); - name = get_rev_name(obj); + name = get_rev_name(obj, &buf); if (name) printf("%s\n", name); else if (allow_undefined) @@ -283,6 +282,7 @@ static void show_name(const struct object *obj, printf("%s\n", find_unique_abbrev(oid->hash, DEFAULT_ABBREV)); else die("cannot describe '%s'", oid_to_hex(oid)); + strbuf_release(&buf); } static char const * const name_rev_usage[] = { @@ -294,6 +294,7 @@ static char const * const name_rev_usage[] = { static void name_rev_line(char *p, struct name_ref_data *data) { + struct strbuf buf = STRBUF_INIT; int forty = 0; char *p_start; for (p_start = p; *p; p++) { @@ -314,7 +315,7 @@ static void name_rev_line(char *p, struct name_ref_data *data) struct object *o = lookup_object(sha1); if (o) - name = get_rev_name(o); + name = get_rev_name(o, &buf); } *(p+1) = c; @@ -332,6 +333,8 @@ static void name_rev_line(char *p, struct name_ref_data *data) /* flush */ if (p_start != p) fwrite(p_start, p - p_start, 1, stdout); + + strbuf_release(&buf); } int cmd_name_rev(int argc, const char **argv, const char *prefix) -- 2.12.2.845.g55fcf8b10
[PATCH 10/18] create_branch: use xstrfmt for reflog message
We generate a reflog message that contains some fixed text plus a branch name, and use a buffer of size PATH_MAX + 20. This mostly works if you assume that refnames are shorter than PATH_MAX, but: 1. That's not necessarily true. PATH_MAX is not always the filesystem's limit. 2. The "20" is not sufficiently large for the fixed text anyway. Let's just switch to a heap buffer so we don't have to even care. Signed-off-by: Jeff King --- branch.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/branch.c b/branch.c index 6d0ca94cc..ad5a2299b 100644 --- a/branch.c +++ b/branch.c @@ -296,14 +296,12 @@ void create_branch(const char *name, const char *start_name, if (!dont_change_ref) { struct ref_transaction *transaction; struct strbuf err = STRBUF_INIT; - char msg[PATH_MAX + 20]; + char *msg; if (forcing) - snprintf(msg, sizeof msg, "branch: Reset to %s", -start_name); + msg = xstrfmt("branch: Reset to %s", start_name); else - snprintf(msg, sizeof msg, "branch: Created from %s", -start_name); + msg = xstrfmt("branch: Created from %s", start_name); transaction = ref_transaction_begin(&err); if (!transaction || @@ -314,6 +312,7 @@ void create_branch(const char *name, const char *start_name, die("%s", err.buf); ref_transaction_free(transaction); strbuf_release(&err); + free(msg); } if (real_ref && track) -- 2.12.2.845.g55fcf8b10
[PATCH 05/18] tag: use strbuf to format tag header
We format the tag header into a fixed 1024-byte buffer. But since the tag-name and tagger ident can be arbitrarily large, we may unceremoniously die with "tag header too big". Let's just use a strbuf instead. Note that it looks at first glance like we can just format this directly into the "buf" strbuf where it will ultimately go. But that buffer may already contain the tag message, and we have no easy way to prepend formatted data to a strbuf (we can only splice in an already-generated buffer). This isn't a performance-critical path, so going through an extra buffer isn't a big deal. Signed-off-by: Jeff King --- builtin/tag.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/builtin/tag.c b/builtin/tag.c index ad29be692..045e7ad23 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -231,26 +231,22 @@ static void create_tag(const unsigned char *object, const char *tag, unsigned char *prev, unsigned char *result) { enum object_type type; - char header_buf[1024]; - int header_len; + struct strbuf header = STRBUF_INIT; char *path = NULL; type = sha1_object_info(object, NULL); if (type <= OBJ_NONE) die(_("bad object type.")); - header_len = snprintf(header_buf, sizeof(header_buf), - "object %s\n" - "type %s\n" - "tag %s\n" - "tagger %s\n\n", - sha1_to_hex(object), - typename(type), - tag, - git_committer_info(IDENT_STRICT)); - - if (header_len > sizeof(header_buf) - 1) - die(_("tag header too big.")); + strbuf_addf(&header, + "object %s\n" + "type %s\n" + "tag %s\n" + "tagger %s\n\n", + sha1_to_hex(object), + typename(type), + tag, + git_committer_info(IDENT_STRICT)); if (!opt->message_given) { int fd; @@ -288,7 +284,8 @@ static void create_tag(const unsigned char *object, const char *tag, if (!opt->message_given && !buf->len) die(_("no tag message?")); - strbuf_insert(buf, 0, header_buf, header_len); + strbuf_insert(buf, 0, header.buf, header.len); + strbuf_release(&header); if (build_tag_object(buf, opt->sign, result) < 0) { if (path) -- 2.12.2.845.g55fcf8b10
[PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt
There's no need to use the magic "100" when a strbuf can do it for us. Signed-off-by: Jeff King --- combine-diff.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index 59501db99..d3560573a 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -292,9 +292,10 @@ static char *grab_blob(const struct object_id *oid, unsigned int mode, enum object_type type; if (S_ISGITLINK(mode)) { - blob = xmalloc(100); - *size = snprintf(blob, 100, -"Subproject commit %s\n", oid_to_hex(oid)); + struct strbuf buf = STRBUF_INIT; + strbuf_addf(&buf, "Subproject commit %s\n", oid_to_hex(oid)); + *size = buf.len; + blob = strbuf_detach(&buf, NULL); } else if (is_null_oid(oid)) { /* deleted blob */ *size = 0; -- 2.12.2.845.g55fcf8b10
[PATCH 17/18] gc: replace local buffer with git_path
We probe the "17/" loose object directory for auto-gc, and use a local buffer to format the path. We can just use git_path() for this. It handles paths of any length (reducing our error handling). And because we feed the result straight to a system call, we can just use the static variant. Note that git_path also knows the string "objects/" is special, and will replace it with git_object_directory() when necessary. Another alternative would be to use sha1_file_name() for the pretend object "17...", but that ends up being more hassle for no gain, as we have to truncate the final path component. Signed-off-by: Jeff King --- builtin/gc.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index c2c61a57b..2daede782 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -135,8 +135,6 @@ static int too_many_loose_objects(void) * distributed, we can check only one and get a reasonable * estimate. */ - char path[PATH_MAX]; - const char *objdir = get_object_directory(); DIR *dir; struct dirent *ent; int auto_threshold; @@ -146,11 +144,7 @@ static int too_many_loose_objects(void) if (gc_auto_threshold <= 0) return 0; - if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) { - warning(_("insanely long object directory %.*s"), 50, objdir); - return 0; - } - dir = opendir(path); + dir = opendir(git_path("objects/17")); if (!dir) return 0; -- 2.12.2.845.g55fcf8b10
[PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf
We can use xsnprintf to do our truncation check with less code. The error message isn't as specific, but the point is that this isn't supposed to trigger in the first place (because our buffer is big enough to handle any int). Signed-off-by: Jeff King --- transport-helper.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/transport-helper.c b/transport-helper.c index dc90a1fb7..36408046e 100644 --- a/transport-helper.c +++ b/transport-helper.c @@ -347,14 +347,11 @@ static int set_helper_option(struct transport *transport, static void standard_options(struct transport *t) { char buf[16]; - int n; int v = t->verbose; set_helper_option(t, "progress", t->progress ? "true" : "false"); - n = snprintf(buf, sizeof(buf), "%d", v + 1); - if (n >= sizeof(buf)) - die("impossibly large verbosity value"); + xsnprintf(buf, sizeof(buf), "%d", v + 1); set_helper_option(t, "verbosity", buf); switch (t->family) { -- 2.12.2.845.g55fcf8b10
[PATCH 18/18] daemon: use an argv_array to exec children
Our struct child_process already has its own argv_array. Let's use that to avoid having to format options into separate buffers. Note that we'll need to declare the child process outside of the run_service_command() helper to do this. But that opens up a further simplification, which is that the helper can append to our argument list, saving each caller from specifying "." manually. Signed-off-by: Jeff King --- Of all the patches in this series, this is the one where I was most undecided on whether the result was more readable or not. My ulterior motive, of course, was to get rid of the unchecked snprintf() into timebuf. But it could also just become an xsnprintf() with no further changes. daemon.c | 38 +- 1 file changed, 17 insertions(+), 21 deletions(-) diff --git a/daemon.c b/daemon.c index 473e6b6b6..f70d27b82 100644 --- a/daemon.c +++ b/daemon.c @@ -449,46 +449,42 @@ static void copy_to_log(int fd) fclose(fp); } -static int run_service_command(const char **argv) +static int run_service_command(struct child_process *cld) { - struct child_process cld = CHILD_PROCESS_INIT; - - cld.argv = argv; - cld.git_cmd = 1; - cld.err = -1; - if (start_command(&cld)) + argv_array_push(&cld->args, "."); + cld->git_cmd = 1; + cld->err = -1; + if (start_command(cld)) return -1; close(0); close(1); - copy_to_log(cld.err); + copy_to_log(cld->err); - return finish_command(&cld); + return finish_command(cld); } static int upload_pack(void) { - /* Timeout as string */ - char timeout_buf[64]; - const char *argv[] = { "upload-pack", "--strict", NULL, ".", NULL }; - - argv[2] = timeout_buf; - - snprintf(timeout_buf, sizeof timeout_buf, "--timeout=%u", timeout); - return run_service_command(argv); + struct child_process cld = CHILD_PROCESS_INIT; + argv_array_pushl(&cld.args, "upload-pack", "--strict", NULL); + argv_array_pushf(&cld.args, "--timeout=%u", timeout); + return run_service_command(&cld); } static int upload_archive(void) { - static const char *argv[] = { "upload-archive", ".", NULL }; - return run_service_command(argv); + struct child_process cld = CHILD_PROCESS_INIT; + argv_array_push(&cld.args, "upload-archive"); + return run_service_command(&cld); } static int receive_pack(void) { - static const char *argv[] = { "receive-pack", ".", NULL }; - return run_service_command(argv); + struct child_process cld = CHILD_PROCESS_INIT; + argv_array_push(&cld.args, "receive-pack"); + return run_service_command(&cld); } static struct daemon_service daemon_service[] = { -- 2.12.2.845.g55fcf8b10
Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread
On 3/28/2017 3:16 PM, Jeff King wrote: On Tue, Mar 28, 2017 at 07:07:30PM +, g...@jeffhostetler.com wrote: From: Jeff Hostetler Version 3 of this patch series simplifies this effort to just turn on/off the hash verification using a "core.checksumindex" config variable. I've preserved the original checksum validation code so that we can force it on in fsck if desired. It eliminates the original threading model completely. Jeff Hostetler (2): read-cache: core.checksumindex test-core-checksum-index: core.checksumindex test helper Makefile| 1 + read-cache.c| 12 ++ t/helper/.gitignore | 1 + t/helper/test-core-checksum-index.c | 77 + Do we still need test-core-checksum-index? Can we just time ls-files or something in t/perf? It was a convenient way to isolate, average, and compare read_index() times, but I suppose we could do something like that. I did confirm that a ls-files does show a slight 0.008 second difference on the 58K file Linux tree when toggled on or off. But I'm tempted to suggest that we just omit my helper exe and not worry about a test -- since we don't have any test repos large enough to really demonstrate the differences. My concern is that that 0.008 would be lost in the noise of the rest of the test and make for an unreliable result. Jeff
Re: [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids
On Tue, Mar 28, 2017 at 03:46:18PM -0400, Jeff King wrote: > As a result, the data we feed to the sha1 computation may be > truncated, and it's possible that a commit with a very long > filename could erroneously collide in the patch-id space > with another commit. For instance, if one commit modified > "really-long-filename/foo" and another modified "bar" in the > same directory. > > In practice this is unlikely. Because the filenames are > repeated, and because there's a single cutoff at the end of > the buffer, the offending filename would have to be on the > order of four times larger than PATH_MAX. > > But it's easy to fix by moving to a strbuf. The other obvious solution is to avoid formatting the string entirely, and just feed the pieces to the sha1 computation. Unfortunately that still involves formatting the modes (into a fixed-size buffer!) since we need the sha1 of their string representations. That patch is below for reference. I'm not sure if it's more readable or not. --- diff --git a/diff.c b/diff.c index a628ac3a9..435c734f4 100644 --- a/diff.c +++ b/diff.c @@ -4570,6 +4570,19 @@ static void patch_id_consume(void *priv, char *line, unsigned long len) data->patchlen += new_len; } +static void patch_id_add_string(git_SHA_CTX *ctx, const char *str) +{ + git_SHA1_Update(ctx, str, strlen(str)); +} + +static void patch_id_add_mode(git_SHA_CTX *ctx, unsigned mode) +{ + /* large enough for 2^32 in octal */ + char buf[12]; + int len = xsnprintf(buf, sizeof(buf), "%06o", mode); + git_SHA1_Update(ctx, buf, len); +} + /* returns 0 upon success, and writes result into sha1 */ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) { @@ -4577,7 +4590,6 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int i; git_SHA_CTX ctx; struct patch_id_t data; - char buffer[PATH_MAX * 4 + 20]; git_SHA1_Init(&ctx); memset(&data, 0, sizeof(struct patch_id_t)); @@ -4609,36 +4621,30 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, len1 = remove_space(p->one->path, strlen(p->one->path)); len2 = remove_space(p->two->path, strlen(p->two->path)); - if (p->one->mode == 0) - len1 = snprintf(buffer, sizeof(buffer), - "diff--gita/%.*sb/%.*s" - "newfilemode%06o" - "---/dev/null" - "+++b/%.*s", - len1, p->one->path, - len2, p->two->path, - p->two->mode, - len2, p->two->path); - else if (p->two->mode == 0) - len1 = snprintf(buffer, sizeof(buffer), - "diff--gita/%.*sb/%.*s" - "deletedfilemode%06o" - "---a/%.*s" - "+++/dev/null", - len1, p->one->path, - len2, p->two->path, - p->one->mode, - len1, p->one->path); - else - len1 = snprintf(buffer, sizeof(buffer), - "diff--gita/%.*sb/%.*s" - "---a/%.*s" - "+++b/%.*s", - len1, p->one->path, - len2, p->two->path, - len1, p->one->path, - len2, p->two->path); - git_SHA1_Update(&ctx, buffer, len1); + patch_id_add_string(&ctx, "diff--git"); + patch_id_add_string(&ctx, "a/"); + git_SHA1_Update(&ctx, p->one->path, len1); + patch_id_add_string(&ctx, "b/"); + git_SHA1_Update(&ctx, p->two->path, len2); + + if (p->one->mode == 0) { + patch_id_add_string(&ctx, "newfilemode"); + patch_id_add_mode(&ctx, p->two->mode); + patch_id_add_string(&ctx, "---/dev/null"); + patch_id_add_string(&ctx, "+++b/"); + git_SHA1_Update(&ctx, p->two->path, len2); + } else if (p->two->mode == 0) { + patch_id_add_string(&ctx, "deletedfilemode"); + patch_id_add_mode(&ctx, p->one->mode); + patch_id_add_string(&ctx, "---a/"); + git_SHA1_Update(&ctx, p->one->path, len1); + patch_id_add_string(&c
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 12:40:29PM -0700, Junio C Hamano wrote: > > Here's that minor tweak, in case anybody is interested. It's less useful > > without that follow-on that touches "eol" more, but perhaps it increases > > readability on its own. > > Yup, the only thing that the original (with Brian's fix) appears to > be more careful about is it tries very hard to avoid setting boc > past eoc. As we are not checking "boc != eoc" but doing the > comparison, that "careful" appearance does not give us any benefit > in practice, other than having to do an extra "eol ? eol+1 : eoc"; > the result of this patch is easier to read. > > By the way, eoc is "one past the end" of the array that begins at > boc, so setting a pointer to eoc+1 may technically be in violation. > I do not know how much it matters, though ;-) I think that is OK. We are reading a strbuf, so eoc must either be the first character of the PGP signature, or the terminating NUL if there was no signature block[1]. So it's actually _inside_ the array, and eoc+1 is our "one past". -Peff [1] Arguably we should bail when parse_signature() does not find a PGP signature at all. We already bail with "malformed push certificate" when there are other syntactic anomalies.
Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output
On Tue, Mar 28, 2017 at 12:03 PM, Junio C Hamano wrote: > Jacob Keller writes: > >> I'm really not a fan of how the ws code ended up. It seems pretty ugly >> and weird to hack in the expand_tabs stuff here. However, I'm really not >> sure how else I could handle this. Additionally, I'm not 100% sure >> whether this interacts with format-patch or other machinery which may >> well want some way to be excluded. Thoughts? > > As long as you do the same as "do we color the output? no, no, we > are format-patch and must not color" logic to refrain from expanding > the tabs, you should be OK. > >> I think there also may be some wonky bits when performing the tab >> expansion during whitespace checks, due to the way we expand, because I >> don't think that the tabexpand function takes into account the "current" >> location when adding a string, so it very well may not be correct I >> am unsure if there is a good way to fix this. > > This "feature" is limited to the diff output, so one way may be to > leave the code as-is and pipe the output to a filter that is similar > to /usr/bin/expand but knows that the first column is special (this > is the part that "this is limited to diff" kicks in). You may even > be able to implement it as a new option to "expand(1)" and then > people who aren't Git users would also benefit. > That makes more sense. I'll take a look at that. It might even be possible to modify the pager setup so that it does that as part of its process. Thanks, Jake
Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread
On Tue, Mar 28, 2017 at 03:50:34PM -0400, Jeff Hostetler wrote: > It was a convenient way to isolate, average, and compare > read_index() times, but I suppose we could do something > like that. > > I did confirm that a ls-files does show a slight 0.008 > second difference on the 58K file Linux tree when toggled > on or off. Yeah, I agree it helps isolate the change. I'm just not sure we want to carry a bunch of function-specific perf-testing code. And one of the nice things about testing a real command is that it's...a real command. So it's an actual improvement a user might see. > But I'm tempted to suggest that we just omit my helper exe > and not worry about a test -- since we don't have any test > repos large enough to really demonstrate the differences. > My concern is that that 0.008 would be lost in the noise > of the rest of the test and make for an unreliable result. Yeah, I think that would be fine. You _could_ write a t/perf test and then use your 400MB monstrosity as GIT_PERF_LARGE_REPO. But given that most people don't have such a thing, there's not much value over you just showing off the perf improvement in the commit message. We could also have a t/perf test that generates a monstrous index and shows that it's faster. But frankly, I don't think this is all that interesting as a performance regression test. It's not like there's something subtle about the performance improvement; we stopped computing the SHA-1, and (gasp!) it takes exactly one SHA-1 computation's less time. So just mentioning the test case and the improvement in the commit message is sufficient, IMHO. -Peff
Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output
On Tue, Mar 28, 2017 at 01:05:40PM -0700, Jacob Keller wrote: > On Tue, Mar 28, 2017 at 12:03 PM, Junio C Hamano wrote: > > Jacob Keller writes: > > > >> I'm really not a fan of how the ws code ended up. It seems pretty ugly > >> and weird to hack in the expand_tabs stuff here. However, I'm really not > >> sure how else I could handle this. Additionally, I'm not 100% sure > >> whether this interacts with format-patch or other machinery which may > >> well want some way to be excluded. Thoughts? > > > > As long as you do the same as "do we color the output? no, no, we > > are format-patch and must not color" logic to refrain from expanding > > the tabs, you should be OK. > > > >> I think there also may be some wonky bits when performing the tab > >> expansion during whitespace checks, due to the way we expand, because I > >> don't think that the tabexpand function takes into account the "current" > >> location when adding a string, so it very well may not be correct I > >> am unsure if there is a good way to fix this. > > > > This "feature" is limited to the diff output, so one way may be to > > leave the code as-is and pipe the output to a filter that is similar > > to /usr/bin/expand but knows that the first column is special (this > > is the part that "this is limited to diff" kicks in). You may even > > be able to implement it as a new option to "expand(1)" and then > > people who aren't Git users would also benefit. > > > > That makes more sense. I'll take a look at that. It might even be > possible to modify the pager setup so that it does that as part of its > process. This is similar to how I use diff-highlight, which is basically: [pager] log = diff-highlight | less show = diff-highlight | less diff = diff-highlight | less I've wanted to move diff-highlight inside git, but ran into ugly conflicts with the whitespace-marking code as well. Something like: git show b16a991c1be5681b4b673d4343dfcc0c2f5ad498 | perl -pe 's/^(.)(\t+)/$1 . (" " x (length($2) * 8))/e' But sticking it in your pager pipeline is tough, because you actually need to skip over the color bytes when they are present. You can see in diff-highlight how this is handled. That also means an option to something like "expand" is tough, because "first character" really means "first non-ANSI-code character". -Peff
Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output
Jeff King writes: > That also means an option to something like "expand" is tough, because > "first character" really means "first non-ANSI-code character". That is true, but once you commit to the mindset that you are extending "expand", then that becomes a mere part of what must be done, i.e. if you want your "expand" to be able to handle coloured input, you'd need to know how wide each segment of the input is. For that matter, you would also want your "expand" to pay attention to the differences between display- vs byte-widths of a string, perhaps reusing utf8_strwidth() or something. Also "the first character is special" does not have to be a "diff specific hack". Your extended "expand" can take a list of tab-widths and the special case for highlighting diff output happens to take 9,8 as the "list of tab-widths" parameter (whose semantics is that each number in the list tells how wide the tab is, and the last number repeats forever, so 9,8 really means 9,8,8,8,8,...).
Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD
On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe wrote: > FreeBSD implements getcwd(3) as a syscall, but falls back to a version > based on readdir(3) if it fails for some reason. The latter requires > permissions to read and execute path components, while the former does > not. That means that if our buffer is too small and we're missing > rights we could get EACCES, but we may succeed with a bigger buffer. > > Keep retrying if getcwd(3) indicates lack of permissions until our > buffer can fit PATH_MAX bytes, as that's the maximum supported by the > syscall on FreeBSD anyway. This way we do what we can to be able to > benefit from the syscall, but we also won't loop forever if there is a > real permission issue. Sorry to be late and maybe I missed something obvious, but the above and the patch seem complex to me compared with something like: diff --git a/strbuf.c b/strbuf.c index ace58e7367..25eadcbedc 100644 --- a/strbuf.c +++ b/strbuf.c @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint) int strbuf_getcwd(struct strbuf *sb) { size_t oldalloc = sb->alloc; - size_t guessed_len = 128; + size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128; for (;; guessed_len *= 2) { strbuf_grow(sb, guessed_len);
Re: [PATCH 7/7] submodule.c: correctly handle nested submodules in is_submodule_modified
On Mon, Mar 27, 2017 at 2:46 PM, Junio C Hamano wrote: > Stefan Beller writes: > >> When a nested submodule has untracked files, it would be reported as >> "modified submodule" in the superproject, because submodules are not >> parsed correctly in is_submodule_modified as they are bucketed into >> the modified pile as "they are not an untracked file". > > I cannot quite parse the above. I tried to describe the example Jonathan gave in his reply in a shorter form. I'll >> + /* regular unmerged and renamed files */ >> + if (buf.buf[5] == 'S' && buf.buf[8] == 'U') >> + /* nested untracked file */ >> + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > > OK, we have untracked one. > >> + if (memcmp(buf.buf + 5, "S..U", 4)) >> + /* other change */ >> + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; > > And for other cases like C at buf.buf[6] or M at buf.buf[7], > i.e. where the submodule not just has untracked files but has real > changes, we say it is truly MODIFIED here. > > If there are changes to paths that is not a submodule but a tracked > file in the submodule in question would have N at buf.buf[5] and is > also caught with the same "not S..U so that's MODIFIED" logic. > > OK. ok, thanks for checking. > > Shouldn't this done as part of 4/7 where is_submodule_modified() > starts reading from the porcelain v2 output? I did that in an earlier version of the series. However the change from porcelain=1 to 2 should not be observable by the end user. > 4/7 does adjust for > the change from double question mark (porcelain v1) to a single one > for untracked, but at the same time it needs to prepare for these > 'u' (unmerged), '1' (normal modification) and '2' (mods with rename) > to appear in the output, no? No, up to patch 5/7 we only had refactors, no user visible changes intended. And until then we had "has untracked files" and "everything else". The nice part of the conversion was to cover the "everything else" part easily. This patch transforms it into "has untracked files or submodule reports untracked files (possibly nested)" and "everything else", but the former is more complicated to compute. > IOW, with 4/7 and 7/7 done as separate steps, isn't the system > broken between these steps? No, see Jonathans answer. We could argue, that 6/7 and 7/7 done as separate steps is broken, because since 6/7 we promise to report untracked files inside submodules as " ?", but we do not report them properly for nested submodules. Suppose I have a superproject 'parent', with two submodules 'parent/sub' and 'parent/sub1'. 'parent/sub' itself contains a submodule 'parent/sub/subsub'. Now suppose I run, from within 'parent': echo hi >sub/subsub/stray-file echo hi >sub1/stray-file with patches 1..5: git status --short M sub M sub1 patch 6: git status --short m sub ? sub1 with patch 7: git status --short ? sub ? sub1 The documentation update in patch 6 is unclear what to expect from nested submodules, so that documentation is technically not wrong, but maybe misleading. I want to resend patch 7 with a documentation update as well as tests. Thanks, Stefan
Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD
On Tue, Mar 28, 2017 at 2:15 PM, Christian Couder wrote: > On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe wrote: >> FreeBSD implements getcwd(3) as a syscall, but falls back to a version >> based on readdir(3) if it fails for some reason. The latter requires >> permissions to read and execute path components, while the former does >> not. That means that if our buffer is too small and we're missing >> rights we could get EACCES, but we may succeed with a bigger buffer. >> >> Keep retrying if getcwd(3) indicates lack of permissions until our >> buffer can fit PATH_MAX bytes, as that's the maximum supported by the >> syscall on FreeBSD anyway. This way we do what we can to be able to >> benefit from the syscall, but we also won't loop forever if there is a >> real permission issue. > > Sorry to be late and maybe I missed something obvious, but the above > and the patch seem complex to me compared with something like: > > diff --git a/strbuf.c b/strbuf.c > index ace58e7367..25eadcbedc 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char > *path, size_t hint) > int strbuf_getcwd(struct strbuf *sb) > { > size_t oldalloc = sb->alloc; > - size_t guessed_len = 128; > + size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128; > > for (;; guessed_len *= 2) { > strbuf_grow(sb, guessed_len); >From f22a76e911 (strbuf: add strbuf_getcwd(), 2014-07-28): Because it doesn't use a fixed-size buffer it supports arbitrarily long paths, provided the platform's getcwd() does as well. At least on Linux and FreeBSD it handles paths longer than PATH_MAX just fine. So with your patch, we'd still see the original issue for paths > PATH_MAX IIUC. Thanks, Stefan
Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD
On Tue, Mar 28, 2017 at 11:24 PM, Stefan Beller wrote: > On Tue, Mar 28, 2017 at 2:15 PM, Christian Couder > wrote: >> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe wrote: >>> FreeBSD implements getcwd(3) as a syscall, but falls back to a version >>> based on readdir(3) if it fails for some reason. The latter requires >>> permissions to read and execute path components, while the former does >>> not. That means that if our buffer is too small and we're missing >>> rights we could get EACCES, but we may succeed with a bigger buffer. >>> >>> Keep retrying if getcwd(3) indicates lack of permissions until our >>> buffer can fit PATH_MAX bytes, as that's the maximum supported by the >>> syscall on FreeBSD anyway. This way we do what we can to be able to >>> benefit from the syscall, but we also won't loop forever if there is a >>> real permission issue. >> >> Sorry to be late and maybe I missed something obvious, but the above >> and the patch seem complex to me compared with something like: >> >> diff --git a/strbuf.c b/strbuf.c >> index ace58e7367..25eadcbedc 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char >> *path, size_t hint) >> int strbuf_getcwd(struct strbuf *sb) >> { >> size_t oldalloc = sb->alloc; >> - size_t guessed_len = 128; >> + size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128; >> >> for (;; guessed_len *= 2) { >> strbuf_grow(sb, guessed_len); > > From f22a76e911 (strbuf: add strbuf_getcwd(), 2014-07-28): > Because it doesn't use a fixed-size buffer it supports > arbitrarily long paths, provided the platform's getcwd() does as well. > At least on Linux and FreeBSD it handles paths longer than PATH_MAX > just fine. Well René's patch says: >>> Keep retrying if getcwd(3) indicates lack of permissions until our >>> buffer can fit PATH_MAX bytes, as that's the maximum supported by the >>> syscall on FreeBSD anyway. So it says that FreeBSD syscall doesn't handle paths longer than PATH_MAX. > So with your patch, we'd still see the original issue for paths > PATH_MAX > IIUC. Also, René's patch just adds: + if (errno == EACCES && guessed_len < PATH_MAX) + continue; so if the length of the path is > PATH_MAX, then guessed_len will have to be > PATH_MAX and then René's patch will do nothing.
Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD
On Tue, Mar 28, 2017 at 11:15:12PM +0200, Christian Couder wrote: > On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe wrote: > > FreeBSD implements getcwd(3) as a syscall, but falls back to a version > > based on readdir(3) if it fails for some reason. The latter requires > > permissions to read and execute path components, while the former does > > not. That means that if our buffer is too small and we're missing > > rights we could get EACCES, but we may succeed with a bigger buffer. > > > > Keep retrying if getcwd(3) indicates lack of permissions until our > > buffer can fit PATH_MAX bytes, as that's the maximum supported by the > > syscall on FreeBSD anyway. This way we do what we can to be able to > > benefit from the syscall, but we also won't loop forever if there is a > > real permission issue. > > Sorry to be late and maybe I missed something obvious, but the above > and the patch seem complex to me compared with something like: > > diff --git a/strbuf.c b/strbuf.c > index ace58e7367..25eadcbedc 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char > *path, size_t hint) > int strbuf_getcwd(struct strbuf *sb) > { > size_t oldalloc = sb->alloc; > - size_t guessed_len = 128; > + size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128; > > for (;; guessed_len *= 2) { > strbuf_grow(sb, guessed_len); I think the main reason is just that we do not have to pay the price to allocate PATH_MAX-sized buffers when they are rarely used. I doubt it matters all that much in practice, though. -Peff
Git hackathon New York / London - call for mentors
Bloomberg would like to host a Git hackathon over a weekend in both New York and London, towards the end of April or the beginning of May. Crucial to the success of the weekend will be having mentors available in both locations who can guide people on the project. Mentors should have some experience with developing for Git and should be familiar with the process and guidelines around contributing. If you are interested in being a mentor or have further questions, then please get in contact with me via email (either to this address or to cbaile...@bloomberg.net) letting me know whether you are closer to New York or London and if you have any date restrictions. Charles. --- Git was the first project for which we hosted an "Open Source Day" and since then we've learned a lot and would like to revisit Git again. The event will involve volunteers who are usually competent programmers but who don't necessarily have experience with contributing to Git, working to contribute to the project over two days. Typically the type of tasks tackled would include documentation improvements, test case improvements and very simple bug fixes that have previously been identified as "low hanging fruit".
Re: [PATCH 0/18] snprintf cleanups
Jeff King writes: > It's a lot of patches, but hopefully they're all pretty straightforward > to read. Yes, quite a lot of changes. I didn't see anything questionable in there. As to the "patch-id" thing, I find the alternate one slightly easier to read. Also, exactly because this is not a performance critical codepath, it may be better if patch_id_add_string() filtered out whitespaces; that would allow the source to express things in more natural way, e.g. patch_id_addf(&ctx, "new file mode"); patch_id_addf(&ctx, "%06o", p->two->mode); patch_id_addf(&ctx, "--- /dev/null"); patch_id_addf(&ctx, "+++ b/%.*s", len2, p->two->path); Or I may be going overboard by bringing "addf" into the mix X-<.
[PATCH 1/2] short status: improve reporting for submodule changes
If I add an untracked file to a submodule or modify a tracked file, currently "git status --short" treats the change in the same way as changes to the current HEAD of the submodule: $ git clone --quiet --recurse-submodules https://gerrit.googlesource.com/gerrit $ echo hello >gerrit/plugins/replication/stray-file $ sed -i -e 's/.*//' gerrit/plugins/replication/.mailmap $ git -C gerrit status --short M plugins/replication This is by analogy with ordinary files, where "M" represents a change that has not been added yet to the index. But this change cannot be added to the index without entering the submodule, "git add"-ing it, and running "git commit", so the analogy is counterproductive. Introduce new status letters " ?" and " m" for this. These are similar to the existing "??" and " M" but mean that the submodule (not the parent project) has new untracked files and modified files, respectively. The user can use "git add" and "git commit" from within the submodule to add them. Changes to the submodule's HEAD commit can be recorded in the index with a plain "git add -u" and are shown with " M", like today. To avoid excessive clutter, show at most one of " ?", " m", and " M" for the submodule. They represent increasing levels of change --- the last one that applies is shown (e.g., " m" if there are both modified files and untracked files in the submodule, or " M" if the submodule's HEAD has been modified and it has untracked files). While making these changes, we need to make sure to not break porcelain level 1, which shares code with "status --short". We only change "git status --short". Non-short "git status" and "git status --porcelain=2" already handle these cases by showing more detail: $ git -C gerrit status --porcelain=2 1 .M S.MU 16 16 16 305c864db28eb0c77c8499bc04c87de3f849cf3c 305c864db28eb0c77c8499bc04c87de3f849cf3c plugins/replication $ git -C gerrit status [...] modified: plugins/replication (modified content, untracked content) Scripts caring about these distinctions should use --porcelain=2. Helped-by: Jonathan Nieder Signed-off-by: Stefan Beller Reviewed-by: Jonathan Nieder --- Documentation/git-status.txt | 9 t/t3600-rm.sh| 18 +--- t/t7506-status-submodule.sh | 102 +++ wt-status.c | 17 +++- 4 files changed, 139 insertions(+), 7 deletions(-) diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index ba873657cf..01b457c322 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -181,6 +181,13 @@ in which case `XY` are `!!`. ! !ignored - +Submodules have more state and instead report + Mthe submodule has a different HEAD than +recorded in the index + mthe submodule has modified content + ?the submodule has untracked files + + If -b is used the short-format status is preceded by a line ## branchname tracking info @@ -210,6 +217,8 @@ field from the first filename). Third, filenames containing special characters are not specially formatted; no quoting or backslash-escaping is performed. +Any submodule changes are reported as modified `M` instead of `m` or single `?`. + Porcelain Format Version 2 ~~ diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index 5aa6db584c..a6e5c5bd56 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -268,6 +268,14 @@ cat >expect.modifiedactual && @@ -436,7 +444,7 @@ test_expect_success 'rm of a populated submodule with untracked files fails unle test -d submod && test -f submod/.git && git status -s -uno --ignore-submodules=none >actual && - test_cmp expect.modified actual && + test_cmp expect.modified_untracked actual && git rm -f submod && test ! -d submod && git status -s -uno --ignore-submodules=none >actual && @@ -621,7 +629,7 @@ test_expect_success 'rm of a populated nested submodule with different nested HE test -d submod && test -f submod/.git && git status -s -uno --ignore-submodules=none >actual && - test_cmp expect.modified actual && + test_cmp expect.modified_inside actual && git rm -f submod && test ! -d submod && git status -s -uno --ignore-submodules=none >actual && @@ -636,7 +644,7 @@ test_expect_success 'rm of a populated nested submodule with nested modification test -d submod && test -f submod/.git && git status -s
[PATCH 2/2] submodule.c: correctly handle nested submodules in is_submodule_modified
Suppose I have a superproject 'super', with two submodules 'super/sub' and 'super/sub1'. 'super/sub' itself contains a submodule 'super/sub/subsub'. Now suppose I run, from within 'super': echo hi >sub/subsub/stray-file echo hi >sub1/stray-file Currently we get would see the following output in git-status: git status --short m sub ? sub1 With this patch applied, the untracked file in the nested submodule is turned into an untracked file on the 'super' level as well. git status --short ? sub ? sub1 This doesn't change the output of 'git status --porcelain=1' for nested submodules, because its output is always ' M' for either untracked files or local modifications no matter the nesting level of the submodule. 'git status --porcelain=2' is affected by this change in a nested submodule, though. Without this patch it would report the direct submodule as modified and having no untracked files. With this patch it would report untracked files. Chalk this up as a bug fix. Signed-off-by: Stefan Beller --- Documentation/git-status.txt | 2 ++ submodule.c | 23 +-- t/t3600-rm.sh| 2 +- t/t7506-status-submodule.sh | 2 +- 4 files changed, 25 insertions(+), 4 deletions(-) diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index 01b457c322..452c6eb875 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -187,6 +187,8 @@ Submodules have more state and instead report mthe submodule has modified content ?the submodule has untracked files +Note that 'm' and '?' are applied recursively, e.g. if a nested submodule +in a submodule contains an untracked file, this is reported as '?' as well. If -b is used the short-format status is preceded by a line diff --git a/submodule.c b/submodule.c index fa21c7bb72..730cc9513a 100644 --- a/submodule.c +++ b/submodule.c @@ -1078,8 +1078,27 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) /* regular untracked files */ if (buf.buf[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - else - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + + if (buf.buf[0] == 'u' || + buf.buf[0] == '1' || + buf.buf[0] == '2') { + /* +* T XY : +* T = line type, XY = status, = submodule state +*/ + if (buf.len < 1 + 1 + 2 + 1 + 4) + die("BUG: invalid status --porcelain=2 line %s", + buf.buf); + + /* regular unmerged and renamed files */ + if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + /* nested untracked file */ + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (memcmp(buf.buf + 5, "S..U", 4)) + /* other change */ + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh index a6e5c5bd56..b58793448b 100755 --- a/t/t3600-rm.sh +++ b/t/t3600-rm.sh @@ -659,7 +659,7 @@ test_expect_success 'rm of a populated nested submodule with nested untracked fi test -d submod && test -f submod/.git && git status -s -uno --ignore-submodules=none >actual && - test_cmp expect.modified_inside actual && + test_cmp expect.modified_untracked actual && git rm -f submod && test ! -d submod && git status -s -uno --ignore-submodules=none >actual && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index fd057751df..4d6d8f6817 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -346,7 +346,7 @@ test_expect_success 'status with untracked file in nested submodule (porcelain=2 git -C super status --porcelain=2 >output && awk -f suppress_hashes.awk output >output2 && diff output2 - <<-\EOF - 1 .M S.M. 16 16 16 HASH HASH sub1 + 1 .M S..U 16 16 16 HASH HASH sub1 1 .M S..U 16 16 16 HASH HASH sub2 EOF ' -- 2.12.1.438.g67623a8358
[PATCH v8 0/7] short status: improve reporting for submodule changes
v8: * This is a resend of the last two patches, i.e. these two patches apply at 5c896f7c3ec (origin/sb/submodule-short-status^^) * below is a diff of this patch series against origin/sb/submodule-short-status * add tests showing the subtle bug fix in case of nesting. * add a bit of documentation v7: previous work at https://public-inbox.org/git/20170325003610.15282-1-sbel...@google.com/ diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt index 01b457c322..452c6eb875 100644 --- a/Documentation/git-status.txt +++ b/Documentation/git-status.txt @@ -187,6 +187,8 @@ Submodules have more state and instead report mthe submodule has modified content ?the submodule has untracked files +Note that 'm' and '?' are applied recursively, e.g. if a nested submodule +in a submodule contains an untracked file, this is reported as '?' as well. If -b is used the short-format status is preceded by a line diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index ab822c79e6..4d6d8f6817 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -327,20 +327,65 @@ test_expect_success 'setup superproject with untracked file in nested submodule' git add sub1 && git commit -a -m "update sub1 to contain nested sub" ) && - echo untracked >super/sub1/sub2/untracked + echo "{ \$7=\"HASH\"; \$8=\"HASH\"; print }" >suppress_hashes.awk && + echo "suppress_hashes.awk" >>.git/info/exclude && + echo "output2" >>.git/info/exclude && + echo content >super/sub1/sub2/file && + echo content >super/sub2/file ' test_expect_success 'status with untracked file in nested submodule (porcelain)' ' git -C super status --porcelain >output && diff output - <<-\EOF M sub1 +M sub2 + EOF +' + +test_expect_success 'status with untracked file in nested submodule (porcelain=2)' ' + git -C super status --porcelain=2 >output && + awk -f suppress_hashes.awk output >output2 && + diff output2 - <<-\EOF + 1 .M S..U 16 16 16 HASH HASH sub1 + 1 .M S..U 16 16 16 HASH HASH sub2 EOF ' test_expect_success 'status with untracked file in nested submodule (short)' ' git -C super status --short >output && diff output - <<-\EOF -? sub1 +m sub1 +? sub2 + EOF +' + +test_expect_success 'setup superproject with modified file in nested submodule' ' + git -C super/sub1/sub2 add file && + git -C super/sub2 add file +' + +test_expect_success 'status with added file in nested submodule (porcelain)' ' + git -C super status --porcelain >output && + diff output - <<-\EOF +M sub1 +M sub2 + EOF +' + +test_expect_success 'status with added file in nested submodule (porcelain=2)' ' + git -C super status --porcelain=2 >output && + awk -f suppress_hashes.awk output >output2 && + diff output2 - <<-\EOF + 1 .M S.M. 16 16 16 HASH HASH sub1 + 1 .M S.M. 16 16 16 HASH HASH sub2 + EOF +' + +test_expect_success 'status with added file in nested submodule (short)' ' + git -C super status --short >output && + diff output - <<-\EOF +m sub1 +m sub2 EOF '
Git fails to build on Ubuntu Server 16.04
I configured with --enable-pthreads, and LIBS included -lpthread. $ make V=1 gcc -I/usr/local/include -g -O2 -I. -DHAVE_ALLOCA_H -I/usr/local/include -DUSE_CURL_FOR_IMAP_SEND -I/usr/local/include -I/usr/local/include -DHAVE_PATHS_H -DHAVE_STRINGS_H -DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM -DSHA1_HEADER='' -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"' -o git-credential-store -Wl,-rpath,/usr/local/lib -L/usr/local/lib credential-store.o common-main.o libgit.a xdiff/lib.a -L/usr/local/lib -Wl,-rpath,/usr/local/lib -lz -L/usr/local/lib -Wl,-rpath,/usr/local/lib -lcrypto -lrt /usr/bin/ld: libgit.a(run-command.o): undefined reference to symbol 'pthread_sigmask@@GLIBC_2.2.5' //lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status Makefile:2053: recipe for target 'git-credential-store' failed make: *** [git-credential-store] Error 1 If the makefile is modified after 'make configure': find `pwd` -name Makefile -exec sed -i 's|-lrt|-lrt -lpthread|g' {} \; Then Git builds successfully. It appears the request to use pthreads is not being honored. Thanks.
Re: Proposal for "fetch-any-blob Git protocol" and server design
+cc Ben Peart, who sent "[RFC] Add support for downloading blobs on demand" to the list recently. This proposal here seems like it has the same goal, so maybe your review could go a long way here? Thanks, Stefan On Tue, Mar 14, 2017 at 3:57 PM, Jonathan Tan wrote: > As described in "Background" below, there have been at least 2 patch sets to > support "partial clones" and on-demand blob fetches, where the server part > that supports on-demand blob fetches was treated at least in outline. Here > is a proposal treating that server part in detail. > > == Background > > The desire for Git to support (i) missing blobs and (ii) fetching them as > needed from a remote repository has surfaced on the mailing list a few > times, most recently in the form of RFC patch sets [1] [2]. > > A local repository that supports (i) will be created by a "partial clone", > that is, a clone with some special parameters (exact parameters are still > being discussed) that does not download all blobs normally downloaded. Such > a repository should support (ii), which is what this proposal describes. > > == Design > > A new endpoint "server" is created. The client will send a message in the > following format: > > > fbp-request = PKT-LINE("fetch-blob-pack") > 1*want > flush-pkt > want = PKT-LINE("want" SP obj-id) > > > The client may send one or more SHA-1s for which it wants blobs, then a > flush-pkt. > > The server will then reply: > > > server-reply = flush-pkt | PKT-LINE("ERR" SP message) > > > If there was no error, the server will then send them in a packfile, > formatted like described in "Packfile Data" in pack-protocol.txt with > "side-band-64k" enabled. > > Any server that supports "partial clone" will also support this, and the > client will automatically assume this. (How a client discovers "partial > clone" is not covered by this proposal.) > > The server will perform reachability checks on requested blobs through the > equivalent of "git rev-list --use-bitmap-index" (like "git upload-pack" when > using the allowreachablesha1inwant option), unless configured to suppress > reachability checks through a config option. The server administrator is > highly recommended to regularly regenerate the bitmap (or suppress > reachability checks). > > === Endpoint support for forward compatibility > > This "server" endpoint requires that the first line be understood, but will > ignore any other lines starting with words that it does not understand. This > allows new "commands" to be added (distinguished by their first lines) and > existing commands to be "upgraded" with backwards compatibility. > > === Related improvements possible with new endpoint > > Previous protocol upgrade suggestions have had to face the difficulty of > allowing updated clients to discover the server support while not slowing > down (for example, through extra network round-trips) any client, whether > non-updated or updated. The introduction of "partial clone" allows clients > to rely on the guarantee that any server that supports "partial clone" > supports "fetch-blob-pack", and we can extend the guarantee to other > protocol upgrades that such repos would want. > > One such upgrade is "ref-in-want" [3]. The full details can be obtained from > that email thread, but to summarize, the patch set eliminates the need for > the initial ref advertisement and allows communication in ref name globs, > making it much easier for multiple load-balanced servers to serve large > repos to clients - this is something that would greatly benefit the Android > project, for example, and possibly many others. > > Bundling support for "ref-in-want" with "fetch-blob-pack" simplifies matters > for the client in that a client needs to only handle one "version" of server > (a server that supports both). If "ref-in-want" were added later, instead of > now, clients would need to be able to handle two "versions" (one with only > "fetch-blob-pack" and one with both "fetch-blob-pack" and "ref-in-want"). > > As for its implementation, that email thread already contains a patch set > that makes it work with the existing "upload-pack" endpoint; I can update > that patch set to use the proposed "server" endpoint (with a > "fetch-commit-pack" message) if need be. > > == Client behavior > > This proposal is concerned with server behavior only, but it is useful to > envision how the client would use this to ensure that the server behavior is > useful. > > === Indication to use the proposed endpoint > > The client will probably already record that at least one of its remotes > (the one that it successfully performed a "partial clone" from) supports > this new endpoint (if not, it can’t determine whether a missing blob was > caused by repo corruption or by the "partial clone"). This knowledge can be > used both to know that the server supports "fetch-blob-pack" and > "fetch-commit-pack" (for the latter, the client can fall back to > "fetch-pack"/
Re: [PATCH 1/2] short status: improve reporting for submodule changes
Hi, Stefan Beller wrote: [...] > +++ b/t/t7506-status-submodule.sh [...] > @@ -287,4 +311,82 @@ test_expect_success 'diff --submodule with merge > conflict in .gitmodules' ' > test_cmp diff_submodule_actual diff_submodule_expect > ' > > +test_expect_success 'setup superproject with untracked file in nested > submodule' ' > + ( > + cd super && > + git clean -dfx && > + rm .gitmodules && > + git submodule add -f ./sub1 && > + git submodule add -f ./sub2 && > + git commit -a -m "messy merge in superproject" && > + ( > + cd sub1 && > + git submodule add ../sub2 && > + git commit -a -m "add sub2 to sub1" > + ) && > + git add sub1 && > + git commit -a -m "update sub1 to contain nested sub" > + ) && > + echo "{ \$7=\"HASH\"; \$8=\"HASH\"; print }" >suppress_hashes.awk && > + echo "suppress_hashes.awk" >>.git/info/exclude && I had some trouble following what suppress_hashes.awk does at first. Some other examples that could be worth imitating: - diff-lib.sh has some sanitize_diff functions using 'sed' - t4202 has a sanitize_output functino, also using 'sed' - grepping for $_x40 finds some other examples (these will be fun to change when the hash function changes, but at least they're easy to find). The main general idea I have in mind is that providing a function at the top of the file for the test to use instead of a script that interacts with what git is tracking can make things easier to understand. Also, could there be a comment with a diagram describing the setup (sub1 vs sub2, etc)? [...] > +test_expect_success 'status with untracked file in nested submodule > (porcelain)' ' > + git -C super status --porcelain >output && > + diff output - <<-\EOF > + M sub1 > + M sub2 > + EOF > +' > + > +test_expect_success 'status with untracked file in nested submodule > (porcelain=2)' ' > + git -C super status --porcelain=2 >output && > + awk -f suppress_hashes.awk output >output2 && > + diff output2 - <<-\EOF > + 1 .M S.M. 16 16 16 HASH HASH sub1 > + 1 .M S..U 16 16 16 HASH HASH sub2 > + EOF > +' > + > +test_expect_success 'status with untracked file in nested submodule (short)' > ' > + git -C super status --short >output && > + diff output - <<-\EOF > + m sub1 > + ? sub2 > + EOF > +' > + > +test_expect_success 'setup superproject with modified file in nested > submodule' ' > + git -C super/sub1/sub2 add file && > + git -C super/sub2 add file > +' > + > +test_expect_success 'status with added file in nested submodule (porcelain)' > ' > + git -C super status --porcelain >output && > + diff output - <<-\EOF > + M sub1 > + M sub2 > + EOF > +' > + > +test_expect_success 'status with added file in nested submodule > (porcelain=2)' ' > + git -C super status --porcelain=2 >output && > + awk -f suppress_hashes.awk output >output2 && > + diff output2 - <<-\EOF > + 1 .M S.M. 16 16 16 HASH HASH sub1 > + 1 .M S.M. 16 16 16 HASH HASH sub2 > + EOF > +' > + > +test_expect_success 'status with added file in nested submodule (short)' ' > + git -C super status --short >output && > + diff output - <<-\EOF > + m sub1 > + m sub2 > + EOF How does ordinary non --short "git status" handle these cases? What should happen when there is a new untracked repository within a submodule? What should happen if there is both a modified tracked file and an untracked file in a sub-submodule? > +' > + > test_done Very nice. Thanks --- this was exactly what I was looking for. The rest looks good. Sincerely, Jonathan
Re: [PATCH 2/2] submodule.c: correctly handle nested submodules in is_submodule_modified
Stefan Beller wrote: > Suppose I have a superproject 'super', with two submodules 'super/sub' > and 'super/sub1'. 'super/sub' itself contains a submodule > 'super/sub/subsub'. Now suppose I run, from within 'super': > > echo hi >sub/subsub/stray-file > echo hi >sub1/stray-file > > Currently we get would see the following output in git-status: > > git status --short > m sub > ? sub1 > > With this patch applied, the untracked file in the nested submodule is > turned into an untracked file on the 'super' level as well. s/turned into/displayed as/ > git status --short > ? sub > ? sub1 > > This doesn't change the output of 'git status --porcelain=1' for nested > submodules, because its output is always ' M' for either untracked files > or local modifications no matter the nesting level of the submodule. > > 'git status --porcelain=2' is affected by this change in a nested > submodule, though. Without this patch it would report the direct submodule > as modified and having no untracked files. With this patch it would report > untracked files. Chalk this up as a bug fix. I think that's reasonable, and the documentation does a good job of describing it. Does this have any effect on the default mode of 'git status' (without --short or --porcelain)? [...] > --- a/Documentation/git-status.txt > +++ b/Documentation/git-status.txt > @@ -187,6 +187,8 @@ Submodules have more state and instead report > mthe submodule has modified content > ?the submodule has untracked files > > +Note that 'm' and '?' are applied recursively, e.g. if a nested submodule > +in a submodule contains an untracked file, this is reported as '?' as well. Language nits: * Can simplify by leaving out "Note that ". * s/, e\.g\./. For example,/ Should this say a brief word about rationale? The obvious way to describe it would involve "git add --recurse-submodules", which alas doesn't exist yet. But we could say this is reported as '?' as well, since the change cannot be added using "git add -u" from within any of the submodules. [...] > --- a/submodule.c > +++ b/submodule.c > @@ -1078,8 +1078,27 @@ unsigned is_submodule_modified(const char *path, int > ignore_untracked) > /* regular untracked files */ > if (buf.buf[0] == '?') > dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > - else > - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; > + > + if (buf.buf[0] == 'u' || > + buf.buf[0] == '1' || > + buf.buf[0] == '2') { > + /* > + * T XY : > + * T = line type, XY = status, = submodule state > + */ > + if (buf.len < 1 + 1 + 2 + 1 + 4) optional: Can shorten: /* T = line type, XY = status, = submodule state */ if (buf.len < strlen("T XY ")) ... That produces the same code at run time because compilers are able to inline the strlen of a constant. What you already have also seems sensible, though. [...] > + die("BUG: invalid status --porcelain=2 line %s", > + buf.buf); > + > + /* regular unmerged and renamed files */ > + if (buf.buf[5] == 'S' && buf.buf[8] == 'U') > + /* nested untracked file */ > + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > + > + if (memcmp(buf.buf + 5, "S..U", 4)) > + /* other change */ > + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; sanity check: What does this do for a "2" line indicating a sub-submodule that has been renamed that contains an untracked file? Do we need to rely on some other indication to show this as a change? Enumerating some more cases, since I keep finding myself getting lost: - if the HEAD commit of "sub" changes, we show this as " M sub". What should we show if the HEAD commit of "sub/subsub" changes? I think this should be " m". - if "sub" is renamed, we show this as "R sub -> newname". What should we show if "sub/subsub" is renamed? It is tempting to show this as " m". - if "sub" is deleted, we show this as "D sub". What should we show if "sub/subsub" is deleted? I think this is " m". It keeps getting more complicated, but I think this is making sense. Thanks and hope that helps, Jonathan
Re: [PATCH v3 2/5] setup: allow for prefix to be passed to git commands
On Mon, Mar 20, 2017 at 3:34 PM, Brandon Williams wrote: > That the gist of how I'm hoping to solve the problem. Hopefully that was > clear enough to get some feedback on. Junio wrote in "What's cooking in git.git (Mar 2017, #10; Fri, 24)" > I saw no particular issues myself. Do others find this series good > (not just "meh--it is harmless" but I want to hear a positive "yes > these are good changes")? So I reviewed them again, and I think they are good to go. As a followup we may want to consider this diff --git a/setup.c b/setup.c index 56cd68ba93..bdc294091a 100644 --- a/setup.c +++ b/setup.c @@ -944,6 +944,10 @@ const char *setup_git_directory_gently(int *nongit_ok) prefix = setup_git_directory_gently_1(nongit_ok); env_prefix = getenv(GIT_TOPLEVEL_PREFIX_ENVIRONMENT); +if (prefix && env_prefix) +die("BUG: can't invoke command in sub directory with %s set", +GIT_TOPLEVEL_PREFIX_ENVIRONMENT); + if (env_prefix) prefix = env_prefix; -- Thanks, Stefan
Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *
On Tue, Mar 28, 2017 at 10:27:41AM -0700, Junio C Hamano wrote: > "brian m. carlson" writes: > > > Convert the callers to pass struct object_id by changing the function > > declaration and definition and applying the following semantic patch: > > > > @@ > > expression E1, E2, E3; > > @@ > > - sha1_array_append(E1, E2[E3].hash) > > + sha1_array_append(E1, E2 + E3) > > > > @@ > > expression E1, E2; > > @@ > > - sha1_array_append(E1, E2.hash) > > + sha1_array_append(E1, &E2) > > I noticed something similar in the change to bisect.c while reading > the previous step, and I suspect that the above two rules leave > somewhat inconsistent and harder-to-read result. Wouldn't it make > the result more readable if the former rule were > > -sha1_array_append(E1, E2[E3].hash) > +sha1_array_append(E1, &E2[E3]) > > > FWIW, the bit that made me read it twice in the previous step was > this change > > - strbuf_addstr(&joined_hexs, sha1_to_hex(array->sha1[i])); > + strbuf_addstr(&joined_hexs, oid_to_hex(array->oid + i)); > > which I would have written &(array->oid[i]) instead. > > After all, the original written by a human said E2[E3].hash (or > array->sha1[i]) because to the human's mind, E2 is a series of > things that can be indexed with an int E3, and even though > > *(E2 + E3) > E2[E3] > E3[E2] > > all mean the same thing, the human decided that E2[E3] is the most > natural way to express this particular reference to an item in the > array. &E2[E3] would keep that intention by the original author > better than E2 + E3. I'm happy to make that change. I'm an experienced C programmer, so a pointer addition seems very readable and natural to me, but if you think it's better or more readable as &E2[E3], I can certainly reroll. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[PATCH v4 4/5] remove_subtree(): test removing nested directories
Test removing a nested directory when an attempt is made to restore the index to a state where it does not exist. A similar test could be found previously in t/t2000-checkout-cache-clash.sh, but it did not check for nested directories, which could allow a faulty implementation of remove_subtree() pass the tests. Signed-off-by: Daniel Ferreira --- t/t2000-checkout-cache-clash.sh | 11 +++ 1 file changed, 11 insertions(+) diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh index de3edb5..ac10ba3 100755 --- a/t/t2000-checkout-cache-clash.sh +++ b/t/t2000-checkout-cache-clash.sh @@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with --prefix' ' git checkout-index -a -f --prefix=there/ ' +test_expect_success 'git checkout-index -f should remove nested subtrees' ' + echo content >path && + git update-index --add path && + rm path && + mkdir -p path/with/nested/paths && + echo content >path/file1 && + echo content >path/with/nested/paths/file2 && + git checkout-index -f -a && + test ! -d path +' + test_done -- 2.7.4 (Apple Git-66)
[PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators
Hi there, This is the fourth version of the GSoC microproject of refactoring remove_subtree() from recursively using readdir() to use dir_iterator. Below are the threads for other versions: v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#t v2: https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t v3: https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3jbfjkhwcdv_lphkcd71zrwm...@mail.gmail.com/T/#t In this version of the patch, I followed Michael's suggestion of splitting the commits responsible for adding the new feature to dir_iterator. His suggestion of using flags instead of an options struct has also been adopted. This version also contains a test that is finally able to test the function decently (not the one Stefan had suggested, which did not result in a call to remove_subtree). I am just unsure about its location in t/. I'd appreciate suggestions to put it in a more decent home. Daniel Ferreira (5): dir_iterator: add helpers to dir_iterator_advance dir_iterator: iterate over dir after its contents remove_subtree(): reimplement using iterators remove_subtree(): test removing nested directories files_reflog_iterator: amend use of dir_iterator dir-iterator.c | 105 +++- dir-iterator.h | 14 -- entry.c | 31 refs/files-backend.c| 2 +- t/t2000-checkout-cache-clash.sh | 11 + 5 files changed, 114 insertions(+), 49 deletions(-) -- 2.7.4 (Apple Git-66)
[PATCH v4 2/5] dir_iterator: iterate over dir after its contents
Create an option for the dir_iterator API to iterate over subdirectories only after having iterated through their contents. This feature was predicted, although not implemented by 0fe5043 ("dir_iterator: new API for iterating over a directory tree", 2016-06-18). Add the "flags" parameter to dir_iterator_create, allowing for the aforementioned "depth-first" iteration mode to be enabled. Currently, the only acceptable flag is DIR_ITERATOR_DEPTH_FIRST. This is useful for recursively removing a directory and calling rmdir() on a directory only after all of its contents have been wiped. Signed-off-by: Daniel Ferreira --- dir-iterator.c | 46 ++ dir-iterator.h | 14 +++--- 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/dir-iterator.c b/dir-iterator.c index 853c040..545d333 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -48,6 +48,9 @@ struct dir_iterator_int { * that will be included in this iteration. */ struct dir_iterator_level *levels; + + /* Holds the flags passed to dir_iterator_begin(). */ + unsigned flags; }; static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) @@ -114,12 +117,14 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) } level->initialized = 1; - } else if (S_ISDIR(iter->base.st.st_mode)) { + } else if (S_ISDIR(iter->base.st.st_mode) && + !iter->flags & DIR_ITERATOR_DEPTH_FIRST) { if (level->dir_state == DIR_STATE_ITER) { /* * The directory was just iterated * over; now prepare to iterate into -* it. +* it (unless an option is set for us +* to do otherwise). */ push_dir_level(iter, level); continue; @@ -153,10 +158,27 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) de = readdir(level->dir); if (!de) { - /* This level is exhausted; pop up a level. */ + /* This level is exhausted */ if (errno) { warning("error reading directory %s: %s", iter->base.path.buf, strerror(errno)); + } else if (iter->flags & DIR_ITERATOR_DEPTH_FIRST) { + /* If we are handling dirpaths after their contents, +* we have to iterate over the directory now that we'll +* have finished iterating into it. */ + level->dir = NULL; + + if (pop_dir_level(iter, level) == 0) + return dir_iterator_abort(dir_iterator); + + level = &iter->levels[iter->levels_nr - 1]; + /* Remove a trailing slash */ + strbuf_strip_suffix(&iter->base.path, "/"); + + if (set_iterator_data(iter, level)) + continue; + + return ITER_OK; } else if (closedir(level->dir)) warning("error closing directory %s: %s", iter->base.path.buf, strerror(errno)); @@ -175,8 +197,22 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) if (set_iterator_data(iter, level)) continue; + /* +* If we want to iterate dirs after files, we shall +* begin looking into them *before* we return the dir +* itself. +*/ + if (S_ISDIR(iter->base.st.st_mode) && + iter->flags & DIR_ITERATOR_DEPTH_FIRST) { + push_dir_level(iter, level); + goto continue_outer_loop; + } + return ITER_OK; } + +continue_outer_loop: + ; } } @@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator) return ITER_DONE; } -struct dir_iterator *dir_iterator_begin(const char *path) +struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags) { struct dir_iterator_int
[PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator
Amend a call to dir_iterator_begin() to pass the flags parameter introduced in 3efb5c0 ("dir_iterator: iterate over dir after its contents", 2017-28-03). Signed-off-by: Daniel Ferreira --- refs/files-backend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 50188e9..b4bba74 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st files_downcast(ref_store, 0, "reflog_iterator_begin"); base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable); - iter->dir_iterator = dir_iterator_begin(git_path("logs")); + iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0); return ref_iterator; } -- 2.7.4 (Apple Git-66)
[PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance
Create inline helpers to dir_iterator_advance(). Make dir_iterator_advance()'s code more legible and allow some behavior to be reusable. Signed-off-by: Daniel Ferreira --- dir-iterator.c | 65 +- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/dir-iterator.c b/dir-iterator.c index 34182a9..853c040 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -50,6 +50,43 @@ struct dir_iterator_int { struct dir_iterator_level *levels; }; +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + level->dir_state = DIR_STATE_RECURSE; + ALLOC_GROW(iter->levels, iter->levels_nr + 1, + iter->levels_alloc); + level = &iter->levels[iter->levels_nr++]; + level->initialized = 0; +} + +static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + return --iter->levels_nr; +} + +static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + if (lstat(iter->base.path.buf, &iter->base.st) < 0) { + if (errno != ENOENT) + warning("error reading path '%s': %s", + iter->base.path.buf, + strerror(errno)); + return -1; + } + + /* +* We have to set these each time because +* the path strbuf might have been realloc()ed. +*/ + iter->base.relative_path = + iter->base.path.buf + iter->levels[0].prefix_len; + iter->base.basename = + iter->base.path.buf + level->prefix_len; + level->dir_state = DIR_STATE_ITER; + + return 0; +} + int dir_iterator_advance(struct dir_iterator *dir_iterator) { struct dir_iterator_int *iter = @@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) * over; now prepare to iterate into * it. */ - level->dir_state = DIR_STATE_RECURSE; - ALLOC_GROW(iter->levels, iter->levels_nr + 1, - iter->levels_alloc); - level = &iter->levels[iter->levels_nr++]; - level->initialized = 0; + push_dir_level(iter, level); continue; } else { /* @@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) * This level is exhausted (or wasn't opened * successfully); pop up a level. */ - if (--iter->levels_nr == 0) + if (pop_dir_level(iter, level) == 0) return dir_iterator_abort(dir_iterator); continue; @@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) iter->base.path.buf, strerror(errno)); level->dir = NULL; - if (--iter->levels_nr == 0) + if (pop_dir_level(iter, level) == 0) return dir_iterator_abort(dir_iterator); break; } @@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) continue; strbuf_addstr(&iter->base.path, de->d_name); - if (lstat(iter->base.path.buf, &iter->base.st) < 0) { - if (errno != ENOENT) - warning("error reading path '%s': %s", - iter->base.path.buf, - strerror(errno)); - continue; - } - /* -* We have to set these each time because -* the path strbuf might have been realloc()ed. -*/ - iter->base.relative_path = - iter->base.path.buf + iter->levels[0].prefix_len; - iter->base.basename = - iter->base.path.buf + level->prefix_len; - level->dir_state = DIR_STATE_ITER; + if (set_iterator_data(iter, level)) + continue; return ITER_OK; } -- 2.7.4 (Apple Git-66)
[PATCH v4 3/5] remove_subtree(): reimplement using iterators
Use dir_iterator to traverse through remove_subtree()'s directory tree, avoiding the need for recursive calls to readdir(). Simplify remove_subtree()'s code. A conversion similar in purpose was previously done at 46d092a ("for_each_reflog(): reimplement using iterators", 2016-05-21). Signed-off-by: Daniel Ferreira --- entry.c | 31 ++- 1 file changed, 10 insertions(+), 21 deletions(-) diff --git a/entry.c b/entry.c index c6eea24..bbebd16 100644 --- a/entry.c +++ b/entry.c @@ -2,6 +2,8 @@ #include "blob.h" #include "dir.h" #include "streaming.h" +#include "iterator.h" +#include "dir-iterator.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -46,29 +48,16 @@ static void create_directories(const char *path, int path_len, static void remove_subtree(struct strbuf *path) { - DIR *dir = opendir(path->buf); - struct dirent *de; - int origlen = path->len; - - if (!dir) - die_errno("cannot opendir '%s'", path->buf); - while ((de = readdir(dir)) != NULL) { - struct stat st; - - if (is_dot_or_dotdot(de->d_name)) - continue; - - strbuf_addch(path, '/'); - strbuf_addstr(path, de->d_name); - if (lstat(path->buf, &st)) - die_errno("cannot lstat '%s'", path->buf); - if (S_ISDIR(st.st_mode)) - remove_subtree(path); - else if (unlink(path->buf)) + struct dir_iterator *diter = dir_iterator_begin(path->buf, DIR_ITERATOR_DEPTH_FIRST); + + while (dir_iterator_advance(diter) == ITER_OK) { + if (S_ISDIR(diter->st.st_mode)) { + if (rmdir(diter->path.buf)) + die_errno("cannot rmdir '%s'", path->buf); + } else if (unlink(diter->path.buf)) die_errno("cannot unlink '%s'", path->buf); - strbuf_setlen(path, origlen); } - closedir(dir); + if (rmdir(path->buf)) die_errno("cannot rmdir '%s'", path->buf); } -- 2.7.4 (Apple Git-66)
Can't locate ExtUtils/MakeMaker.pm in @INC
This looks like the last issue with Git 2.12.2. This time the machine is Fedora 25. I configured with PERL_PATH=/usr/local/bin/perl. The local Perl was built specifically for this error, and it includes ExtUtils/MakeMaker.pm: $ find /usr/local -name MakeMaker.pm /usr/local/lib/perl5/5.24.1/ExtUtils/MakeMaker.pm $ make all ... GEN git-bisect GEN git-difftool--helper GEN git-filter-branch GEN git-merge-octopus GEN git-merge-one-file GEN git-merge-resolve GEN git-mergetool GEN git-quiltimport GEN git-rebase GEN git-request-pull GEN git-stash GEN git-submodule GEN git-web--browse SUBDIR perl /usr/bin/perl Makefile.PL PREFIX='/usr/local' INSTALL_BASE='' --localedir='/usr/local/share/locale' GEN git-p4 Can't locate ExtUtils/MakeMaker.pm in @INC (you may need to install the ExtUtils::MakeMaker module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at Makefile.PL line 3. BEGIN failed--compilation aborted at Makefile.PL line 3. Makefile:83: recipe for target 'perl.mak' failed make[1]: *** [perl.mak] Error 2 Makefile:1843: recipe for target 'perl/perl.mak' failed make: *** [perl/perl.mak] Error 2 make: *** Waiting for unfinished jobs Failed to build Git /usr/local/bin/perl is on path but Git is using the old one in /usr/bin: $ which perl /usr/local/bin/perl It appears Git is not honoring the request for the updated Perl. Thanks,
Re: Re: Microproject | Add more builtin patterns for userdiff
Stefan Beller wrote: > On Tue, Mar 28, 2017 at 3:46 AM, Pickfire wrote: > > > EOF > > > > echo '* diff="cpp"' > .gitmodules > > Did you mean .gitattributes? Yeah, that's a spelling error.
Re: Re: Re: Microproject | Add more builtin patterns for userdiff
Jacob Keller wrote: > On Tue, Mar 28, 2017 at 10:53 AM, Pickfire wrote: > > > > Yes, I can't reproduce it outside the test suite. I have added the builtin > > and yet the test fails. test_decode_color gets same output as expect but > > still it fails, should I send in the patch? > > You also need to ensure you have the exact same color settings as used > by the test scripts. > > Thanks, > Jake Yes, I used the same color, looks like the block that are failing is: test_must_fail git diff --no-index "$@" pre post >output
Re: Can't locate ExtUtils/MakeMaker.pm in @INC
On Tue, Mar 28, 2017 at 09:03:43PM -0400, Jeffrey Walton wrote: > This looks like the last issue with Git 2.12.2. This time the machine > is Fedora 25. > > I configured with PERL_PATH=/usr/local/bin/perl. The local Perl was > built specifically for this error, and it includes > ExtUtils/MakeMaker.pm: I'm not sure what "configured with PERL_PATH" means exactly. If you did: PERL_PATH=/usr/local/bin/perl ./configure then I don't think that works. The way to tell configure that you want to use a specific version of perl is with a command-line option: ./configure --with-perl=/usr/local/bin/perl When you're running make itself, you can override the default (or what was specified during configure) with: make PERL_PATH=/usr/local/bin/perl Both of the latter two work for me: $ ./configure --with-perl=/perl/from/configure [...] $ make [...] /perl/from/configure Makefile.PL PREFIX='/home/peff/local/git/master' INSTALL_BASE='' --localedir='/home/peff/local/git/master/share/locale' make[1]: /perl/from/configure: Command not found $ make PERL_PATH=/perl/from/make [...] /perl/from/make Makefile.PL PREFIX='/home/peff/local/git/master' INSTALL_BASE='' --localedir='/home/peff/local/git/master/share/locale' make[1]: /perl/from/make: Command not found Obviously those are nonsense, but they quickly show that we're using the requested version of perl. -Peff
Re: Git fails to build on Ubuntu Server 16.04
On Tue, Mar 28, 2017 at 07:17:16PM -0400, Jeffrey Walton wrote: > I configured with --enable-pthreads, and LIBS included -lpthread. > > $ make V=1 > gcc -I/usr/local/include -g -O2 -I. -DHAVE_ALLOCA_H > -I/usr/local/include -DUSE_CURL_FOR_IMAP_SEND -I/usr/local/include > -I/usr/local/include -DHAVE_PATHS_H -DHAVE_STRINGS_H -DHAVE_DEV_TTY > -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM > -DSHA1_HEADER='' -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' > -DPAGER_ENV='"LESS=FRX LV=-c"' -o git-credential-store > -Wl,-rpath,/usr/local/lib -L/usr/local/lib credential-store.o > common-main.o libgit.a xdiff/lib.a -L/usr/local/lib > -Wl,-rpath,/usr/local/lib -lz -L/usr/local/lib > -Wl,-rpath,/usr/local/lib -lcrypto -lrt > /usr/bin/ld: libgit.a(run-command.o): undefined reference to symbol > 'pthread_sigmask@@GLIBC_2.2.5' > //lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO > missing from command line > collect2: error: ld returned 1 exit status > Makefile:2053: recipe for target 'git-credential-store' failed > make: *** [git-credential-store] Error 1 Hmm. I can reproduce with: LIBS=-lpthread ./configure --enable-pthreads make I think the problem is that $LIBS is meaningful to autoconf, but not to Git's Makefile. So it tricks autoconf into writing a blank PTHREAD_LIBS variable (because it can compile a pthread program without any extra options), but the Makefile does not include $LIBS. Just doing: ./configure --enable-pthreads make works fine. So should: ./configure make which should detect pthreads. Or just: make as building with pthreads is the default on Linux. So depending on your perspective, it's either: - not a bug (because we do not advertise $LIBS as a meaningful input to the build process) - a bug that the configure script respects $LIBS at all, since it is not meaningful to the Makefile - a bug that the configure script does not propagate $LIBS into something the Makefile _does_ understand, like $EXTLIBS - a bug that the Makefile does not care about $LIBS Patches welcome for any of the latter three (I do not have an opinion myself; I don't use autoconf at all). -Peff
Re: [PATCH 0/18] snprintf cleanups
On Tue, Mar 28, 2017 at 03:33:48PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > It's a lot of patches, but hopefully they're all pretty straightforward > > to read. > > Yes, quite a lot of changes. I didn't see anything questionable in > there. > > As to the "patch-id" thing, I find the alternate one slightly easier > to read. Also, exactly because this is not a performance critical > codepath, it may be better if patch_id_add_string() filtered out > whitespaces; that would allow the source to express things in more > natural way, e.g. > > patch_id_addf(&ctx, "new file mode"); > patch_id_addf(&ctx, "%06o", p->two->mode); > patch_id_addf(&ctx, "--- /dev/null"); > patch_id_addf(&ctx, "+++ b/%.*s", len2, p->two->path); > > Or I may be going overboard by bringing "addf" into the mix X-<. I think there are two things going on in your example. One is that obviously patch_id_addf() removes the spaces from the result. But we could do that now by keeping the big strbuf_addf(), and then just walking the result and feeding non-spaces. The second is that your addf means we are back to formatting everything into a buffer again. And it has to be dynamic to handle the final line there, because "len2" isn't bounded. At which point we may as well go back to sticking it all in one big strbuf (your example also breaks it down line by line, but we could do that with separte strbuf_addf calls, too). Or you have to reimplement the printf format-parsing yourself, and write into the sha1 instead of into the buffers. But that's probably insane. I think the "no extra buffer with whitespace" combo is more like: void patch_id_add_buf(git_SHA1_CTX *ctx, const char *buf, size_t len) { for (; len > 0; buf++, len--) { if (!isspace(*buf)) git_SHA1_Update(ctx, buf, 1); } } void patch_id_add_str(git_SHA1_CTX *ctx, const char *str) { patch_id_add_buf(ctx, strlen(str)); } void patch_id_add_mode(git_SHA1_CTX *ctx, unsigned mode) { char buf[16]; /* big enough... */ int len = xsnprintf(buf, "%06o", mode); patch_id_add_buf(ctx, buf, len); } patch_id_add_str(&ctx, "new file mode"); patch_id_add_mode(&ctx, p->two->mode); patch_id_add_str(&ctx, "--- /dev/null"); patch_id_add_str(&ctx, "+++ b/"); patch_id_add_buf(&ctx, p->two->path, len2); I dunno. I wondered if feeding single bytes to the sha1 update might actually be noticeably slower, because I would assume that internally it generally copies data in larger chunks. I didn't measure it, though. -Peff
Re: [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance
On 03/29/2017 02:32 AM, Daniel Ferreira wrote: > Create inline helpers to dir_iterator_advance(). Make > dir_iterator_advance()'s code more legible and allow some behavior to > be reusable. Thanks for breaking up the patches. That makes them a lot easier to review. > Signed-off-by: Daniel Ferreira > --- > dir-iterator.c | 65 > +- > 1 file changed, 42 insertions(+), 23 deletions(-) > > diff --git a/dir-iterator.c b/dir-iterator.c > index 34182a9..853c040 100644 > --- a/dir-iterator.c > +++ b/dir-iterator.c > @@ -50,6 +50,43 @@ struct dir_iterator_int { > struct dir_iterator_level *levels; > }; > > +static inline void push_dir_level(struct dir_iterator_int *iter, struct > dir_iterator_level *level) > +{ > + level->dir_state = DIR_STATE_RECURSE; > + ALLOC_GROW(iter->levels, iter->levels_nr + 1, > +iter->levels_alloc); > + level = &iter->levels[iter->levels_nr++]; > + level->initialized = 0; > +} > + > +static inline int pop_dir_level(struct dir_iterator_int *iter, struct > dir_iterator_level *level) > +{ > + return --iter->levels_nr; > +} `pop_dir_level()` doesn't use its `level` argument; it can be removed. > [...] Michael
Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD
On Tue, Mar 28, 2017 at 11:49 PM, Jeff King wrote: > On Tue, Mar 28, 2017 at 11:15:12PM +0200, Christian Couder wrote: > >> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe wrote: >> > FreeBSD implements getcwd(3) as a syscall, but falls back to a version >> > based on readdir(3) if it fails for some reason. The latter requires >> > permissions to read and execute path components, while the former does >> > not. That means that if our buffer is too small and we're missing >> > rights we could get EACCES, but we may succeed with a bigger buffer. >> > >> > Keep retrying if getcwd(3) indicates lack of permissions until our >> > buffer can fit PATH_MAX bytes, as that's the maximum supported by the >> > syscall on FreeBSD anyway. This way we do what we can to be able to >> > benefit from the syscall, but we also won't loop forever if there is a >> > real permission issue. >> >> Sorry to be late and maybe I missed something obvious, but the above >> and the patch seem complex to me compared with something like: >> >> diff --git a/strbuf.c b/strbuf.c >> index ace58e7367..25eadcbedc 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char >> *path, size_t hint) >> int strbuf_getcwd(struct strbuf *sb) >> { >> size_t oldalloc = sb->alloc; >> - size_t guessed_len = 128; >> + size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128; >> >> for (;; guessed_len *= 2) { >> strbuf_grow(sb, guessed_len); > > I think the main reason is just that we do not have to pay the price to > allocate PATH_MAX-sized buffers when they are rarely used. > > I doubt it matters all that much in practice, though. Yeah, I can understand that. But I also wonder if René's patch relies on PATH_MAX being a multiple of 128 on FreeBSD and what would happen for a path between 129 and PATH_MAX if PATH_MAX was something like 254.