is it "git gc --prune=now" or "git gc --prune=all"?
more pedantry, but digging through "git gc", the man page reads: --prune= Prune loose objects older than date (default is 2 weeks ago, overridable by the config variable gc.pruneExpire). --prune=all prunes loose objects regardless of their age ^^^ but the code for gc.c contains a check for "now" (which actually makes more sense semantically): static void add_repack_all_option(struct string_list *keep_pack) { if (prune_expire && !strcmp(prune_expire, "now")) argv_array_push(&repack, "-a"); else { ... snip ... while the man page does not seem to mention the possible value of "now". am i misreading something? should the man page mention the possible value of "now" as opposed to "all"? rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
[PATCH] docs/git-gc: fix typo "--prune=all" to "--prune=now"
Signed-off-by: Robert P. J. Day --- diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index a7442499f6..a7c1b0f60e 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -76,7 +76,7 @@ be performed as well. --prune=:: Prune loose objects older than date (default is 2 weeks ago, overridable by the config variable `gc.pruneExpire`). - --prune=all prunes loose objects regardless of their age and + --prune=now prunes loose objects regardless of their age and increases the risk of corruption if another process is writing to the repository concurrently; see "NOTES" below. --prune is on by default. -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: [PATCH] builtin/config.c: don't print a newline with --color
On Fri, Mar 1, 2019 at 7:41 PM Taylor Blau wrote: > [...] > In this case, printing a terminating newline is not desirable. Callers > may want to print out such a configuration variable in a sub-shell in > order to color something else, in which case they certainly don't want a > newline. It's extra confusing considering that this: color=$(git config --get --type=color foo.color) echo "${color}hello" >output works as expected since $(...) swallows the newline, whereas, the case you cite: ( git config --get --type=color foo.color && echo hello ) >output does not. Illustrating the problem in the commit message by using example code like the above might (or might not) help the reader understand the issue more easily. (I had to read the prose description of the problem a couple times to properly understand it.) Not necessarily worth a re-roll. > To do what callers expect, only print a newline when the type is not > 'color', and print the escape sequence itself for an exact comparison. > > Signed-off-by: Taylor Blau > --- > diff --git a/builtin/config.c b/builtin/config.c > @@ -258,7 +258,8 @@ static int format_config(struct strbuf *buf, const char > *key_, const char *value > - strbuf_addch(buf, term); > + if (type != TYPE_COLOR) > + strbuf_addch(buf, term); The reasoning for this conditional is sufficiently subtle that it might deserve an in-code comment (though, perhaps is not worth a re-roll).
Re: Feeling confused a little bit
On 03/01, Rohit Ashiwal wrote: > Hey! > > I'm a little confused as you never provide a clear indication to > where shall I proceed? :- > > On Fri, 01 Mar 2019 11:51:46 +0900 Junio C Hamano wrote: > > I think you are talking about t3600, which uses an ancient style. > > If this were a real project, then the preferred order would be > > > > - A preliminary patch (or a series of patches) that modernizes > >existing tests in t3600. Just style updates and adding or > >removing nothing else. > > > > - Update test that use "test -f" and friends to use the helpers in > >t3600. > > > > Yes, this is a microproject after all. But I think I can work on this as > if it were a real project, should I proceed according to this plan? (I have > a lot of free time over this weekend) Yes, I think it would be good to make those changes, to try and get this merged. Having the microproject merged is not a requirement (its main purpose is to see how students communicate on the mailing list, and to get them familiar with the workflow ahead of GSoC), but it can be a nice achievement in itself. That said, I would also encourage you to start thinking about a project proposal, as that is another important part that should be done for the application. > > > > If we often see if a path is an non-empty file in our tests (not > > limited to t3600), then it may make sense to add a new helper > > test_path_is_non_empty_file in t/test-lib-functions.sh next to where > > test_path_is_file and friends are defined. > > > > Since my project does not deal with `test-lib-functions.sh`, I think I > should not edit it anyway, but I'd be more than happy to add a new > member to `test_path_is_*` family. It is up to you how far you would like to go with this. If you want to add the helper, to make further cleanups in t3600, I think that would be a good thing to do (after double checking that it would be useful in other test files as well), and should be done in a separate patch. Then you can use it in the same patch as using the helpers for "test -f" etc. > Thanks > Rohit
Re: Questions on GSoC 2019 Ideas
On 03/01, Duy Nguyen wrote: > On Fri, Mar 1, 2019 at 5:20 AM Christian Couder > wrote: > > > > Hi Matheus, > > > > On Thu, Feb 28, 2019 at 10:46 PM Matheus Tavares Bernardino > > wrote: > > > > > > I've been in the mailing list for a couple weeks now, mainly working > > > on my gsoc micro-project[1] and in other patches that derived from it. > > > I also have been contributing to the Linux Kernel for half an year, > > > but am now mainly just supporting other students here at USP. > > > > > > I have read the ideas page for the GSoC 2019 and many of them interest > > > me. Also, looking around git-dev materials on the web, I got to the > > > GSoC 2012 ideas page. And this one got my attention: > > > https://github.com/peff/git/wiki/SoC-2012-Ideas#improving-parallelism-in-various-commands > > > > > > I'm interested in parallel computing and that has been my research > > > topic for about an year now. So I would like to ask what's the status > > > of this GSoC idea. I've read git-grep and saw that it is already > > > parallel, but I was wondering if there is any other section in git in > > > which it was already considered to bring parallelism, seeking to > > > achieve greater performance. And also, if this could, perhaps, be a > > > GSoC project. > > > > I vaguely remember that we thought at one point that all the low > > hanging fruits had already been taken in this area but I might be > > wrong. > > We still have to remove some global variables, which is quite easy to > do, before one could actually add mutexes and stuff to allow multiple > pack access. I don't know though if the removing global variables is > that exciting for GSoC, or if both tasks could fit in one GSoC. The > adding parallel access is not that hard, I think, once you know > packfile.c and sha1-file.c relatively well. It's mostly dealing with > caches and all the sliding access windows safely. I'm not very familiar with what's required here, but reading the above makes me think it's likely too much for a GSoC project. I think I'd be happy with a project that declares removing the global variables as the main goal, and adding parallelism as a potential bonus. I'm a bit wary of a too large proposal here, as we've historically overestimated what kind of project is achievable over a summer (I've been there myself, as my GSoC project was also more than I was able to do in a summer :)). I'd rather have a project whose goal is rather small and can be expanded later, than having something that could potentially take more than 3 months, where the student (or their mentors) have to finish it after GSoC. > -- > Duy
[GSoC] Thanking
Hey! Thomas Thank you for replying over my woes. > > It is up to you how far you would like to go with this. If you want > to add the helper, to make further cleanups in t3600, I think that > would be a good thing to do (after double checking that it would be > useful in other test files as well), and should be done in a separate > patch. Then you can use it in the same patch as using the helpers for > "test -f" etc. > I guess I should work on this one first. I checked and around 18 test files use `test -s`, it will be useful nonetheless. > > Yes, I think it would be good to make those changes, to try and get > this merged. Having the microproject merged is not a requirement (its > main purpose is to see how students communicate on the mailing list, > and to get them familiar with the workflow ahead of GSoC), but it can > be a nice achievement in itself. > Yes, it is a nice experience to interact with people who "run" git over which most of the open source community depends for code sharing and collaboration. > > That said, I would also encourage you to start thinking about a > project proposal, as that is another important part that should be > done for the application. > That is really encouraging, I'll try to finish my work as soon as possible and work on the proposal side by side! Thanks Rohit
Sherry Mars
my lovely friend, how are you doing.my name is Sherry Mars. i want to tell you something is very important and good okay. so if you are interested contact me on this email address okay (sherrymars...@gmail.com) is very important talk.
Re: [PATCH 1/1] Makefile: use `git ls-files` to list header files, if possible
Hi Peff, On Fri, 1 Mar 2019, Jeff King wrote: > On Fri, Mar 01, 2019 at 04:54:15PM -0500, Jeff King wrote: > > > The one thing we do lose, though, is make's parallelization. It would > > probably be possible to actually shove this into a sub-make which > > defined the hdr-check rules, but I don't know how complicated that would > > become. > > This seems to work, though it's kind of horrid. > > It costs at least one extra process to run "make hdr-check", and > probably more for things like $(GIT_VERSION) that the Makefile include > likely triggers. But when you're not running hdr-check (which is the > norm), it's zero-cost. If we want to go that route (and I am not saying we should), we could easily just add another target (say, `check-headers`) that requires a list of headers to check to be passed in via a Makefile variable that is defined via the command-line. Ciao, Dscho > > -Peff > > diff --git a/Makefile b/Makefile > index c5240942f2..ab6ecf450e 100644 > --- a/Makefile > +++ b/Makefile > @@ -2735,16 +2735,8 @@ $(SP_OBJ): %.sp: %.c GIT-CFLAGS FORCE > .PHONY: sparse $(SP_OBJ) > sparse: $(SP_OBJ) > > -GEN_HDRS := command-list.h unicode-width.h > -EXCEPT_HDRS := $(GEN_HDRS) compat% xdiff% > -CHK_HDRS = $(filter-out $(EXCEPT_HDRS),$(patsubst ./%,%,$(LIB_H))) > -HCO = $(patsubst %.h,%.hco,$(CHK_HDRS)) > - > -$(HCO): %.hco: %.h FORCE > - $(QUIET_HDR)$(CC) -include git-compat-util.h -I. -o /dev/null -c -xc $< > - > -.PHONY: hdr-check $(HCO) > -hdr-check: $(HCO) > +hdr-check: > + $(MAKE) -f hdr-check.mak hdr-check > > .PHONY: style > style: > diff --git a/hdr-check.mak b/hdr-check.mak > new file mode 100644 > index 00..b8924afa90 > --- /dev/null > +++ b/hdr-check.mak > @@ -0,0 +1,12 @@ > +include Makefile > + > +GEN_HDRS := command-list.h unicode-width.h > +EXCEPT_HDRS := $(GEN_HDRS) compat% xdiff% > +CHK_HDRS = $(filter-out $(EXCEPT_HDRS),$(patsubst ./%,%,$(LIB_H))) > +HCO = $(patsubst %.h,%.hco,$(CHK_HDRS)) > + > +$(HCO): %.hco: %.h FORCE > + $(QUIET_HDR)$(CC) -include git-compat-util.h -I. -o /dev/null -c -xc $< > + > +.PHONY: hdr-check $(HCO) > +hdr-check: $(HCO) >
Re: [PATCH 1/1] Makefile: use `git ls-files` to list header files, if possible
Hi Peff, On Fri, 1 Mar 2019, Jeff King wrote: > On Fri, Mar 01, 2019 at 04:36:19PM -0500, Jeff King wrote: > > > > This has one notable consequence: we no longer include > > > `command-list.h` in `LIB_H`, as it is a generated file, not a > > > tracked one. > > > > We should be able to add back $(GENERATED_H) as appropriate. I see you > > did it for the non-computed-dependencies case. Couldn't we do the same > > for $(LOCALIZED_C) and $(CHK_HDRS)? > > Answering my own question somewhat: $(CHK_HDRS) explicitly ignores the > generated headers, so we wouldn't want to bother re-adding it there > anyway. Possibly $(LOCALIZED_C) would care, though. The generated file > does have translatable strings in it. Yes, and it is defined thusly: LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) So it does include the generated headers explictly anyway, with or without my patch. Ciao, Dscho
Re: [PATCH 1/1] Makefile: use `git ls-files` to list header files, if possible
Hi Peff, On Fri, 1 Mar 2019, Jeff King wrote: > On Fri, Mar 01, 2019 at 11:57:46AM -0800, Johannes Schindelin via > GitGitGadget wrote: > > > In d85b0dff72 (Makefile: use `find` to determine static header > > dependencies, 2014-08-25), we switched from a static list of header > > files to a dynamically-generated one, asking `find` to enumerate them. > > > > Back in those days, we did not use `$(LIB_H)` by default, and many a > > `make` implementation seems smart enough not to run that `find` command > > in that case, so it was deemed okay to run `find` for special targets > > requiring this macro. > > > > However, as of ebb7baf02f (Makefile: add a hdr-check target, > > 2018-09-19), $(LIB_H) is part of a global rule and therefore must be > > expanded. Meaning: this `find` command has to be run upon every > > `make` invocation. In the presence of many a worktree, this can tax the > > developers' patience quite a bit. > > I'm confused about this part. We don't run hdr-check by default. Why > would make need the value of $(LIB_H)? Yet empirically it does run find. > > Worse, it seems to actually run it _three times_. Once for the $(HCO) > target, once for the .PHONY, and once for the hdr-check target. I think > the .PHONY one is unavoidable (it doesn't know which names we might > otherwise be building should be marked), but the other two seem like > bugs in make (or at least pessimisations). > > It makes me wonder if we'd be better off pushing hdr-check into a > separate script. It doesn't actually use make's dependency tree in any > meaningful way. And then regular invocations wouldn't even have to pay > the `ls-files` price. > > If we are going to keep it in the Makefile, we should probably use a > ":=" rule to avoid running it three times. Good point. > > Even in the absence of worktrees or other untracked files and > > directories, the cost of I/O to generate that list of header files is > > simply a lot larger than a simple `git ls-files` call. > > > > Therefore, just like in 335339758c (Makefile: ask "ls-files" to list > > source files if available, 2011-10-18), we now prefer to use `git > > ls-files` to enumerate the header files to enumerating them via `find`, > > falling back to the latter if the former failed (which would be the case > > e.g. in a worktree that was extracted from a source .tar file rather > > than from a clone of Git's sources). > > That seems reasonable (regardless of whether it is in a script or in the > Makefile). Another option is to use -maxdepth, but that involves > guessing how deep people might actually put header files. It would also fail to work when somebody clones an unrelated repository that contains header files in the top-level directory into the Git worktree. I know somebody like that: me. > > This has one notable consequence: we no longer include `command-list.h` > > in `LIB_H`, as it is a generated file, not a tracked one. > > We should be able to add back $(GENERATED_H) as appropriate. I see you > did it for the non-computed-dependencies case. Couldn't we do the same > for $(LOCALIZED_C) and $(CHK_HDRS)? As you figured out, CHK_HDRS *specifically* excludes the generated headers, and as I pointed out: LOCALIZED_C includes $(GENERATED_H) explicitly. So I think we're good on that front. > > Likewise, we no longer include not-yet-tracked header files in `LIB_H`. > > I think that's probably OK. It does potentially make developing new patches more challenging. I, for one, do not immediately stage every new file I add, especially not header files. But then, I rarely recompile after only editing such a new header file (I would more likely modify also the source file that includes that header). So while I think this issue could potentially cause problems only *very* rarely, I think that it would be a really terribly confusing thing if it happened. But I probably worry too much about it? Ciao, Dscho
Re: [PATCH] builtin/config.c: don't print a newline with --color
Hi Taylor, On Fri, 1 Mar 2019, Taylor Blau wrote: > Invocations of 'git config ' print a trailing newline after > writing the value assigned to the given configuration variable. > > This has an unexpected interaction with 63e2a0f8e9 (builtin/config: > introduce `color` type specifier, 2018-04-09), which causes a newline to > be printed after a color's ANSI escape sequence. > > In this case, printing a terminating newline is not desirable. Callers > may want to print out such a configuration variable in a sub-shell in > order to color something else, in which case they certainly don't want a > newline. > > This bug has survived because there was never a test that would have > caught it. The old test used 'test_decode_color', which checks that its > input begins with a color, but stops parsing once it has parsed a single > color successfully. In this case, it was ignoring the trailing '\n'. Isn't the reason rather that our test compared the output to an expected text *with a newline*? IOW: > diff --git a/t/t1300-config.sh b/t/t1300-config.sh > index 428177c390..ec1b3a852d 100755 > --- a/t/t1300-config.sh > +++ b/t/t1300-config.sh > @@ -907,9 +907,8 @@ test_expect_success 'get --expiry-date' ' > test_expect_success 'get --type=color' ' > rm .git/config && > git config foo.color "red" && > - git config --get --type=color foo.color >actual.raw && > - test_decode_color actual && > - echo "" >expect && This should do the right thing if you write printf "" >expect instead? Ciao, Dscho > + printf "\\033[31m" >expect && > + git config --get --type=color foo.color >actual && > test_cmp expect actual > ' > > -- > 2.20.1 >
[PATCH 2/2] tests: introduce --stress-jobs=
From: Johannes Schindelin The --stress option currently accepts an argument, but it is confusing to at least this user that the argument does not define the maximal number of stress iterations, but instead the number of jobs to run in parallel per stress iteration. Let's introduce a separate option for that, whose name makes it more obvious what it is about, and let --stress= error out with a helpful suggestion about the two options tha could possibly have been meant. Signed-off-by: Johannes Schindelin --- t/test-lib.sh | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/t/test-lib.sh b/t/test-lib.sh index ab7f27ec6a..6e557982a2 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -142,10 +142,16 @@ do --stress) stress=t ;; --stress=*) + echo "error: --stress does not accept an argument: '$opt'" >&2 + echo "did you mean --stress-jobs=${opt#*=} or --stress-limit=${opt#*=}?" >&2 + exit 1 + ;; + --stress-jobs=*) + stress=t; stress=${opt#--*=} case "$stress" in *[!0-9]*|0*|"") - echo "error: --stress= requires the number of jobs to run" >&2 + echo "error: --stress-jobs= requires the number of jobs to run" >&2 exit 1 ;; *) # Good. -- gitgitgadget
[PATCH 0/2] tests: some touchups related to the --stress feature
If my mistake using --stress= instead of --stress-limit= is any indication, then the current options are very confusing. This is my attempt at making them less confusing. Johannes Schindelin (2): tests: let --stress-limit= imply --stress tests: introduce --stress-jobs= t/test-lib.sh | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) base-commit: 7d661e5ed16dca303d7898f5ab0cc2ffc69e0499 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-155%2Fdscho%2Fstress-test-extra-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-155/dscho/stress-test-extra-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/155 -- gitgitgadget
[PATCH 1/2] tests: let --stress-limit= imply --stress
From: Johannes Schindelin It does not make much sense that running a test with --stress-limit= seemingly ignores that option because it does not stress test at all. Signed-off-by: Johannes Schindelin --- t/test-lib.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/test-lib.sh b/t/test-lib.sh index 4e7cb52b57..ab7f27ec6a 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -153,6 +153,7 @@ do esac ;; --stress-limit=*) + stress=t; stress_limit=${opt#--*=} case "$stress_limit" in *[!0-9]*|0*|"") -- gitgitgadget
Re: t5570-git-daemon fails with SIGPIPE on OSX
Hi Peff, On Fri, 1 Mar 2019, Jeff King wrote: > diff --git a/builtin/fetch.c b/builtin/fetch.c > index b620fd54b4..4ba63d5ac6 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1556,7 +1556,9 @@ static int fetch_one(struct remote *remote, int argc, > const char **argv, int pru > > sigchain_push_common(unlock_pack_on_signal); > atexit(unlock_pack); > + sigchain_push(SIGPIPE, SIG_IGN); > exit_code = do_fetch(gtransport, &rs); > + sigchain_pop(SIGPIPE); > refspec_clear(&rs); > transport_disconnect(gtransport); > gtransport = NULL; That indeed does the job: https://git.visualstudio.com/git/_build/results?buildId=358 > That said, I actually think it's kind of pointless for git-fetch to use > SIGPIPE at all. The purpose of SIGPIPE is that you can write a naive > program that spews output, and you'll get informed (forcefully) by the > OS if the process consuming your output stops listening. That makes > sense for programs like "git log", whose primary purpose is generating > output. > > But for git-fetch, our primary purpose is receiving data and writing it > to disk. We should be checking all of our write()s already, and SIGPIPE > is just confusing. The only "big" output we generate is the status table > at the end. And even if that is going to a pipe that closes, I don't > think we'd want to fail the whole command (we'd want to finalize any > writes for what we just fetched, clean up after ourselves, etc). > > So I'd actually be fine with just declaring that certain commands (like > fetch) just ignore SIGPIPE entirely. That's a bigger change than I'm comfortable with, so I'd like to go with tge diff you gave above. Do you want to turn these two patches into a proper patch series? Otherwise I can take care of it, probably this Monday or Tuesday. Ciao, Dscho
Re: [RFC PATCH 1/4] name-rev: improve name_rev() memory usage
Hi Alban, On Fri, 1 Mar 2019, Alban Gruin wrote: > name_rev() is a recursive function. For each commit, it allocates the > name of its parents, and call itself. A parent may not use a name for > multiple reasons, but in any case, the name will not be released. On a > repository with a lot of branches, tags, remotes, and commits, it can > use more than 2GB of RAM. > > To improve the situation, name_rev() now returns a boolean to its caller > indicating if it can release the name. The caller may free the name if > the commit is too old, or if the new name is not better than the current > name. > > There a condition that will always be false here when name_rev() calls > itself for the first parent, but it will become useful when name_rev() > will stop to name commits that are not mentionned in the stdin buffer. > If the current commit should not be named, its parents may have to be, > but they may not. In this case, name_rev() will tell to its caller that > the current commit and its first parent has not used the name, and that > it can be released. However, if the current commit has been named but > not its parent, or the reverse, the name will not be released. Makes sense, and the patch looks mostly good to me, just one suggestion: > Signed-off-by: Alban Gruin > --- > builtin/name-rev.c | 23 ++- > 1 file changed, 14 insertions(+), 9 deletions(-) > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index f1cb45c227..0719a9388d 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -77,7 +77,7 @@ static int is_better_name(struct rev_name *name, > return 0; > } > > -static void name_rev(struct commit *commit, > +static int name_rev(struct commit *commit, > const char *tip_name, timestamp_t taggerdate, > int generation, int distance, int from_tag, > int deref) > @@ -86,11 +86,12 @@ static void name_rev(struct commit *commit, > struct commit_list *parents; > int parent_number = 1; > char *to_free = NULL; > + int free_alloc = 1; > > parse_commit(commit); > > if (commit->date < cutoff) > - return; > + return 1; > > if (deref) { > tip_name = to_free = xstrfmt("%s^0", tip_name); > @@ -111,9 +112,10 @@ static void name_rev(struct commit *commit, > name->generation = generation; > name->distance = distance; > name->from_tag = from_tag; > + free_alloc = 0; > } else { > free(to_free); > - return; > + return 1; > } > > for (parents = commit->parents; > @@ -131,15 +133,18 @@ static void name_rev(struct commit *commit, > new_name = xstrfmt("%.*s^%d", (int)len, > tip_name, > parent_number); > > - name_rev(parents->item, new_name, taggerdate, 0, > - distance + MERGE_TRAVERSAL_WEIGHT, > - from_tag, 0); > + if (name_rev(parents->item, new_name, taggerdate, 0, > + distance + MERGE_TRAVERSAL_WEIGHT, > + from_tag, 0)) > + free(new_name); > } else { > - name_rev(parents->item, tip_name, taggerdate, > - generation + 1, distance + 1, > - from_tag, 0); > + free_alloc &= name_rev(parents->item, tip_name, > taggerdate, > +generation + 1, distance + 1, > +from_tag, 0); This would be easier to read if it avoided the &=, e.g. by turning it into: } else if (!name_rev(parents->item, tip_name, taggerdate, generation + 1, distance + 1, from_tag, 0)) free_alloc = 0; Ciao, Dscho > } > } > + > + return free_alloc; > } > > static int subpath_matches(const char *path, const char *filter) > -- > 2.20.1 > >
How crc32 value is calculated for deltified objects
Hi all, I am trying to figure out how the crc32 value is calculated for deltified objects. It appears that the crc32 value is updated in the use() function. After calculating the header, the unpack_raw_entry() function will call unpack_entry_data(). This function inflates the delta object and calls use(). My question is, does it update the crc32 value with the deltas or just the undeltified object? And if my question implies I do not understand how this process works, can you please explain to me how the crc32 value is constructed? Thank you. --- Farhan Khan PGP Fingerprint: 1312 89CE 663E 1EB2 179C 1C83 C41D 2281 F8DA C0DE signature.asc Description: PGP signature
Re: [BUG] All files in folder are moved when cherry-picking commit that moves fewer files
Hi Junio, On Thu, Feb 28, 2019 at 6:52 PM Junio C Hamano wrote: > > As you know that I've always been skeptical to this rename-directory > business, this probably won't come as a surprise, but I seriously > think directory renames should be made opt-in, and as a separate > option, making the option three-way. I.e. > > - do we do any renames (yes/no)? > > - if we do do renames, do we base the decision only on the contents >of the thing along, or do we allow neighbouring files affect the >decision? > > That is, in addition to the traditional --renames or -M, we'd have a > separate bool --allow-directory-renames that is by default off and > is a no-op if the former is off. We've gone from always-off (ability wasn't implemented), to always on (assuming normal rename detection was on), and now we're considering making it an option, with you proposing opt-in. While I think opt-out would be a better default, we may want to first consider the range of possibilities; if more safety is needed, other choices might be safer than either opt-in or opt-out. This email reminds me a bit of the recent overlay discussion, where some of your wording made me think about directory handling, but you had smudge filters in mind. We're occasionally on different wavelengths, and I'm worried I might be mis-reading your main point and/or that my responses only address part of the issues you have in mind. Let me know if that's the case. > We had to fix a breakage recently for 3-way fallback case, and we > explained the fix as needed because the case lacks the full > information, but I think even with the full information at hand, the > rename-directory heurisitcs is inherently riskier than the content > based rename detection. > > Suppose you had F1, F2, ... in directory D1, and moved them all to > D1/D2. In the meantime, somebody else adds Fn to directory D1. It > may be likely that some variant of Fn would want to go to D1/D2, but > it also is very likely that there should be a difference between > D1/Fn somebody else had, and the contents of D1/D2/Fn in the merge > result. Perhaps D1/F1 in your preimage used to refer to another > path in the top-level directory as "../top", but the reference would > have been rewritten to "../../top" when you moved D1/F1 to D1/D2/F1, > and the person doing the merge should at least check if D1/Fn that > comes from the other party needs a similar adjustment while merged. > > In the above scenario, if there were D1/Fn in _your_ preimage and > all the other party did was to make in-place modification, the story > is vastly different. Most likely you would have made most of, if > not all, the adjustment necessary for D1/Fn to sit in its new > location, while the other party kept the relative reference to other > places intact, so we can say that both parties have say in the > contents of the auto-merged result. The "since neighgours moved, > this must also want to move the same way" heuristics does not give a > chance to the party that is not aware of the move to prepare the > contents appropriate for the new location, by definition, so the > onus is on the person who merges to adjust the contents. There are a few issues here to unpack. First, in regards to your example, the very first "Limitation" imposed by the directory rename heuristics was: * If a given directory still exists on both sides of a merge, we do not consider it to have been renamed. Therefore, your (detailed) example above of "renaming" D1/* to D1/D2/* isn't something the directory rename heuristics supports (the presence of D1/D2/ implies D1/ still exists and thus could not have been renamed; this particular case is explicitly mentioned in t6043 and I brought it up a couple times during the review of the directory rename patch series with Stefan). There are too many edge/corner cases that become difficult without this rule and it was considered more likely to (negatively) surprise users, so even with directory rename heuristics on, a new file Fn added to D1/ on the other side of the merge will remain in D1/Fn after the merge. Perhaps this clarification helps assuage your worries, or maybe your example was an unfortunate one for relaying your real underlying concern. Second, you stated that you thought "rename-directory [heuristics are] inherently riskier" than not having them on, and then gave a single (though involved) example. This could be read to imply that you believe directory rename heuristics are for convenience only, that there is no risk involved with not detecting directory renames, and thus all that matters is the cost benefit of the convenience of detecting renames for users vs. whatever risks there are in "detecting" renames that aren't actually wanted by users. I do not know if you intended any of these implications at all, but I do want to point out that not detecting directory renames is not merely a convenience. While that convenience was _part_ of the reason spurring my work on that
Re: fast-import fails with case sensitive tags due to case insensitive lock files
On Fri, Mar 01, 2019 at 06:19:48AM +, Wendeborn, Jonathan wrote: > Hi, > > I have a problem with fast-import on an NTFS drive: If I try to import tags > which are identical apart from their casing a failure due to identical lock > file names occurs. > > I am running git for windows 2.15.1.2 x64 on a Windows 10 machine > (10.0.15063): > $ git --version --build-options > git version 2.15.1.windows.2 > built from commit: 5d5baf91824ec7750b103c8b7c4827ffac202feb > sizeof-long: 4 > machine: x86_64 > > MCVE: > (echo "commit refs/heads/master" && > echo "mark :1" && > echo "committer me <> 0 +" && > echo "data 0" && > echo "" && > echo "tag tag_A" && > echo "from :1" && > echo "tagger me <> 0 +" && > echo "data 0" && > echo "" && > echo "tag tag_a" && > echo "from :1" && > echo "tagger me <> 0 +" && > echo "data 0" && > echo "") | git fast-import > > Instead of having 1 commit with two tags ("tag_A" and "tag_a") I get his > error message: > Unpacking objects: 100% (4/4), done. > error: cannot lock ref 'refs/tags/tag_a': Unable to create > 'C:/tmp/.git/refs/tags/tag_a.lock': File exists. The reason you're seeing this error is because refs can be stored in the file system. In order to update a reference, Git takes a lock on it, and as you've seen, Git can't take a lock on the same reference twice. It's known that multiple references that differ only in case can't be stored in a case-insensitive file system, and there is a design for a different system (reftable) which nobody has yet implemented in Git but does not have this problem. Even if we accepted this situation in fast-import, we'd destroy one of your tags, which would be undesirable. Sometimes this happens to work because when we pack references, we store them in a file instead, which does not suffer from case-sensitivity problems. Right now, you have some choices: • Volunteer to implement reftable. • Since you're on Windows 10, set your Git repository directory as case-sensitive. • Use Windows Subsystem for Linux, which is case sensitive and creates directories with that flag (even on NTFS), to do your import. • If you control the fast-export output, adjust the arguments you pass such that the output does not contain one of the offending tags. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH] builtin/config.c: don't print a newline with --color
Taylor Blau writes: > Invocations of 'git config ' print a trailing newline after > writing the value assigned to the given configuration variable. > > This has an unexpected interaction with 63e2a0f8e9 (builtin/config: > introduce `color` type specifier, 2018-04-09), which causes a newline to > be printed after a color's ANSI escape sequence. > > In this case, printing a terminating newline is not desirable. Callers > may want to print out such a configuration variable in a sub-shell in > order to color something else, in which case they certainly don't want a > newline. > > This bug has survived because there was never a test that would have > caught it. The old test used 'test_decode_color', which checks that its > input begins with a color, but stops parsing once it has parsed a single > color successfully. In this case, it was ignoring the trailing '\n'. The output from "git config" plumbing command were designed to help people writing shell scripts Porcelain around it, so the expected use for them has always been ERR=$(git config --type=color --default=red ui.color.error) ... some time later .. echo "${ERR}this is an error message" where the first assignment will strip the final LF (i.e. the value of the $ERR variable does not have it). An interesting aspect of the above is that this is *NOT* limited to colors. Regardless of the type you are reading, be it an int or a bool, VAR=$(git config ...) will strip the trailing LF, and existing scripts people have do rely on that, i.e. when people write VAR=$(git config ...) echo "var setting is $VAR" they rely on VAR=$(...) assignment to strip trailing LF and echo to add a final LF to the string. So if we are going to change anything, the change MUST NOT single out "color". IOW, the title of the patch already tells us that it is giving a wrong solution. Whether you limit it to color or not, to Porcelain writers who are writing in shell, I suspect that the code after the proposed change will not be a huge regression. VAR=$(git config ...) assignment, when the output from the command ends without the final LF (i.e. an incomplete line), will keep the string intact, so the behaviour of these shell scripts would not change. If an existing Porcelain script were written in Perl and uses chop to strip the last LF coming out of "git config", however, the proposed change WILL BREAK such a script. Needless to say, "using chop in Perl is wrong to begin with" misses the point from two directions---(1) 'chop in Perl' is a mere example---scripts not written in Perl using chop may still rely on the existing behaviour that the output always has the final LF, and (2) even if we agree that using chop in Perl is a bad idea, such a script has happily been working, and suddenly breaking it is a regression no matter what. So, I am not hugely enthused by this change, even though I am somewhat sympathetic to it, as it would help one narrow use case, i.e. "interpolation". cat <
Re: git commit --verbose shows incorrect diff when pre-commit hook is used to modify files
Fernando Chorney writes: > Hmm looks like I forgot to send my reply to this back to the mailing list. > > "Hmm, so I currently have it set to run vim as my commit editor, and > enter the message in there most of the time. I can definitely see > output from the hook into the shell before my vim editor loads up that > shows me the diff and lets me add in the commit message. This leads me > to believe that the pre-commit hook is being run before the editor > (with the diff) pops up." > > Does anybody else have any insight to this issue? I do not know if it counts as an insight, but the pre-* hooks, not limited to pre-commit, were invented for the purpose of validating the input to an operation so that the hooks can _reject_ if the outcome anticipated does not satisfy project's conventions; the purpose they were invented was not because the operations wanted to allow the input to them. So attempting to modify the tree in pre-commit hook, for example, is outside the scope of its design and I wouldn't be surprised if you get an unexpected result.
Re: [PATCH 0/1] Drop last MakeMaker reference
"Johannes Schindelin via GitGitGadget" writes: > Back when we stopped using MakeMaker, we forgot one reference... > > Johannes Schindelin (1): > mingw: drop MakeMaker reference > > config.mak.uname | 1 - > 1 file changed, 1 deletion(-) > > > base-commit: 8104ec994ea3849a968b4667d072fedd1e688642 > Published-As: > https://github.com/gitgitgadget/git/releases/tag/pr-146%2Fdscho%2Fno-perl-makemaker-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > pr-146/dscho/no-perl-makemaker-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/146 Good ;-) Thanks.
Re: [PATCH 1/1] fixup! upload-pack: refactor reading of pack-objects out
"Johannes Schindelin via GitGitGadget" writes: > From: Johannes Schindelin > > This fixes an issue pointed out by clang. > > Signed-off-by: Johannes Schindelin > --- > upload-pack.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/upload-pack.c b/upload-pack.c > index 2365b707bc..534e8951a2 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -179,7 +179,7 @@ static void create_pack_file(const struct object_array > *have_obj, >const struct string_list *uri_protocols) > { > struct child_process pack_objects = CHILD_PROCESS_INIT; > - struct output_state output_state = {0}; > + struct output_state output_state = {"\0"}; OK, as the structure looks like struct output_state { char buffer[8193]; int used; ... we obviously cannot initialize buffer[] with a single integer 0 ;-) Between "" and "\0" you wondered elsewhere, I have no strong preference, but if I were fixing it, I would probably write it as struct output_state output_state = { { 0 } }; so that readers do not even have to waste time to wonder between the two. Thanks.
Re: t5570-git-daemon fails with SIGPIPE on OSX
Jeff King writes: > But for git-fetch, our primary purpose is receiving data and writing it > to disk. We should be checking all of our write()s already, and SIGPIPE > is just confusing. Yup, sounds like a very sensible argument. > So I'd actually be fine with just declaring that certain commands (like > fetch) just ignore SIGPIPE entirely. Me too (to me, "actually be fine with" would mean "that's a larger hammer alternative, which is OK, while the base solution is fine, too"). Thanks.
Re: [PATCH 1/2] gitk: refresh the colour scheme
On Tue, Feb 26, 2019 at 12:05:34PM +0100, Andrej Shadura wrote: > The colours gitk is currently using are from the basic 16 colour > palette, and are a bit too intensive to be comfortable or pleasant > to work with. > > Adjust the main colours (commit nodes, remotes, tags and one branch > colour) to be slightly softer. > > Signed-off-by: Andrej Shadura Thanks for the patch, but I disagree. I do like the change you made to the tag colours, but the blue you have for the commit node circles is pretty blah. That needs a more definite colour. Also, the "too intensive to be comfortable or pleasant" in the commit message reflect a personal preference, and the way it is put seems to me to be too intensive to be comfortable or pleasant. Paul.
Re: [PATCH RFC 0/20] cat-file: start using formatting logic from ref-filter
Jeff King writes: > On Fri, Feb 22, 2019 at 06:50:06PM +0300, Olga Telezhnaya wrote: > >> In my opinion, it still has some issues. I mentioned all of them in >> TODOs in comments. All of them considered to be separate tasks for >> other patches. Some of them sound easy and could be great tasks for >> newbies. > > One other thing I forgot to mention: your patches ended up on the list > in jumbled order. How do you send them? Usually `send-email` would add 1 > second to the timestamp of each, so that threading mail readers sort > them as you'd expect (even if they arrive out of order due to the > vagaries of SMTP servers). Yes, the 1 second increment has served us so well in the entire life of this project, and I am finding a bit irritating that we seem to be seeing topics that are shown in jumbled order more often. I'd love to see why and get them fixed at the source eventually.
Re: [PATCH v3 1/1] worktree add: sanitize worktree names
Jeff King writes: > I agree that sanitize_refname_format() would be nice, but I'm pretty > sure it's going to end up having to duplicate many of the rules from > check_refname_format(). Which is ugly if the two ever get out of sync. > > But if we could write it in a way that keeps the actual policy logic in > one factored-out portion, I think it would be worth doing. Yeah, I do too. In the meantime, let's call v3 sufficient improvement from the current state for now and queue it.
Re: `git status -u no` suppresses modified files too.
Jeff King writes: > This is a pretty horrible UI trap. Most of the time with pathspecs we > require them to be on the right-hand side of a "--" unless the paths > actually exist in the filesystem. But then, in most of those cases we're > making sure they're not ambiguous between revisions and paths. So maybe > it's overkill here. I dunno. But the patch below certainly makes what's > going on in your case less subtle: > > $ git status -u no > fatal: no: no such path in the working tree. > Use 'git -- ...' to specify paths that do not exist locally. Yeah. It was a mistake that we gave a short-and-sweet "-u" too hastily to a relatively useless form that does not take any argument and ended up needing to resort to the optional-arg trick. > You do still have to figure out why it wasn't stuck to "-u", though. Sorry, but I am not sure what you mean by this comment. Your illustration patch lets you say "no, 'no' is not a pathspec" with git status -u no -- and "I want the unadorned -u and am asking about stuff in the 'no' subdirectory" with git status -u -- no but in the former, it would not make 'no' an arg to '-u' by saying "'no' is not a pathspec", would it? > --- > diff --git a/builtin/commit.c b/builtin/commit.c > index 2986553d5f..7177d7d82f 100644 > --- a/builtin/commit.c > +++ b/builtin/commit.c > @@ -1300,6 +1300,16 @@ static int git_status_config(const char *k, const char > *v, void *cb) > return git_diff_ui_config(k, v, NULL); > } > > +static void verify_pathspec(const char *prefix, const char **argv) > +{ > + while (*argv) { > + const char *arg = *argv++; > + if (!strcmp(arg, "--")) > + return; > + verify_filename(prefix, arg, 0); > + } > +} > + > int cmd_status(int argc, const char **argv, const char *prefix) > { > static int no_renames = -1; > @@ -1351,7 +1361,7 @@ int cmd_status(int argc, const char **argv, const char > *prefix) > status_init_config(&s, git_status_config); > argc = parse_options(argc, argv, prefix, >builtin_status_options, > - builtin_status_usage, 0); > + builtin_status_usage, PARSE_OPT_KEEP_DASHDASH); > finalize_colopts(&s.colopts, -1); > finalize_deferred_config(&s); > > @@ -1362,6 +1372,7 @@ int cmd_status(int argc, const char **argv, const char > *prefix) > s.show_untracked_files == SHOW_NO_UNTRACKED_FILES) > die(_("Unsupported combination of ignored and untracked-files > arguments")); > > + verify_pathspec(prefix, argv); > parse_pathspec(&s.pathspec, 0, > PATHSPEC_PREFER_FULL, > prefix, argv);
Re: [PATCH] rebase docs: fix "gitlink" typo
Martin Ågren writes: > On Thu, 28 Feb 2019 at 03:44, Kyle Meyer wrote: > >> Change it to "linkgit" so that the reference is properly rendered. > >> have `` as direct ancestor will keep their original branch point, >> -i.e. commits that would be excluded by gitlink:git-log[1]'s >> +i.e. commits that would be excluded by linkgit:git-log[1]'s >> `--ancestry-path` option will keep their original ancestry by default. If > > Heh, I stumbled on this a few days ago, and have this exact patch in my > w-i-p series. I found it interesting that both Asciidoctor and AsciiDoc > trip on this in quite different ways. > > The patch is correct and tested by me, FWIW. Yup, thanks, both.
Re: [PATCH v13 00/27] Convert "git stash" to C builtin
Thomas Gummerer writes: > One thing that came up in the latest reviews, was to keep the stash > script intact throughout the series, and to not re-introduce it after > deleting it. I did however not do that, as that would make the > range-diff quite a bit harder to read. Of course, if you start from a suboptimal ordering and reorder to make it right, the range-diff that tries to match and compare the steps will become larger and harder to follow. What else is new? That is refusing to think clearly hiding behind tautology. > In addition removing the > script bit by bit also allowed us to find the precise commit in which > the missing 'discard_cache()' bug was introduced, which made it a bit > easier to pinpoint where the bug comes from imo. In an ideal ordering, you would do a command line parser first and dispatch the rewritten commands piece by piece to the C reimplementation while diverting the remaining unwritten parts to scripted "legacy" one. That would allow us to find the precise commit that was faulty the same way. Having said all that. I do not care too deeply about seeing this particular series done in the right order anymore. Everybody's eyes are tired of looking at it, and I do not mind to see you guys declare "nobody can review this, so let's keep it in 'next' and patch breakages up as they are found out", which seems to be happening here. We can divert our review cycles to other topics that haven't faced reviewer fatigue; they still have chance to be done right ;-).
Re: [PATCH v2 0/3] asciidoctor-extensions: fix spurious space after linkgit
"brian m. carlson" writes: > On Wed, Feb 27, 2019 at 07:17:51PM +0100, Martin Ågren wrote: >> Just like v1 [1], this v2 removes a spurious space which shows up in a >> large number of places in our manpages when Asciidoctor expands the >> linkgit:foo[bar] macro. The only difference is a new paragraph in the >> commit message of the first patch to explain why we need to explicitly >> list a file we depend on. >> >> Thanks Eric and brian for your comments on v1. >> >> [1] https://public-inbox.org/git/cover.1551123979.git.martin.ag...@gmail.com/ >> >> Martin Ågren (3): >> Documentation/Makefile: add missing xsl dependencies for manpages >> Documentation/Makefile: add missing dependency on >> asciidoctor-extensions >> asciidoctor-extensions: fix spurious space after linkgit >> >> Documentation/Makefile | 4 ++-- >> Documentation/asciidoctor-extensions.rb | 2 +- >> 2 files changed, 3 insertions(+), 3 deletions(-) > > This version looks good to me. Thanks again for getting this cleaned up. Thanks, all. Will queue.
Re: [PATCH v13 00/27] Convert "git stash" to C builtin
Thomas Gummerer writes: > As I was advocating for this series to go into 'next' without a large > refactor of this series, I'll put my money were my mouth is and try to > make the cleanups and fixes required, though without trying to avoid > further external process calls, or changing the series around too > much. Thanks for a well-written summary. One thing that is missing in the write-up is if you kept the same base (i.e. on top of eacdb4d24f) or rebased the series on top of somewhere else. I'd assume you built on the same base as before in the meantime (I'll know soon enough, when I sit down on a terminal and resume working on the code ;-)
Re: [BUG] All files in folder are moved when cherry-picking commit that moves fewer files
Elijah Newren writes: > Whatever we choose for the default could be tweaked by some new option > (e.g. make it less noisy or don't mark such paths as conflicted if the > user has explicitly stated their preference for or against directory > rename detection). I'm struggling to see directory rename detection > as "risky" (though I certainly feel that lack of it is and thus do not > like the opt-in option), but if others feel there is risk here, I do not think it matters what "others feel" at all. The simple fact that we are seeing a bug report that started this thread is enough to say that the heuristics kick in when the end users do not expect to and when it happens it is harder to explain than other kinds of rename detection heuristics.
Re: [BUG] completion.commands does not remove multiple commands
SZEDER Gábor wrote: [... lots of good history ...] Thanks for an excellent summary! > Note, however, that for completeness sake, if we choose to update > list_cmds_by_config() to read the repo's config as well, then we > should also update the completion script to run $(__git > --list-cmds=...), note the two leading underscores, so in case of 'git > -C over/there ' it would read 'completion.commands' from the right > repository. Thanks for pointing this out. I'll add that to my local branch for the "respect local config" case. At the moment, I think it only matters for calls where config is in the --list-cmds list. But since the fix Jeff suggested affects all --list-cmds calls, it seems prudent to adjust the 3-4 other uses of --list-cmds in the completion script. Let me know if you see a reason not to do that. Thanks again for such a nice summary, -- Todd
Re: [PATCH 1/4] built-in rebase: no need to check out `onto` twice
Phillip Wood writes: > Thanks for explaining, it all makes sense to me now It would be necessary to make sure that it all makes sense to all future readers. Are they patches good enough as-is for that, or do they need some updates before I take a look at them to pick up?
Re: Prevent reset --hard from deleting everything if one doesn't have any commits yet
Jeff King writes: > Wouldn't that mean all of the file data is available in the object > database? Unfortunately without an index, there's nothing to mark which > file was which. But `git fsck --lost-found` should copy out all of the > file content into .git/lost-found. If we had a hierachically stored index that stores subdirectory contents as a tree (or some tree-like things), "fsck --lost-found" would have been able to recover even pathnames, but with the current flat index in a fresh directory without any commit, we'd get bunch of blobs without any clue as to where each of them goes. It would be far better than nothing, but leaves room for improvement.
Re: [PATCH 2/2] tests: introduce --stress-jobs=
On Sat, Mar 2, 2019 at 4:19 PM Johannes Schindelin via GitGitGadget wrote: > The --stress option currently accepts an argument, but it is confusing > to at least this user that the argument does not define the maximal > number of stress iterations, but instead the number of jobs to run in > parallel per stress iteration. > > Let's introduce a separate option for that, whose name makes it more > obvious what it is about, and let --stress= error out with a helpful > suggestion about the two options tha could possibly have been meant. This new option probably deserves documentation in t/README alongside existing documentation for --stress and --stress-limit, and the existing --stress= documentation ought be updated. > Signed-off-by: Johannes Schindelin
Re: Questions on GSoC 2019 Ideas
On Sat, Mar 2, 2019 at 4:09 PM Thomas Gummerer wrote: > > On 03/01, Duy Nguyen wrote: > > On Fri, Mar 1, 2019 at 5:20 AM Christian Couder > > wrote: > > > > > > Hi Matheus, > > > > > > On Thu, Feb 28, 2019 at 10:46 PM Matheus Tavares Bernardino > > > wrote: > > > > > > > > I've been in the mailing list for a couple weeks now, mainly working > > > > on my gsoc micro-project[1] and in other patches that derived from it. > > > > I also have been contributing to the Linux Kernel for half an year, > > > > but am now mainly just supporting other students here at USP. > > > > > > > > I have read the ideas page for the GSoC 2019 and many of them interest > > > > me. Also, looking around git-dev materials on the web, I got to the > > > > GSoC 2012 ideas page. And this one got my attention: > > > > https://github.com/peff/git/wiki/SoC-2012-Ideas#improving-parallelism-in-various-commands > > > > > > > > I'm interested in parallel computing and that has been my research > > > > topic for about an year now. So I would like to ask what's the status > > > > of this GSoC idea. I've read git-grep and saw that it is already > > > > parallel, but I was wondering if there is any other section in git in > > > > which it was already considered to bring parallelism, seeking to > > > > achieve greater performance. And also, if this could, perhaps, be a > > > > GSoC project. > > > > > > I vaguely remember that we thought at one point that all the low > > > hanging fruits had already been taken in this area but I might be > > > wrong. > > > > We still have to remove some global variables, which is quite easy to > > do, before one could actually add mutexes and stuff to allow multiple > > pack access. I don't know though if the removing global variables is > > that exciting for GSoC, or if both tasks could fit in one GSoC. The > > adding parallel access is not that hard, I think, once you know > > packfile.c and sha1-file.c relatively well. It's mostly dealing with > > caches and all the sliding access windows safely. > > I'm not very familiar with what's required here, but reading the above > makes me think it's likely too much for a GSoC project. I think I'd > be happy with a project that declares removing the global variables as > the main goal, and adding parallelism as a potential bonus. Yeah, I think that the main issue, now that Duy found something that could be a GSoC project, is that the potential mentors are not familiar with the pack access code. It means that Matheus would probably not get a lot of help from his mentors when he would work on adding parallelism. That may not be too big a problem though if Matheus is ok to ask many technical questions on the mailing list. It seems to me that he could succeed. > I'm a bit wary of a too large proposal here, as we've historically > overestimated what kind of project is achievable over a summer (I've > been there myself, as my GSoC project was also more than I was able to > do in a summer :)). I'd rather have a project whose goal is rather > small and can be expanded later, than having something that could > potentially take more than 3 months, where the student (or their > mentors) have to finish it after GSoC. Yeah, I agree with your suggestion about a project that declares removing the global variables as the main goal, and adding parallelism as a potential bonus. One thing I am still worried about is if we are sure that adding parallelism is likely to get us a significant performance improvement or not. If the performance of this code is bounded by disk or memory access, then adding parallelism might not bring any benefit. (It could perhaps decrease performance if memory locality gets worse.) So I'd like some confirmation either by running some tests or by experienced Git developers that it is likely to be a win.