Re: Git in Outreachy Dec-Mar?
Hi Olga, On Fri, Aug 31, 2018 at 12:30 PM, Оля Тележная wrote: > Hi everyone, > > I was Outreachy intern last winter. I guess I need to speak up: I will > be happy if my feedback helps you. > At first, I want to repeat all thanks to Outreachy organizers and Git > mentors. That was unique experience and I am so proud of being a part > of this project. But, I need to say that my internship wasn't ideal. > Mentors, please do not feel guilty: I just want to improve the quality > of future internships and give some advises. Thanks a lot for this feedback! I think it can be very useful and I don't feel guilty as I think no one is really to blame in these kinds of situations. We all have to learn how we can improve. I think a part of the problem as just that the work is really very difficult for interns even if it doesn't seem that it should be so difficult which makes the intern/mentor relationship tricky. > I guess some of the problems aren't related to Git, and it's Outreachy > weak points. Please forward this email to Outreachy organizers if you > want. > > 1. The main problem of Outreachy internship is positioning. I mean, I > had strong confidence that it's an internship for newbies in > programming. All my friends had the same confidence, and that's the > reason why 2 my friends failed in the middle of the Outreachy > internship. Load was so big for them, noone explained this fact in the > beginning, noone helped with this situation during the internship. I > was thinking I could be overqualified and I took someone's place (I > had 2 other SWE internships before Outreachy). The truth is that my > skills were barely enough. Yeah, but will it be better if Outreachy scares too many people away from applying? I am not really sure. Also I think success depends not so much on the students/interns technical skills but on their willingness to ask questions and their ability to get the help the need. Maybe Outreachy and Git should state that more clearly. For a long time I thought that it could be enough to just tell students/interns that mentors are here to help, so they should not be afraid of asking even the most basic questions. But over time I have been realizing that mentors should actively try to understand what help the student/intern need. > 2. Please tell more about minimal requirements: write it down on a > landing page in the beginning and maybe repeat them in every task. I > guess it would be the same this year: good knowledge of C, gdb, Git > (as a user: intern needs to know how to work with forks, git remote, > git rebase -i, etc), Shell, base understanding of Linux terminal, > being ready to work remotely. It's good idea to mention that it's not > 100% requirement, but anyway at least 60% from the list must be > familiar. On our project page (https://git.github.io/Outreachy-15/) we tried a bit to do that, for example there are things like: Language: C Difficulty: medium to hard for each project. And there were no "easy" tasks. So yeah instead of "Language: C" we could have something like: Requirements: very good knowledge and practice of C, shell, gdb, Git, Linux terminal, ... and for difficulty we could remove "medium" and just select between "hard", "very hard" and "impossible" :-) But this could scare possible students/interns away and that's not really what we want. We think that if someone can successfully complete a micro project, they should have enough basic technical skills to get started and then we can teach them what they need. Maybe we could state that more clearly? I also think that the Git project doesn't make enough effort to be newcomer friendly. Maybe we could start by adding a document somewhere that could contain basic useful information for newcomers? Perhaps this could be based on the presentation that Peff gave at the beginning of the Bloomberg Hackathon last November? > 3. If you decide to be a mentor - at first, thanks a lot. Please be > ready to spend A LOT OF time on it. You need to explain not only the > task to your intern, but also how to split the task into subtasks, how > to look for solutions, how to work with the terminal, how to debug > better and many other questions. It's not only about solving > internship task. It's about learning something new. And I did not > mention code reviews: there would be many stupid errors and it's a > talent not to be angry about that. I think mentors are ready to do that. Often the problem is that we just don't know how we could help or how we can make the students/interns confident enough to tell us how we could help or what is blocking them. > 4. I fully sure that you need to talk with your intern by the voice. I > mean regular calls, at least once a week. It's good idea to share the > desktop and show how you are working, what are you using, etc. > Ask your intern to share the desktop: you need to feel confident that > they understand how to work with the task. Help them with the > shortcuts. The
[PATCH 0/4] un-breaking pack-objects with bitmaps
On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote: > On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Tue, Aug 21 2018, Jeff King wrote: > > > > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, > > > + const unsigned char *sha1) > > > +{ > > > + int pos; > > > + > > > + if (!bitmap_git) > > > + return 0; /* no bitmap loaded */ > > > + if (!bitmap_git->result) > > > + BUG("failed to perform bitmap walk before querying"); > > > > Some part of what calls this completely breaks pushing from the "next" > > branch when you have local bitmaps (we *really* should have some tests > > for this...). > > Yikes, thanks for reporting. I agree we need better tests here. OK, here is the fix. Since the problem is in 'next', this is done as a patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to rewind 'next' post-release anyway, we could squash it directly into 30cdc33fba from the original series. That would help later bisections from running into it, which may be worth it as it's a pretty severe breakage. Or maybe not: 1. The test suite doesn't actually fail, because it's toy repos are too small. 2. It only triggers in the real-world if you have bitmaps turned on, which are not the default. So it may not be that likely in practice to bother a hypothetical future bisecting developer. > [1] Actually, there is also prepare_bitmap_git(), but it is not really > for general use by callers. It should be made static, or better yet, > I suspect it can be folded into its callers. This actually turned out not to work. There's a caller over in pack-bitmap-write.c, and it makes things worse to try to expand the logic there. So it technically _is_ possible to have a bitmap_index without a "have" field, but it also doesn't make sense to ask about "uninteresting" objects there. You haven't done (and cannot do) a traversal on such an object. Which I think goes back to Stefan's original question: is this just a crappy API? And the answer is "yes, to some degree". There are really two uses of bitmaps: - you want to do a traverse_commit_list() walk, but faster - you want to selectively query the on-disk bitmaps (e.g., you are walking for --contains and want to ask "do we have a bitmap for this object?" Those currently use the same struct bitmap_index, but with two different constructors (prepare_bitmap_git and prepare_bitmap_walk). It probably ought to be two different ones (with the "walk" variant using the "query" variant under the hood). I've punted on that full conversion for now, but did clean up a few confusing bits. [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check The actual fix. This should get merged to next ASAP (or the original topic just reverted). [2/4]: t5310: test delta reuse with bitmaps I did this separately to give us flexibility to squash or merge quickly. But it does find Ævar's bug on a git without patch 1. [3/4]: traverse_bitmap_commit_list(): don't free result The original assert should have simply been useless, but it was the surprising behavior of this function that turned it into a bug. [4/4]: pack-bitmap: drop "loaded" flag And this is just an annoyance I ran into, which is a fallout from our conversion to using an allocated bitmap_index struct. pack-bitmap.c | 14 ++- pack-bitmap.h | 2 +- t/t5310-pack-bitmaps.sh | 93 + 3 files changed, 97 insertions(+), 12 deletions(-) -Peff
[PATCH 1/4] bitmap_has_sha1_in_uninteresting(): drop BUG check
Commit 30cdc33fba (pack-bitmap: save "have" bitmap from walk, 2018-08-21) introduced a new function for looking at the "have" side of a bitmap walk. Because it only makes sense to do so after we've finished the walk, we added an extra safety assertion, making sure that bitmap_git->result is non-NULL. However, this safety is misguided. It was trying to catch the case where we had called prepare_bitmap_walk() to give us a "struct bitmap_index", but had not yet called traverse_bitmap_commit_list() to walk it. But all of the interesting computation (including setting up the result and "have" bitmaps) happens in the first function! The latter function only delivers the result to a callback function. So the case we were worried about is impossible; if you get a non-NULL result from prepare_bitmap_walk(), then its "have" field will be fully formed. But much worse, traverse_bitmap_commit_list() actually frees the result field as it finishes. Which means that this assertion is worse than useless: it's almost guaranteed to trigger! Our test suite didn't catch this because the function isn't actually exercised at all. The only caller comes from 6a1e32d532 (pack-objects: reuse on-disk deltas for thin "have" objects, 2018-08-21), and that's triggered only when you fetch or push history that contains an object with a base that is found deep in history. Our test suite fetches and pushes either don't use bitmaps, or use too-small example repositories. But any reasonably-sized real-world push or fetch (with bitmaps) would trigger this. This patch drops the harmful assertion and tweaks the docstring for the function to make the precondition clear. The tests need to be improved to exercise this new pack-objects feature, but we'll do that in a separate commit. Signed-off-by: Jeff King --- pack-bitmap.c | 2 -- pack-bitmap.h | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index c3231ef9ef..76fd93a3de 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1128,8 +1128,6 @@ int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, if (!bitmap_git) return 0; /* no bitmap loaded */ - if (!bitmap_git->result) - BUG("failed to perform bitmap walk before querying"); if (!bitmap_git->haves) return 0; /* walk had no "haves" */ diff --git a/pack-bitmap.h b/pack-bitmap.h index c633bf5238..189dd68ad3 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -54,7 +54,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping void free_bitmap_index(struct bitmap_index *); /* - * After a traversal has been performed on the bitmap_index, this can be + * After a traversal has been performed by prepare_bitmap_walk(), this can be * queried to see if a particular object was reachable from any of the * objects flagged as UNINTERESTING. */ -- 2.19.0.rc1.549.g6ce4dc0f08
[PATCH 2/4] t5310: test delta reuse with bitmaps
Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for thin "have" objects, 2018-08-21) taught pack-objects a new optimization trick. Since this wasn't meant to change user-visible behavior, but only produce smaller packs more quickly, testing focused on t/perf/p5311. However, since people don't run perf tests very often, we should make sure that the feature is exercised in the regular test suite. This patch does so. Signed-off-by: Jeff King --- Side note: If we do squash patch 1 into the eariler series, that will invalidate the commit id mentioned in the commit message here. t/t5310-pack-bitmaps.sh | 93 + 1 file changed, 93 insertions(+) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 557bd0d0c0..af38776054 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -342,4 +342,97 @@ test_expect_success 'truncated bitmap fails gracefully' ' test_i18ngrep corrupt stderr ' +# have_delta +# +# Note that because this relies on cat-file, it might find _any_ copy of an +# object in the repository. The caller is responsible for making sure +# there's only one (e.g., via "repack -ad", or having just fetched a copy). +have_delta () { + echo $2 >expect && + echo $1 | git cat-file --batch-check="%(deltabase)" >actual && + test_cmp expect actual +} + +# Create a state of history with these properties: +# +# - refs that allow a client to fetch some new history, while sharing some old +#history with the server; we use branches delta-reuse-old and +#delta-reuse-new here +# +# - the new history contains an object that is stored on the server as a delta +#against a base that is in the old history +# +# - the base object is not immediately reachable from the tip of the old +#history; finding it would involve digging down through history we know the +#other side has +# +# This should result in a state where fetching from old->new would not +# traditionally reuse the on-disk delta (because we'd have to dig to realize +# that the client has it), but we will do so if bitmaps can tell us cheaply +# that the other side has it. +test_expect_success 'set up thin delta-reuse parent' ' + # This first commit contains the buried base object. + test-tool genrandom delta 16384 >file && + git add file && + git commit -m "delta base" && + base=$(git rev-parse --verify HEAD:file) && + + # These intermediate commits bury the base back in history. + # This becomes the "old" state. + for i in 1 2 3 4 5 + do + echo $i >file && + git commit -am "intermediate $i" || return 1 + done && + git branch delta-reuse-old && + + # And now our new history has a delta against the buried base. Note + # that this must be smaller than the original file, since pack-objects + # prefers to create deltas from smaller objects to larger. + test-tool genrandom delta 16300 >file && + git commit -am "delta result" && + delta=$(git rev-parse --verify HEAD:file) && + git branch delta-reuse-new && + + # Repack with bitmaps and double check that we have the expected delta + # relationship. + git repack -adb && + have_delta $delta $base +' + +# Now we can sanity-check the non-bitmap behavior (that the server is not able +# to reuse the delta). This isn't strictly something we care about, so this +# test could be scrapped in the future. But it makes sure that the next test is +# actually triggering the feature we want. +# +# Note that our tools for working with on-the-wire "thin" packs are limited. So +# we actually perform the fetch, retain the resulting pack, and inspect the +# result. +test_expect_success 'fetch without bitmaps ignores delta against old base' ' + test_config pack.usebitmaps false && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $ZERO_OID + ) +' + +# And do the same for the bitmap case, where we do expect to find the delta. +test_expect_success 'fetch with bitmaps can reuse old base' ' + test_config pack.usebitmaps true && + test_when_finished "rm -rf client.git" && + git init --bare client.git && + ( + cd client.git && + git config transfer.unpackLimit 1 && + git fetch .. delta-reuse-old:delta-reuse-old && + git fetch .. delta-reuse-new:delta-reuse-new && + have_delta $delta $base + ) +' + test_done -- 2.19.0.rc1.549.g6ce4dc0f08
[PATCH 3/4] traverse_bitmap_commit_list(): don't free result
Since it was introduced in fff42755ef (pack-bitmap: add support for bitmap indexes, 2013-12-21), this function has freed the result after traversing it. That is an artifact of the early days of the bitmap code, when we had a single static "struct bitmap_index". Back then, it was intended that you would do: prepare_bitmap_walk(&revs); traverse_bitmap_commit_list(&revs); Since the actual bitmap_index struct was totally behind the scenes, it was convenient for traverse_bitmap_commit_list() to clean it up, clearing the way for another traversal. But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global variable, 2018-06-07), the caller explicitly manages the bitmap_index struct itself, like this: b = prepare_bitmap_walk(&revs); traverse_bitmap_commit_list(b, &revs); free_bitmap_index(b); It no longer makes sense to auto-free the result after the traversal. If you want to do another traversal, you'd just create a new bitmap_index. And while nobody tries to call traverse_bitmap_commit_list() twice, the fact that it throws away the result might be surprising, and is better avoided. Note that in the "old" way it was possible for two walks to amortize the cost of opening the on-disk .bitmap file (since it was stored in the global bitmap_index), but we lost that in 3ae5fa0768. However, no code actually does this, so it's not worth addressing now. The solution might involve a new: reset_bitmap_walk(b, &revs); call. Or we might even attach the bitmap data to its matching packed_git struct, so that multiple prepare_bitmap_walk() calls could use it. That can wait until somebody actually has need of the optimization (and until then, we'll do the correct, unsurprising thing). Signed-off-by: Jeff King --- pack-bitmap.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 76fd93a3de..8c1af1cca2 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -848,9 +848,6 @@ void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git, OBJ_TAG, show_reachable); show_extended_objects(bitmap_git, show_reachable); - - bitmap_free(bitmap_git->result); - bitmap_git->result = NULL; } static uint32_t count_object_type(struct bitmap_index *bitmap_git, -- 2.19.0.rc1.549.g6ce4dc0f08
[PATCH 4/4] pack-bitmap: drop "loaded" flag
In the early days of the bitmap code, there was a single static bitmap_index struct that was used behind the scenes, and any bitmap-related functions could lazily check bitmap_git.loaded to see if they needed to read the on-disk data. But since 3ae5fa0768 (pack-bitmap: remove bitmap_git global variable, 2018-06-07), the caller is responsible for the lifetime of the bitmap_index struct, and we return it from prepare_bitmap_git() and prepare_bitmap_walk(), both of which load the on-disk data (or return NULL). So outside of these functions, it's not possible to have a bitmap_index for which the loaded flag is not true. Nor is it possible to accidentally pass an already-loaded bitmap_index to the loading function (which is static-local to the file). We can drop this unnecessary and confusing flag. Signed-off-by: Jeff King --- pack-bitmap.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 8c1af1cca2..40debd5e20 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -91,8 +91,6 @@ struct bitmap_index { /* Version of the bitmap index */ unsigned int version; - - unsigned loaded : 1; }; static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) @@ -306,7 +304,7 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git static int load_pack_bitmap(struct bitmap_index *bitmap_git) { - assert(bitmap_git->map && !bitmap_git->loaded); + assert(bitmap_git->map); bitmap_git->bitmaps = kh_init_sha1(); bitmap_git->ext_index.positions = kh_init_sha1_pos(); @@ -321,7 +319,6 @@ static int load_pack_bitmap(struct bitmap_index *bitmap_git) if (load_bitmap_entries_v1(bitmap_git) < 0) goto failed; - bitmap_git->loaded = 1; return 0; failed: @@ -336,7 +333,7 @@ static int open_pack_bitmap(struct bitmap_index *bitmap_git) struct packed_git *p; int ret = -1; - assert(!bitmap_git->map && !bitmap_git->loaded); + assert(!bitmap_git->map); for (p = get_packed_git(the_repository); p; p = p->next) { if (open_pack_bitmap_1(bitmap_git, p) == 0) @@ -738,7 +735,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) * from disk. this is the point of no return; after this the rev_list * becomes invalidated and we must perform the revwalk through bitmaps */ - if (!bitmap_git->loaded && load_pack_bitmap(bitmap_git) < 0) + if (load_pack_bitmap(bitmap_git) < 0) goto cleanup; object_array_clear(&revs->pending); -- 2.19.0.rc1.549.g6ce4dc0f08
Re: [PATCH 2/4] t5310: test delta reuse with bitmaps
On Sat, Sep 01, 2018 at 03:48:13AM -0400, Jeff King wrote: > Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for > thin "have" objects, 2018-08-21) taught pack-objects a new > optimization trick. Since this wasn't meant to change > user-visible behavior, but only produce smaller packs more > quickly, testing focused on t/perf/p5311. > > However, since people don't run perf tests very often, we > should make sure that the feature is exercised in the > regular test suite. This patch does so. This, by the way, is the crux of how such an obvious and severe bug made it to 'next'. The original series was tested quite extensively via t/perf and in production at GitHub. When I re-rolled v2, the only change was the addition of the assertion, so I didn't bother re-doing the perf tests, since they're slow and there wouldn't be a measurable impact. I did run the normal test suite (as I'm sure Junio did, too) as a double-check for correctness, but as we noticed, the code wasn't actually exercised there. Nor had I yet backported the revised series to the version we run at GitHub, so it hadn't been run there, either. And all of that coupled with the fact that it only triggers with bitmaps, so day-to-day use of the buggy Git (like Junio trying to push out the result ;) ) wouldn't show it. Anyway. Not that exciting, and kind of obviously dumb in retrospect. But I think it was worth analyzing to see what went wrong. If there's an immediate lesson, it is probably: add tests even for changes that aren't really user-visible to make sure the code is exercised. There may be a larger lesson about tracking code coverage, but I don't know that most general code coverage tools would have helped (any overall percentage number would be too large to move). A tool that looked at the diff and said "of the N lines you added/touched, this percent is exercised in the test suite" might have been useful. -Peff
Re: Git in Outreachy Dec-Mar?
On Fri, Aug 31, 2018 at 01:30:05PM +0300, Оля Тележная wrote: > I was Outreachy intern last winter. I guess I need to speak up: I will > be happy if my feedback helps you. > At first, I want to repeat all thanks to Outreachy organizers and Git > mentors. That was unique experience and I am so proud of being a part > of this project. But, I need to say that my internship wasn't ideal. > Mentors, please do not feel guilty: I just want to improve the quality > of future internships and give some advises. As one of those mentors, let me first say: thank you for this feedback. It's very valuable to get your honest perspective. I more or less agree with everything you said. A few specific comments: > 1. The main problem of Outreachy internship is positioning. I mean, I > had strong confidence that it's an internship for newbies in > programming. All my friends had the same confidence, and that's the > reason why 2 my friends failed in the middle of the Outreachy > internship. Load was so big for them, noone explained this fact in the > beginning, noone helped with this situation during the internship. I > was thinking I could be overqualified and I took someone's place (I > had 2 other SWE internships before Outreachy). The truth is that my > skills were barely enough. Some of this may be due to Outreachy (I'm not very familiar with the materials they use to get applicants). But we propose the individual projects, and I don't think there's anything stopping us from something that might be smaller in scope (i.e., to focus more on "soft" skills like participating in the project, and less time on system design or tricky coding). I think ideally we'd have various project options with a range of difficulties, and part of the application period could involve steering candidates to the right project. > 2. Please tell more about minimal requirements: write it down on a > landing page in the beginning and maybe repeat them in every task. I > guess it would be the same this year: good knowledge of C, gdb, Git > (as a user: intern needs to know how to work with forks, git remote, > git rebase -i, etc), Shell, base understanding of Linux terminal, > being ready to work remotely. It's good idea to mention that it's not > 100% requirement, but anyway at least 60% from the list must be > familiar. Yes, I agree that we don't really communicate the expected skills very well. That's something we should be able to fix pretty immediately for the next round. > 3. If you decide to be a mentor - at first, thanks a lot. Please be > ready to spend A LOT OF time on it. You need to explain not only the > task to your intern, but also how to split the task into subtasks, how > to look for solutions, how to work with the terminal, how to debug > better and many other questions. It's not only about solving > internship task. It's about learning something new. And I did not > mention code reviews: there would be many stupid errors and it's a > talent not to be angry about that. I'd agree with this. I think one of the biggest mistakes I made for your internship was not being focused and spending enough time. Johannes mentioned that he actually does online pair-programming with his GSoC students, and I think that would have helped a lot in our case. Ironically, I was actually worried about being _too_ involved (which is obviously dumb in retrospect). Since there were some interesting design problems, I didn't want to just dictate "here's what your design should look like, go code it and get back to me". I wanted to give you the space to explore the problem, maybe even make some mistakes, and be there to "unstick" you when you got stuck. But with basically weekly check-ins, 3 months goes by _really_ fast. I think we probably needed to be talking things through and working in real-time at least an hour a day. > 4. I fully sure that you need to talk with your intern by the voice. I > mean regular calls, at least once a week. It's good idea to share the > desktop and show how you are working, what are you using, etc. > Ask your intern to share the desktop: you need to feel confident that > they understand how to work with the task. Help them with the > shortcuts. Yeah. I think it would have helped a lot to have a real-time session where we're actually working on the problem collaboratively, and not just discussing problems you might have run into. That gives the opportunity to reveal workflow issues: the intern can see how the mentor does things (and ask "how/why did you do that neat thing?") and the mentor can see how the intern does things ("I see you're doing it this way; did you know you can also do it this way, which is easier?"). > 5. In the ideal world, I want to force all mentors to get special > courses (it will solve problems 2-3-4). Great developer is not equal > to great mentor. And, if you work with really newbie, it becomes so > necessary. I definitely agree with the "not equal" thing. It might even be inversely propor
Re: Git in Outreachy Dec-Mar?
On Fri, Aug 31, 2018 at 10:16:49AM +0200, Christian Couder wrote: > > 2. To get our landing page and list of projects in order (and also > > micro-projects for applicants). This can probably build on the > > previous round at: > > > >https://git.github.io/Outreachy-15/ > > > > and on the project/microprojects lists for GSoC (which will need > > some updating and culling). > > Ok to take a look at that. Thanks. I think sooner is better for this (for you or anybody else who's interested in mentoring). The application period opens on September 10th, but I think the (still growing) list of projects is already being looked at by potential candidates. > > 3. To figure out funding (unlike GSoC, the intern stipend comes from > > the projects). I can look into getting outside funds (which is what > > we did last year). Worst case, we do have enough project money to > > cover an intern. Last year[1] opinions were that this was a > > reasonable use of project money, but of course new opinions are > > welcome. > > I can also look at getting outside funds. > > My opinion though is that it is probably better if the Git project can > use its own fund for this, as it makes it easier for possible mentors > if they don't need to look at getting outside funds. I disagree. An internship costs more than we generally take in over the course of a year. So we would eventually run out of money doing this. I also think it doesn't need to be the mentor's responsibility to find the funding. That can be up to an "org admin", and I don't think it should be too big a deal (I had no trouble getting funding from GitHub last year, and I don't expect any this year; I just didn't want to start that process until I knew we were serious about participating). So if you (or anybody else) wants to mentor, please focus on the project list and application materials. -Peff
Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers
On 08/29, Junio C Hamano wrote: > Thomas Gummerer writes: > > > Yeah that makes sense. Maybe something like this? > > > > (replying to here to keep > > the patches in one thread) > > > > Documentation/technical/rerere.txt | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/Documentation/technical/rerere.txt > > b/Documentation/technical/rerere.txt > > index e65ba9b0c6..8fefe51b00 100644 > > --- a/Documentation/technical/rerere.txt > > +++ b/Documentation/technical/rerere.txt > > @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for > > the outer and the > > inner conflict. This is done recursively, so any number of nested > > conflicts can be handled. > > > > +Note that this only works for conflict markers that "cleanly nest". If > > +there are any unmatched conflict markers, rerere will fail to handle > > +the conflict and record a conflict resolution. > > + > > The only difference is in how the conflict ID is calculated. For the > > inner conflict, the conflict markers themselves are not stripped out > > before calculating the sha1. > > Looks good to me except for the line count on the @@ line. The > preimage ought to have 6 (not 7) lines and adding 4 new lines makes > it a 10 line postimage. I wonder who miscounted the hunk---it is > immediately followed by the signature cut mark "-- \n" and some > tools (including Emacs's patch editing mode) are known to > misinterpret it as a preimage line that was removed. Sorry about that. Yeah Emacs's patch editing mode doing that would explain it. I did a round of proof-reading in my editor, and spotted a typo. Since it was trivial to fix I just edited the patch directly, and Emacs changed the line count. Sorry about that, I'll be more careful about this in the future. > What is curious is that your 2/2 counts the preimage lines > correctly. I only added some text after the '---' line in 2/2, but did not edit the patch directly. Emacs's patch editing mode only seems to change the line numbers of the patch that's being edited, not if anything surrounding that is changed, so the line count stayed the same as what format-patch put in the file in the first place. > In any case, both patches look good. Will apply. Thanks! > Thanks.
Re: [PATCH 2/3] Add test for commit --dry-run --short.
On Thu, Aug 30, 2018 at 10:39:20PM -0700, Stephen P. Smith wrote: > diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh > index 810d4cea7..fc69da816 100755 > --- a/t/t7501-commit.sh > +++ b/t/t7501-commit.sh > @@ -682,4 +682,14 @@ test_expect_success '--dry-run with conflicts fixed from > a merge' ' > git commit -m "conflicts fixed from merge." > ' > > +test_expect_success '--dry-run --short with conflicts fixed from a merge' ' > + # setup two branches with conflicting information > + # in the same file, resolve the conflict, > + # call commit with --dry-run --short I think the last line of the comment is unnecessary: it doesn't say anything that is not obvious from the test's last line. > + rm -f test-file && > + touch testfile && That filename should be 'test-file', i.e. with a dash, shouldn't it? Anyway, if you want to truncate the file, then please use '>test-file' instead of 'rm' and 'touch'. > + git add test-file && > + git commit --dry-run --short > +' > + > test_done > -- > 2.18.0 >
Re: non-smooth progress indication for git fsck and git gc
On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote: > On Thu, Aug 16 2018, Jeff King wrote: > >> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: >> >>> I'd like to point out some minor issue observed while processing some >>> 5-object repository with many binary objects, but most are rather >>> small: >>> >>> Between the two phases of "git fsck" (checking directories and >>> checking objects) there was a break of several seconds where no >>> progress was indicated. >>> >>> During "git gc" the writing objects phase did not update for some >>> seconds, but then the percentage counter jumped like from 15% to 42%. >>> >>> I understand that updating the progress output too often can be a >>> performance bottleneck, while upating it too rarely might only bore >>> the user... ;-) >> >> We update the counter integer for every object we process, and then >> actually update the display whenever the percentage increases or a >> second has elapsed, whichever comes first. >> >> What you're seeing is likely an artifact of _what_ we're counting: >> written objects. Not all objects are the same size, so it's not uncommon >> to see thousands/sec when dealing with small ones, and then several >> seconds for one giant blob. >> >> The only way to solve that is to count bytes. We don't have a total byte >> count in most cases, and it wouldn't always make sense (e.g., the >> "Compressing objects" meter can show the same issue, but it's not really >> putting through bytes in a linear way). In some cases we do show >> transmitted size and throughput, but that's just for network operations. >> We could do the same for "gc" with the patch below. But usually >> throughput isn't all that interesting for a filesystem write, because >> bandwidth isn't the bottleneck. >> >> Possibly we could have a "half throughput" mode that counts up the total >> size written, but omits the speed indicator. That's not an unreasonable >> thing to show for a local pack, since you end up with the final pack >> size. The object counter would still be jumpy, but you'd at least have >> one number updated at least once per second as you put through a large >> blob. >> >> If you really want a smooth percentage, then we'd have to start counting >> bytes instead of objects. Two reasons we don't do that are: >> >> - we often don't know the total byte size exactly. E.g., for a >> packfile write, it depends on the result of deflating each object. >> You can make an approximation and just pretend at the end that you >> hit 100%. Or you can count the pre-deflate sizes, but then your >> meter doesn't match the bytes from the throughput counter. >> >> - for something like fsck, we're not actually writing out bytes. So I >> guess you'd be measuring "here's how many bytes of objects I >> fsck-ed". But is that on-disk compressed bytes, or decompressed >> bytes? >> >> If the former, that's only marginally better as a measure of effort, >> since delta compression means that a small number of on-disk bytes >> may require a big effort (imagine processing a 100 byte blob versus >> a 100 byte delta off of a 100MB blob). >> >> The latter is probably more accurate. But it's also not free to >> pre-generate the total. We can get the number of objects or the size >> of the packfile in constant-time, but totaling up the uncompressed >> size of all objects is O(n). So that's extra computation, but it >> also means a potential lag before we can start the progress meter. >> >> I'm also not sure how meaningful a byte count is for a user there. >> >> So there. That's probably more than you wanted to know about Git's >> progress code. I think it probably _could_ be improved by counting >> more/different things, but I also think it can be a bit of a rabbit >> hole. Which is why AFAIK nobody's really looked too seriously into >> changing it. >> >> -Peff > > This is all interesting, but I think unrelated to what Ulrich is talking > about. Quote: > > Between the two phases of "git fsck" (checking directories and > checking objects) there was a break of several seconds where no > progress was indicated > > I.e. it's not about the pause you get with your testcase (which is > certainly another issue) but the break between the two progress bars. > > Here's a test case you can clone: > https://github.com/avar/2015-04-03-1M-git (or might already have > "locally" :) > > If you fsck this repository it'll take around (on my spinning rust > server) 30 seconds between 100% of "Checking object directories" before > you get any output from "Checking objects". > > The breakdown of that is (this is from approximate eyeballing): > > * We spend 1-3 seconds just on this: > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-check.c#L181 > > * We spend the majority of the ~30s on this: > > https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack-
Re: non-smooth progress indication for git fsck and git gc
On Sat, Sep 01 2018, Ævar Arnfjörð Bjarmason wrote: > On Thu, Aug 16 2018, Ævar Arnfjörð Bjarmason wrote: > >> On Thu, Aug 16 2018, Jeff King wrote: >> >>> On Thu, Aug 16, 2018 at 08:54:25AM +0200, Ulrich Windl wrote: >>> I'd like to point out some minor issue observed while processing some 5-object repository with many binary objects, but most are rather small: Between the two phases of "git fsck" (checking directories and checking objects) there was a break of several seconds where no progress was indicated. During "git gc" the writing objects phase did not update for some seconds, but then the percentage counter jumped like from 15% to 42%. I understand that updating the progress output too often can be a performance bottleneck, while upating it too rarely might only bore the user... ;-) >>> >>> We update the counter integer for every object we process, and then >>> actually update the display whenever the percentage increases or a >>> second has elapsed, whichever comes first. >>> >>> What you're seeing is likely an artifact of _what_ we're counting: >>> written objects. Not all objects are the same size, so it's not uncommon >>> to see thousands/sec when dealing with small ones, and then several >>> seconds for one giant blob. >>> >>> The only way to solve that is to count bytes. We don't have a total byte >>> count in most cases, and it wouldn't always make sense (e.g., the >>> "Compressing objects" meter can show the same issue, but it's not really >>> putting through bytes in a linear way). In some cases we do show >>> transmitted size and throughput, but that's just for network operations. >>> We could do the same for "gc" with the patch below. But usually >>> throughput isn't all that interesting for a filesystem write, because >>> bandwidth isn't the bottleneck. >>> >>> Possibly we could have a "half throughput" mode that counts up the total >>> size written, but omits the speed indicator. That's not an unreasonable >>> thing to show for a local pack, since you end up with the final pack >>> size. The object counter would still be jumpy, but you'd at least have >>> one number updated at least once per second as you put through a large >>> blob. >>> >>> If you really want a smooth percentage, then we'd have to start counting >>> bytes instead of objects. Two reasons we don't do that are: >>> >>> - we often don't know the total byte size exactly. E.g., for a >>> packfile write, it depends on the result of deflating each object. >>> You can make an approximation and just pretend at the end that you >>> hit 100%. Or you can count the pre-deflate sizes, but then your >>> meter doesn't match the bytes from the throughput counter. >>> >>> - for something like fsck, we're not actually writing out bytes. So I >>> guess you'd be measuring "here's how many bytes of objects I >>> fsck-ed". But is that on-disk compressed bytes, or decompressed >>> bytes? >>> >>> If the former, that's only marginally better as a measure of effort, >>> since delta compression means that a small number of on-disk bytes >>> may require a big effort (imagine processing a 100 byte blob versus >>> a 100 byte delta off of a 100MB blob). >>> >>> The latter is probably more accurate. But it's also not free to >>> pre-generate the total. We can get the number of objects or the size >>> of the packfile in constant-time, but totaling up the uncompressed >>> size of all objects is O(n). So that's extra computation, but it >>> also means a potential lag before we can start the progress meter. >>> >>> I'm also not sure how meaningful a byte count is for a user there. >>> >>> So there. That's probably more than you wanted to know about Git's >>> progress code. I think it probably _could_ be improved by counting >>> more/different things, but I also think it can be a bit of a rabbit >>> hole. Which is why AFAIK nobody's really looked too seriously into >>> changing it. >>> >>> -Peff >> >> This is all interesting, but I think unrelated to what Ulrich is talking >> about. Quote: >> >> Between the two phases of "git fsck" (checking directories and >> checking objects) there was a break of several seconds where no >> progress was indicated >> >> I.e. it's not about the pause you get with your testcase (which is >> certainly another issue) but the break between the two progress bars. >> >> Here's a test case you can clone: >> https://github.com/avar/2015-04-03-1M-git (or might already have >> "locally" :) >> >> If you fsck this repository it'll take around (on my spinning rust >> server) 30 seconds between 100% of "Checking object directories" before >> you get any output from "Checking objects". >> >> The breakdown of that is (this is from approximate eyeballing): >> >> * We spend 1-3 seconds just on this: >> >> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21
Re: Possible bug: identical lines added/removed in git diff
On Fri, Aug 31, 2018 at 2:39 PM Johannes Schindelin wrote: > > Hi, > > On Thu, 30 Aug 2018, Stefan Beller wrote: > > > On Thu, Aug 30, 2018 at 12:20 PM Jeff King wrote: > > > > > > [...] Myers does not promise to find the absolute minimal diff. [...] > > > > The `Myers` (our default) diff algorithm is really the Myers algorithm + > > a heuristic that cuts off the long tail when it is very costly to compute > > the minimal diff. > > IIRC Myers (the original, which we spell `minimal`) > promises minimal diffs only for -U0. As soon as you add > context, Myers might in fact end up with a suboptimal diff, even without > that heuristic. ah, the last 4 words made it clear. I have debated the cost function for diffs some time ago and came to the conclusion that having one line added/removed costing a flat price of 1 in the search of the 'lowest cost diff' is pretty good as it does an ok job in the broad world, it is not 'overfitting' assumptions on a problem. For example in some patches I pay more attention to the lines removed than to the lines added, or vice versa, so assigning different costs to added/removed lines would make sense. (It depend on the type of patch, but that cannot be deduced at the low level of diff machinery) Starting a new hunk (i.e. add cost to -U for n >0) could be costly. In fact we have an after-Myers optimization in the xdiff code that checks if hunks can be coalesced together, but if we could have that cost at diff computation time, this might make for better diffs already. Another example is number of changes between added/removed/context parts. If I have to review only one large added part and one large removed part, it may be more understandable than having interleaved adds/removes. (I think --patience goes in that direction, but does optimize for a different metric) Stefan
Re: [PATCH v3 05/11] t0027: make hash size independent
On Fri, Aug 31, 2018 at 02:40:12PM -0400, Eric Sunshine wrote: > On Fri, Aug 31, 2018 at 2:21 PM Torsten Bögershausen wrote: > > Out of interest: why do we use a "tmp" file here? > > Would it make more sense to chain the 'tr' with 'sed' and skip the > > tmp file ? > > > > tr '\015\000abcdef0123456789' QN0 <"$2" | > > sed -e "s/*/$ZERO_OID/" >"$exp" && > > > > Yes, we will loose the exit status of 'tr', I think. > > How important is the exit status ? > > As far as I understand, it is only Git commands for which we worry > about losing the exit status upstream in pipes. System utilities, on > the other hand, are presumed to be bug-free, thus we don't mind having > them upstream. If that's the case, that's fine, and I can make that change. I know that we do often want to preserve the exit status of a system command, but presumably the tr and sed here would exit 0, so I'm happy to assume that for the test. > A different question is why does this need to run both 'tr' and 'sed' > when 'sed itself could do the entire job since 'sed' has 'tr' > functionality built in (see sed's "y" command)? It doesn't. I went for a minimal change, but I could switch to using both s/// and y/// in sed instead. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 2/4] t5310: test delta reuse with bitmaps
On Sat, Sep 01 2018, Jeff King wrote: > On Sat, Sep 01, 2018 at 03:48:13AM -0400, Jeff King wrote: > >> Commit 6a1e32d532 (pack-objects: reuse on-disk deltas for >> thin "have" objects, 2018-08-21) taught pack-objects a new >> optimization trick. Since this wasn't meant to change >> user-visible behavior, but only produce smaller packs more >> quickly, testing focused on t/perf/p5311. >> >> However, since people don't run perf tests very often, we >> should make sure that the feature is exercised in the >> regular test suite. This patch does so. > > This, by the way, is the crux of how such an obvious and severe bug made > it to 'next'. > > The original series was tested quite extensively via t/perf and in > production at GitHub. When I re-rolled v2, the only change was the > addition of the assertion, so I didn't bother re-doing the perf tests, > since they're slow and there wouldn't be a measurable impact. > > I did run the normal test suite (as I'm sure Junio did, too) as a > double-check for correctness, but as we noticed, the code wasn't > actually exercised there. > > Nor had I yet backported the revised series to the version we run at > GitHub, so it hadn't been run there, either. > > And all of that coupled with the fact that it only triggers with > bitmaps, so day-to-day use of the buggy Git (like Junio trying to push > out the result ;) ) wouldn't show it. > > Anyway. Not that exciting, and kind of obviously dumb in retrospect. But > I think it was worth analyzing to see what went wrong. If there's an > immediate lesson, it is probably: add tests even for changes that aren't > really user-visible to make sure the code is exercised. Test-wise, isn't the problem rather that that we didn't have something like what's described in t/README as "Running tests with special setups" for bitmaps? I.e. stuff like GIT_TEST_SPLIT_INDEX=, or running it with GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all to stress the fsmonitor code. That comment b.t.w. is not meant as a "you should have done that!" blame, but just musings on how we could make things better. Git has things like bitmaps, midx, commit graph, and probably a few other things I'm forgetting which all have their own tests, but really fall more in the category of something like the split index in that they can potentially impact every test in some unexpected way. So we could add some option to the test suite to e.g. run a custom command before every "git push" or "git fetch", and then just do a gc with a repack/commit graph write/midx write etc. in that codepath, along with (in the case of stuff like midx) setting any neede config knobs to turn it on. Of course the utility of that sort of thing is limited unless we have some dedicated smoke testers or CI capacity to run the various combinations of those options. But FWIW when I build our own in-house git I build the package with: # Set "false" to test the build procedure itself if true then export BKNG_GIT_HARNESS_OPTIONS="%{?_smp_mflags} --state=failed,slow,save --timer" echo Testing without any custom options: (cd t && /usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh) echo Testing while roundtripping everything through the fsmonitor codepath: (cd t && GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all GIT_SKIP_TESTS="t3404.7 t7411.3 t7411.4" /usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh) echo Testing split index (cd t && GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77" /usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh) echo Testing uncommon pack modes. See ci/run-tests.sh in git (cd t && GIT_TEST_FULL_IN_PACK_ARRAY=true GIT_TEST_OE_SIZE=10 /usr/bin/prove $BKNG_GIT_HARNESS_OPTIONS t[0-9]*.sh) fi Those skipped tests are various intermittent bugs related to those codpaths which I haven't had time to track down / report yet. So if there was a "test bitmaps everywhere" mode that would have been caught during the build, unless I've misunderstood how this particular bug manifests, but then again, it happened on just a plain git.git after repack, so wasn't any bitmap + push pretty much all that was needed?, I haven't read your patches in any detail. B.t.w. for Ben or anyone else who knows about the fsmonitor part of this: I've long been running the whole test suite with `GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all prove ...` (also along with GIT_TEST_SPLIT_INDEX=) after all the main tests pass as additional stress testing. It's not documented under the "special setups" section. So I was going to add it, but I see that in 5c8cdcfd80 ("fsmonitor: add test cases for fsmonitor extension", 2017-09-22) it's documented that you should also set GIT_FORCE_PRELOAD_TEST=true, is that needed for GIT_FSMONITOR_TEST? Or is it yet another mode, and if so to be combined with fsmonitor in particular, or stand-alone? > There may be a larger lesson about tracking code coverage, but I don't > know that most g
Re: [PATCH] Makefile: enable DEVELOPER by default
On Mon, 2018-08-06 at 13:02 -0400, Randall S. Becker wrote: > > Any idea when this is going to be in an official release, and exactly > what the settings will be for "Not Developer". I assume DEVELOPER=0 > and DEVOPTS=error, which is the current behaviour, correct? I am the > platform maintainer for HPE NonStop and need to make sure I'm not > packaging DEV builds to anyone, since I'm the only one doing this for > the platform. It's another hoop, but hopefully not a bad one. The > question is the best place to set this, assuming we are using Jenkins > for our builds, and I'd rather keep the existing config.mak.uname the > same, since at least it seems stable. Just a FYI and in case you aren't aware, you could create a "config.mak" to store your custom configurations. You can be sure it's used due to the following Makefile part: ... include config.mak.uname -include config.mak.autogen -include config.mak ... It's just not a hard dependency. Hope that helps, Sivaraam
[BUG] index corruption with git commit -p
Hi, I've just had a scary error: error: index uses $?+? extension, which we do not understand fatal: index file corrupt Things were quickly recovered by deleting the index but it clearly looks to a but to me. Here are the steps to reproduce it: $ git clone git://github.com/lucvoo/sparse-dev.git $ cd $ git co index-corruption $ git rm -r validation/ Documentation/ $ git commit -m -p $ git status error: index uses $?+? extension, which we do not understand fatal: index file corrupt The 'extension' pattern '$?+?', can vary a bit, sometimes it's just '', but always seems 4 chars. If the commit command doesn't use the '-p' flag, there is no problem. The repository itself is not corrupted, it's only the index. It happends with git 2.18.0 and 2.17.0 -- Luc Van Oostenryck
Re: [BUG] index corruption with git commit -p
On Sat, Sep 01 2018, Luc Van Oostenryck wrote: > Hi, > > I've just had a scary error: > error: index uses $?+? extension, which we do not understand > fatal: index file corrupt > > Things were quickly recovered by deleting the index but it clearly > looks to a but to me. > > Here are the steps to reproduce it: > $ git clone git://github.com/lucvoo/sparse-dev.git > $ cd > $ git co index-corruption > $ git rm -r validation/ Documentation/ > $ git commit -m -p > $ git status > error: index uses $?+? extension, which we do not understand > fatal: index file corrupt > > > The 'extension' pattern '$?+?', can vary a bit, sometimes > it's just '', but always seems 4 chars. > If the commit command doesn't use the '-p' flag, there is no > problem. The repository itself is not corrupted, it's only > the index. It happends with git 2.18.0 and 2.17.0 Yeah this is a bug, I didn't dig much but testing with this script down to 2.8.0: #!/bin/sh cd ~/g/git make -j $(parallel --number-of-cores) USE_LIBPCRE2=YesPlease CFLAGS="-O0 -g -ggdb3" DEVELOPER=1 DEVOPTS=no-error NO_OPENSSL=Y all ( rm -rf /tmp/x; ~/g/git/git --exec-path=/home/avar/g/git clone git://github.com/lucvoo/sparse-dev.git /tmp/x && cd /tmp/x && ~/g/git/git --exec-path=/home/avar/g/git checkout index-corruption && ~/g/git/git --exec-path=/home/avar/g/git rm -r validation/ Documentation/ && ~/g/git/git --exec-path=/home/avar/g/git commit -p ) ~/g/git/git --exec-path=/home/avar/g/git -C /tmp/x status if ~/g/git/git --exec-path=/home/avar/g/git -C /tmp/x status then exit 0 else exit 1 fi I found that the first bad commit was: 680ee550d7 ("commit: skip discarding the index if there is no pre-commit hook", 2017-08-14) Now, note the two invocations of "status" in my script. Before we'd already been complaining about a bad index, but after that commit is the first time we started getting a persistent error, and indeed even reverting it now on top of master makes the error non-persistent. So not a fix, but a strong signal to see where we should start looking. I.e. the index file handling around discard_cache() & "interactive" in commit.c is likely what's broken.
Re: [PATCH 2/4] t5310: test delta reuse with bitmaps
On 9/1/2018 4:29 PM, Ævar Arnfjörð Bjarmason wrote: B.t.w. for Ben or anyone else who knows about the fsmonitor part of this: I've long been running the whole test suite with `GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all prove ...` (also along with GIT_TEST_SPLIT_INDEX=) after all the main tests pass as additional stress testing. It's not documented under the "special setups" section. So I was going to add it, but I see that in 5c8cdcfd80 ("fsmonitor: add test cases for fsmonitor extension", 2017-09-22) it's documented that you should also set GIT_FORCE_PRELOAD_TEST=true, is that needed for GIT_FSMONITOR_TEST? Or is it yet another mode, and if so to be combined with fsmonitor in particular, or stand-alone? I was unaware of the "special setups" section until recently so if you would add the fsmonitor information that would be great. I added the GIT_FORCE_PRELOAD_TEST as when I went to test fsmonitor I realized there was no way to manually override the default behavior and force preload to happen. I added tests into t7519-status-fsmonitor.sh that use this capability to test fsmonitor with and without preload to ensure they play nice together as I had to add logic into the preload code to mark cache entries as clean for fsmonitor. GIT_FORCE_PRELOAD_TEST should probably be added to the "special setups" as well so that the entire test suite can be used to exercise that code path. Ultimately, GIT_FSMONITOR_TEST and GIT_FORCE_PRELOAD_TEST can be used to test their respective code paths using the test suite. The challenge is that the number of potential combinations (including split index, uncommon pack modes, etc) gets large pretty quickly. There may be a larger lesson about tracking code coverage, but I don't know that most general code coverage tools would have helped (any overall percentage number would be too large to move). A tool that looked at the diff and said "of the N lines you added/touched, this percent is exercised in the test suite" might have been useful. This would be very useful.
[PATCH v2 2/3] Add test for commit --dry-run --short.
Add test for commit with --dry-run --short for a new file of zero length. The test demonstrates that the setting of the commitable flag is broken. Signed-off-by: Stephen P. Smith --- t/t7501-commit.sh | 8 1 file changed, 8 insertions(+) diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh index 4cae92804..5812dc2f8 100755 --- a/t/t7501-commit.sh +++ b/t/t7501-commit.sh @@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed from a merge' ' git commit -m "conflicts fixed from merge." ' +test_expect_failure '--dry-run --short' ' + # setup two branches with conflicting information + # in the same file, resolve the conflict + >test-file && + git add test-file && + git commit --dry-run --short +' + test_done -- 2.18.0
[PATCH v2 3/3] wt-status.c: Set the commitable flag in the collect phase.
In an update to fix a bug with "commit --dry-run" it was found that the commitable flag was broken. The update was, at the time, accepted as it was better than the previous version. [1] Since the setting of the commitable flag had been done in wt_longstatus_print_updated, move it to wt_status_collect_updated_cb. Set the commitable flag in wt_status_collect_changes_initial to keep from introducing a rebase regression. Instead of setting the commitable flag in show_merge_in_progress, in wt_status_cllect check for a merge that has not been committed. If present then set the commitable flag. Change the tests to expect success since updates to the wt-status broken code section is being fixed. [1] https://public-inbox.org/git/xmqqr3gcj9i5@gitster.mtv.corp.google.com/ Signed-off-by: Stephen P. Smith --- t/t7501-commit.sh | 6 +++--- wt-status.c | 13 +++-- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh index 5812dc2f8..4cda088cc 100755 --- a/t/t7501-commit.sh +++ b/t/t7501-commit.sh @@ -99,12 +99,12 @@ test_expect_success '--dry-run with stuff to commit returns ok' ' git commit -m next -a --dry-run ' -test_expect_failure '--short with stuff to commit returns ok' ' +test_expect_success '--short with stuff to commit returns ok' ' echo bongo bongo bongo >>file && git commit -m next -a --short ' -test_expect_failure '--porcelain with stuff to commit returns ok' ' +test_expect_success '--porcelain with stuff to commit returns ok' ' echo bongo bongo bongo >>file && git commit -m next -a --porcelain ' @@ -682,7 +682,7 @@ test_expect_success '--dry-run with conflicts fixed from a merge' ' git commit -m "conflicts fixed from merge." ' -test_expect_failure '--dry-run --short' ' +test_expect_success '--dry-run --short' ' # setup two branches with conflicting information # in the same file, resolve the conflict >test-file && diff --git a/wt-status.c b/wt-status.c index 180faf6ba..578328979 100644 --- a/wt-status.c +++ b/wt-status.c @@ -540,10 +540,12 @@ static void wt_status_collect_updated_cb(struct diff_queue_struct *q, /* Leave {mode,oid}_head zero for an add. */ d->mode_index = p->two->mode; oidcpy(&d->oid_index, &p->two->oid); + s->commitable = 1; break; case DIFF_STATUS_DELETED: d->mode_head = p->one->mode; oidcpy(&d->oid_head, &p->one->oid); + s->commitable = 1; /* Leave {mode,oid}_index zero for a delete. */ break; @@ -561,6 +563,7 @@ static void wt_status_collect_updated_cb(struct diff_queue_struct *q, d->mode_index = p->two->mode; oidcpy(&d->oid_head, &p->one->oid); oidcpy(&d->oid_index, &p->two->oid); + s->commitable = 1; break; case DIFF_STATUS_UNMERGED: d->stagemask = unmerged_mask(p->two->path); @@ -665,11 +668,13 @@ static void wt_status_collect_changes_initial(struct wt_status *s) * code will output the stage values directly and not use the * values in these fields. */ + s->commitable = 1; } else { d->index_status = DIFF_STATUS_ADDED; /* Leave {mode,oid}_head zero for adds. */ d->mode_index = ce->ce_mode; oidcpy(&d->oid_index, &ce->oid); + s->commitable = 1; } } } @@ -739,6 +744,7 @@ static int has_unmerged(struct wt_status *s) void wt_status_collect(struct wt_status *s) { + struct wt_status_state state; wt_status_collect_changes_worktree(s); if (s->is_initial) @@ -746,6 +752,11 @@ void wt_status_collect(struct wt_status *s) else wt_status_collect_changes_index(s); wt_status_collect_untracked(s); + + memset(&state, 0, sizeof(state)); + wt_status_get_state(&state, s->branch && !strcmp(s->branch, "HEAD")); + if (state.merge_in_progress && !has_unmerged(s)) + s->commitable = 1; } static void wt_longstatus_print_unmerged(struct wt_status *s) @@ -786,7 +797,6 @@ static void wt_longstatus_print_updated(struct wt_status *s) continue; if (!shown_header) { wt_longstatus_print_cached_header(s); - s->commitable = 1; shown_header = 1; } wt_longstatus_print_change_data(s, WT_STATUS_UPDATED, it); @@ -1089,7 +1099,6 @@ static void show_merge_in_progress(struct wt_statu
[PATCH v2 0/3] wt-status.c: commitable flag
A couple of years ago, during a patch review Junio found that the commitable bit as implemented in wt-status.c was broken. Stephen P. Smith (3): Move has_unmerged earlier in the file. Add test for commit --dry-run --short. wt-status.c: Set the commitable flag in the collect phase. t/t7501-commit.sh | 12 ++-- wt-status.c | 39 --- 2 files changed, 34 insertions(+), 17 deletions(-) -- 2.18.0
[PATCH v2 1/3] wt-status.c: Move has_unmerged earlier in the file.
Move has_unmerged up in the file so that has_unmerged can be called in wt_status_collect where we need to place a merge check. Signed-off-by: Stephen P. Smith --- wt-status.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/wt-status.c b/wt-status.c index 5ffab6101..180faf6ba 100644 --- a/wt-status.c +++ b/wt-status.c @@ -724,6 +724,19 @@ static void wt_status_collect_untracked(struct wt_status *s) s->untracked_in_ms = (getnanotime() - t_begin) / 100; } +static int has_unmerged(struct wt_status *s) +{ + int i; + + for (i = 0; i < s->change.nr; i++) { + struct wt_status_change_data *d; + d = s->change.items[i].util; + if (d->stagemask) + return 1; + } + return 0; +} + void wt_status_collect(struct wt_status *s) { wt_status_collect_changes_worktree(s); @@ -1063,19 +1076,6 @@ static void wt_longstatus_print_tracking(struct wt_status *s) strbuf_release(&sb); } -static int has_unmerged(struct wt_status *s) -{ - int i; - - for (i = 0; i < s->change.nr; i++) { - struct wt_status_change_data *d; - d = s->change.items[i].util; - if (d->stagemask) - return 1; - } - return 0; -} - static void show_merge_in_progress(struct wt_status *s, struct wt_status_state *state, const char *color) -- 2.18.0
Re: [BUG] A part of an edge from an octopus merge gets colored, even with --color=never
On 6 August 2018 at 17:26, Jeff King wrote: > I suspect it still has a bug, which is that it is handling this > first-parent-goes-left case, but probably gets the straight-parent case > wrong. But at least in this form, I think it is obvious to see where > that bug is (the "three" in the comment is not accurate in that latter > case, and it should be two). Yes, thanks, it makes a lot more sense this way. I believe the attached handles both parent types correctly. From a841a50b016c0cfc9183384e6c3ca85a23d1e11f Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Sat, 1 Sep 2018 20:07:16 -0400 Subject: [PATCH v3] log: Fix coloring of certain octupus merge shapes For octopus merges where the first parent edge immediately merges into the next column to the left: | *-. | |\ \ |/ / / then the number of columns should be one less than the usual case: | *-. | |\ \ | | | * Also refactor the code to iterate over columns rather than dashes, building from an initial patch suggestion by Jeff King. Signed-off-by: Noam Postavsky --- graph.c | 48 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/graph.c b/graph.c index e1f6d3bdd..478c86dfb 100644 --- a/graph.c +++ b/graph.c @@ -848,21 +848,45 @@ static int graph_draw_octopus_merge(struct git_graph *graph, struct strbuf *sb) { /* - * Here dashless_commits represents the number of parents - * which don't need to have dashes (because their edges fit - * neatly under the commit). + * Here dashless_commits represents the number of parents which don't + * need to have dashes (because their edges fit neatly under the + * commit). And dashful_commits are the remaining ones. */ const int dashless_commits = 2; - int col_num, i; - int num_dashes = - ((graph->num_parents - dashless_commits) * 2) - 1; - for (i = 0; i < num_dashes; i++) { - col_num = (i / 2) + dashless_commits + graph->commit_index; - strbuf_write_column(sb, &graph->new_columns[col_num], '-'); + int dashful_commits = graph->num_parents - dashless_commits; + + /* + * Usually, each parent gets its own column, like this: + * + * | *-. + * | |\ \ + * | | | * + * + * Sometimes the first parent goes into an existing column, like this: + * + * | *-. + * | |\ \ + * |/ / / + * + */ + int parent_in_existing_cols = graph->num_parents - + (graph->num_new_columns - graph->num_columns); + + /* + * Draw the dashes. We start in the column following the + * dashless_commits, but subtract out the parent which goes to an + * existing column: we've already counted that column in commit_index. + */ + int first_col = graph->commit_index + dashless_commits + - parent_in_existing_cols; + int i; + + for (i = 0; i < dashful_commits; i++) { + strbuf_write_column(sb, &graph->new_columns[i+first_col], '-'); + strbuf_write_column(sb, &graph->new_columns[i+first_col], +i == dashful_commits-1 ? '.' : '-'); } - col_num = (i / 2) + dashless_commits + graph->commit_index; - strbuf_write_column(sb, &graph->new_columns[col_num], '.'); - return num_dashes + 1; + return 2 * dashful_commits; } static void graph_output_commit_line(struct git_graph *graph, struct strbuf *sb) -- 2.11.0
Re: [PATCH v2 2/3] Add test for commit --dry-run --short.
On Sat, Sep 1, 2018 at 7:53 PM Stephen P. Smith wrote: > Add test for commit --dry-run --short. The first line of a commit message typically mentions the area or module being touched. It's also customary not to capitalize the first word and to omit the final period. So, for instance: t7501: add test of "--dry-run --short" setting 'committable' flag > Add test for commit with --dry-run --short for a new file of zero > length. > > The test demonstrates that the setting of the commitable flag is > broken. s/commitable/committable/ > Signed-off-by: Stephen P. Smith > --- > diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh > @@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed from > a merge' ' > +test_expect_failure '--dry-run --short' ' > + # setup two branches with conflicting information > + # in the same file, resolve the conflict What is this comment all about? It doesn't seem to have any relation to what the test itself is actually doing. (I see that it was copied from an earlier test in this script, but that doesn't help me understand what it is trying to say.) > + >test-file && > + git add test-file && > + git commit --dry-run --short > +'
Re: [PATCH v2 2/3] Add test for commit --dry-run --short.
On Sat, Sep 1, 2018 at 10:18 PM Eric Sunshine wrote: > On Sat, Sep 1, 2018 at 7:53 PM Stephen P. Smith wrote: > > The test demonstrates that the setting of the commitable flag is > > broken. > > s/commitable/committable/ Looking at patch 3/3, I see that this misspelling exists in the code itself, so I guess my recommended spelling correction isn't needed. It might make sense, though, to quote this word in the commit message to make it more clear that that is the literal spelling in the code. That is: ...demonstrates that the setting of the 'commitable' flag... (Not itself worth a re-roll.)
Re: [BUG] index corruption with git commit -p
On Sun, Sep 02, 2018 at 12:17:53AM +0200, Ævar Arnfjörð Bjarmason wrote: > > Here are the steps to reproduce it: > > $ git clone git://github.com/lucvoo/sparse-dev.git > > $ cd > > $ git co index-corruption > > $ git rm -r validation/ Documentation/ > > $ git commit -m -p > > $ git status > > error: index uses $?+? extension, which we do not understand > > fatal: index file corrupt > > > > The 'extension' pattern '$?+?', can vary a bit, sometimes > > it's just '', but always seems 4 chars. > > If the commit command doesn't use the '-p' flag, there is no > > problem. The repository itself is not corrupted, it's only > > the index. It happends with git 2.18.0 and 2.17.0 > > Yeah this is a bug, I didn't dig much but testing with this script down > to 2.8.0: > [...] > I found that the first bad commit was: 680ee550d7 ("commit: skip > discarding the index if there is no pre-commit hook", 2017-08-14) I think it's much older than that. I set up my test repo like this: git clone git://github.com/lucvoo/sparse-dev.git cd sparse-dev git checkout --detach and then bisected with this script: cd /path/to/sparse-dev rm .git/index git reset --hard index-corruption && git rm -q -r validation/ Documentation/ && git commit -qm foo -p && git status Since a33fc72fe9 (read-cache: force_verify_index_checksum, 2017-04-14), that produces the corrupt extension error. But before that, I consistently get: error: bad index file sha1 signature fatal: index file corrupt from git-commit. And that bisects back to 9c4d6c0297 (cache-tree: Write updated cache-tree after commit, 2014-07-13). If I revert that commit (which takes some untangling, see below), then the problem seems to go away. Here's the patch I tried on top of the current master, though I think it is actually the first hunk that is making the difference. --- diff --git a/builtin/commit.c b/builtin/commit.c index 0d9828e29e..779c5e2cb5 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -359,13 +359,6 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix discard_cache(); read_cache_from(get_lock_file_path(&index_lock)); - if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { - if (reopen_lock_file(&index_lock) < 0) - die(_("unable to write index file")); - if (write_locked_index(&the_index, &index_lock, 0)) - die(_("unable to update temporary index")); - } else - warning(_("Failed to update main cache tree")); commit_style = COMMIT_NORMAL; ret = get_lock_file_path(&index_lock); @@ -408,9 +401,6 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix if (!only && !pathspec.nr) { hold_locked_index(&index_lock, LOCK_DIE_ON_ERROR); refresh_cache_or_die(refresh_flags); - if (active_cache_changed - || !cache_tree_fully_valid(active_cache_tree)) - update_main_cache_tree(WRITE_TREE_SILENT); if (write_locked_index(&the_index, &index_lock, COMMIT_LOCK | SKIP_IF_UNCHANGED)) die(_("unable to write new_index file")); @@ -457,7 +447,6 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix hold_locked_index(&index_lock, LOCK_DIE_ON_ERROR); add_remove_files(&partial); refresh_cache(REFRESH_QUIET); - update_main_cache_tree(WRITE_TREE_SILENT); if (write_locked_index(&the_index, &index_lock, 0)) die(_("unable to write new_index file")); -Peff
Re: [PATCH v2 2/3] Add test for commit --dry-run --short.
On Saturday, September 1, 2018 7:18:34 PM MST Eric Sunshine wrote: > > diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh > > @@ -682,4 +682,12 @@ test_expect_success '--dry-run with conflicts fixed > > from a merge' ' +test_expect_failure '--dry-run --short' ' > > + # setup two branches with conflicting information > > + # in the same file, resolve the conflict > > What is this comment all about? It doesn't seem to have any relation > to what the test itself is actually doing. (I see that it was copied > from an earlier test in this script, but that doesn't help me > understand what it is trying to say.) Agreed. I saw your other email about not being worth a re-roll, but I've made the change locally in case Junio wants me to do so. Additionally if there are other comments I can wrap them into a single set.
Re: [PATCH 2/4] t5310: test delta reuse with bitmaps
On Sat, Sep 01, 2018 at 10:29:25PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Anyway. Not that exciting, and kind of obviously dumb in retrospect. But > > I think it was worth analyzing to see what went wrong. If there's an > > immediate lesson, it is probably: add tests even for changes that aren't > > really user-visible to make sure the code is exercised. > > Test-wise, isn't the problem rather that that we didn't have something > like what's described in t/README as "Running tests with special setups" > for bitmaps? I.e. stuff like GIT_TEST_SPLIT_INDEX=, or running it > with GIT_FSMONITOR_TEST=$PWD/t7519/fsmonitor-all to stress the fsmonitor > code. I'm a little skeptical of that as a general approach, for two reasons: - the test suite is filled with toy examples, so they often don't trigger the interesting cases - the tests are often looking for very particular outcomes or repository state; munging that state is going to confuse them > So we could add some option to the test suite to e.g. run a custom > command before every "git push" or "git fetch", and then just do a gc > with a repack/commit graph write/midx write etc. in that codepath, along > with (in the case of stuff like midx) setting any neede config knobs to > turn it on. We can try that out with something like this: diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c index 42dc4da5a1..f40e0b7a04 100644 --- a/builtin/upload-pack.c +++ b/builtin/upload-pack.c @@ -30,6 +30,14 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix) OPT_END() }; + /* +* This environment variable hackery is necessary because repack in a +* partial clone might actually try to fetch, spawning an infinite +* recursion. +*/ + system("test -z \"$SUPPRESS_BITMAP_MAGIC\" && " + "SUPPRESS_BITMAP_MAGIC=1 git repack -adb >/dev/null 2>&1"); + packet_trace_identity("upload-pack"); read_replace_refs = 0; It actually _does_ find the bug in question (which I wasn't at all sure it would). But it also turns up several other unrelated test failures. And it's only triggering in some limited cases (fetches). If we tried to get better coverage of bitmaps in general, I'm not sure where we would slip in such a repack command. But I think you'd get even more false positives. > Of course the utility of that sort of thing is limited unless we have > some dedicated smoke testers or CI capacity to run the various > combinations of those options. But FWIW when I build our own in-house > git I build the package with: Yes, it also gets really expensive. That can be helped with more CI machines, but even neglecting cost, I'm not sure our CI workflow is that great right now (for example, I still haven't figured out the simple feature of: if I push up a commit that fails, I'd like to get an email). > So if there was a "test bitmaps everywhere" mode that would have been > caught during the build, unless I've misunderstood how this particular > bug manifests, but then again, it happened on just a plain git.git after > repack, so wasn't any bitmap + push pretty much all that was needed?, I > haven't read your patches in any detail. The test patch describes the minimal scenario you need. Which would be pretty common on any decent sized repo, but rare on toy ones (though as I said above, there are a few cases in the test suite big enough to trigger this). -Peff