Re: [PATCH 1/1] documentation: add lab for first contribution
On Thu, Apr 11, 2019 at 8:20 PM Junio C Hamano wrote: > > "Emily Shaffer via GitGitGadget" writes: > > > diff --git a/Documentation/MyFirstContribution > > b/Documentation/MyFirstContribution > > new file mode 100644 > > index 00..9b87e424d6 > > --- /dev/null > > +++ b/Documentation/MyFirstContribution > > @@ -0,0 +1,674 @@ > > +My First Contribution > > += > > + > > +== Summary > > Just a minor nit but only the document title uses the underlined > mark-up format, but not each of its sections? I was basing the format of this document on Documentation/SubmittingPatches, which underlines the document title but prefixes the rest of the headers with /=+/ to indicate level. I can change it, but I like the hierarchical headers offered by ={1,4} prefix. > > Is the goal of this document to help those who want to contribute to > *THIS* project, or to any project that happens to use Git as their > source control tool of choice? I take it to be the former, but if > that is the case, perhaps extend the title to "My First Contribution > to the Git Project" or something, so that those who use us merely as > a tool can tell that it can be ignored more easily? Good point, I'll extend the title. > > > +=== Set Up Your Workspace > > + > > +Let's start by making a development branch to work on our changes. Per > > +`Documentation/SubmittingPatches`, since a brand new command is a new > > feature, > > +it's fine to base your work on `master`. However, in the future for > > bugfixes, > > +etc., you should check that doc and base it on the appropriate branch. > > + > > +For the purposes of this doc, we will base all our work on `master`. Before > > +running the command below, ensure that you are on `master` first so your > > +branch diverges at the right point. > > + > > + > > +git checkout -b psuh > > + > > An obvious and more explicit alternative, which may be better both > for tutorial purposes (as it illustrates what is going on better by > not relying on an implicit default) and for real-life (as it does > not require checking out 'master' only to switch away from a new > branch, saving a entry of two in the HEAD reflog) is > > For the purposes of this doc, we will base all our work on > the `master` branch of the upstream project. Fork the > `psuh` branch you use for your development like so: > > > $ git checkout -b psuh origin/master > > I like this suggestion, but don't like the use of "fork" as it somewhat conflates a GitHub-specific term. I'll take this recommendation but use "create" instead of "fork". > > +=== Adding a new command > > + > > +Lots of the main useful commands are written as builtins, which means they > > are > > Drop 'useful', and possibly 'main' as well. I would have written > "Many of the Git subcommands are..." if I were writing it myself > without reading yours. > > > +implemented in C and compiled into the main `git` executable.. So it is > > +informative to implement `git psuh` as a builtin. > > +Create a new file in `builtin/` called `psuh.c`. > > I am not sure what "informative" in this context means. I was hoping to indicate that it would be an informative exercise for the reader who is following the tutorial. I'll try to reword this and make it more clear; thanks for the example. > > Typically a built-in subcommand is implemented as a function > whose name is "cmd_" followed by the name of the subcommand > and stored in a C source file that is named after the name > of the subcommand inside `builtin/` directory, so it is > natural to implement your `psuh` command as a cmd_psuh() > function in `builtin/psuh.c`. > > > +The entry point of your new command needs to match a certain signature: > > + > > +int cmd_psuh(int argc, const char **argv, const char *prefix) > > + > > + > > +We'll also need to add the extern declaration of psuh; open up `builtin.h`, > > +find the declaration for cmd_push, and add a new line for psuh: > > + > > + > > +extern int cmd_psuh(int argc, const char **argv, const char *prefix); > > + > > I think there is a push to drop the "extern " from decls of > functions in *.h header files. > I'd just as soon change this documentation when that push changes the other decls, rather than spend time explaining a transient shift in how to d
Re: [PATCH 0/1] documentation: add lab for first contribution
On Thu, Apr 11, 2019 at 7:35 PM Junio C Hamano wrote: > > "Emily Shaffer via GitGitGadget" writes: > > > RFC. I am still working on adding a section on handling refs and objects. > > Thanks. Is 'lab' understood widely enough? I _think_ you are > abbreviating what is known as 'codelab' by your colleagues at work, > but would it make it more in line with what we already have in this > project, perhaps, to call it a "tutorial"? > I think you're right; I'll try to modify throughout. As part of that change I will also move the sample to a branch with a more descriptive name and change the URL. Thanks a lot for the detailed review, Junio. I expect to send a new patch later today, including an exercise in examining a specific commit. -- Emily Shaffer
Re: [PATCH 1/1] documentation: add lab for first contribution
> >> ... then leave it in your example, perhaps? > >> > > > > Good point. :) I had wanted to avoid including my own name/email in the > > tutorial; I used a throwaway "Git Contributor " for the > > example. > > ... > >> Keep a sample sign-off by "A U Thor " here. > > > No, use "A U Thor " as I suggested. That's the > author ident the aspiring Git developer MUST become familiar with > while working with our test suite in t/. There you will also find > the counterpart committer ident to use, if needed. Done. > > Just FYI, I rarely give a "do it exactly like this" suggestion; > often I instead stop at giving a general direction and leave it up > to the contributers to perfect it. The "A U Thor" is one of those > rare cases. On the other hand, "fork" was *not*. Sorry about that. I've found it's good practice to "show, don't tell" when I make review comments to avoid confusion, which isn't quite the same as "do it exactly like this" but looks similar on the box. So I guessed wrong this time. :) Will push a fix with it. > > > Do folks on Git project usually engage in test-driven development? I > > would be happy to move the test up towards the front of the document > > and mention the usefulness of TDD, but not if that's not something > > emphasized usually by the group.. > > I have no strong opinion on this myself. > > I suspect that the developer would clean up with "rebase -i" by > squashing before submitting the result of a very fine-grained TDD > workflow, as nobody would want to read printf("hello world") in > [PATCH 1/47] that would become a real feature in a later step. So > if the tutorial wants to go into that tangent (which might be too > much detail), it may be worth showing from the first few commits, > but otherwise a sequence that pretends the reader to be a perfect > developer who does not make any mistake in the middle may be more > concise and more readable. I dunno. In that case, I'd just as soon leave the order as it is. I think that a developer, outside of the context of a tutorial, will end up writing tests in the order they prefer regardless of the order of a tutorial they did one time. Maybe I can add a note about tests being required for incoming patches to discourage readers from glossing over that section. > > Thanks.
Re: [PATCH v2 0/1] documentation: add lab for first contribution
On Tue, Apr 16, 2019 at 1:26 PM Emily Shaffer via GitGitGadget wrote: > > RFC. I am still working on adding a section on handling refs and objects. Sorry for the spam; above line was stale in GitGitGadget. The relevant section has been added just before the help page. I think the change is complete pending review comments. I've also added instructions for using send-email in addition to GitGitGadget. Thanks!
Re: [PATCH v2 1/1] documentation: add lab for first contribution
> > This code lab covers how to add a new command to Git and, in the > > process, everything from cloning git/git to getting reviewed on the mail > > "lab"? I thought we settled on "tutorial". Also the place we are > having conversation we call "mailing list", I think. You're right. My find/replace missed the commit; I'll read through the document again as well. > > +MyFirstContribution.txt: MyFirstContribution > > + $(QUIET_GEN) cp $< $@ > > Hmph. > > Unlike SubmittingPatches that has known as that specific filename > for a long time before we added to the *.txt -> *.html toolchain > (hence people may look for it without the *.txt suffix), I do not > immediately see why the source of this new tutorial needs a variant > without the suffix. Is there a reason why this new file cannot be > created as Documentaiton/MyFirstContribution.txt that I am missing? The only reason is that I wasn't aware of the special case. Fixed. > > +== Getting Started > > + > > +=== Pull the Git codebase > > + > > +Git is mirrored in a number of locations. https://git-scm.com/downloads > > +suggests the best place to clone from is GitHub. > > "suggests that one of the best places ..."? I'll change it. But it seems odd to say that when git-scm.com only mentions the one mirror. > > +Let's start by making a development branch to work on our changes. Per > > +`Documentation/SubmittingPatches`, since a brand new command is a new > > feature, > > +it's fine to base your work on `master`. However, in the future for > > bugfixes, > > +etc., you should check that doc and base it on the appropriate branch. > > Avoid unnecessary abbreviation; s/doc/document/. Same for "lab", if > we are to call this "codelab" instead of "tutorial". I won't repeat > this for brevity, but I see many instances of them. Sure. I'll check across for instances of "lab" and reword them to "tutorial"; same for "doc" vs "document" etc. > > +For the purposes of this doc, we will base all our work on the `master` > > branch > > +of the upstream project. Create the `psuh` branch you will use for > > development > > +like so: > > + > > + > > +git checkout -b psuh origin/master > > + > > > $ git checkout -b psuh origin/master > > > I think both of our existing tutorials spell out the shell prompt to > clarify what these lines are. It would especially help in this > document, where you have other monospaced display material that are > not commands to be typed but code snippets. I appreciate your comment about clarity. But including the prompt also makes it more difficult to copy-paste into the command line... I think if someone is blindly copy-pasting this tutorial, they won't be getting a lot out of it anyway. So I'll add the prompts. > > > + > > +We'll make a number of commits here in order to demonstrate how to send > > many > > +patches up for review simultaneously. > > I'd write "a topic with multiple patches" instead of "many > patches". The point being that we are not sending a group of > unrelated changes, but are focusing on a theme. Good suggestion, thank you. > > +== Code It Up! > > + > > +NOTE: A reference implementation can be found at > > +https://github.com/nasamuffin/git/tree/psuh. > > + > > +=== Adding a new command > > + > > +Lots of the main useful commands are written as builtins, which means they > > are > > I'd say "the subcommands" without "main" or "useful". There are > fringe subcommands that are implemented as built-in, there are main > useful commands that are not built-in, and "useful"-ness is in the > eyes of beholder. Sure, done. > > +implemented in C and compiled into the main `git` executable. Since they > > are so > > +common, it is a useful exercise to implement `git psuh` as a builtin > > subcommand. > > What does "they" refer to in this sentence? Exiting built-in > commands are so common? In what way are they "common"? They are > commonly used? That is not relevant in the choice of making 'git > psuh' a built-in or a standalone. > > Adding a new built-in command, if it were common, may be a good > target for illustration. But it is not all that common. > > Adding a built-in command requires you to understand the start-up > sequence to the exit status, and serves as a good end-to-end > exercise, if this tutorial's main aim is to give a tour of the > codebase and its internal API. An almost no-op "git psuh" built-in > is small enough to serve as a good end-to-end exercise, without > requiring the author to know much about the internal API, and would > be a good material to show how the contributor, the reviewers and > the maintainer work together to add it to the system. > > So "they are so common" is probably not a good excuse, even though > using `git psuh` may be a good exercise for the purpose of this > tutorial. > > Since adding an almost no-op built-in command is relatively > simple, it is a good material to demonstrate how you as an >
[PATCH v3] documentation: add lab for first contribution
This tutorial covers how to add a new command to Git and, in the process, everything from cloning git/git to getting reviewed on the mailing list. It's meant for new contributors to go through interactively, learning the techniques generally used by the git/git development community. Signed-off-by: Emily Shaffer --- Notable changes in v3: - Changed document name from MyFirstContribution to MyFirstContribution.txt; this reduced the changes to Makefile and eliminated changes to .gitignore. - git send-email section now uses the --cover-letter argument to format-patch, and explains single-patch emails - Reworked summary of engaging in code review, moving to its own section "Responding To Reviews" - "Now What?" section now explains when to make a new commit vs when to modify existing commit Other smaller file-wide changes: - Expanded use of "doc" ; reworded references to "lab" and "codelab" - Added $ prompts to bash snippets - monospaced things that should be but weren't Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1065 + 2 files changed, 1066 insertions(+) create mode 100644 Documentation/MyFirstContribution.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 26a2342bea..fddc3c3c95 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica SP_ARTICLES += $(API_DOCS) TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstContribution TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt new file mode 100644 index 00..d9bed013fc --- /dev/null +++ b/Documentation/MyFirstContribution.txt @@ -0,0 +1,1065 @@ +My First Contribution to the Git Project + + +== Summary + +This is a tutorial demonstrating the end-to-end workflow of creating a change to +the Git tree, sending it for review, and making changes based on comments. + +=== Prerequisites + +This tutorial assumes you're already fairly familiar with using Git to manage +source code. The Git workflow steps will largely remain unexplained. + +=== Related Reading + +This tutorial aims to summarize the following documents, but the reader may find +useful additional context: + +- `Documentation/SubmittingPatches` +- `Documentation/howto/new-command.txt` + +== Getting Started + +=== Pull the Git codebase + +Git is mirrored in a number of locations. https://git-scm.com/downloads +suggests one of the best places to clone from is GitHub. + + +$ git clone https://github.com/git/git git + + +=== Identify Problem to Solve + + +Use + to indicate fixed-width here; couldn't get ` to work nicely with the +quotes around "Pony Saying 'Um, Hello'". + +In this tutorial, we will add a new command, +git psuh+, short for ``Pony Saying +`Um, Hello''' - a feature which has gone unimplemented despite a high frequency +of invocation during users' typical daily workflow. + +(We've seen some other effort in this space with the implementation of popular +commands such as `sl`.) + +=== Set Up Your Workspace + +Let's start by making a development branch to work on our changes. Per +`Documentation/SubmittingPatches`, since a brand new command is a new feature, +it's fine to base your work on `master`. However, in the future for bugfixes, +etc., you should check that document and base it on the appropriate branch. + +For the purposes of this document, we will base all our work on the `master` +branch of the upstream project. Create the `psuh` branch you will use for +development like so: + + +$ git checkout -b psuh origin/master + + +We'll make a number of commits here in order to demonstrate how to send a topic +with multiple patches up for review simultaneously. + +== Code It Up! + +NOTE: A reference implementation can be found at +https://github.com/nasamuffin/git/tree/psuh. + +=== Adding a new command + +Lots of the subcommands are written as builtins, which means they are +implemented in C and compiled into the main `git` executable. Implementing the +very simple `psuh` command as a built-in will demonstrate the structure of the +codebase, the internal API, and the process of working together as a contributor +with the reviewers and maintainer to integrate this change into the system. + +Built-in subcommands are typically implemented in a function named "cmd_" +followed by the name of the subcommand, in a source file named after the +subcommand and contained within `builtin/`. So it makes sense to implement your +command in `builtin/psuh.c`. Create that file, and within it, write the entry +point for your comman
Re: [PATCH v3] documentation: add lab for first contribution
On Sun, Apr 21, 2019 at 3:52 AM Junio C Hamano wrote: > > Emily Shaffer writes: > > > This tutorial covers how to add a new command to Git and, in the > > process, everything from cloning git/git to getting reviewed on the > > mailing list. It's meant for new contributors to go through > > interactively, learning the techniques generally used by the git/git > > development community. > > > > Signed-off-by: Emily Shaffer > > --- > > I think a stray "lab" remains in the title of the patch. They seem > to have disappeared from all the other places, and "tutorial" is > consistently used, which is good. Thanks, finished. > > My eyes have lost freshness on this topic, so my review tonight is > bound to be (much) less thorough than the previous round. I appreciate the thorough reviews you gave til now, thanks very much for all your time. > > > +- `Documentation/SubmittingPatches` > > +- `Documentation/howto/new-command.txt` > > Good (relative to the earlier one, these are set in monospace). > > > + > > +== Getting Started > > + > > +=== Pull the Git codebase > > + > > +Git is mirrored in a number of locations. https://git-scm.com/downloads > > Perhaps URLs should also be set in monospace? This breaks the hyperlink. So I'd rather not? > > > > +NOTE: When you are developing the Git project, it's preferred that you use > > the > > +`DEVELOPER flag`; if there's some reason it doesn't work for you, you can > > turn > > I do not think you want to set 'flag' in monospace, too. > i.e. s| flag`|` flag|; Thanks, good catch. > > + git_config(git_default_config, NULL) > > + if (git_config_get_string_const("user.name", &cfg_name) > 0) > > + { > > + printf(_("No name is found in config\n")); > > + } > > + else > > + { > > + printf(_("Your name: %s\n"), cfg_name); > > + } > > Style. Opening braces "{" for control structures are never be on > its own line, and else comes on the same line as closing "}" of if, > i.e. > > if (...) { > print ... > } else { > print ... > } Took this suggestion. > > Or just get rid of braces if you are not going to extend one (or > both) of if/else blocks into multi-statement blocks. > > > + > > + > > +`git_config()` will grab the configuration from config files known to Git > > and > > +apply standard precedence rules. `git_config_get_string_const()` will look > > up > > +a specific key ("user.name") and give you the value. There are a number of > > +single-key lookup functions like this one; you can see them all (and more > > info > > +about how to use `git_config()`) in > > `Documentation/technical/api-config.txt`. > > + > > +You should see that the name printed matches the one you see when you run: > > + > > + > > +$ git config --get user.name > > + > > + > > +Great! Now we know how to check for values in the Git config. Let's commit > > this > > +too, so we don't lose our progress. > > + > > + > > +$ git add builtin/psuh.c > > +$ git commit -sm "psuh: show parameters & config opts" > > + > > + > > +NOTE: Again, the above is for sake of brevity in this tutorial. In a real > > change > > +you should not use `-m` but instead use the editor to write a verbose > > message. > > We never encourge people to write irrelevant things or obvious > things that do not have to be said. But a single-liner message > rarely is sufficient to convey "what motivated the change, and why > the change is done in the way seen in the patch" in a meaningful > way. > > i.e. s|verbose|meaningful|; Thanks, done. I'll leave out repeating the lecture here as it was given in the full sample commit. > > > +Create your test script and mark it executable: > > + > > + > > +$ touch t/t-psuh-tutorial.sh > > +$ chmod +x t/t-psuh-tutorial.sh > > + > > I never "create an empty file" before editing in real life. Is this > a common workflow in some circles? > > I'd be tempted to suggest s/touch/edit/ here, but I dunno. Looking back, I'm wondering why I wrote it this way - I think I wanted to avoid prescribing an editor and also mention the permissions. I'll try to reword this to mention the executable bit after the content of the test script. > >
[PATCH v4] documentation: add tutorial for first contribution
This tutorial covers how to add a new command to Git and, in the process, everything from cloning git/git to getting reviewed on the mailing list. It's meant for new contributors to go through interactively, learning the techniques generally used by the git/git development community. Signed-off-by: Emily Shaffer --- Only minor changes from v3, correcting the comments Junio made in his review. - Changed commit subject - Stray monospace typos - Curly brace style Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1073 + 2 files changed, 1074 insertions(+) create mode 100644 Documentation/MyFirstContribution.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 26a2342bea..fddc3c3c95 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica SP_ARTICLES += $(API_DOCS) TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstContribution TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt new file mode 100644 index 00..fc4a59a8c6 --- /dev/null +++ b/Documentation/MyFirstContribution.txt @@ -0,0 +1,1073 @@ +My First Contribution to the Git Project + + +== Summary + +This is a tutorial demonstrating the end-to-end workflow of creating a change to +the Git tree, sending it for review, and making changes based on comments. + +=== Prerequisites + +This tutorial assumes you're already fairly familiar with using Git to manage +source code. The Git workflow steps will largely remain unexplained. + +=== Related Reading + +This tutorial aims to summarize the following documents, but the reader may find +useful additional context: + +- `Documentation/SubmittingPatches` +- `Documentation/howto/new-command.txt` + +== Getting Started + +=== Pull the Git codebase + +Git is mirrored in a number of locations. https://git-scm.com/downloads +suggests one of the best places to clone from is GitHub. + + +$ git clone https://github.com/git/git git + + +=== Identify Problem to Solve + + +Use + to indicate fixed-width here; couldn't get ` to work nicely with the +quotes around "Pony Saying 'Um, Hello'". + +In this tutorial, we will add a new command, +git psuh+, short for ``Pony Saying +`Um, Hello''' - a feature which has gone unimplemented despite a high frequency +of invocation during users' typical daily workflow. + +(We've seen some other effort in this space with the implementation of popular +commands such as `sl`.) + +=== Set Up Your Workspace + +Let's start by making a development branch to work on our changes. Per +`Documentation/SubmittingPatches`, since a brand new command is a new feature, +it's fine to base your work on `master`. However, in the future for bugfixes, +etc., you should check that document and base it on the appropriate branch. + +For the purposes of this document, we will base all our work on the `master` +branch of the upstream project. Create the `psuh` branch you will use for +development like so: + + +$ git checkout -b psuh origin/master + + +We'll make a number of commits here in order to demonstrate how to send a topic +with multiple patches up for review simultaneously. + +== Code It Up! + +NOTE: A reference implementation can be found at +https://github.com/nasamuffin/git/tree/psuh. + +=== Adding a new command + +Lots of the subcommands are written as builtins, which means they are +implemented in C and compiled into the main `git` executable. Implementing the +very simple `psuh` command as a built-in will demonstrate the structure of the +codebase, the internal API, and the process of working together as a contributor +with the reviewers and maintainer to integrate this change into the system. + +Built-in subcommands are typically implemented in a function named "cmd_" +followed by the name of the subcommand, in a source file named after the +subcommand and contained within `builtin/`. So it makes sense to implement your +command in `builtin/psuh.c`. Create that file, and within it, write the entry +point for your command in a function matching the style and signature: + + +int cmd_psuh(int argc, const char **argv, const char *prefix) + + +We'll also need to add the extern declaration of psuh; open up `builtin.h`, +find the declaration for `cmd_push`, and add a new line for `psuh` immediately +before it, in order to keep the declarations sorted: + + +extern int cmd_psuh(int argc, const char **argv, const char *prefix); + + +Be sure to `#include "builtin.h"` in your `psuh.c`. + +Go ahead and add some throwaway printf to that function. This is a decent +starting
[PATCH] gitsubmodules: align html and nroff lists
There appears to be a bug in the toolchain generating manpages from lettered lists. When a list is enumerated with letters, the resulting nroff shows numbers instead. Mostly this is harmless, but in the case of gitsubmodules, the paragraph following the list refers back to each bullet by letter. As a result, reading this documentation via `man gitsubmodules` is hard to parse - readers must infer that a bug exists and a refers to 1, b refers to 2, and c refers to 3 in the list above. The problem specifically was introduced in ad47194; previously rather than generating numerated lists the bulleted area was entirely monospaced in HTML and shown in plaintext in nroff. The bug seems to exist in docbook-xml - I've reported it on May 1 via the docbook-apps mail list - but for now it may make more sense to just work around the issue. Signed-off-by: Emily Shaffer --- Documentation/gitsubmodules.txt | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt index 57999e9f36..0a890205b8 100644 --- a/Documentation/gitsubmodules.txt +++ b/Documentation/gitsubmodules.txt @@ -169,15 +169,15 @@ ACTIVE SUBMODULES A submodule is considered active, - a. if `submodule..active` is set to `true` + 1. if `submodule..active` is set to `true` + or - b. if the submodule's path matches the pathspec in `submodule.active` + 2. if the submodule's path matches the pathspec in `submodule.active` + or - c. if `submodule..url` is set. + 3. if `submodule..url` is set. and these are evaluated in this order. @@ -193,11 +193,11 @@ For example: url = https://example.org/baz In the above config only the submodule 'bar' and 'baz' are active, -'bar' due to (a) and 'baz' due to (c). 'foo' is inactive because -(a) takes precedence over (c) +'bar' due to (1) and 'baz' due to (3). 'foo' is inactive because +(1) takes precedence over (3) -Note that (c) is a historical artefact and will be ignored if the -(a) and (b) specify that the submodule is not active. In other words, +Note that (3) is a historical artefact and will be ignored if the +(1) and (2) specify that the submodule is not active. In other words, if we have a `submodule..active` set to `false` or if the submodule's path is excluded in the pathspec in `submodule.active`, the url doesn't matter whether it is present or not. This is illustrated in -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH v4] documentation: add tutorial for first contribution
On Tue, Apr 30, 2019 at 11:59:23AM -0700, Josh Steadmon wrote: > Just a couple typo fixes listed below: > Thanks for the review, Josh. I'll hold these fixes locally until I either get something more significant to fix or Junio asks for them before a merge to next, to reduce spam to the list. - Emily > > On 2019.04.23 12:34, Emily Shaffer wrote: > [snip] > > +=== Implementation > > + > > +It's probably useful to do at least something besides printing out a > > string. > > +Let's start by having a look at everything we get. > > + > > +Modify your `cmd_psuh` implementation to dump the args you're passed: > > + > > + > > + int i; > > + > > + ... > > + > > + printf(Q_("Your args (there is %d):\n", > > + "Your args (there are %d):\n", > > + argc), > > + argc); > > + for (i = 0; i < argc; i++) { > > + printf("%d: %s\n", i, argv[i]); > > + } > > + printf(_("Your current working directory:\n%s%s\n"), > > + prefix ? "/" : "", prefix ? prefix : ""); > > + > > + > > + > > +Build and try it. As you may expect, there's pretty much just whatever we > > give > > +on the command line, including the name of our command. (If `prefix` is > > empty > > +for you, try `cd Documentation/ && ../bin-wrappers/git/ psuh`). That's not > > so > > Looks like you have an errant "/" after "git". Right you are. Thanks. > > > [snip] > > +=== Adding documentation > > + > > +Awesome! You've got a fantastic new command that you're ready to share > > with the > > +community. But hang on just a minute - this isn't very user-friendly. Run > > the > > +following: > > + > > + > > +$ ./bin-wrappers/git help psuh > > + > > + > > +Your new command is undocumented! Let's fix that. > > + > > +Take a look at `Documentation/git-*.txt`. These are the manpages for the > > +subcommands that Git knows about. You can open these up and take a look to > > get > > +acquainted with the format, but then go ahead and make a new file > > +`Documentation/git-psuh.txt`. Like with most of the documentation in the > > Git > > +project, help pages are written with AsciiDoc (see CodingGuidelines, > > "Writing > > +Documentation" section). Use the following template to fill out your own > > +manpage: > > + > > +// Surprisingly difficult to embed AsciiDoc source within AsciiDoc. > > +[listing] > > + > > +git-psuh(1) > > +=== > > + > > +NAME > > + > > +git-psuh - Delight users' typo with a shy horse > > + > > + > > +SYNOPSIS > > + > > +[verse] > > +'git-psuh' > > + > > +DESCRIPTION > > +--- > > +... > > + > > +OPTIONS[[OPTIONS]] > > +-- > > +... > > + > > +OUTPUT > > +-- > > +... > > + > > + > > +GIT > > +--- > > +Part of the linkgit:git[1] suite > > + > > + > > +The most important pieces of this to note are the file header, underlined > > by =, > > +the NAME section, and the SYNOPSIS, which would normally contain the > > grammar if > > +your command took arguments. Try to use well-established manpage headers > > so your > > +documentation is consistent with other Git and UNIX manpages; this makes > > life > > +easier for your user, who can skip to the section they know contains the > > +information they need. > > + > > +Now that you've written your manpage, you'll need to build it explicitly. > > We > > +convert your AsciiDoc to troff which is man-readable like so: > > + > > + > > +$ make all doc > > +$ man Documentation/git-psuh.1 > > + > > + > > +or > > + > > + > > +$ make -C Documentation/git-psuh.1 > > Needs a space after "Documentation/". Done. Thanks much.
Re: [PATCH 2/2] merge: add --quit
They both look fine to me, besides a couple typos in the commit message in this one. On Wed, May 01, 2019 at 08:11:52PM +0700, Nguyễn Thái Ngọc Duy wrote: > This allows to cancel the current merge without reseting worktree/index, "resetting". > which is what --abort is for. Like other --quit(s), this is often used > when you forgot that you're in the middle of a merge and already > switched away, doing different things. By the time you're realize, you "By the time you've realized" ? > can't even continue the merge anymore. Thanks, Emily
Re: [RFE] Allow for "interactive"-like actions in non-interactive rebase
Hi, On Fri, May 03, 2019 at 06:04:15PM +0300, Konstantin Kharlamov wrote: > Interactive rebase (i.e. for example "git rebase -i HEAD~10") is used most > often to apply an action to a single commit, e.g. "rename", "edit", "fixup", > etc… > > As result, people keep coming up with custom scripts and aliases for every > distinct action. > > Instead, it would be nice to have native support in git to start "rebase" > for a given commit, and pass the "interactive action" to use on that commit. I would totally use this. The equivalent workflow right now is a pretty large number of steps for, say, fixing a typo. > Examples: > > $ git rebase -i HEAD~10 --action edit > $ git rebase -i HEAD~10 --action rename > $ git rebase -i HEAD~10 --action fixup Is there an alternative to any of these actions that can already be taken individually? Or, another way of asking, were the interactive rebase commands based on some other Git command which could be pain to do on many commits individually? Emily
Re: [PATCH] commit-graph: fix memory leak
Hi, This change looks good to me, and like good evidence for the benefits of automated tooling :) Thanks! - Emily On Mon, May 06, 2019 at 02:36:58PM -0700, Josh Steadmon wrote: > Free the commit graph when verify_commit_graph_lite() reports an error. > Credit to OSS-Fuzz for finding this leak. > > Signed-off-by: Josh Steadmon > --- > commit-graph.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index 66865acbd7..4bce70d35c 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -267,8 +267,10 @@ struct commit_graph *parse_commit_graph(void *graph_map, > int fd, > last_chunk_offset = chunk_offset; > } > > - if (verify_commit_graph_lite(graph)) > + if (verify_commit_graph_lite(graph)) { > + free(graph); > return NULL; > + } > > return graph; > } > -- > 2.21.0.1020.gf2820cf01a-goog >
Re: [PATCH v3 00/16] Add new command 'restore'
Hiya, On Thu, Apr 25, 2019 at 04:45:44PM +0700, Nguyễn Thái Ngọc Duy wrote: > v3 changes are really small > > - gitcli.txt is updated to mention the --staged/--worktree pair, in > comparison to --cached/--index which is also mention there. > > - The patch 'rm --staged' is dropped. It could come back. But if it > does, it should be in a series where commands that take > --cached/--index will now take both --staged/--worktree. Those are > 'rm', 'apply', 'check-attr', 'grep', 'diff' and maybe 'ls-files'. > > In other words we support --staged/--worktree everywhere while > --cached/--index still remains because of muscle memory. This is > of course only a good move if --staged/--worktree is much better > than --cached/--index. > > - Since there's a chance this series may end up in 'master' and get > released, and I'm still worried that I screwed up somewhere, the last > patch declares the two commands experimental for maybe one or two > release cycles. > > If the reception is good, we revert that patch. If something > horrible is found, we can still change the default behavior without > anybody yelling at us. Or worst case scenario, we remove both > commands and declare a failed experiment. > > PS. the intent-to-add support is still not in. But it should be in > before the experimental status is removed. I've got a usability comment, as we're giving restore a try within Google for now. I found myself in a situation where I had accidentally staged all my changes to tracked files (I think resulting from a stash pop which generated a merge conflict?) and didn't see a good way to unstage everything using restore. I tried out `git restore --staged *` and it tried to restore every build artifact in my working tree, all of which should be ignored, made a lot of noisy errors, and left me with my changes still staged. Then, figuring that I only had a few files, I thought I'd type them each, with the exception of a .c/.h pair: g restore --staged builtin/log.c builtin/walken.c revision.* Because I had build artifacts present, this vomited while trying to unstage revision.o, and again left me with all my changes still staged. revision.o is validly ignored: $ g check-ignore revision.o revision.o Finally explicitly naming each file which I needed to restore worked, but I have very little interest in doing this for more than a handful of files, especially since the output of `git status` is not easy to paste into the command line (due to the "modified:" etc marker). I was definitely still able to make it do what I wanted with `git reset HEAD` but thought you may be interested. It'd be cool to have a hint which advises how you can unstage everything, or else to pay attention to the gitignore and not try to unstage those files, or else to have a command to unstage everything. (I looked for one by trying `git restore --staged -a` but that doesn't exist :) ) Thanks! Emily
Re: [PATCH v3 00/16] Add new command 'restore'
On Tue, May 07, 2019 at 05:36:56PM +0700, Duy Nguyen wrote: > On Tue, May 7, 2019 at 9:21 AM Emily Shaffer wrote: > > I've got a usability comment, as we're giving restore a try within > > Google for now. > > Thanks. I thought I was the only guinea pig :D and obviously not a > good one since I know too much (which is not a good thing as a > tester). > > > I found myself in a situation where I had accidentally > > staged all my changes to tracked files (I think resulting from a stash > > pop which generated a merge conflict?) and didn't see a good way to > > unstage everything using restore. > > > > I tried out `git restore --staged *` and it tried to restore every build > > artifact in my working tree, all of which should be ignored, made a lot of > > noisy errors, and left me with my changes still staged. > > For the record, "git restore --staged :/" should do the trick and it > is documented as an example (but without --staged). Yeah, this worked, and today I also noted `git restore --staged .` works, as does Junio's suggestion on the other mail (`git restore --staged revision.\*`), and quoting the * (`git restore --staged '*'`). So maybe I didn't think outside the box enough before mailing :) > > Either way. I think you raise a good point about "*" (or patterns > matching more than expected in general). I need to sleep on it and see > if the old way of handling pattern matching failure is still a good > way to go. I think it's worth considering, especially as `git reset HEAD *` and `git reset HEAD` both work. (`git restore --staged` complains that it hasn't got any paths, but reset seems to figure you just mean everything.) It was a surprising failure to me coming away from years of using reset. > > > Finally explicitly naming each file which I needed to restore worked, > > but I have very little interest in doing this for more than a handful of > > files, especially since the output of `git status` is not easy to paste > > into the command line (due to the "modified:" etc marker). > > For just a couple files, you could also use "-p" to go through them > and either "a" to restore all or "d" to skip the file. Maybe adding > "-i/--interactive" to git-restore too, just like "git add". Hmm... > -- > Duyh Anyways, maybe it's not a particularly weighty complaint, since there are still ways to mass unstage, and if I had just RTFM I would have found them. :) It's a new workflow, after all, and the old one still works, so it's probably not reasonable to expect it all to just work the same way with s/reset HEAD/restore --staged/. Thanks! - Emily
Re: [PATCH v4] documentation: add tutorial for first contribution
On Thu, May 02, 2019 at 07:11:04PM -0700, Phil Hord wrote: > On Tue, Apr 23, 2019 at 12:35 PM Emily Shaffer > wrote: > > > > This tutorial covers how to add a new command to Git and, in the > > process, everything from cloning git/git to getting reviewed on the > > mailing list. It's meant for new contributors to go through > > interactively, learning the techniques generally used by the git/git > > development community. > > > > Thanks for working on this. It's very nicely done. :) > > +Check it out! You've got a command! Nice work! Let's commit this. > > + > > + > > +$ git add Makefile builtin.h builtin/psuh.c git.c > > +$ git commit -s > > + > > + > > +You will be presented with your editor in order to write a commit message. > > Start > > +the commit with a 50-column or less subject line, including the name of the > > +component you're working on. Remember to be explicit and provide the "Why" > > of > > This part sounds a little ambiguous to me, as I'm expected to include > the "Why" in my 50-column subject line. I don't want to go overboard, > but maybe direct them further to > > After this, insert a blank line (always required) and then some > text describing > your change. Remember to be explicit and ... > Done. I agree it's ambiguous about how much is supposed to go into the subject versus the body, so I've hopefully clarified by explaining that the body "should provide the bulk of the context". > > + > > +$ make all doc > > +$ man Documentation/git-psuh.1 > > + > > + > > +or > > + > > + > > +$ make -C Documentation/git-psuh.1 > > There's an unwanted slash here. This should be `make -C Documentation > git-psuh.1`. Done, thanks. Nice catch. > > +=== Overview of Testing Structure > > + > > +The tests in Git live in `t/` and are named with a 4-decimal digit, > > according to > > This doesn't parse. How about this? > > named with a 4-decimal digit number using the schema shown in ... > I think we've both managed to miss that I've swapped "decimal" and "digit" by accident. :) How about "named with a 4-digit decimal number using the schema"? Very pleased that you caught this, since all the once-overs in the world from the cooked-brain author and the cooked-brain second- or tenth-pass reviewers wouldn't have caught this mistake. Thanks! > > +== Sending Patches via GitGitGadget > > + > > +One option for sending patches is to follow a typical pull request > > workflow and > > +send your patches out via GitGitGadget. GitGitGadget is a tool created by > > +Johannes Schindelin to make life as a Git contributor easier for those > > used to > > +the GitHub PR workflow. It allows contributors to open pull requests > > against its > > +mirror of the Git project, and does some magic to turn the PR into a set of > > +emails and sent them out for you. It also runs the Git continuous > > integration > > nit: "send" them out for you. Done, thanks. > > > +suite for you. It's documented at http://gitgitgadget.github.io. > > + > > +=== Forking git/git on GitHub > > + > > +Before you can send your patch off to be reviewed using GitGitGadget, you > > will > > +need to fork the Git project and upload your changes. First thing - make > > sure > > +you have a GitHub account. > > + > > +Head to the https://github.com/git/git[GitHub mirror] and look for the Fork > > +button. Place your fork wherever you deem appropriate and create it. > > + > > +=== Uploading To Your Own Fork > > I noticed some of your titles Use Capital Initials and others do not. > I suppose either is fine, but consistency is appreciated. > Nice catch. I've gone through and fixed up the titles throughout; as a result I also caught a missed monospace. > > +=== Sending Your Patches > > + > > +Now that your CI is passing and someone has granted you permission to use > > +GitGitGadget with the `/allow` command, sending out for review is as > > simple as > > nit: extra space before "sending" > Done. > > +Next, go look at your pull request against GitGitGadget; you should see > > the CI > > +has been kicked off again. Now while the CI is running is a good time for > > you > > nit: extra spaces before "kicked" > Done. > > +Make sure you retain the ``[PATCH 0/X]'' part; that's what indicates to >
Re: [PATCH v4] documentation: add tutorial for first contribution
On Mon, May 06, 2019 at 03:28:44PM -0700, Jonathan Tan wrote: > Sorry for not looking at this sooner. > > Firstly, I'm not sure if this file should be named without the ".txt", > like SubmittingPatches. I should mention that during this change's life as a GitGitGadget PR, dscho performed a review in GitHub, which included a recommendation to name this SubmittingPatches.txt. This obviates quite a bit of boilerplate within the Makefile as there are rules for handling *.txt documentation generation already. You can check out Johannes's review: https://github.com/gitgitgadget/git/pull/177 I keep forgetting to add a Reviewed-by line to my commit. I'll do that before pushing based on your comments, as well as Josh and Phil's. > > As for my other comments below, the Makefile comment below is the only > one I feel strongly about; feel free to disagree with the rest (which I > think are subjective). > > > diff --git a/Documentation/Makefile b/Documentation/Makefile > > index 26a2342bea..fddc3c3c95 100644 > > --- a/Documentation/Makefile > > +++ b/Documentation/Makefile > > @@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out > > technical/api-index-skel.txt technica > > SP_ARTICLES += $(API_DOCS) > > > > TECH_DOCS += SubmittingPatches > > +TECH_DOCS += MyFirstContribution > > Any reason not to keep this alphabetized? No reason, done. > > +=== Pull the Git codebase > > + > > +Git is mirrored in a number of locations. https://git-scm.com/downloads > > +suggests one of the best places to clone from is GitHub. > > + > > + > > +$ git clone https://github.com/git/git git > > + > > I would rename the header to "Clone the Git repository" instead, since > "pull" has a specific meaning. Also, I think that "one of the best > places" is unnecessary (I would just say "Clone the Git repository from > one of its many mirrors, e.g.:"), but perhaps you want to leave it in > there to maintain the informal tone. I've merged the language from both and added that the GH mirror at git/git is official. "Git is mirrored in a number of locations. Clone the repository from one of them; https://git-scm.com/downloads suggests one of the best places to clone from is the official mirror on GitHub." > > +We'll also need to add the extern declaration of psuh; open up `builtin.h`, > > +find the declaration for `cmd_push`, and add a new line for `psuh` > > immediately > > +before it, in order to keep the declarations sorted: > > + > > + > > +extern int cmd_psuh(int argc, const char **argv, const char *prefix); > > + > > I was going to say to not include the "extern", but I see that builtin.h > has them already, so it's probably better to leave it there for > consistency. > This was brought up in an earlier review and there's also a review pending to remove them, which seems to have turned into a discussion about how to maintain a script to remove them :) I'm going to avoid politics and also remove the extern here, because it looks like that's the direction the codebase is heading anyway. > > +The list of commands lives in `git.c`. We can register a new command by > > adding > > +a `cmd_struct` to the `commands[]` array. `struct cmd_struct` takes a > > string > > +with the command name, a function pointer to the command implementation, > > and a > > +setup option flag. For now, let's keep cheating off of `push`. Find the > > line > > +where `cmd_push` is registered, copy it, and modify it for `cmd_psuh`, > > placing > > +the new line in alphabetical order. > > For an international audience, it might be better to replace "cheating > off" with its literal meaning. It took me a while to understand that > "cheating off" was meant to evoke a so-called cheat sheet. You're right; I leaned too far towards casual voice here and included a colloquialism. I've modified this to "let's keep mimicking `push`" as I feel it means the same thing, without the slang but with a similar tone. I also considered "copying from `push`" but didn't want to indicate we would be copy-pasting lines of code. If anybody's got a better suggestion for a verb here I'm happy to hear it; "cheating from X" is a phrase I'm trying to stop using anyways :) > > +Go ahead and inspect your new commit with `git show`. "psuh:" indicates you > > +have modified mainly the `psuh` command. The subject line gives readers an > > idea > > +of what you've changed. The sign-off line (`-s`) indicates that you agree > > to > > +the Developer's Certificate of Origin 1.1 (see the > > +`Documentation/SubmittingPatches` +++[[dco]]+++ header). If you wish to > > add some > > +context to your change, go ahead with `git commit --amend`. > > I think the last sentence is confusing - didn't we already add the > context? (And if it's meant more along the lines of "if you want to > change your commit message for whatever reason, use --amend", I don't > think that's necessary here, since we are assuming that the user knows > how to use Git.) I think you're
[PATCH v5 2/2] documentation: add anchors to MyFirstContribution
During the course of review for MyFirstContribution.txt, the suggestion came up to include anchors to make it easier for veteran contributors to link specific sections of this documents to newbies. To make life easier for reviewers, add these anchors in their own commit. See review context here: https://public-inbox.org/git/20190507195938.gd220...@google.com/ AsciiDoc does not support :sectanchors: and the anchors are not discoverable, but they are referenceable. So a link to "foo.com/MyFirstContribution.html#prerequisites" will still work if that file was generated with AsciiDoc. The inclusion of :sectanchors: does not create warnings or errors while compiling directly with `asciidoc -b html5 Documentation/MyFirstContribution.txt` or while compiling with `make doc`. AsciiDoctor does support :sectanchors: and displays a paragraph link on mouseover. When the anchor is included above or inline with a section (as in this change), the link provided points to the custom ID contained within [[]] instead of to an autogenerated ID. Practically speaking, this means we have .../MyFirstContribution.html#summary instead of .../MyFirstContribution.html#_summary. In addition to being prettier, the custom IDs also enable anchor linking to work with asciidoc-generated pages. This change compiles with no warnings using `asciidoctor -b html5 Documentation/MyFirstContribution.txt`. Signed-off-by: Emily Shaffer --- Documentation/MyFirstContribution.txt | 35 +++ 1 file changed, 35 insertions(+) diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt index 0f9a7fed93..a0380d5dc8 100644 --- a/Documentation/MyFirstContribution.txt +++ b/Documentation/MyFirstContribution.txt @@ -1,16 +1,20 @@ My First Contribution to the Git Project +:sectanchors: +[[summary]] == Summary This is a tutorial demonstrating the end-to-end workflow of creating a change to the Git tree, sending it for review, and making changes based on comments. +[[prerequisites]] === Prerequisites This tutorial assumes you're already fairly familiar with using Git to manage source code. The Git workflow steps will largely remain unexplained. +[[related-reading]] === Related Reading This tutorial aims to summarize the following documents, but the reader may find @@ -19,8 +23,10 @@ useful additional context: - `Documentation/SubmittingPatches` - `Documentation/howto/new-command.txt` +[[getting-started]] == Getting Started +[[cloning]] === Clone the Git Repository Git is mirrored in a number of locations. Clone the repository from one of them; @@ -31,6 +37,7 @@ the official mirror on GitHub. $ git clone https://github.com/git/git git +[[identify-problem]] === Identify Problem to Solve @@ -44,6 +51,7 @@ of invocation during users' typical daily workflow. (We've seen some other effort in this space with the implementation of popular commands such as `sl`.) +[[setup-workspace]] === Set Up Your Workspace Let's start by making a development branch to work on our changes. Per @@ -62,11 +70,13 @@ $ git checkout -b psuh origin/master We'll make a number of commits here in order to demonstrate how to send a topic with multiple patches up for review simultaneously. +[[code-it-up]] == Code It Up! NOTE: A reference implementation can be found at https://github.com/nasamuffin/git/tree/psuh. +[[add-new-command]] === Adding a New Command Lots of the subcommands are written as builtins, which means they are @@ -195,6 +205,7 @@ For the remainder of the tutorial, the subject line only will be listed for the sake of brevity. However, fully-fleshed example commit messages are available on the reference implementation linked at the top of this document. +[[implementation]] === Implementation It's probably useful to do at least something besides printing out a string. @@ -359,6 +370,7 @@ about. Neat! Let's commit that as well. $ git commit -sm "psuh: display the top of origin/master" +[[add-documentation]] === Adding Documentation Awesome! You've got a fantastic new command that you're ready to share with the @@ -446,6 +458,7 @@ sees that your command has been implemented as well as documented) by running Go ahead and commit your new documentation change. +[[add-usage]] === Adding Usage Text Try and run `./bin-wrappers/git psuh -h`. Your command should crash at the end. @@ -502,6 +515,7 @@ your command terminated before anything else interesting happens. Great! Go ahead and commit this one, too. +[[testing]] == Testing It's important to test your code - even for a little toy command like this one. @@ -516,11 +530,13 @@ So let's write some tests. Related reading: `t/README` +[[overview-test-structure]] === Overview of Testing Structure The tests in Git live in `t/` and are named with a
[PATCH v5 1/2] documentation: add tutorial for first contribution
This tutorial covers how to add a new command to Git and, in the process, everything from cloning git/git to getting reviewed on the mailing list. It's meant for new contributors to go through interactively, learning the techniques generally used by the git/git development community. Reviewed-by: Johannes Schindelin Reviewed-by: Jonathan Tan Signed-off-by: Emily Shaffer --- Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1075 + 2 files changed, 1076 insertions(+) create mode 100644 Documentation/MyFirstContribution.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 6d738f831e..360c8051e2 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -73,6 +73,7 @@ SP_ARTICLES += howto/maintain-git API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technical/api-index.txt, $(wildcard technical/api-*.txt))) SP_ARTICLES += $(API_DOCS) +TECH_DOCS += MyFirstContribution TECH_DOCS += SubmittingPatches TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt new file mode 100644 index 00..0f9a7fed93 --- /dev/null +++ b/Documentation/MyFirstContribution.txt @@ -0,0 +1,1075 @@ +My First Contribution to the Git Project + + +== Summary + +This is a tutorial demonstrating the end-to-end workflow of creating a change to +the Git tree, sending it for review, and making changes based on comments. + +=== Prerequisites + +This tutorial assumes you're already fairly familiar with using Git to manage +source code. The Git workflow steps will largely remain unexplained. + +=== Related Reading + +This tutorial aims to summarize the following documents, but the reader may find +useful additional context: + +- `Documentation/SubmittingPatches` +- `Documentation/howto/new-command.txt` + +== Getting Started + +=== Clone the Git Repository + +Git is mirrored in a number of locations. Clone the repository from one of them; +https://git-scm.com/downloads suggests one of the best places to clone from is +the official mirror on GitHub. + + +$ git clone https://github.com/git/git git + + +=== Identify Problem to Solve + + +Use + to indicate fixed-width here; couldn't get ` to work nicely with the +quotes around "Pony Saying 'Um, Hello'". + +In this tutorial, we will add a new command, +git psuh+, short for ``Pony Saying +`Um, Hello''' - a feature which has gone unimplemented despite a high frequency +of invocation during users' typical daily workflow. + +(We've seen some other effort in this space with the implementation of popular +commands such as `sl`.) + +=== Set Up Your Workspace + +Let's start by making a development branch to work on our changes. Per +`Documentation/SubmittingPatches`, since a brand new command is a new feature, +it's fine to base your work on `master`. However, in the future for bugfixes, +etc., you should check that document and base it on the appropriate branch. + +For the purposes of this document, we will base all our work on the `master` +branch of the upstream project. Create the `psuh` branch you will use for +development like so: + + +$ git checkout -b psuh origin/master + + +We'll make a number of commits here in order to demonstrate how to send a topic +with multiple patches up for review simultaneously. + +== Code It Up! + +NOTE: A reference implementation can be found at +https://github.com/nasamuffin/git/tree/psuh. + +=== Adding a New Command + +Lots of the subcommands are written as builtins, which means they are +implemented in C and compiled into the main `git` executable. Implementing the +very simple `psuh` command as a built-in will demonstrate the structure of the +codebase, the internal API, and the process of working together as a contributor +with the reviewers and maintainer to integrate this change into the system. + +Built-in subcommands are typically implemented in a function named "cmd_" +followed by the name of the subcommand, in a source file named after the +subcommand and contained within `builtin/`. So it makes sense to implement your +command in `builtin/psuh.c`. Create that file, and within it, write the entry +point for your command in a function matching the style and signature: + + +int cmd_psuh(int argc, const char **argv, const char *prefix) + + +We'll also need to add the declaration of psuh; open up `builtin.h`, find the +declaration for `cmd_push`, and add a new line for `psuh` immediately before it, +in order to keep the declarations sorted: + + +int cmd_psuh(int argc, const char **argv, const char *prefix); + + +Be sure to `#include "builtin.h"` in your `psuh.c`. + +Go ahead and add some throwaway printf to that function. This is a de
[PATCH v5 0/2] documentation: add lab for first contribution
Since v4, I've made one major change and a handful of minor changes. The major change is contained entirely in patch 2/2; patch 1/2 has only nit fixes compared to v4. Major: - I've added anchors to each section. These anchors are custom-named in order to be compatible with asciidoc as well as asciidoctor. I figured it would be easier to review the naming choices if the anchors stood alone from the main introductory patch. Minor: - Alphabetized Documentation/Makefile - Fixup headers - a few renames for clarity, casing, and a missed monospace, based on Phil Hord & Jonathan Tan comments. - Removed 'extern' from the addition to builtin.h - Replace "cheating off" with "mimicking" in a few places - Clarify instructions for writing commit message to remove ambiguity and include requirement for newline after subject - Remove unnecessary explanation of `git commit --amend` - Remove braces to bring code sample into Git style - Typo, word-reversal, and early line break fixes - Add neglected argument to `git send-email` samples to actually include patches Emily Shaffer (2): documentation: add tutorial for first contribution documentation: add anchors to MyFirstContribution Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1110 + 2 files changed, insertions(+) create mode 100644 Documentation/MyFirstContribution.txt -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH v5 1/2] documentation: add tutorial for first contribution
> +Add a line to `#include "config.h"`. Then, add the following bits to the > +function body: > + > + > + const char *cfg_name; > + > +... > + > + git_config(git_default_config, NULL) > + if (git_config_get_string_const("user.name", &cfg_name) > 0) { > + printf(_("No name is found in config\n")); > + } else { > + printf(_("Your name: %s\n"), cfg_name); > + } Just noticed the braces here, too. I have removed them from my local copy and will upload again if I hear other comments warranting a v6. - Emily
Re: [PATCH v5 1/2] documentation: add tutorial for first contribution
On Wed, May 08, 2019 at 12:46:47PM +0900, Junio C Hamano wrote: > Emily Shaffer writes: > > > +=== Clone the Git Repository > > + > > +Git is mirrored in a number of locations. Clone the repository from one of > > them; > > +https://git-scm.com/downloads suggests one of the best places to clone > > from is > > +the official mirror on GitHub. > > I didn't want to have to get into fine-grained wordsmithing (let > alone bikeshedding) this late in the iteration of the topic, but > "the official mirror" is not something anybody suggested in the > recent reviews (JTan's rephrasing, which I already said that I am OK > with, said something like "one of the many mirrors"). I should have waited longer on sending this round of patch out; as it stands I sent the patch with this fix a few hours before your response to JTan came out. Sorry for the confusion. > And "official" is a phrase I have trouble with in this context. A > mirror does not have to be blessed as official; that's the point of > a mirror---anybody can make one to help users with better > connectivity or availability, as long as its users trust the mirror > maintainer for mirror's correctness and freshness. You're right and I'll remove it. However, I've seen at least one instance of confusion over Git's lack of an "official" mirror (over on #git on Freenode). I'd like to rephrase this to explain the reasoning behind having multiple mirrors, none of which are official. To that end, I propose replacing the phrase with "one of the best places to clone from is this mirror on GitHub." followed by the clone command to git/git, then followed by: NOTE: As Git is a distributed version control, the Git project also follows a distributed model. There is no central official mirror; any mirror which you can reasonably trust is being kept correct and fresh to the Git source distributed by the Git maintainer is fine to work from. What do we think? Alternatively, if the desire is to just be done with it, I have no problem with Junio rewording however he feels is best and otherwise applying this patch - if there is value in reducing the churn on the mailing list for this patch. > So perhaps "... clone from is the mirror maintained by GitHub folks" > or just simply "is the mirror on GitHub"? > > > +$ git send-email --to=tar...@example.com > > +--in-reply-to="" > > Very nice attention to the detail here. I like it (the earlier > round did not have dq around the message ID). > > > +psuh/v2* > > + > > All other edits relative to the previous round look very sensible to > me. Thanks.
Re: Proposal: Remembering message IDs sent with git send-email
On Wed, May 08, 2019 at 07:10:13PM -0400, Drew DeVault wrote: > I want to gather some thoughts about this. Say you've written a patch > series and are getting ready to send a -v2. If you set > --in-reply-to=ask, it'll show you a list of emails you've recently sent, > and their subject lines, and ask you to pick one to use the message ID > from. It'll set the In-Reply-To header to your selection. It sounds to me like you mean to call this during `git format-patch` - that is, `git format-patch -v2 --cover-letter --in-reply-to=ask master..branch -o branch/`. That should set the In-Reply-To: header on your cover letter. There's also the possibility that you mean `git send-email --in-reply-to=ask branch/v2*` - in which case I imagine the In-Reply-To: is added as the message is sent, but not added to the cover letter text file. > > I'd also like to add a custom header, X-Patch-Supersedes: , > with a similar behavior & purpose. Is the hope to store the message ID you choose from --in-reply-to=ask into the X-Patch-Supersedes: header? I'm not sure I understand what you're trying to solve; if you use `git format-patch --in-reply-to` it sounds like the X-Patch-Supersedes: and In-Reply-To: would be redundant. Is it possible you mean you want (sorry for pseudocode scribblings) [PATCH v2 1/1]->X-Patch-Supersedes = [PATCH 1/1]->Message-Id ? I think that wouldn't look good in a threaded mail client? > > Thoughts? Or maybe I totally misunderstood :) What I think might be useful (and what I was hoping you were going to talk about when I saw the subject line) would be if the Message-Id is conveniently stored during `git send-email` on v1 and somehow saved in a useful place in order to apply to the In-Reply-To field on v2 automatically upon `git format-patch -v2`. I'll admit I didn't know about --in-reply-to=ask and that helps with the pain point I've experienced sending out v2 before. - Emily
Re: Proposal: Remembering message IDs sent with git send-email
On Thu, May 09, 2019 at 12:50:25PM -0400, Drew DeVault wrote: > On 2019-05-08 5:19 PM, Emily Shaffer wrote: > > What I think might be useful (and what I was hoping you were going to > > talk about when I saw the subject line) would be if the Message-Id is > > conveniently stored during `git send-email` on v1 and somehow saved in a > > useful place in order to apply to the In-Reply-To field on v2 > > automatically upon `git format-patch -v2`. I'll admit I didn't know > > about --in-reply-to=ask and that helps with the pain point I've > > experienced sending out v2 before. > > --in-reply-to=ask doesn't exist, that's what I'm looking to add. This > convenient storage mechanism is exactly what I'm talking about. Sorry > for the confusion. Looking at the documentation, I suppose I hadn't realized before that --thread will generate a Message-Id for your cover letter. It does seem like we could teach --thread to check for the previous patch's cover letter in the directory provided by -o. Of course, this wouldn't work if the author was generating v2 and didn't have the v1 files available (i.e. different workstation or different author picking up the set). I'm still not sure I see the value of the extra header proposed here. I'd appreciate an explanation of how you think it would be used, Drew. I don't know much about emailed workflows outside of Git; is this something likely to be useful to other communities? - Emily
Re: [PATCH] git.c: show usage for accessing the git(1) help page
On Tue, May 14, 2019 at 04:24:50PM +0100, Philip Oakley wrote: > It is not immediately obvious how to use the `git help` system > to show the git(1) page, with all its background and ccordinating > material, such as environment variables. > > Let's simply list it as the last few words of the last usage line. > > Signed-off-by: Philip Oakley > --- > This follows from the discussion > <3cd065d1-9db5-f2e6-ddff-aa539746d...@iee.org> > --- > git.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/git.c b/git.c > index 2324ac0b7e..9a852b09c1 100644 > --- a/git.c > +++ b/git.c > @@ -33,7 +33,7 @@ const char git_usage_string[] = > const char git_more_info_string[] = > N_("'git help -a' and 'git help -g' list available subcommands and > some\n" > "concept guides. See 'git help ' or 'git help '\n" > -"to read about a specific subcommand or concept."); > +"to read about a specific subcommand or concept. Or use 'git help > git'."); I'm not sure the wording makes sense here. It sounds like you're saying, "Or use 'git help git' to read about specific subcommands or concepts." which isn't really what I think you're trying to say. What about, "Or, use 'git help git' for a detailed guide of the Git system as a whole." (I'm still not sure that's quite it - since `git help git` mostly details the flags you can pass to git before invoking a subcommand. But I'm not sure that `git --help` is the place to say that...) > > static int use_pager = -1; > > -- > 2.21.0.windows.1.1517.gbad5f960a3.dirty >
[RFC PATCH] grep: provide sane default to grep_source struct
grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. Signed-off-by: Emily Shaffer --- Hiya, I ran into this issue while I was working on a tutorial on rolling your own revision walk. I didn't want to print all the lines associated with a matching change, but it didn't seem good that a seemingly-sane grep_filter config was segfaulting. I don't know if adding a placeholder name is the right answer (hence RFC patch). Jonathan Nieder proposed alternatively adding some check to grep_source() to ensure that if opt->status_only is unset, gs->name must be non-NULL (and yell about it if not), as well as some extra comments indicating what assumptions are made about the data coming into functions like grep_source(). I'm fine with that as well (although I'm not sure it makes sense semantically to require a name which the user probably can't easily set, or else ban the user from printing LOC during grep). Mostly I'm happy with any solution besides a segfault with no error logging :) Thanks in advance for everyone's thoughts. Emily grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 0d50598acd..fd84454faf 100644 --- a/grep.c +++ b/grep.c @@ -2045,7 +2045,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) struct grep_source gs; int r; - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in memory)"), NULL, NULL); gs.buf = buf; gs.size = size; -- 2.21.0.1020.gf2820cf01a-goog
Re: [RFC PATCH] grep: provide sane default to grep_source struct
On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote: > Hi, > > Emily Shaffer wrote: > > > grep_buffer creates a struct grep_source gs and calls grep_source() > > with it. However, gs.name is null, which eventually produces a > > segmentation fault in > > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > > not set. > > Thanks for catching it. Taking a step back, I think the problem is in > the definition of "struct grep_source": > > struct grep_source { > char *name; > > enum grep_source_type { > GREP_SOURCE_OID, > GREP_SOURCE_FILE, > GREP_SOURCE_BUF, > } type; > void *identifier; > > ... > }; > > What is the difference between a 'name' and an 'identifier'? Who is > responsible for free()ing them? Can they be NULL? This is pretty > underdocumented for a public type. > > If we take the point of view that 'name' should always be non-NULL, > then I wonder: > > - can we document that? > - can grep_source_init enforce that? Today grep_source_init() defaults to NULL. So if we decide that 'name' should be non-NULL it will be somewhat changing the intent. void grep_source_init(struct grep_source *gs, enum grep_source_type type, const char *name, const char *path, const void *identifier) { gs->type = type; gs->name = xstrdup_or_null(name); ... > - can we take advantage of that in grep_source as well, as a sanity > check that the grep_source has been initialized? > - while we're here, can we describe what the field is used for > (prefixing output with context before a ":", I believe)? In general the documentation for grep.[ch] is pretty light. There aren't any header comments and `Documentation/technical/api-grep.txt` is a todo. So I agree that we should document it anywhere we can. > > Jonathan Nieder proposed alternatively adding some check to grep_source() > > to ensure that if opt->status_only is unset, gs->name must be non-NULL > > (and yell about it if not), as well as some extra comments indicating > > what assumptions are made about the data coming into functions like > > grep_source(). I'm fine with that as well (although I'm not sure it > > makes sense semantically to require a name which the user probably can't > > easily set, or else ban the user from printing LOC during grep). Mostly > > I'm happy with any solution besides a segfault with no error logging :) > > Let's compare the two possibilities. > > The advantage of "(in memory)" is that it Just Works, which should > make a nicer development experience with getting a new caller mostly > working on the way to getting them working just the way you want. > > The disadvantage is that if we start outputting that in production, we > and static analyzers are less likely to notice. In other words, > printing "(in memory)" is leaking details to the end user that do not > match what the end user asked for. NULL would instead produce a > crash, prompting the author of the caller to fix it. > > What was particularly pernicious about this example is that the NULL > dereference only occurs if the grep has a match. So I suppose I'm > leaning toward (in addition to adding some comments to document the > struct) adding a check like > > if (!gs->name && !opt->status_only) > BUG("grep calls that could print name require name"); > > to grep_source. Why not both? :) But seriously, I am planning to push a second patch with both, per Junio's reply. I'll consider the documentation out of scope for now since I'm not sure I know enough about grep.[ch]'s intent or history to document anything yet. :) > > That would also sidestep the question of whether this debugging aid > should be translated. :) > > Sensible? > > Thanks, > Jonathan
[PATCH v2] grep: provide sane default to grep_source struct
grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. To futureproof, give developers a heads-up if grep_source() is called and a valid name field is expected but not given. This path should be unreachable now, but in the future may become reachable if developers want to add the ability to use grep_buffer() in prepared in-core data and provide hints to the user about where the data came from. Signed-off-by: Emily Shaffer --- I've added the BUG() line to grep_source(). I considered adding it to grep_source_1() but didn't see much difference. Looks like grep_source_1() exists purely as a helper to grep_source() and is never called by anybody else... but it's the function where the crash would happen. So I'm not sure. I also modified the wording for the BUG("") statement a little from jrn's suggestion to hopefully make it more explicit, but I welcome wordsmithing. grep.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 0d50598acd..6ceb880a8c 100644 --- a/grep.c +++ b/grep.c @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x) int grep_source(struct grep_opt *opt, struct grep_source *gs) { + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); /* * we do not have to do the two-pass grep when we do not check * buffer-wide "all-match". @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) struct grep_source gs; int r; - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); + /* TODO: In the future it may become desirable to pass in the name as +* an argument to grep_buffer(). At that time, "(in core)" should be +* replaced. +*/ + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); gs.buf = buf; gs.size = size; -- 2.21.0.1020.gf2820cf01a-goog
[PATCH v6 0/2] documentation: add tutorial for first contribution
Only minor changes since v5. Removed 'official' from the blurb about the Github mirror, and removed some spurious braces around oneline code branches. Emily Shaffer (2): documentation: add tutorial for first contribution documentation: add anchors to MyFirstContribution Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1109 + 2 files changed, 1110 insertions(+) create mode 100644 Documentation/MyFirstContribution.txt -- 2.21.0.1020.gf2820cf01a-goog
[PATCH v6 1/2] documentation: add tutorial for first contribution
This tutorial covers how to add a new command to Git and, in the process, everything from cloning git/git to getting reviewed on the mailing list. It's meant for new contributors to go through interactively, learning the techniques generally used by the git/git development community. Reviewed-by: Johannes Schindelin Reviewed-by: Jonathan Tan Signed-off-by: Emily Shaffer --- Documentation/Makefile|1 + Documentation/MyFirstContribution.txt | 1074 + 2 files changed, 1075 insertions(+) create mode 100644 Documentation/MyFirstContribution.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index dbf5a0f276..76f2ecfc1b 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -76,6 +76,7 @@ SP_ARTICLES += howto/maintain-git API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technical/api-index.txt, $(wildcard technical/api-*.txt))) SP_ARTICLES += $(API_DOCS) +TECH_DOCS += MyFirstContribution TECH_DOCS += SubmittingPatches TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt new file mode 100644 index 00..bc267c4931 --- /dev/null +++ b/Documentation/MyFirstContribution.txt @@ -0,0 +1,1074 @@ +My First Contribution to the Git Project + + +== Summary + +This is a tutorial demonstrating the end-to-end workflow of creating a change to +the Git tree, sending it for review, and making changes based on comments. + +=== Prerequisites + +This tutorial assumes you're already fairly familiar with using Git to manage +source code. The Git workflow steps will largely remain unexplained. + +=== Related Reading + +This tutorial aims to summarize the following documents, but the reader may find +useful additional context: + +- `Documentation/SubmittingPatches` +- `Documentation/howto/new-command.txt` + +== Getting Started + +=== Clone the Git Repository + +Git is mirrored in a number of locations. Clone the repository from one of them; +https://git-scm.com/downloads suggests one of the best places to clone from is +the mirror on GitHub. + + +$ git clone https://github.com/git/git git + + +=== Identify Problem to Solve + + +Use + to indicate fixed-width here; couldn't get ` to work nicely with the +quotes around "Pony Saying 'Um, Hello'". + +In this tutorial, we will add a new command, +git psuh+, short for ``Pony Saying +`Um, Hello''' - a feature which has gone unimplemented despite a high frequency +of invocation during users' typical daily workflow. + +(We've seen some other effort in this space with the implementation of popular +commands such as `sl`.) + +=== Set Up Your Workspace + +Let's start by making a development branch to work on our changes. Per +`Documentation/SubmittingPatches`, since a brand new command is a new feature, +it's fine to base your work on `master`. However, in the future for bugfixes, +etc., you should check that document and base it on the appropriate branch. + +For the purposes of this document, we will base all our work on the `master` +branch of the upstream project. Create the `psuh` branch you will use for +development like so: + + +$ git checkout -b psuh origin/master + + +We'll make a number of commits here in order to demonstrate how to send a topic +with multiple patches up for review simultaneously. + +== Code It Up! + +NOTE: A reference implementation can be found at +https://github.com/nasamuffin/git/tree/psuh. + +=== Adding a New Command + +Lots of the subcommands are written as builtins, which means they are +implemented in C and compiled into the main `git` executable. Implementing the +very simple `psuh` command as a built-in will demonstrate the structure of the +codebase, the internal API, and the process of working together as a contributor +with the reviewers and maintainer to integrate this change into the system. + +Built-in subcommands are typically implemented in a function named "cmd_" +followed by the name of the subcommand, in a source file named after the +subcommand and contained within `builtin/`. So it makes sense to implement your +command in `builtin/psuh.c`. Create that file, and within it, write the entry +point for your command in a function matching the style and signature: + + +int cmd_psuh(int argc, const char **argv, const char *prefix) + + +We'll also need to add the declaration of psuh; open up `builtin.h`, find the +declaration for `cmd_push`, and add a new line for `psuh` immediately before it, +in order to keep the declarations sorted: + + +int cmd_psuh(int argc, const char **argv, const char *prefix); + + +Be sure to `#include "builtin.h"` in your `psuh.c`. + +Go ahead and add some throwaway printf to that function. This is a decent +sta
[PATCH v6 2/2] documentation: add anchors to MyFirstContribution
During the course of review for MyFirstContribution.txt, the suggestion came up to include anchors to make it easier for veteran contributors to link specific sections of this documents to newbies. To make life easier for reviewers, add these anchors in their own commit. See review context here: https://public-inbox.org/git/20190507195938.gd220...@google.com/ AsciiDoc does not support :sectanchors: and the anchors are not discoverable, but they are referenceable. So a link to "foo.com/MyFirstContribution.html#prerequisites" will still work if that file was generated with AsciiDoc. The inclusion of :sectanchors: does not create warnings or errors while compiling directly with `asciidoc -b html5 Documentation/MyFirstContribution.txt` or while compiling with `make doc`. AsciiDoctor does support :sectanchors: and displays a paragraph link on mouseover. When the anchor is included above or inline with a section (as in this change), the link provided points to the custom ID contained within [[]] instead of to an autogenerated ID. Practically speaking, this means we have .../MyFirstContribution.html#summary instead of .../MyFirstContribution.html#_summary. In addition to being prettier, the custom IDs also enable anchor linking to work with asciidoc-generated pages. This change compiles with no warnings using `asciidoctor -b html5 Documentation/MyFirstContribution.txt`. Signed-off-by: Emily Shaffer --- Documentation/MyFirstContribution.txt | 35 +++ 1 file changed, 35 insertions(+) diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt index bc267c4931..274df8575b 100644 --- a/Documentation/MyFirstContribution.txt +++ b/Documentation/MyFirstContribution.txt @@ -1,16 +1,20 @@ My First Contribution to the Git Project +:sectanchors: +[[summary]] == Summary This is a tutorial demonstrating the end-to-end workflow of creating a change to the Git tree, sending it for review, and making changes based on comments. +[[prerequisites]] === Prerequisites This tutorial assumes you're already fairly familiar with using Git to manage source code. The Git workflow steps will largely remain unexplained. +[[related-reading]] === Related Reading This tutorial aims to summarize the following documents, but the reader may find @@ -19,8 +23,10 @@ useful additional context: - `Documentation/SubmittingPatches` - `Documentation/howto/new-command.txt` +[[getting-started]] == Getting Started +[[cloning]] === Clone the Git Repository Git is mirrored in a number of locations. Clone the repository from one of them; @@ -31,6 +37,7 @@ the mirror on GitHub. $ git clone https://github.com/git/git git +[[identify-problem]] === Identify Problem to Solve @@ -44,6 +51,7 @@ of invocation during users' typical daily workflow. (We've seen some other effort in this space with the implementation of popular commands such as `sl`.) +[[setup-workspace]] === Set Up Your Workspace Let's start by making a development branch to work on our changes. Per @@ -62,11 +70,13 @@ $ git checkout -b psuh origin/master We'll make a number of commits here in order to demonstrate how to send a topic with multiple patches up for review simultaneously. +[[code-it-up]] == Code It Up! NOTE: A reference implementation can be found at https://github.com/nasamuffin/git/tree/psuh. +[[add-new-command]] === Adding a New Command Lots of the subcommands are written as builtins, which means they are @@ -195,6 +205,7 @@ For the remainder of the tutorial, the subject line only will be listed for the sake of brevity. However, fully-fleshed example commit messages are available on the reference implementation linked at the top of this document. +[[implementation]] === Implementation It's probably useful to do at least something besides printing out a string. @@ -358,6 +369,7 @@ about. Neat! Let's commit that as well. $ git commit -sm "psuh: display the top of origin/master" +[[add-documentation]] === Adding Documentation Awesome! You've got a fantastic new command that you're ready to share with the @@ -445,6 +457,7 @@ sees that your command has been implemented as well as documented) by running Go ahead and commit your new documentation change. +[[add-usage]] === Adding Usage Text Try and run `./bin-wrappers/git psuh -h`. Your command should crash at the end. @@ -501,6 +514,7 @@ your command terminated before anything else interesting happens. Great! Go ahead and commit this one, too. +[[testing]] == Testing It's important to test your code - even for a little toy command like this one. @@ -515,11 +529,13 @@ So let's write some tests. Related reading: `t/README` +[[overview-test-structure]] === Overview of Testing Structure The tests in Git live in `t/` and are named with a 4-dig
[PATCH] doc: hint about GIT_DEBUGGER
We check for a handy environment variable GIT_DEBUGGER when running via bin-wrappers/, but this feature is undocumented. Add a hint to how to use it into the CodingGuidelines (which is where other useful environment settings like DEVELOPER are documented). It looks like you can use GIT_DEBUGGER to pick gdb by default, or you can hand it your own debugger if you like to use something else (or if you want custom flags for gdb). Hopefully document that intent within CodingGuidelines. Signed-off-by: Emily Shaffer --- Maybe this isn't the right place for this patch. But right now git grep reveals that GIT_DEBUGGER is completely undocumented. Alternatively, it might make sense to only add a short blurb about using GIT_DEBUGGER flag to CodingGuidelines and then documenting how to use it inside of wrap-for-bin.sh. Documentation/CodingGuidelines | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 32210a4386..e17cd75b50 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -412,6 +412,11 @@ For C programs: must be declared with "extern" in header files. However, function declarations should not use "extern", as that is already the default. + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or + run `GIT_DEBUGGER=debugger-binary some-args ./bin-wrappers/git foo` to + use your own debugger and arguments. (See `wrap-for-bin.sh`.) + For Perl programs: - Most of the C guidelines above apply. -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH] doc: hint about GIT_DEBUGGER
[snip] > > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. > > + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or > > + run `GIT_DEBUGGER=debugger-binary some-args ./bin-wrappers/git foo` to > > Missing some quotes around debugger-binary and some-args: > + run `GIT_DEBUGGER="debugger-binary some-args" ./bin-wrappers/git foo` to > > Also, one thing I always wonder about with command documentation like > this is whether people will understand that "debugger-binary", > "some-args", and "foo" are just placeholders rather than literal text > -- and that everything else is literal text and not meant to be > placeholders. Does it make since to include a couple examples, or > perhaps modify the text somehow to avoid confusion between > placeholders and literals, or maybe just tell me I overthinking this? > (I've been bit by similar problems in other contexts, so I'm just > flagging it for you to consider). It's a good point. I like to use placeholders that don't sound like a real command (and failed a little here), for example, `GIT_DEBUGGER=my-cool-debugger some-various-args`. It can be a challenge to choose a placeholder that sounds fake but also doesn't sound too informal (the above feels informal to me). I think the best way is to show an example, that's a good idea. I'll come up with one and send another round this week. Thanks a bunch for having a look. - Emily > > Elijah
[PATCH v2] doc: hint about GIT_DEBUGGER
We check for a handy environment variable GIT_DEBUGGER when running via bin-wrappers/, but this feature is undocumented. Add a hint to how to use it into the CodingGuidelines (which is where other useful environment settings like DEVELOPER are documented). It looks like you can use GIT_DEBUGGER to pick gdb by default, or you can hand it your own debugger if you like to use something else (or if you want custom flags for gdb). Hopefully document that intent within CodingGuidelines. Signed-off-by: Emily Shaffer --- Documentation/CodingGuidelines | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 32210a4386..e99af36df9 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -412,6 +412,12 @@ For C programs: must be declared with "extern" in header files. However, function declarations should not use "extern", as that is already the default. + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or + run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) + For Perl programs: - Most of the C guidelines above apply. -- 2.21.0.1020.gf2820cf01a-goog
Re: [PATCH v2] doc: hint about GIT_DEBUGGER
On Tue, May 21, 2019 at 09:06:18AM -0700, Elijah Newren wrote: > On Mon, May 20, 2019 at 6:01 PM Emily Shaffer wrote: > > > > We check for a handy environment variable GIT_DEBUGGER when running via > > bin-wrappers/, but this feature is undocumented. Add a hint to how to > > use it into the CodingGuidelines (which is where other useful > > environment settings like DEVELOPER are documented). > > > > Two very minor nits: > > > It looks like you can use GIT_DEBUGGER to pick gdb by default, or you > > I think it'd sound better without 'It looks like'; perhaps drop that part? > > > can hand it your own debugger if you like to use something else (or if > > you want custom flags for gdb). Hopefully document that intent within > > CodingGuidelines. > > Maybe just leave out 'Hopefully'? > > > > > Signed-off-by: Emily Shaffer > > --- > > Documentation/CodingGuidelines | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines > > index 32210a4386..e99af36df9 100644 > > --- a/Documentation/CodingGuidelines > > +++ b/Documentation/CodingGuidelines > > @@ -412,6 +412,12 @@ For C programs: > > must be declared with "extern" in header files. However, function > > declarations should not use "extern", as that is already the default. > > > > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. > > + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or > > + run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to > > + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" > > + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) > > + > > Other than the minor nits above: > Acked-by: Elijah Newren Made both the changes you recommended; I'll hang onto it locally until this afternoon in case anybody else has something to say. Thanks for looking, Elijah. - Emily
Re: [PATCH v2] grep: provide sane default to grep_source struct
On Thu, May 16, 2019 at 06:02:54PM -0400, Jeff King wrote: > On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote: > > > grep_buffer creates a struct grep_source gs and calls grep_source() > > with it. However, gs.name is null, which eventually produces a > > segmentation fault in > > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is > > not set. > > Funny wrapping here. > > > This seems to be unreachable from existing commands but is reachable in > > the event that someone rolls their own revision walk and neglects to set > > rev_info->grep_filter->status_only. Conceivably, someone might want to > > print all changed lines of all commits which match some regex. > > > > To futureproof, give developers a heads-up if grep_source() is called > > and a valid name field is expected but not given. This path should be > > unreachable now, but in the future may become reachable if developers > > want to add the ability to use grep_buffer() in prepared in-core data > > and provide hints to the user about where the data came from. > > So I guess this is probably my fault, as I was the one who factored out > the grep_source bits long ago (though I would also not be surprised if > it could be triggered before, too). > > I think adding a BUG() is the right thing to do. > > > I've added the BUG() line to grep_source(). I considered adding it to > > grep_source_1() but didn't see much difference. Looks like > > grep_source_1() exists purely as a helper to grep_source() and is never > > called by anybody else... but it's the function where the crash would > > happen. So I'm not sure. > > I agree it probably doesn't matter. I'd probably stick at at the top of > grep_source_1(), just with the rationale that it could possibly catch > more cases if somebody ever refactors grep_source() to have two > different callers (e.g., imagine we later add a variant that takes more > options, but leave the old one to avoid disrupting existing callers). I think this combined with needing a history lesson to know not to call grep_source_1() makes a pretty strong case for moving it. I'll put the BUG() line at the top of grep_source_1() instead. > > > I also modified the wording for the BUG("") statement a little from > > jrn's suggestion to hopefully make it more explicit, but I welcome > > wordsmithing. > > [...] > > + BUG("grep call which could print a name requires " > > + "grep_source.name be non-NULL"); > > It looks fine to me. The point is that nobody should ever see this. And > if they do, we should already get a file/line number telling us where > the problem is (and a coredump to poke at). So as long as it's enough to > convince a regular user that they should make a report to the mailing > list, it will have done its job. > > > diff --git a/grep.c b/grep.c > > index 0d50598acd..6ceb880a8c 100644 > > --- a/grep.c > > +++ b/grep.c > > @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x) > > > > int grep_source(struct grep_opt *opt, struct grep_source *gs) > > { > > + if (!opt->status_only && gs->name == NULL) > > + BUG("grep call which could print a name requires " > > + "grep_source.name be non-NULL"); > > /* > > * we do not have to do the two-pass grep when we do not check > > * buffer-wide "all-match". > > Minor nit, but can we put a blank line between the new block and the > comment? I made sure to include blank lines surrounding this block in its new home. > > > @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, > > unsigned long size) > > struct grep_source gs; > > int r; > > > > - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL); > > + /* TODO: In the future it may become desirable to pass in the name as > > +* an argument to grep_buffer(). At that time, "(in core)" should be > > +* replaced. > > +*/ > > + grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL); > > Hmm. I don't see much point in this one, as it would just avoid > triggering our BUG(). If somebody is adding new grep_buffer() callers > that don't use status_only, wouldn't we want them to see the BUG() to > know that they need to refactor grep_buffer() to provide a name? For me, I'd rather have *some* functionality available without doing a large refactor. The line now isn't
[PATCH v3] grep: provide sane default to grep_source struct
grep_buffer creates a struct grep_source gs and calls grep_source() with it. However, gs.name is null, which eventually produces a segmentation fault in grep_source()->grep_source_1()->show_line() when grep_opt.status_only is not set. This seems to be unreachable from existing commands but is reachable in the event that someone rolls their own revision walk and neglects to set rev_info->grep_filter->status_only. Conceivably, someone might want to print all changed lines of all commits which match some regex. To futureproof, give developers a heads-up if grep_source() is called and a valid name field is expected but not given. This path should be unreachable now, but in the future may become reachable if developers want to add the ability to use grep_buffer() in prepared in-core data and provide hints to the user about where the data came from. Signed-off-by: Emily Shaffer --- I think Peff and Jonathan are right. If someone does want to hack away on something in the very early stages, the BUG() line gives a pretty clear indicator of what to modify to make it work. I also moved the BUG() to grep_source_1() to keep it close to the error itself and reflowed the commit message. - Emily grep.c | 4 1 file changed, 4 insertions(+) diff --git a/grep.c b/grep.c index 0d50598acd..f7c3a5803e 100644 --- a/grep.c +++ b/grep.c @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle enum grep_context ctx = GREP_CONTEXT_HEAD; xdemitconf_t xecfg; + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); + if (!opt->output) opt->output = std_output; -- 2.21.0.1020.gf2820cf01a-goog
[PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines
We check for a handy environment variable GIT_DEBUGGER when running via bin-wrappers/, but this feature is undocumented. Add a hint to how to use it into the CodingGuidelines (which is where other useful environment settings like DEVELOPER are documented). You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your own debugger if you like to use something else (or if you want custom flags for gdb). This commit documents that intent within CodingGuidelines. Signed-off-by: Emily Shaffer --- Since v2, only the commit message has changed. Removed some weak language to make the commit message read more easily. Documentation/CodingGuidelines | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 32210a4386..e99af36df9 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -412,6 +412,12 @@ For C programs: must be declared with "extern" in header files. However, function declarations should not use "extern", as that is already the default. + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or + run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) + For Perl programs: - Most of the C guidelines above apply. -- 2.21.0.1020.gf2820cf01a-goog
[PATCH v4] grep: fail if call could output and name is null
grep_source(), which performs much of the work for Git's grep library, allows passing an arbitrary struct grep_source which represents the text which grep_source() should search to match a pattern in the provided struct grep_opt. In most callers, the grep_source::name field is set to an appropriate prefix to print before a colon when a result matches: README:Git is an Open Source project covered by the GNU General One caller, grep_buffer(), leaves the grep_source::name field set to NULL because there isn't enough context to determine an appropriate name for this kind of output line. In practice, this has been fine: the only caller of grep_buffer() is "git log --grep", and that caller sets grep_opt::status_only, which disables output and only checks whether a match exists. But this is brittle: a future caller can call grep_buffer() without grep_opt::status_only set, and as soon as it hits a match, grep_source() will try to print the match and segfault: (null):Git is an Open Source project covered by the GNU General For example, a future caller might want to print all matching lines from commits which match a regex. Futureproof by diagnosing early a use of the API that could trigger that condition, before we know whether the pattern matches: BUG: grep.c:1783: grep call which could print a name requires grep_source.name be non-NULL Aborted This way, the caller's author gets an indication of how to fix the issue - by providing grep_source::name or setting grep_opt::status_only - and they are warned of the potential for a segfault unconditionally, rather than only if there is a match. Noticed while adding such a call to a tutorial on revision walks. Signed-off-by: Emily Shaffer Reviewed-by: Jonathan Nieder --- Since v3, only the commit message has changed. Reworked based on Jonathan Nieder's suggestions (with some modifications for readability). - Emily grep.c | 4 1 file changed, 4 insertions(+) diff --git a/grep.c b/grep.c index 0d50598acd..f7c3a5803e 100644 --- a/grep.c +++ b/grep.c @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle enum grep_context ctx = GREP_CONTEXT_HEAD; xdemitconf_t xecfg; + if (!opt->status_only && gs->name == NULL) + BUG("grep call which could print a name requires " + "grep_source.name be non-NULL"); + if (!opt->output) opt->output = std_output; -- 2.22.0.rc1.257.g3120a18244-goog
Re: [PATCH v1 1/5] list-objects-filter: refactor into a context struct
On Tue, May 21, 2019 at 05:21:50PM -0700, Matthew DeVore wrote: > The next patch will create and manage filters in a new way, which means > that this bundle of data will have to be managed at a new callsite. Make > this bundle of data more manageable by putting it in a struct and > making it part of the list-objects-filter module's API. This commit message might read more easily on its own if you define "this bundle of data" at least once. Since there are things being moved from both list-objects-filter.c (filter_blobs_none_data) and list-objects-filter.h (list_objects_filter_result and filter_free_fn) into the new struct in list-objects-filter.h, it's not immediately clear to me from the diff what's going on. [snip] > -static void *filter_blobs_none__init( > - struct oidset *omitted, > +static void filter_blobs_none__init( > struct list_objects_filter_options *filter_options, > - filter_object_fn *filter_fn, > - filter_free_fn *filter_free_fn) > + struct filter_context *ctx) > { > - struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d)); > - d->omits = omitted; > - > - *filter_fn = filter_blobs_none; > - *filter_free_fn = free; > - return d; > + ctx->filter_fn = filter_blobs_none; I think you want to set ctx->free_fn here too, right? It seems like you're not setting ctx->omitted anymore because you'd be reading that information in from ctx->omitted (so it's redundant). > } [snip] > -/* > - * A filter for list-objects to omit large blobs. > - * And to OPTIONALLY collect a list of the omitted OIDs. > - */ > +/* A filter for list-objects to omit large blobs. */ > struct filter_blobs_limit_data { > - struct oidset *omits; > unsigned long max_bytes; I suppose I don't have a good enough grasp of the architecture here to follow why you want to move 'omits' but not 'max_bytes' into the new struct. Maybe it will become clear as I look at the rest of your patches :) > }; Most of this patch looks like a pretty straightforward conversion. Thanks. - Emily
Re: [PATCH v1 2/5] list-objects-filter-options: error is localizeable
On Tue, May 21, 2019 at 05:21:51PM -0700, Matthew DeVore wrote: > The "invalid filter-spec" message is user-facing and not a BUG, so make > it localizeable. What does it look like? Is it human-readable in its current form? I ask because (without having looked ahead) I think you're going to move it from a BUG to a user-facing error in a later patchset (which I don't think the commit message reflects well). If I'm wrong, and today it's a possibly user-facing string, then I think this patch could stand on its own outside of this patchset, which would probably save you some effort keeping a simple change in limbo for a long time. But if I'm right, I think the commit message could use some work. - Emily > > Signed-off-by: Matthew DeVore > --- > list-objects-filter-options.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index c0036f7378..e46ea467bc 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -81,21 +81,21 @@ static int gently_parse_list_objects_filter( > filter_options->choice = LOFC_SPARSE_PATH; > filter_options->sparse_path_value = strdup(v0); > return 0; > } > /* >* Please update _git_fetch() in git-completion.bash when you >* add new filters >*/ > > if (errbuf) > - strbuf_addf(errbuf, "invalid filter-spec '%s'", arg); > + strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg); > > memset(filter_options, 0, sizeof(*filter_options)); > return 1; > } > > int parse_list_objects_filter(struct list_objects_filter_options > *filter_options, > const char *arg) > { > struct strbuf buf = STRBUF_INIT; > if (gently_parse_list_objects_filter(filter_options, arg, &buf)) > -- > 2.21.0 >
Re: [PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines
On Thu, May 23, 2019 at 06:09:17AM -0400, Eric Sunshine wrote: > On Wed, May 22, 2019 at 8:56 PM Emily Shaffer wrote: > > We check for a handy environment variable GIT_DEBUGGER when running via > > bin-wrappers/, but this feature is undocumented. Add a hint to how to > > use it into the CodingGuidelines (which is where other useful > > environment settings like DEVELOPER are documented). > > > > You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your > > own debugger if you like to use something else (or if you want custom > > flags for gdb). This commit documents that intent within > > CodingGuidelines. > > This last sentence is repeating what is already stated in the first > paragraph, thus doesn't seem to add value. In fact, the remainder of > the second paragraph seems to be repeating what is in the patch > proper, thus could likely be dropped. Yes, you're right. Dropped. > > > Signed-off-by: Emily Shaffer > > --- > > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines > > @@ -412,6 +412,12 @@ For C programs: > > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. > > + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or > > + run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to > > Don't you need to bind my-debugger-binary and my-args together with > shell quotes? Also, placeholders like these are often ensconced in > angle brackets, so perhaps: > > ... `GIT_DEBUGGER=" " ./bin-wrappers/git ... Fixed. I did get the dq in the example but missed it with the placeholders. :) Thanks, good catch. > > + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" > > + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) Patch update to follow. Thanks. - Emily
[PATCH v4] doc: hint about GIT_DEBUGGER in CodingGuidelines
We check for a handy environment variable GIT_DEBUGGER when running via bin-wrappers/, but this feature is undocumented. Add a hint to how to use it into the CodingGuidelines (which is where other useful environment settings like DEVELOPER are documented). Signed-off-by: Emily Shaffer --- Documentation/CodingGuidelines | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 32210a4386..1169ff6c8e 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -412,6 +412,12 @@ For C programs: must be declared with "extern" in header files. However, function declarations should not use "extern", as that is already the default. + - You can launch gdb around your program using the shorthand GIT_DEBUGGER. + Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or + run `GIT_DEBUGGER=" " ./bin-wrappers/git foo` to + use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb" + ./bin-wrappers/git log` (See `wrap-for-bin.sh`.) + For Perl programs: - Most of the C guidelines above apply. -- 2.22.0.rc1.257.g3120a18244-goog
Re: [PATCH v1 3/5] list-objects-filter: implement composite filters
On Tue, May 21, 2019 at 05:21:52PM -0700, Matthew DeVore wrote: > Allow combining filters such that only objects accepted by all filters > are shown. The motivation for this is to allow getting directory > listings without also fetching blobs. This can be done by combining > blob:none with tree:. There are massive repositories that have > larger-than-expected trees - even if you include only a single commit. > > The current usage requires passing the filter to rev-list, or sending > it over the wire, as: > > combine:+ > > (i.e.: git rev-list --filter=combine:tree:2+blob:limit=32k). This is > potentially awkward because individual filters must be URL-encoded if > they contain + or %. This can potentially be improved by supporting a > repeated flag syntax, e.g.: > > $ git rev-list --filter=tree:2 --filter=blob:limit=32k > > Such usage is currently an error, so giving it a meaning is backwards- > compatible. > > Signed-off-by: Matthew DeVore > --- > Documentation/rev-list-options.txt | 12 ++ > contrib/completion/git-completion.bash | 2 +- > list-objects-filter-options.c | 161 - > list-objects-filter-options.h | 14 ++- > list-objects-filter.c | 114 + > t/t6112-rev-list-filters-objects.sh| 159 +++- > 6 files changed, 455 insertions(+), 7 deletions(-) > > diff --git a/Documentation/rev-list-options.txt > b/Documentation/rev-list-options.txt > index ddbc1de43f..4fb0c4fbb0 100644 > --- a/Documentation/rev-list-options.txt > +++ b/Documentation/rev-list-options.txt > @@ -730,20 +730,32 @@ specification contained in . > + > The form '--filter=tree:' omits all blobs and trees whose depth > from the root tree is >= (minimum depth if an object is located > at multiple depths in the commits traversed). =0 will not include > any trees or blobs unless included explicitly in the command-line (or > standard input when --stdin is used). =1 will include only the > tree and blobs which are referenced directly by a commit reachable from > or an explicitly-given object. =2 is like =1 > while also including trees and blobs one more level removed from an > explicitly-given commit or tree. > ++ > +The form '--filter=combine:++...' combines > +several filters. Only objects which are accepted by every filter are > +included. Filters are joined by '{plus}' and individual filters are %-encoded > +(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following > +characters are reserved and also must be encoded: > +`~!@#$^&*()[]{}\;",<>?`+'`+ as well as all characters with ASCII code > +<= `0x20`, which includes space and newline. > ++ > +Other arbitrary characters can also be encoded. For instance, > +'combine:tree:3+blob:none' and 'combine:tree%3A2+blob%3Anone' are > +equivalent. > > --no-filter:: > Turn off any previous `--filter=` argument. > > --filter-print-omitted:: > Only useful with `--filter=`; prints a list of the objects omitted > by the filter. Object IDs are prefixed with a ``~'' character. > > --missing=:: > A debug option to help with future "partial clone" development. > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index 3eefbabdb1..0fd0a10d0c 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -1529,21 +1529,21 @@ _git_difftool () > __git_fetch_recurse_submodules="yes on-demand no" > > _git_fetch () > { > case "$cur" in > --recurse-submodules=*) > __gitcomp "$__git_fetch_recurse_submodules" "" > "${cur##--recurse-submodules=}" > return > ;; > --filter=*) > - __gitcomp "blob:none blob:limit= sparse:oid= sparse:path=" "" > "${cur##--filter=}" > + __gitcomp "blob:none blob:limit= sparse:oid= sparse:path= > combine: tree:" "" "${cur##--filter=}" > return > ;; > --*) > __gitcomp_builtin fetch > return > ;; > esac > __git_complete_remote_or_refspec > } > > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index e46ea467bc..d7a1516188 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -1,19 +1,24 @@ > #include "cache.h" > #include "commit.h" > #include "config.h" > #include "revision.h" > #include "argv-array.h" > #include "list-objects.h" > #include "list-objects-filter.h" > #include "list-objects-filter-options.h" > > +static int parse_combine_filter( > + struct list_objects_filter_options *filter_options, > + const char *arg, > + struct strbuf *errbuf); > + > /* > * Parse value of the argument to the "filter" keyword. > * On the command line this looks like: > * --filter= > * and in the pack protocol as: > * "filter" SP > * > * The filter
Re: [PATCH v6 1/2] documentation: add tutorial for first contribution
On Sun, May 26, 2019 at 09:48:26AM +0200, Christian Couder wrote: > On Fri, May 17, 2019 at 11:48 PM Emily Shaffer > wrote: > > > > This tutorial covers how to add a new command to Git and, in the > > process, everything from cloning git/git to getting reviewed on the > > mailing list. It's meant for new contributors to go through > > interactively, learning the techniques generally used by the git/git > > development community. > > Very nice, thanks! I tried it and I liked it very much. > > I noted a few nits that might help improve it a bit. > > > + > > +$ git clone https://github.com/git/git git > > Nit: maybe add "$ cd git" after that. Sure, done. > > > +Check it out! You've got a command! Nice work! Let's commit this. > > + > > + > > +$ git add Makefile builtin.h builtin/psuh.c git.c > > +$ git commit -s > > + > > Nit: when building a "git-psuh" binary is created at the root of the > repo which will pollute the `git status` output. The usual way we deal > with that is by adding "/git-psuh" to the ".gitignore" at the root of > the repo. Right you are - good catch. I'll add a paragraph about adding to the gitignore. > > > +=== Implementation > > + > > +It's probably useful to do at least something besides printing out a > > string. > > +Let's start by having a look at everything we get. > > + > > +Modify your `cmd_psuh` implementation to dump the args you're passed: > > Nit: maybe it could be a bit more clear that the previous printf() > call should be kept as is, otherwise the test added later will fail. Done. > > > + > > + const char *cfg_name; > > + > > +... > > + > > + git_config(git_default_config, NULL) > > Nit: a ";" is missing at the end of the above line. Yikes, done. > > > +Let's commit this as well. > > + > > + > > +$ git commit -sm "psuh: print the current branch" > > Nit: maybe add "builtin/psuh.c" at the end of the above line, so that > a `git add builtin/psuh.c` is not needed. This is purely personal preference, but I prefer manually adding files first. I didn't add any indication about staging the changes to psuh.c though, so I'm adding a line to `git add builtin/psuh.c`. I found one other place where the commit line was shown without the add line, so I included the add there too. > > > + > > +git-psuh(1) > > +=== > > + > > +NAME > > + > > +git-psuh - Delight users' typo with a shy horse > > + > > + > > +SYNOPSIS > > + > > +[verse] > > +'git-psuh' > > + > > +DESCRIPTION > > +--- > > +... > > + > > +OPTIONS[[OPTIONS]] > > +-- > > +... > > + > > +OUTPUT > > +-- > > +... > > + > > + > > Nit: it seems that the above newline could be removed. Sure, why not. > > Thanks, > Christian. Thanks for trying it out and for your thorough review, Christian. I appreciate it! Since this is checked into next already, I'll be sending a follow-on patch in reply to my last version in this thread which is based on the tip of next. - Emily
[PATCH] doc: add some nit fixes to MyFirstContribution
A trial run-through of the tutorial revealed a few typos and missing commands in the tutorial itself. This commit fixes typos, clarifies which lines to keep or modify in some places, and adds a section on putting the git-psuh binary into the gitignore. Signed-off-by: Emily Shaffer --- This patch is based on next, as the doc hasn't made it to master yet. - Missing `cd git` after cloning the repo - Documented how to add git-psuh to the gitignore - Documented the need to leave prior printfs in place during the tutorial - Typos: missing ;, stray newline - Missing `git add builtin/psuh.c` in a couple of places; this could also have been done by adding the filename to the end of the commit call, but I don't think that's a good habit (as opposed to staging all changes, inspecting the wt state, and then committing). Open for debate. Big thanks to Christian for the trial run and review. - Emily Documentation/MyFirstContribution.txt | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt index 274df8575b..895b7cfd4f 100644 --- a/Documentation/MyFirstContribution.txt +++ b/Documentation/MyFirstContribution.txt @@ -35,6 +35,7 @@ the mirror on GitHub. $ git clone https://github.com/git/git git +$ cd git [[identify-problem]] @@ -164,8 +165,28 @@ $ ./bin-wrappers/git psuh Check it out! You've got a command! Nice work! Let's commit this. +`git status` reveals modified `Makefile`, `builtin.h`, and `git.c` as well as +untracked `builtin/psuh.c` and `git-psuh`. First, let's take care of the binary, +which should be ignored. Open `.gitignore` in your editor, find `/git-push`, and +add an entry for your new command in alphabetical order: + + +... +/git-prune-packed +/git-psuh +/git-pull +/git-push +/git-quiltimport +/git-range-diff +... + + +Checking `git status` again should show that `git-psuh` has been removed from +the untracked list and `.gitignore` has been added to the modified list. Now we +can stage and commit: + -$ git add Makefile builtin.h builtin/psuh.c git.c +$ git add Makefile builtin.h builtin/psuh.c git.c .gitignore $ git commit -s @@ -211,7 +232,8 @@ on the reference implementation linked at the top of this document. It's probably useful to do at least something besides printing out a string. Let's start by having a look at everything we get. -Modify your `cmd_psuh` implementation to dump the args you're passed: +Modify your `cmd_psuh` implementation to dump the args you're passed, keeping +existing `printf()` calls in place: int i; @@ -243,7 +265,7 @@ function body: ... - git_config(git_default_config, NULL) + git_config(git_default_config, NULL); if (git_config_get_string_const("user.name", &cfg_name) > 0) printf(_("No name is found in config\n")); else @@ -315,6 +337,7 @@ Run it again. Check it out - here's the (verbose) name of your current branch! Let's commit this as well. +$ git add builtin/psuh.c $ git commit -sm "psuh: print the current branch" @@ -366,6 +389,7 @@ see the subject line of the most recent commit in `origin/master` that you know about. Neat! Let's commit that as well. +$ git add builtin/psuh.c $ git commit -sm "psuh: display the top of origin/master" @@ -418,7 +442,6 @@ OUTPUT -- ... - GIT --- Part of the linkgit:git[1] suite -- 2.22.0.rc1.257.g3120a18244-goog
Is --filter-print-omitted correct/used/needed?
Yesterday while down a rabbit hole, I discovered an interesting thing about the 'omitted' object in many filters in list-objects-filter-options.h. It appears that when we call list-objects.h:traverse_commit_list_filtered() with a non-NULL 'omitted' argument, we still perform a walk of all objects - that is, the filtered walk is no more efficient than the unfiltered walk from the same place via 'traverse_commit_list()'. I verified by calling each and counting the objects: 161 if (0) { 162 /* Unfiltered: */ 163 printf(_("Unfiltered object walk.\n")); 164 traverse_commit_list(rev, walken_show_commit, 165 walken_show_object, NULL); 166 } else { 167 printf(_("Filtered object walk with filterspec 'tree:1'.\n")); 168 /* 169 * We can parse a tree depth of 1 to demonstrate the kind of 170 * filtering that could occur eg during shallow cloning. 171 */ 172 parse_list_objects_filter(&filter_options, "tree:1"); 173 174 traverse_commit_list_filtered(&filter_options, rev, 175 walken_show_commit, walken_show_object, NULL, &omitted); 176 } 177 178 /* Count the omitted objects. */ 179 oidset_iter_init(&omitted, &oit); 180 181 while ((oid = oidset_iter_next(&oit))) 182 omitted_count++; 183 184 printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " 185"and %d trees; %d omitted objects.\n"), commit_count, 186blob_count, tag_count, tree_count, omitted_count); I found that omitted_count was always equal to the difference between sum(blob_count, tag_count, tree_count, commit_count) in the unfiltered and filtered walks. I also found that the length of time required to perform the unfiltered walk and the filtered-with-non-NULL-omitted walk was the same, while the time required to perform the filtered walk with NULL omitted was significantly shorter. (The walk in question was over the latest release of Git master, plus the ten or so commits in my feature branch.) I was surprised! I figured that with filter "tree:1" that "omitted" would contain only the objects on the "border" of the filter - that is, I assumed it would contain the blobs and trees in the root git dir, but none of those trees' blobs and trees. After talking with jrnieder at length, it sounds like neither of us were clear on why this "omitted" list would be needed beyond the initial development stage of a new filter... Jonathan's impression was also that if we do need the "omitted" list, it may be a bug that we're still traversing objects which are only reachable from objects already omitted. I grepped the Git source and found that we only provide a non-NULL "omitted" when someone calls "git rev-list --filter-print-omitted", which we verify with a simple test case for "blobs:none", in which case the "border" objects which were omitted must be the same as all objects which were omitted (since blobs aren't pointing to anything else). I think if we had written a similar test case with some trees we expect to omit we might have noticed sooner. Since I was already in the rabbit hole, out of curiosity I did a search on Github and only found one project referencing --filter-print-omitted which wasn't a mirror of Git: https://github.com/search?l=Python&q=%22--filter-print-omitted%22&type=Code So, what do we use --filter-print-omitted for? Is anybody needing it? Or do we just use it to verify this one test case? Should we fix it, or get rid of it, or neither? - Emily
[PATCH] documentation: add tutorial for revision walking
Existing documentation on revision walks seems to be primarily intended as a reference for those already familiar with the procedure. This tutorial attempts to give an entry-level guide to a couple of bare-bones revision walks so that new Git contributors can learn the concepts without having to wade through options parsing or special casing. The target audience is a Git contributor who is just getting started with the concept of revision walking. The goal is to prepare this contributor to be able to understand and modify existing commands which perform revision walks more easily, although it will also prepare contributors to create new commands which perform walks. The tutorial covers a basic overview of the structs involved during revision walk, setting up a basic commit walk, setting up a basic all-object walk, and adding some configuration changes to both walk types. It intentionally does not cover how to create new commands or search for options from the command line or gitconfigs. There is an associated patchset at https://github.com/nasamuffin/git/tree/revwalk that contains a reference implementation of the code generated by this tutorial. Signed-off-by: Emily Shaffer --- This one is longer than the MyFirstContribution one, thanks in advance to anybody with the wherewithal to review this. I'll also be mailing an RFC patchset In-Reply-To this message; the RFC patchset should not be merged to Git, as I intend to host it in my own mirror as an example. I hosted a similar example for the MyFirstContribution tutorial; it's visible at https://github.com/nasamuffin/git/tree/psuh. There might be a better place to host these so I don't "own" them but I'm not sure what it is; keeping them as a live branch somewhere struck me as an okay way to keep them from getting stale. Looking forward to hearing everyone's comments! - Emily Documentation/.gitignore | 1 + Documentation/Makefile | 1 + Documentation/MyFirstRevWalk.txt | 826 +++ 3 files changed, 828 insertions(+) create mode 100644 Documentation/MyFirstRevWalk.txt diff --git a/Documentation/.gitignore b/Documentation/.gitignore index 9022d48355..0e3df737c5 100644 --- a/Documentation/.gitignore +++ b/Documentation/.gitignore @@ -12,6 +12,7 @@ cmds-*.txt mergetools-*.txt manpage-base-url.xsl SubmittingPatches.txt +MyFirstRevWalk.txt tmp-doc-diff/ GIT-ASCIIDOCFLAGS /GIT-EXCLUDED-PROGRAMS diff --git a/Documentation/Makefile b/Documentation/Makefile index dbf5a0f276..d57b80962f 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica SP_ARTICLES += $(API_DOCS) TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstRevWalk TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt new file mode 100644 index 00..494c09d1fa --- /dev/null +++ b/Documentation/MyFirstRevWalk.txt @@ -0,0 +1,826 @@ +My First Revision Walk +== + +== What's a Revision Walk? + +The revision walk is a key concept in Git - this is the process that underpins +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the +list of objects is found by walking parent relationships between objects. The +revision walk can also be usedto determine whether or not a given object is +reachable from the current HEAD pointer. + +=== Related Reading + +- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of + the revision walker in its various incarnations. +- `Documentation/technical/api-revision-walking.txt` +- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] + gives a good overview of the types of objects in Git and what your revision + walk is really describing. + +== Setting Up + +Create a new branch from `master`. + + +git checkout -b revwalk origin/master + + +We'll put our fiddling into a new command. For fun, let's name it `git walken`. +Open up a new file `builtin/walken.c` and set up the command handler: + + +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ +printf(_("cmd_walken incoming...\n")); +return 0; +} + + +Add usage text and `-h` handling, in order to pass the test suite: + + +static const char * const walken_usage[] = { + N_("git walken"), + NULL, +} + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + struct option options[] = { + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, walken_usage, 0
[RFC PATCH 01/13] walken: add infrastructure for revwalk demo
Begin to add scaffolding for `git walken`, a toy command which we will teach to perform a number of revision walks, in order to demonstrate the mechanics of revision walking for developers new to the Git project. This commit is the beginning of an educational series which correspond to the tutorial in Documentation/MyFirstRevWalk.txt. Signed-off-by: Emily Shaffer --- Makefile | 1 + builtin.h| 1 + builtin/walken.c | 14 ++ 3 files changed, 16 insertions(+) create mode 100644 builtin/walken.c diff --git a/Makefile b/Makefile index 8a7e235352..a25d46c7a3 100644 --- a/Makefile +++ b/Makefile @@ -1143,6 +1143,7 @@ BUILTIN_OBJS += builtin/var.o BUILTIN_OBJS += builtin/verify-commit.o BUILTIN_OBJS += builtin/verify-pack.o BUILTIN_OBJS += builtin/verify-tag.o +BUILTIN_OBJS += builtin/walken.o BUILTIN_OBJS += builtin/worktree.o BUILTIN_OBJS += builtin/write-tree.o diff --git a/builtin.h b/builtin.h index ec7e0954c4..c919736c36 100644 --- a/builtin.h +++ b/builtin.h @@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char *prefix); int cmd_verify_commit(int argc, const char **argv, const char *prefix); int cmd_verify_tag(int argc, const char **argv, const char *prefix); int cmd_version(int argc, const char **argv, const char *prefix); +int cmd_walken(int argc, const char **argv, const char *prefix); int cmd_whatchanged(int argc, const char **argv, const char *prefix); int cmd_worktree(int argc, const char **argv, const char *prefix); int cmd_write_tree(int argc, const char **argv, const char *prefix); diff --git a/builtin/walken.c b/builtin/walken.c new file mode 100644 index 00..bfeaa5188d --- /dev/null +++ b/builtin/walken.c @@ -0,0 +1,14 @@ +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + printf(_("cmd_walken incoming...\n")); + return 0; +} -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 00/13] example implementation of revwalk tutorial
This patchset is NOT intended to be merged to the Git project! This patchset should indicate what a contributor would generate by following the MyFirstRevWalk tutorial. I intend to push a feature branch with these patches to my own mirror of Git on Github (github.com/nasamuffin/git/tree/revwalk). I'm sending them for review by the list to check for consistency with the Git codebase, so they aren't a bad example for new contributors. Thanks for any reviews, all! - Emily Emily Shaffer (13): walken: add infrastructure for revwalk demo walken: add usage to enable -h walken: add placeholder to initialize defaults walken: add handler to git_config walken: configure rev_info and prepare for walk walken: perform our basic revision walk walken: filter for authors from gmail address walken: demonstrate various topographical sorts walken: demonstrate reversing a revision walk list walken: add unfiltered object walk from HEAD walken: add filtered object walk walken: count omitted objects walken: reverse the object walk order Makefile | 1 + builtin.h| 1 + builtin/walken.c | 263 +++ git.c| 1 + 4 files changed, 266 insertions(+) create mode 100644 builtin/walken.c -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 05/13] walken: configure rev_info and prepare for walk
`struct rev_info` is what's used by the struct itself. `repo_init_revisions()` initializes the struct; then we need to set it up for the walk we want to perform, which is done in `final_rev_info_setup()`. The most important step here is adding the first object we want to walk to the pending array. Here, we take the easy road and use `add_head_to_pending()`; there is also a way to do it with `setup_revision_opt()` and `setup_revisions()` which we demonstrate but do not use. If we were to forget this step, the walk would do nothing - the pending queue would be checked, determined to be empty, and the walk would terminate immediately. Signed-off-by: Emily Shaffer --- builtin/walken.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 5d1666a5da..c101db38c7 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -6,6 +6,7 @@ #include #include "builtin.h" +#include "revision.h" #include "config.h" #include "parse-options.h" @@ -26,6 +27,35 @@ static void init_walken_defaults(void) */ } +/* + * cmd_log calls a second set of init after the repo_init_revisions call. We'll + * mirror those settings in post_repo_init_init. + */ +static void final_rev_info_setup(int argc, const char **argv, const char *prefix, + struct rev_info *rev) +{ + struct setup_revision_opt opt; + + /* setup_revision_opt is used to pass options to the setup_revisions() +* call. It's got some special items for submodules and other types of +* optimizations, but for now, we'll just point it to HEAD and call it +* good. First we should make sure to reset it. TODO: This is useful for +* more complicated stuff revisions, but a decent shortcut for the first +* pass is add_head_to_pending(). +*/ + memset(&opt, 0, sizeof(opt)); + opt.def = "HEAD"; + opt.revarg_opt = REVARG_COMMITTISH; + //setup_revisions(argc, argv, rev, &opt); + + /* Let's force oneline format. */ + get_commit_format("oneline", rev); + rev->verbose_header = 1; + + /* add the HEAD to pending so we can start */ + add_head_to_pending(rev); +} + /* * This method will be called back by git_config(). It is used to gather values * from the configuration files available to Git. @@ -54,12 +84,24 @@ int cmd_walken(int argc, const char **argv, const char *prefix) OPT_END() }; + struct rev_info rev; + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); init_walken_defaults(); git_config(git_walken_config, NULL); + /* Time to set up the walk. repo_init_revisions sets up rev_info with +* the defaults, but then you need to make some configuration settings +* to make it do what's special about your walk. +*/ + repo_init_revisions(the_repository, &rev, prefix); + + /* Before we do the walk, we need to set a starting point. It's not +* coming from opt. */ + final_rev_info_setup(argc, argv, prefix, &rev); + printf(_("cmd_walken incoming...\n")); return 0; } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 04/13] walken: add handler to git_config
For now, we have no configuration options we want to set up for ourselves, but in the future we may need to. At the very least, we should invoke git_default_config() for each config option; we will do so inside of a skeleton config callback so that we know where to add configuration handling later on when we need it. Signed-off-by: Emily Shaffer --- builtin/walken.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index dcee906556..5d1666a5da 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -6,6 +6,7 @@ #include #include "builtin.h" +#include "config.h" #include "parse-options.h" static const char * const walken_usage[] = { @@ -25,6 +26,28 @@ static void init_walken_defaults(void) */ } +/* + * This method will be called back by git_config(). It is used to gather values + * from the configuration files available to Git. + * + * Each time git_config() finds a configuration file entry, it calls this + * callback. Then, this function should compare it to entries which concern us, + * and make settings changes as necessary. + * + * If we are called with a config setting we care about, we should use one of + * the helpers which exist in config.h to pull out the value for ourselves, i.e. + * git_config_string(...) or git_config_bool(...). + * + * If we don't match anything, we should pass it along to another stakeholder + * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our case, + * we'll ignore everybody else. + */ +static int git_walken_config(const char *var, const char *value, void *cb) +{ + /* For now, let's not bother with anything. */ + return git_default_config(var, value, cb); +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -34,6 +57,9 @@ int cmd_walken(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, walken_usage, 0); init_walken_defaults(); + + git_config(git_walken_config, NULL); + printf(_("cmd_walken incoming...\n")); return 0; } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 08/13] walken: demonstrate various topographical sorts
Order the revision walk by author or commit dates, to demonstrate how to apply topo_sort to a revision walk. While following the tutorial, new contributors are guided to run a walk with each sort and compare the results. Signed-off-by: Emily Shaffer --- builtin/walken.c | 5 + 1 file changed, 5 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 6c0f4e7b7a..716d31f04e 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -61,6 +61,11 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix /* add the HEAD to pending so we can start */ add_head_to_pending(rev); + + /* Let's play with the sort order. */ + rev->topo_order = 1; + rev->sort_order = REV_SORT_BY_COMMIT_DATE; + /* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */ } /* -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 13/13] walken: reverse the object walk order
Demonstrate that just like commit walks, object walks can have their order reversed. Additionally, add verbose logging of objects encountered in order to let contributors prove to themselves that the walk has actually been reversed. With this commit, `git walken` becomes extremely chatty - it's recommended to pipe the output through `head` or `tail` or to redirect it into a file. Signed-off-by: Emily Shaffer --- builtin/walken.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index d93725ee88..4bfee3a2d7 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -102,11 +102,13 @@ static int git_walken_config(const char *var, const char *value, void *cb) static void walken_show_commit(struct commit *cmt, void *buf) { + printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid)); commit_count++; } static void walken_show_object(struct object *obj, const char *str, void *buf) { + printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid)); switch (obj->type) { case OBJ_TREE: tree_count++; @@ -149,6 +151,7 @@ static int walken_object_walk(struct rev_info *rev) rev->tag_objects = 1; rev->tree_blobs_in_commit_order = 1; rev->exclude_promisor_objects = 1; + rev->reverse = 1; if (prepare_revision_walk(rev)) die(_("revision walk setup failed")); -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 09/13] walken: demonstrate reversing a revision walk list
The final installment in the tutorial about sorting revision walk outputs. This commit reverses the commit list, so that we see newer commits last (handy since we aren't using a pager). It's important to note that rev->reverse needs to be set after add_head_to_pending() or before setup_revisions(). (This is mentioned in the accompanying tutorial.) Signed-off-by: Emily Shaffer --- builtin/walken.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 716d31f04e..86c8d29c48 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -61,6 +61,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix /* add the HEAD to pending so we can start */ add_head_to_pending(rev); + + /* Reverse the order */ + rev->reverse = 1; /* Let's play with the sort order. */ rev->topo_order = 1; -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 06/13] walken: perform our basic revision walk
Add the final steps needed and implement the walk loop itself. We add a method walken_commit_walk() which performs the final setup to revision.c and then iterates over commits from get_revision(). This basic walk only prints the subject line of each commit in the history. It is nearly equivalent to `git log --oneline`. Signed-off-by: Emily Shaffer --- builtin/walken.c | 41 + 1 file changed, 41 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index c101db38c7..9cf19a24ab 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -7,8 +7,11 @@ #include #include "builtin.h" #include "revision.h" +#include "commit.h" #include "config.h" #include "parse-options.h" +#include "pretty.h" +#include "line-log.h" static const char * const walken_usage[] = { N_("git walken"), @@ -78,6 +81,39 @@ static int git_walken_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } +/* + * walken_commit_walk() is invoked by cmd_walken() after initialization. It + * does the commit walk only. + */ +static int walken_commit_walk(struct rev_info *rev) +{ + struct commit *commit; + struct strbuf prettybuf; + + strbuf_init(&prettybuf, 0); + + + /* prepare_revision_walk() gets the final steps ready for a revision +* walk. We check the return value for errors. */ + if (prepare_revision_walk(rev)) { + die(_("revision walk setup failed")); + } + + /* Now we can start the real commit walk. get_revision grabs the next +* revision based on the contents of rev. +*/ + rev->diffopt.close_file = 0; + while ((commit = get_revision(rev)) != NULL) { + if (commit == NULL) + continue; + strbuf_reset(&prettybuf); + pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); + printf(_("%s\n"), prettybuf.buf); + + } + return 0; +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -98,10 +134,15 @@ int cmd_walken(int argc, const char **argv, const char *prefix) */ repo_init_revisions(the_repository, &rev, prefix); + /* We can set our traversal flags here. */ + rev.always_show_header = 1; + /* Before we do the walk, we need to set a starting point. It's not * coming from opt. */ final_rev_info_setup(argc, argv, prefix, &rev); + walken_commit_walk(&rev); + printf(_("cmd_walken incoming...\n")); return 0; } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 02/13] walken: add usage to enable -h
One requirement of the Git test suite is that all commands support '-h', which is captured by parse_options(). In order to support this flag, add a short usage text to walken.c and invoke parse_options(). With this change, we can now add cmd_walken to the builtins set and expect tests to pass, so we'll do so - cmd_walken is now open for business. Signed-off-by: Emily Shaffer --- builtin/walken.c | 12 git.c| 1 + 2 files changed, 13 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index bfeaa5188d..5ae7c7d93f 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -6,9 +6,21 @@ #include #include "builtin.h" +#include "parse-options.h" + +static const char * const walken_usage[] = { + N_("git walken"), + NULL, +}; int cmd_walken(int argc, const char **argv, const char *prefix) { + struct option options[] = { + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + printf(_("cmd_walken incoming...\n")); return 0; } diff --git a/git.c b/git.c index 1bf9c94550..209c42836f 100644 --- a/git.c +++ b/git.c @@ -600,6 +600,7 @@ static struct cmd_struct commands[] = { { "verify-pack", cmd_verify_pack }, { "verify-tag", cmd_verify_tag, RUN_SETUP }, { "version", cmd_version }, + { "walken", cmd_walken, RUN_SETUP }, { "whatchanged", cmd_whatchanged, RUN_SETUP }, { "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT }, { "write-tree", cmd_write_tree, RUN_SETUP }, -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 10/13] walken: add unfiltered object walk from HEAD
Provide a demonstration of a revision walk which traverses all types of object, not just commits. This type of revision walk is used for operations such as creating packfiles and performing fetches or clones, so it's useful to teach new developers how it works. For starters, only demonstrate the unfiltered version, as this will make the tutorial easier to follow. This commit is part of a tutorial on revision walking. Signed-off-by: Emily Shaffer --- builtin/walken.c | 79 ++-- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index 86c8d29c48..408af6c841 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -12,6 +12,7 @@ #include "parse-options.h" #include "pretty.h" #include "line-log.h" +#include "list-objects.h" #include "grep.h" static const char * const walken_usage[] = { @@ -19,6 +20,11 @@ static const char * const walken_usage[] = { NULL, }; +static int commit_count; +static int tag_count; +static int blob_count; +static int tree_count; + /* * Within init_walken_defaults() we can call into other useful defaults to set * in the global scope or on the_repository. It's okay to borrow from other @@ -93,6 +99,70 @@ static int git_walken_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } +static void walken_show_commit(struct commit *cmt, void *buf) +{ + commit_count++; +} + +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + switch (obj->type) { + case OBJ_TREE: + tree_count++; + break; + case OBJ_BLOB: + blob_count++; + break; + case OBJ_TAG: + tag_count++; + break; + case OBJ_COMMIT: + printf(_("Unexpectedly encountered a commit in " +"walken_show_object!\n")); + commit_count++; + break; + default: + printf(_("Unexpected object type %s!\n"), + type_name(obj->type)); + break; + } +} + +/* + * walken_object_walk() is invoked by cmd_walken() after initialization. It does + * a walk of all object types. + */ +static int walken_object_walk(struct rev_info *rev) +{ + struct list_objects_filter_options filter_options = {}; + struct oidset omitted; + oidset_init(&omitted, 0); + + printf("walken_object_walk beginning...\n"); + + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); + + commit_count = 0; + tag_count = 0; + blob_count = 0; + tree_count = 0; + + traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + + printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " + "and %d trees.\n"), commit_count, blob_count, tag_count, + tree_count); + + return 0; +} + /* * walken_commit_walk() is invoked by cmd_walken() after initialization. It * does the commit walk only. @@ -151,9 +221,14 @@ int cmd_walken(int argc, const char **argv, const char *prefix) /* Before we do the walk, we need to set a starting point. It's not * coming from opt. */ - final_rev_info_setup(argc, argv, prefix, &rev); - walken_commit_walk(&rev); + if (1) { + add_head_to_pending(&rev); + walken_object_walk(&rev); + } else { + final_rev_info_setup(argc, argv, prefix, &rev); + walken_commit_walk(&rev); + } printf(_("cmd_walken incoming...\n")); return 0; -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 07/13] walken: filter for authors from gmail address
In order to demonstrate how to create grep filters for revision walks, filter the walk performed by cmd_walken() to print only commits which are authored by someone with a gmail address. This commit demonstrates how to append a grep pattern to a rev_info.grep_filter, to teach new contributors how to create their own more generalized grep filters during revision walks. Signed-off-by: Emily Shaffer --- builtin/walken.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index 9cf19a24ab..6c0f4e7b7a 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -12,6 +12,7 @@ #include "parse-options.h" #include "pretty.h" #include "line-log.h" +#include "grep.h" static const char * const walken_usage[] = { N_("git walken"), @@ -25,9 +26,8 @@ static const char * const walken_usage[] = { */ static void init_walken_defaults(void) { - /* We don't actually need the same components `git log` does; leave this -* empty for now. -*/ + /* Needed by our grep filter. */ + init_grep_defaults(the_repository); } /* @@ -51,6 +51,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix opt.revarg_opt = REVARG_COMMITTISH; //setup_revisions(argc, argv, rev, &opt); + /* Add a grep pattern to the author line in the header. */ + append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail"); + compile_grep_patterns(&rev->grep_filter); + /* Let's force oneline format. */ get_commit_format("oneline", rev); rev->verbose_header = 1; @@ -77,7 +81,7 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix */ static int git_walken_config(const char *var, const char *value, void *cb) { - /* For now, let's not bother with anything. */ + grep_config(var, value, cb); return git_default_config(var, value, cb); } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 12/13] walken: count omitted objects
It may be illuminating to see which objects were not included within a given filter. This also demonstrates, since filter-spec "tree:1" is used, that the 'omitted' list contains all objects which are omitted, not just the first objects which were omitted - that is, it continues to dereference omitted trees and commits. This is part of a tutorial on performing revision walks. Signed-off-by: Emily Shaffer --- builtin/walken.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index f2c98bcd6b..d93725ee88 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -137,6 +137,9 @@ static int walken_object_walk(struct rev_info *rev) { struct list_objects_filter_options filter_options = {}; struct oidset omitted; + struct oidset_iter oit; + struct object_id *oid = NULL; + int omitted_count = 0; oidset_init(&omitted, 0); printf("walken_object_walk beginning...\n"); @@ -172,9 +175,15 @@ static int walken_object_walk(struct rev_info *rev) walken_show_commit, walken_show_object, NULL, &omitted); } + /* Count the omitted objects. */ + oidset_iter_init(&omitted, &oit); + + while ((oid = oidset_iter_next(&oit))) + omitted_count++; + printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " - "and %d trees.\n"), commit_count, blob_count, tag_count, - tree_count); + "and %d trees; %d omitted objects.\n"), commit_count, + blob_count, tag_count, tree_count, omitted_count); return 0; } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 03/13] walken: add placeholder to initialize defaults
Eventually, we will want a good place to initialize default variables for use during our revision walk(s) in `git walken`. For now, there's nothing to do here, but let's add the scaffolding so that it's easy to tell where to put the setup later on. Signed-off-by: Emily Shaffer --- builtin/walken.c | 13 + 1 file changed, 13 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 5ae7c7d93f..dcee906556 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -13,6 +13,18 @@ static const char * const walken_usage[] = { NULL, }; +/* + * Within init_walken_defaults() we can call into other useful defaults to set + * in the global scope or on the_repository. It's okay to borrow from other + * functions which are doing something relatively similar to yours. + */ +static void init_walken_defaults(void) +{ + /* We don't actually need the same components `git log` does; leave this +* empty for now. +*/ +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -21,6 +33,7 @@ int cmd_walken(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + init_walken_defaults(); printf(_("cmd_walken incoming...\n")); return 0; } -- 2.22.0.rc1.311.g5d7573a151-goog
[RFC PATCH 11/13] walken: add filtered object walk
Demonstrate how filter specs can be used when performing a revision walk of all object types. In this case, tree depth is used. Contributors who are following the revision walking tutorial will be encouraged to run the revision walk with and without the filter in order to compare the number of objects seen in each case. Signed-off-by: Emily Shaffer --- builtin/walken.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/builtin/walken.c b/builtin/walken.c index 408af6c841..f2c98bcd6b 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -13,6 +13,7 @@ #include "pretty.h" #include "line-log.h" #include "list-objects.h" +#include "list-objects-filter-options.h" #include "grep.h" static const char * const walken_usage[] = { @@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev) blob_count = 0; tree_count = 0; - traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + if (1) { + /* Unfiltered: */ + printf(_("Unfiltered object walk.\n")); + traverse_commit_list(rev, walken_show_commit, + walken_show_object, NULL); + } else { + printf(_("Filtered object walk with filterspec 'tree:1'.\n")); + /* +* We can parse a tree depth of 1 to demonstrate the kind of +* filtering that could occur eg during shallow cloning. +*/ + parse_list_objects_filter(&filter_options, "tree:1"); + + traverse_commit_list_filtered(&filter_options, rev, + walken_show_commit, walken_show_object, NULL, &omitted); + } printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, " "and %d trees.\n"), commit_count, blob_count, tag_count, -- 2.22.0.rc1.311.g5d7573a151-goog
Re: Is --filter-print-omitted correct/used/needed?
On Fri, Jun 07, 2019 at 02:57:01PM -0400, Jeff Hostetler wrote: > > > On 6/7/2019 2:38 AM, Christian Couder wrote: > > On Thu, Jun 6, 2019 at 10:18 PM Emily Shaffer > > wrote: > > > > > > > > > I grepped the Git source and found that we only provide a non-NULL > > > "omitted" when someone calls "git rev-list --filter-print-omitted", > > > which we verify with a simple test case for "blobs:none", in which > > > case the "border" objects which were omitted must be the same as all > > > objects which were omitted (since blobs aren't pointing to anything > > > else). I think if we had written a similar test case with some trees > > > we expect to omit we might have noticed sooner. > > > > It seems that --filter-print-omitted was introduced in caf3827e2f > > (rev-list: add list-objects filtering support, 2017-11-21) so I cc'ed > > Jeff. > > > > [...] > > The --filter-print-omitted was intended to print the complete list > of omitted objects. For example, a packfile built from a filtered > command and a packfile build from the unfiltered command would differ > by exactly that set of objects. > > So the discrepancy reported by the tree:1 example is incorrect. > The omitted set is the full set, not the frontier. So when > --filter-print-omitted is used, we still have to do the full tree walk. > When not specified, we do get the perf boost because we can terminate > the tree walk early. > > > > > So, what do we use --filter-print-omitted for? Is anybody needing it? > > > Or do we just use it to verify this one test case? Should we fix it, > > > or get rid of it, or neither? > > > > In caf3827e2f there is: > > > > This patch introduces handling of missing objects to help > > debugging and development of the "partial clone" mechanism, > > and once the mechanism is implemented, for a power user to > > perform operations that are missing-object aware without > > incurring the cost of checking if a missing link is expected. > > > > So I would say that if you think that --filter-print-omitted doesn't > > help in debugging or development, and can even be confusing, and that > > it also doesn't help performance for power users or anyone else, then > > it would make sense to remove it, unless you find a way to make it > > fulfill its original goals, or maybe other worthwhile goals. > > I don't currently have a use for that (other than the existing test > cases), but we could use that in the future as a guide for the server > to put the omitted objects on a CDN, for example. > > So I'd say let's leave it as is for now. Thanks for the input, Jeff. I wasn't sure from looking at it whether it was intended behavior with a plan for use; looks like it is. I'll leave it alone. > > > Jeff > > >
[PATCH] revision: remove stray whitespace when name empty
Teach show_object_with_name() to avoid writing a space before a name which is empty. Also teach tests for rev-list --objects --filter to not require a space between the object ID and name. show_object_with_name() inserts a space between an object's OID and name regardless of whether the name is empty or not. This causes 'git cat-file (--batch | --batch-check)' to fail to discover the type of root directories: git rev-list --objects --filter=tree:1 --max-count=1 HEAD \ | git cat-file --batch-check git rev-parse HEAD: | xargs -I% git cat-file -t % git rev-list --objects --filter=tree:1 --max-count=1 HEAD \ | xargs -I% echo "AA%AA" git rev-list --objects --filter=tree:1 --max-count=1 HEAD \ | cut -f 1 -d ' ' | git cat-file --batch-check The extra space provided by 'show_object_with_name()' sticks to the output of 'git rev-list' even if it's piped into a file. Signed-off-by: Emily Shaffer --- I don't see any reason _not_ to remove this stray whitespace at the end, since it seems like it just gets in the way of easy scripting. I also think this case will only present itself for root trees. Not sure if making the whitespace optional is the right solution for the test, although I couldn't come up with a cleaner approach. Maybe something like this would be better, even though handling it in the regex is shorter? if [[ -z "$name" ]] then grep "^$hash" actual else grep "^$hash $name" actual fi - Emily revision.c | 3 ++- t/t6112-rev-list-filters-objects.sh | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/revision.c b/revision.c index d4aaf0ef25..260258b371 100644 --- a/revision.c +++ b/revision.c @@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, const char *name) { const char *p; - fprintf(out, "%s ", oid_to_hex(&obj->oid)); + fprintf(out, "%s%s", oid_to_hex(&obj->oid), + strcmp(name, "") == 0 ? "" : " "); for (p = name; *p && *p != '\n'; p++) fputc(*p, out); fputc('\n', out); diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh index acd7f5ab80..e05faa7a67 100755 --- a/t/t6112-rev-list-filters-objects.sh +++ b/t/t6112-rev-list-filters-objects.sh @@ -288,7 +288,7 @@ expect_has () { name=$2 && hash=$(git -C r3 rev-parse $commit:$name) && - grep "^$hash $name$" actual + grep "^$hash \?$name$" actual } test_expect_success 'verify tree:1 includes root trees' ' -- 2.22.0.rc2.383.gf4fbbf30c2-goog
Re: [PATCH] revision: remove stray whitespace when name empty
On Fri, Jun 07, 2019 at 08:42:45PM -0400, Eric Sunshine wrote: > On Fri, Jun 7, 2019 at 6:59 PM Emily Shaffer wrote: > > Teach show_object_with_name() to avoid writing a space before a name > > which is empty. Also teach tests for rev-list --objects --filter to not > > require a space between the object ID and name. > > [...] > > Signed-off-by: Emily Shaffer > > --- > > Not sure if making the whitespace optional is the right solution for the > > test, although I couldn't come up with a cleaner approach. Maybe > > something like this would be better, even though handling it in the > > regex is shorter? > > > > if [[ -z "$name" ]] then > > In Git, we avoid Bash-isms. Just use 'test'. And, style is to place > 'then' on its own line. > > if test -z "$name" > then > > > diff --git a/revision.c b/revision.c > > @@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, > > const char *name) > > { > > - fprintf(out, "%s ", oid_to_hex(&obj->oid)); > > + fprintf(out, "%s%s", oid_to_hex(&obj->oid), > > + strcmp(name, "") == 0 ? "" : " "); > > for (p = name; *p && *p != '\n'; p++) > > fputc(*p, out); > > fputc('\n', out); > > It's subjective, but this starts to be a bit too noisy and unreadable. > An alternative: > > fputs(oid_to_hex(...), out); > if (*name) > putc(' ', out); > > > diff --git a/t/t6112-rev-list-filters-objects.sh > > b/t/t6112-rev-list-filters-objects.sh > > @@ -288,7 +288,7 @@ expect_has () { > > hash=$(git -C r3 rev-parse $commit:$name) && > > - grep "^$hash $name$" actual > > + grep "^$hash \?$name$" actual > > This would be the first use of \? with 'grep' in the test suite. I > would worry about portability. Your suggestion earlier to tailor > 'grep' invocation based upon whether $name is empty would be safer. Thanks for the review, Eric. I'm going to try again from scratch with Peff and Junio's suggestion about adding an option to remove object names, and I'll try to keep your general style concerns in mind when I do so. - Emily
Re: [PATCH] revision: remove stray whitespace when name empty
On Mon, Jun 10, 2019 at 09:29:14AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Your patch only helps with this at all because you're using the "tree:1" > > ... > > because there you'll have actual names which cat-file will choke on. So > > it seems like this is helping only a very limited use case. > > ... > > Alternatively, it would be reasonable to me to have an option for > > "rev-list --objects" to have an option to suppress the filename (and > > space) entirely. > > Yup, I think that is a more reasonable short-term change compared to > the patch being discussed, and ... I like this too. Starting work on it today. > > > I think in the longer run along those lines that "--objects" should > > allow cat-file style pretty-print formats, which would eliminate the > > need to pipe to cat-file in the first place. That makes this parsing > > problem go away entirely, and it's way more efficient to boot (rev-list > > already knows the types!). > > ... of course this makes tons of sense. Probably not going to start work on this now, but I will make a note to myself to have a look if I have some spare time soon, and keep an eye out for reviews in that area. > > Thanks. Thanks all for the feedback, I'll have a reroll soon. - Emily
[PATCH v2] rev-list: teach --oid-only to enable piping
Allow easier parsing by cat-file by giving rev-list an option to print only the OID of an object without any additional information. This is a short-term shim; later on, rev-list should be taught how to print the types of objects it finds in a format similar to cat-file's. Before this commit, the output from rev-list needed to be massaged before being piped to cat-file, like so: git rev-list --objects HEAD | cut -f 1 -d ' ' \ | git cat-file --batch-check This was especially unexpected when dealing with root trees, as an invisible whitespace exists at the end of the OID: git rev-list --objects --filter=tree:1 --max-count=1 HEAD \ | xargs -I% echo "AA%AA" Now, it can be piped directly, as in the added test case: git rev-list --objects --oid-only HEAD | git cat-file --batch-check Signed-off-by: Emily Shaffer Change-Id: I489bdf0a8215532e54017513ff7541d70e1b --- It didn't appear that using an existing --pretty string would do the trick, as the formatting options for --pretty apply specifically to commit objects (you can specify the commit hash and the tree hash, but I didn't see a way to more generally pretty-print all types of objects). rev-list doesn't appear to use parse-options-h, so I added a new option as simply as I could see without breaking existing style; it seemed wrong to add a flag to the `struct rev_list_info` as the definition of struct and flag values alike are contained in bisect.h. There are a couple other options to rev-list which are stored as globals. At the moment, this doesn't work with --abbrev, and although I think it'd be a good fit, right now --abbrev doesn't seem to work without --abbrev-commit, and both those options end up being eaten by setup_revisions() rather than by rev-list itself. - Emily builtin/rev-list.c | 21 - t/t6000-rev-list-misc.sh | 18 ++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 9f31837d30..ff07ea9564 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -48,7 +48,7 @@ static const char rev_list_usage[] = "--children\n" "--objects | --objects-edge\n" "--unpacked\n" -"--header | --pretty\n" +"--header | --pretty | --oid-only\n" "--abbrev= | --no-abbrev\n" "--abbrev-commit\n" "--left-right\n" @@ -75,6 +75,8 @@ enum missing_action { }; static enum missing_action arg_missing_action; +static int arg_oid_only; /* display only the oid of each object encountered */ + #define DEFAULT_OIDSET_SIZE (16*1024) static void finish_commit(struct commit *commit, void *data); @@ -90,6 +92,11 @@ static void show_commit(struct commit *commit, void *data) return; } + if (arg_oid_only) { + printf("%s\n", oid_to_hex(&commit->object.oid)); + return; + } + graph_show_commit(revs->graph); if (revs->count) { @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data) display_progress(progress, ++progress_counter); if (info->flags & REV_LIST_QUIET) return; + if (arg_oid_only) { + printf("%s\n", oid_to_hex(&obj->oid)); + return; + } show_object_with_name(stdout, obj, name); } @@ -484,6 +495,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (skip_prefix(arg, "--missing=", &arg)) continue; /* already handled above */ + if (!strcmp(arg, ("--oid-only"))) { + arg_oid_only = 1; + continue; + } + usage(rev_list_usage); } @@ -499,6 +515,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) /* Only --header was specified */ revs.commit_format = CMIT_FMT_RAW; + if (arg_oid_only && revs.commit_format != CMIT_FMT_UNSPECIFIED) + die(_("cannot combine --oid-only and --pretty or --header")); + if ((!revs.commits && reflog_walk_empty(revs.reflog_info) && (!(revs.tag_objects || revs.tree_objects || revs.blob_objects) && !revs.pending.nr) && diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 0507999729..996ba1799b 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -48,6 +48,24 @@ test_expect_success 'rev-list --objects with pathspecs and copied files' ' ! grep one output ' +test_expect_success 'rev-list --objects --oid-only has no whitespace/names' ' + git rev-list --objects --oid-only HEAD >output &a
[RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage
Indicate that --abbrev only works with --abbrev-commit also specified. It seems that simply running `git rev-list --abbrev=5` doesn't abbreviate commit OIDs. But the combination of `git rev-list --abbrev-commit --abbrev=5` works as expected. Clarify in the documentation by indicating that --abbrev is an optional addition to the --abbrev-commit option. --no-abbrev remains on a separate line as it can still be used to disable OID abbreviation even if --abbrev-commit has been specified. Signed-off-by: Emily Shaffer Change-Id: If9b1198938e1a3515ae6740241f7b791fb7a88bd --- I thought this was odd when I was working on the other rev-list changes - --abbrev doesn't do anything on its own. It looks like it does work by itself in other commands, but apparently not in rev-list. Listed this patch as RFC because maybe instead it's better to fix something so --abbrev can be used alone, or teach --abbrev-commit=. It looks like `git log --abbrev=5` also doesn't work the way one might expect, which makes sense to me, as they use the same internals for option parsing (parse_revisions()). The manpages for log and rev-list both correctly indicate that --abbrev= is an optional addition to --abbrev-commit. `git log -h` is generated by parse-options tooling and doesn't cover --abbrev-commit at all, but `git rev-list` doesn't use an option parser on its own and the usage is hardcoded. - Emily builtin/rev-list.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 9f31837d30..6ae0087b01 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -49,8 +49,8 @@ static const char rev_list_usage[] = "--objects | --objects-edge\n" "--unpacked\n" "--header | --pretty\n" -"--abbrev= | --no-abbrev\n" -"--abbrev-commit\n" +"--abbrev-commit [--abbrev=]\n" +"--no-abbrev\n" "--left-right\n" "--count\n" " special purpose:\n" -- 2.22.0.rc2.383.gf4fbbf30c2-goog
Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage
On Fri, Jun 14, 2019 at 12:18:41PM -0400, Jeff King wrote: > On Thu, Jun 13, 2019 at 03:15:41PM -0700, Emily Shaffer wrote: > > > I thought this was odd when I was working on the other rev-list changes > > - --abbrev doesn't do anything on its own. It looks like it does work by > > itself in other commands, but apparently not in rev-list. > > > > Listed this patch as RFC because maybe instead it's better to fix > > something so --abbrev can be used alone, or teach --abbrev-commit=. > > It looks like `git log --abbrev=5` also doesn't work the way one might > > expect, which makes sense to me, as they use the same internals for > > option parsing (parse_revisions()). > > > > The manpages for log and rev-list both correctly indicate that > > --abbrev= is an optional addition to --abbrev-commit. `git log -h` is > > generated by parse-options tooling and doesn't cover --abbrev-commit at > > all, but `git rev-list` doesn't use an option parser on its own and the > > usage is hardcoded. > > Yeah, "--abbrev" is a bit tricky here. It is really "when you abbrev, do > it to this level". In "log", that means that "git log --abbrev=5 --raw" > _does_ do something useful (it abbreviates the blob hashes). And then > you may add "--abbrev-commit" on top to ask to abbreviate the commit > ids. > > And rev-list follows that same pattern, except that rev-list _never_ > shows diff output. You'd traditionally do (and this is how log was > implemented once upon a time): > > git rev-list HEAD | git diff-tree --stdin --abbrev=5 --raw > > But even there, we are not seeing the commit id output by rev-list. It > goes to diff-tree, which then formats it separately. So if you wanted > abbreviated commits there, you'd add "--abbrev-commit" to the diff-tree > invocation, not rev-list! > > So no, I cannot see a way in which "rev-list --abbrev" is useful without > "--abbrev-commit". Which means that perhaps the former should imply the > latter. Maybe it should; maybe this patch is a good excuse to do so, and enforce mutual exclusion of --abbrev-commit/--abbrev and --no-abbrev. Maybe it's also a good time to add --abbrev-commit=? > > And as you noticed in your other patch, there is no way to abbreviate > "--objects" output at all. I am not sure I have ever seen a good use for > that. Though to be honest, I am not sure that "--abbrev" is really all > that useful in the first place. Machine-readable output should never > abbreviate, and human-readable ones generally already do. > > But at any rate, before making any behavior changes it may make sense to > think about how they'd interact with "rev-list --objects" abbreviation, > if it were to be added. > > As for the patch itself: > > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > > index 9f31837d30..6ae0087b01 100644 > > --- a/builtin/rev-list.c > > +++ b/builtin/rev-list.c > > @@ -49,8 +49,8 @@ static const char rev_list_usage[] = > > "--objects | --objects-edge\n" > > "--unpacked\n" > > "--header | --pretty\n" > > -"--abbrev= | --no-abbrev\n" > > -"--abbrev-commit\n" > > +"--abbrev-commit [--abbrev=]\n" > > +"--no-abbrev\n" > > So --no-abbrev clears both --abbrev and --abbrev-commit. That sort of > makes sense, though I did not expect it. But it also means that: > > --abbrev-commit [--abbrev= | --no-abbrev] > > is not right. Possibly: > > --abbrev-commit [--abbrev=] | --no-abbrev > > would show the interaction more clearly, but I don't have a strong > opinion. I did consider demonstrating it this way, but when both --abbrev-commit and --no-abbrev are used together, we don't complain or reject the invocation - which I would expect if the usage states the two options are mutually exclusive. I've been trying to think of good reasons not to enforce their mutual exclusion, and the one I keep coming back to is that --no-abbrev might be desired to override a git config'd abbreviation length - although I didn't check to see whether we have one, maybe we would want one later. And even in that case, I suppose that --abbrev-commit would not be explicitly added to the call (because we'd infer from the config), or that if it did need to be explicitly added (like if we need the user to say they want abbreviation, but we want to use their configured preferred length) then we could still reject the addition of --no-abbrev. So maybe it makes even more sense to take this patch as an opportunity to make these options mutually exclusive... although that checking I think would wind up in revision.c, and therefore widen the impact of the change significantly. >From a practical standpoint, I guess you could consider --abbrev-commit and --no-abbrev mutually exclusive, but since it's not enforced, I have a little preference not to document it as though it is... - Emily
Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage
On Fri, Jun 14, 2019 at 05:27:14PM -0400, Jeff King wrote: > On Fri, Jun 14, 2019 at 01:59:50PM -0700, Emily Shaffer wrote: > > > > So no, I cannot see a way in which "rev-list --abbrev" is useful without > > > "--abbrev-commit". Which means that perhaps the former should imply the > > > latter. > > > > Maybe it should; maybe this patch is a good excuse to do so, and enforce > > mutual exclusion of --abbrev-commit/--abbrev and --no-abbrev. Maybe it's > > also a good time to add --abbrev-commit=? > > Hmm, yeah, I think that would reduce confusion quite a bit. If it were > "--abbrev-commit=", then "--abbrev" would not be useful for > anything in rev-list. It would still work as a historical item, but we > would not need or want to advertise it in the usage at all. Good > suggestion. Given your comments below, I think rather than enforcing mutual exclusion it makes more sense to enforce last-one-wins. But the thinking is essentially the same. > > > > is not right. Possibly: > > > > > > --abbrev-commit [--abbrev=] | --no-abbrev > > > > > > would show the interaction more clearly, but I don't have a strong > > > opinion. > > > > I did consider demonstrating it this way, but when both --abbrev-commit > > and --no-abbrev are used together, we don't complain or reject the > > invocation - which I would expect if the usage states the two options > > are mutually exclusive. > > Ah, I see. I don't consider "|" to indicate an exclusion to the point > that the options are rejected. Only that you wouldn't want to use both, > because one counteracts the other. So every "--no-foo" is mutually > exclusive with "--foo" in the sense that one override the other. But the > outcome is "last one wins", and not "whoops, we cannot figure out what > you meant". And that's what the original: > > --abbrev= | --no-abbrev > > before your patch was trying to say (and I suspect there are many other > cases of "|" with this kind of last-one-wins behavior). For what it's worth, in this case it's not last-one-wins - --no-abbrev always wins: emilyshaffer@podkayne:~/git [master]$ g rev-list --abbrev-commit --no-abbrev --max-count=5 --pretty=oneline HEAD b697d92f56511e804b8ba20ccbe7bdc85dc66810 Git 2.22 6ee1eaca3e996e69691f515742129645f453e0e8 Merge tag 'l10n-2.22.0-rnd3' of git://github.com/git-l10n/git-po 0cdb8d2db2f39d1a29636975168c457d2dc0d466 Merge branch 'fr_review' of git://github.com/jnavila/git d0149200792f579151166a4a5bfae7e66c5d998b Merge branch 'master' of git://github.com/alshopov/git-po 82eb147dbbbd0221980883e87ca7efd16a939a6f l10n: fr.po: Review French translation emilyshaffer@podkayne:~/git [master]$ g rev-list --no-abbrev --abbrev-commit --max-count=5 --pretty=oneline HEAD b697d92f56511e804b8ba20ccbe7bdc85dc66810 Git 2.22 6ee1eaca3e996e69691f515742129645f453e0e8 Merge tag 'l10n-2.22.0-rnd3' of git://github.com/git-l10n/git-po 0cdb8d2db2f39d1a29636975168c457d2dc0d466 Merge branch 'fr_review' of git://github.com/jnavila/git d0149200792f579151166a4a5bfae7e66c5d998b Merge branch 'master' of git://github.com/alshopov/git-po 82eb147dbbbd0221980883e87ca7efd16a939a6f l10n: fr.po: Review French translation emilyshaffer@podkayne:~/git [master]$ g rev-list --abbrev-commit --max-count=5 --pretty=oneline HEAD b697d92f56 Git 2.22 6ee1eaca3e Merge tag 'l10n-2.22.0-rnd3' of git://github.com/git-l10n/git-po 0cdb8d2db2 Merge branch 'fr_review' of git://github.com/jnavila/git d014920079 Merge branch 'master' of git://github.com/alshopov/git-po 82eb147dbb l10n: fr.po: Review French translation > > > I've been trying to think of good reasons not to enforce their mutual > > exclusion, and the one I keep coming back to is that --no-abbrev might > > be desired to override a git config'd abbreviation length - although I > > didn't check to see whether we have one, maybe we would want one later. > > And even in that case, I suppose that --abbrev-commit would not be > > explicitly added to the call (because we'd infer from the config), or > > that if it did need to be explicitly added (like if we need the user to > > say they want abbreviation, but we want to use their configured > > preferred length) then we could still reject the addition of > > --no-abbrev. > > > > So maybe it makes even more sense to take this patch as an opportunity > > to make these options mutually exclusive... although that checking I > > think would
Re: [PATCH v2] rev-list: teach --oid-only to enable piping
On Fri, Jun 14, 2019 at 01:25:59PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > But I wonder if things would be simpler if we did not touch the commit > > code path at all. I.e., if this were simply "--no-object-names", and it > > touched only show_object(). > > Yeah, that sounds more tempting. And the refined code structure you > suggested ... > > >> @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const > >> char *name, void *cb_data) > >>display_progress(progress, ++progress_counter); > >>if (info->flags & REV_LIST_QUIET) > >>return; > >> + if (arg_oid_only) { > >> + printf("%s\n", oid_to_hex(&obj->oid)); > >> + return; > >> + } > >>show_object_with_name(stdout, obj, name); > >> } > >> > > > > A minor style point, but I think this might be easier to follow without > > the early return, since we are really choosing to do A or B. Writing: > > > > if (arg_oid_only) > > printf(...); > > else > > show_object_with_name(...); > > > > shows that more clearly, I think. > > ... is a good way to clearly show that intention, I would think. Sounds good. Thanks, both; I'll reroll that quickly today.
Re: [PATCH v2] rev-list: teach --oid-only to enable piping
On Fri, Jun 14, 2019 at 12:07:28PM -0400, Jeff King wrote: > On Thu, Jun 13, 2019 at 02:51:03PM -0700, Emily Shaffer wrote: > > +test_expect_success 'rev-list --objects --oid-only is usable by cat-file' ' > > + git rev-list --objects --oid-only --all >list-output && > > + git cat-file --batch-check cat-output && > > + ! grep missing cat-output > > +' > > Usually we prefer to look for the expected output, rather than making > sure we did not find the unexpected. But I'm not sure if that might be > burdensome in this case (i.e., if there's a bunch of cruft coming out of > "rev-list" that would be annoying to match, and might even change as > people add more tests). So I'm OK with it either way. My (newbie) opinion is that in this case, we specifically want to know that cat-file didn't choke on objects which we know exist (since they came from rev-list). I have the feeling that checking for the exact objects returned instead (or a sample of them) would be more brittle and would also make the wording of the test less direct. So if there's no complaint either way, I'd prefer to leave it the way it is. By the way, rev-list-misc.sh has a number of other existing "! grep ..." lines. - Emily
[PATCH v3] rev-list: teach --no-object-names to enable piping
Allow easier parsing by cat-file by giving rev-list an option to print only the OID of a non-commit object without any additional information. This is a short-term shim; later on, rev-list should be taught how to print the types of objects it finds in a format similar to cat-file's. Before this commit, the output from rev-list needed to be massaged before being piped to cat-file, like so: git rev-list --objects HEAD | cut -f 1 -d ' ' \ | git cat-file --batch-check This was especially unexpected when dealing with root trees, as an invisible whitespace exists at the end of the OID: git rev-list --objects --filter=tree:1 --max-count=1 HEAD \ | xargs -I% echo "AA%AA" Now, it can be piped directly, as in the added test case: git rev-list --objects --no-object-names HEAD | git cat-file --batch-check Signed-off-by: Emily Shaffer Change-Id: I489bdf0a8215532e54017513ff7541d70e1b --- Based on Peff and Junio's comments, made following changes since v2: - Removed interaction with commit objects - Renamed option to "no-object-names" - Removed warnings when new option is combined with commit-formatting options (and reflected in usage) - Simplified logic in show_object() Thanks for the thoughts, all. - Emily builtin/rev-list.c | 14 +- t/t6000-rev-list-misc.sh | 18 ++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 660172b014..7e2598fd22 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -49,6 +49,7 @@ static const char rev_list_usage[] = "--objects | --objects-edge\n" "--unpacked\n" "--header | --pretty\n" +"--no-object-names\n" "--abbrev= | --no-abbrev\n" "--abbrev-commit\n" "--left-right\n" @@ -75,6 +76,9 @@ enum missing_action { }; static enum missing_action arg_missing_action; +/* display only the oid of each object encountered */ +static int arg_no_object_names; + #define DEFAULT_OIDSET_SIZE (16*1024) static void finish_commit(struct commit *commit); @@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data) display_progress(progress, ++progress_counter); if (info->flags & REV_LIST_QUIET) return; - show_object_with_name(stdout, obj, name); + if (arg_no_object_names) + printf("%s\n", oid_to_hex(&obj->oid)); + else + show_object_with_name(stdout, obj, name); } static void show_edge(struct commit *commit) @@ -484,6 +491,11 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (skip_prefix(arg, "--missing=", &arg)) continue; /* already handled above */ + if (!strcmp(arg, ("--no-object-names"))) { + arg_no_object_names = 1; + continue; + } + usage(rev_list_usage); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 0507999729..5d87171b99 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -48,6 +48,24 @@ test_expect_success 'rev-list --objects with pathspecs and copied files' ' ! grep one output ' +test_expect_success 'rev-list --objects --no-object-names has no space/names' ' + git rev-list --objects --no-object-names HEAD >output && + ! grep wanted_file output && + ! grep unwanted_file output && + ! grep " " output +' + +test_expect_success 'rev-list --objects --no-object-names works with cat-file' ' + git rev-list --objects --no-object-names --all >list-output && + git cat-file --batch-check cat-output && + ! grep missing cat-output +' + +test_expect_success 'rev-list --oid-only is incompatible with --pretty' ' + test_must_fail git rev-list --objects --oid-only --pretty HEAD && + test_must_fail git rev-list --objects --oid-only --header HEAD +' + test_expect_success 'rev-list A..B and rev-list ^A B are the same' ' git commit --allow-empty -m another && git tag -a -m "annotated" v1.0 && -- 2.22.0.410.gd8fdbe21b5-goog
Re: [RFC PATCH 11/13] walken: add filtered object walk
On Fri, Jun 07, 2019 at 03:15:53PM -0400, Jeff Hostetler wrote: > > > On 6/6/2019 9:08 PM, Emily Shaffer wrote: > > Demonstrate how filter specs can be used when performing a revision walk > > of all object types. In this case, tree depth is used. Contributors who > > are following the revision walking tutorial will be encouraged to run > > the revision walk with and without the filter in order to compare the > > number of objects seen in each case. > > > > Signed-off-by: Emily Shaffer > > --- > > builtin/walken.c | 18 +- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/builtin/walken.c b/builtin/walken.c > > index 408af6c841..f2c98bcd6b 100644 > > --- a/builtin/walken.c > > +++ b/builtin/walken.c > > @@ -13,6 +13,7 @@ > > #include "pretty.h" > > #include "line-log.h" > > #include "list-objects.h" > > +#include "list-objects-filter-options.h" > > #include "grep.h" > > static const char * const walken_usage[] = { > > @@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev) > > blob_count = 0; > > tree_count = 0; > > - traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); > > + if (1) { > > + /* Unfiltered: */ > > + printf(_("Unfiltered object walk.\n")); > > + traverse_commit_list(rev, walken_show_commit, > > + walken_show_object, NULL); > > + } else { > > + printf(_("Filtered object walk with filterspec 'tree:1'.\n")); > > + /* > > +* We can parse a tree depth of 1 to demonstrate the kind of > > +* filtering that could occur eg during shallow cloning. > > +*/ > > I think I'd avoid the term "shallow clone" here. Shallow clone > refers to getting a limited commit history. That's orthogonal from > partial clone and the filtered tree walk that operates *within* a commit > or a series of commits. > > Granted, a user might want to do both a shallow and partial clone (and > then later partial fetches), but I wouldn't mix the concepts here. It's a valid complaint. I removed the mention of shallow cloning and replaced it with a reference to the documentation for --filter in rev-list. Thanks. - Emily
Re: [PATCH] documentation: add tutorial for revision walking
On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote: > On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer wrote: > > [...] > > The tutorial covers a basic overview of the structs involved during > > revision walk, setting up a basic commit walk, setting up a basic > > all-object walk, and adding some configuration changes to both walk > > types. It intentionally does not cover how to create new commands or > > search for options from the command line or gitconfigs. > > [...] > > Signed-off-by: Emily Shaffer > > --- > > diff --git a/Documentation/.gitignore b/Documentation/.gitignore > > @@ -12,6 +12,7 @@ cmds-*.txt > > SubmittingPatches.txt > > +MyFirstRevWalk.txt > > The new file itself is named Documentation/MyFirstRevWalk.txt, so why > add it to .gitignore? Yep, fixed. Holdover from an initial attempt which named the file MyFirstRevWalk (no extension), which was then corrected for the earlier tutorial I sent. Thanks. > > > diff --git a/Documentation/MyFirstRevWalk.txt > > b/Documentation/MyFirstRevWalk.txt > > @@ -0,0 +1,826 @@ > > +== What's a Revision Walk? > > + > > +The revision walk is a key concept in Git - this is the process that > > underpins > > +operations like `git log`, `git blame`, and `git reflog`. Beginning at > > HEAD, the > > +list of objects is found by walking parent relationships between objects. > > The > > +revision walk can also be usedto determine whether or not a given object is > > s/usedto/used to/ Done. > > > +reachable from the current HEAD pointer. > > + > > +We'll put our fiddling into a new command. For fun, let's name it `git > > walken`. > > +Open up a new file `builtin/walken.c` and set up the command handler: > > + > > + > > +/* > > + * "git walken" > > + * > > + * Part of the "My First Revision Walk" tutorial. > > + */ > > + > > +#include > > +#include "builtin.h" > > Git source files must always include cache.h or git-compat-util.h (or, > for builtins, builtin.h) as the very first header since those headers > take care of differences which might crop up as problems with system > headers on various platforms. System headers are included after Git > headers. So, stdio.h should be included after builtin.h. In this case, > however, stdio.h will get pulled in by git-compat-util.h anyhow, so > you need not include it here. Done. > > > +Add usage text and `-h` handling, in order to pass the test suite: > > + > > + > > +static const char * const walken_usage[] = { > > + N_("git walken"), > > + NULL, > > +} > > Unless you plan on referencing this from functions other than > cmd_walken(), it need not be global. Done; bad C++ habits sneaking in. :) > > > +int cmd_walken(int argc, const char **argv, const char *prefix) > > +{ > > + struct option options[] = { > > + OPT_END() > > + }; > > + > > + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); > > + > > + ... > > Perhaps comment out the "..." or remove it altogether to avoid having > the compiler barf when the below instructions tell the reader to build > the command. Hmm. That part I'm not so sure about. I like to use the "..." to indicate where the code in the snippet should be added around the other code already in the file - which I suppose it does just as clearly if it's commented - but I also hope folks are not simply copy-pasting blindly from the tutorial. It seems like including uncommented "..." in code tutorials is pretty common. I don't think I have a good reason to push back on this except that I think "/* ... */" is ugly :) I'll go through and replace "..." with some actual hints about what's supposed to go there; for example, here I'll replace with "/* print and return */". > > > +} > > + > > +Also add the relevant line in builtin.h near `cmd_whatchanged()`: > > s/builtin.h/`&`/ Done. > > > +Build and test out your command, without forgetting to ensure the > > `DEVELOPER` > > +flag is set: > > + > > + > > +echo DEVELOPER=1 >config.mak > > This will blast existing content of 'config.mak' which could be > dangerous. It might be better to suggest >> instead. Done. > > > +`name` is the SHA-1 of the object - a 40-digit hex string you may be > > familiar > > +with from using Git to organize your source in the past. Chec
Re: [PATCH] documentation: add tutorial for revision walking
On Mon, Jun 10, 2019 at 01:49:41PM -0700, Junio C Hamano wrote: > Emily Shaffer writes: > > > +My First Revision Walk > > +== > > + > > +== What's a Revision Walk? > > + > > +The revision walk is a key concept in Git - this is the process that > > underpins > > +operations like `git log`, `git blame`, and `git reflog`. Beginning at > > HEAD, the > > +list of objects is found by walking parent relationships between objects. > > The > > +revision walk can also be usedto determine whether or not a given object is > > +reachable from the current HEAD pointer. > > s/usedto/used to/; Done. > > > +We'll put our fiddling into a new command. For fun, let's name it `git > > walken`. > > +Open up a new file `builtin/walken.c` and set up the command handler: > > + > > + > > +/* > > + * "git walken" > > + * > > + * Part of the "My First Revision Walk" tutorial. > > + */ > > + > > +#include > > Bad idea. In the generic part of the codebase, system headers are > supposed to be supplied by including git-compat-util.h (or cache.h > or builtin.h, that are common header files that begin by including > it and are allowed by CodingGuidelines to be used as such). Done. > > > +#include "builtin.h" > > + > > +int cmd_walken(int argc, const char **argv, const char *prefix) > > +{ > > +printf(_("cmd_walken incoming...\n")); > > +return 0; > > +} > > + > > I wonder if it makes sense to use trace instead of printf, as our > reader has already seen the psuh example for doing the above. Hmmm. I will think about it and look into the intended use of each. I hadn't considered using a different logging method. > > > +Add usage text and `-h` handling, in order to pass the test suite: > > It is not wrong per-se, and it indeed is a very good practice to > make sure that our subcommands consistently gives usage text and > short usage. Encouraging them early is a good idea. > > But "in order to pass the test suite" invites "eh, the test suite > does not pass without usage and -h? why?". > > Either drop the mention of "the test suite", or perhaps say > something like > > Add usage text and `-h` handling, like all the subcommands > should consistently do (our test suite will notice and > complain if you fail to do so). > > i.e. the real purpose is consistency and usability; test suite is > merely an enforcement mechanism. Yeah, you're right. I'll reword this. > > > + > > +{ "walken", cmd_walken, RUN_SETUP }, > > + > > + > > +Add it to the `Makefile` near the line for `builtin\worktree.o`: > > Backslash intended? Nope, typo. Thanks for the comments, Junio. - Emily
Re: [PATCH] documentation: add tutorial for revision walking
On Mon, Jun 10, 2019 at 01:25:14PM -0700, Junio C Hamano wrote: > Emily Shaffer writes: > > > I'll also be mailing an RFC patchset In-Reply-To this message; the RFC > > patchset should not be merged to Git, as I intend to host it in my own > > mirror as an example. I hosted a similar example for the > > MyFirstContribution tutorial; it's visible at > > https://github.com/nasamuffin/git/tree/psuh. There might be a better > > place to host these so I don't "own" them but I'm not sure what it is; > > keeping them as a live branch somewhere struck me as an okay way to keep > > them from getting stale. > > Yes, writing the initial version is one thing, but keeping it alive > is more work and more important. As the underlying API changes over > time, it will become necessary to update the sample implementation, > but for a newbie who wants to learn by building "walken" on top of > the then-current codebase and API, it would not be so helpful to > show "these 7 patches were for older codebase, and the tip 2 are > incremental updates to adjust to the newer API", so the maintenance > of these sample patches may need different paradigm than the norm > for our main codebase that values incremental polishing. > I'm trying to think of how it would end up working if I tried to use a Github workflow. I think it wouldn't - someone would open a PR, and then I'd have to rewrite that change into the appropriate commit in the live branch and push the entire branch anew. Considering that workflow leaves me doubly convinced that leaving it in my personal fork indefinitely might not be wise (what if I become unable to continue maintaining it)? I wonder if this is something that might fit well in one of the more closely-associated mirrors, like gitster/git or gitgitgadget/git - although I wonder if those count as "owned" by Junio and Johannes, respectively. H. Maybe there's a case for storing them as a set of patch files that are revision-controlled somewhere within Documentation/? There was some discussion on the IRC a few weeks ago about trying to organize these tutorials into their own directory to form a sort of "Git Contribution 101" course, maybe it makes sense to store there? Documentation/contributing/myfirstcontrib/MyFirstContrib.txt Documentation/contributing/myfirstcontrib/sample/*.patch Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt Documentation/contributing/myfirstrevwalk/sample/*.patch I don't love the idea of maintaining text patches with the expectation that they should cleanly apply always, but it might make the idea that they shouldn't contain 2 patches on the tip for API adjustment more clear. And it would be probably pretty easy to inflate and build them with a build target or something. Hm. - Emily
Re: [PATCH v3] rev-list: teach --no-object-names to enable piping
On Mon, Jun 17, 2019 at 03:32:34PM -0700, Junio C Hamano wrote: > Emily Shaffer writes: > > > Allow easier parsing by cat-file by giving rev-list an option to print > > only the OID of a non-commit object without any additional information. > > This is a short-term shim; later on, rev-list should be taught how to > > print the types of objects it finds in a format similar to cat-file's. > > > > Before this commit, the output from rev-list needed to be massaged > > before being piped to cat-file, like so: > > > > git rev-list --objects HEAD | cut -f 1 -d ' ' \ > > | git cat-file --batch-check > > Write this with '|' at the end of the first line; that way the shell > knows you haven't finished your sentence without any backslash. Oh cool. Will do. > > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > > index 660172b014..7e2598fd22 100644 > > --- a/builtin/rev-list.c > > +++ b/builtin/rev-list.c > > @@ -49,6 +49,7 @@ static const char rev_list_usage[] = > > "--objects | --objects-edge\n" > > "--unpacked\n" > > "--header | --pretty\n" > > +"--no-object-names\n" > > Ideally, this should be "--[no-]object-names", i.e. --object-names > which may be the default when using --objects could be tweaked with > --no-object-names and then again turned on with a --object-names > later on the command line, following the usual "last one wins". Sure, good point. Will fix. > > > @@ -75,6 +76,9 @@ enum missing_action { > > }; > > static enum missing_action arg_missing_action; > > > > +/* display only the oid of each object encountered */ > > +static int arg_no_object_names; > > And this would become > > static int show_object_names = 1; > > that can be turned off via --no-object-names (and --object-names > would flip it on). It would reduce the double negation brain > twister, hopefully. > I'll send a new patch today. Thanks. - Emily
[PATCH v4] rev-list: teach --no-object-names to enable piping
Allow easier parsing by cat-file by giving rev-list an option to print only the OID of a non-commit object without any additional information. This is a short-term shim; later on, rev-list should be taught how to print the types of objects it finds in a format similar to cat-file's. Before this commit, the output from rev-list needed to be massaged before being piped to cat-file, like so: git rev-list --objects HEAD | cut -f 1 -d ' ' | git cat-file --batch-check This was especially unexpected when dealing with root trees, as an invisible whitespace exists at the end of the OID: git rev-list --objects --filter=tree:1 --max-count=1 HEAD | xargs -I% echo "AA%AA" Now, it can be piped directly, as in the added test case: git rev-list --objects --no-object-names HEAD | git cat-file --batch-check Signed-off-by: Emily Shaffer Change-Id: I489bdf0a8215532e54017513ff7541d70e1b --- Since v3, added a corresponding "--object-names" arg to pair with "--no-object-names", and "last-one-wins" logic. Also added a test to validate this new arg and the logic. I did not take Junio's suggestion of naming the arg "show_object_names" as "arg_show_object_names" better matches the existing style of builtin/revlist.c. In adding the test, I noticed that I had left in a test about --oid-only that doesn't apply after the changes from v2->v3; that test is removed. - Emily builtin/rev-list.c | 19 ++- t/t6000-rev-list-misc.sh | 20 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 660172b014..301ccb970b 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -49,6 +49,7 @@ static const char rev_list_usage[] = "--objects | --objects-edge\n" "--unpacked\n" "--header | --pretty\n" +"--[no-]object-names\n" "--abbrev= | --no-abbrev\n" "--abbrev-commit\n" "--left-right\n" @@ -75,6 +76,9 @@ enum missing_action { }; static enum missing_action arg_missing_action; +/* display only the oid of each object encountered */ +static int arg_show_object_names = 1; + #define DEFAULT_OIDSET_SIZE (16*1024) static void finish_commit(struct commit *commit); @@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data) display_progress(progress, ++progress_counter); if (info->flags & REV_LIST_QUIET) return; - show_object_with_name(stdout, obj, name); + if (arg_show_object_names) + show_object_with_name(stdout, obj, name); + else + printf("%s\n", oid_to_hex(&obj->oid)); } static void show_edge(struct commit *commit) @@ -484,6 +491,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (skip_prefix(arg, "--missing=", &arg)) continue; /* already handled above */ + if (!strcmp(arg, ("--no-object-names"))) { + arg_show_object_names = 0; + continue; + } + + if (!strcmp(arg, ("--object-names"))) { + arg_show_object_names = 1; + continue; + } + usage(rev_list_usage); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 0507999729..52a9e38d66 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -48,6 +48,26 @@ test_expect_success 'rev-list --objects with pathspecs and copied files' ' ! grep one output ' +test_expect_success 'rev-list --objects --no-object-names has no space/names' ' + git rev-list --objects --no-object-names HEAD >output && + ! grep wanted_file output && + ! grep unwanted_file output && + ! grep " " output +' + +test_expect_success 'rev-list --objects --no-object-names works with cat-file' ' + git rev-list --objects --no-object-names --all >list-output && + git cat-file --batch-check cat-output && + ! grep missing cat-output +' + +test_expect_success '--no-object-names and --object-names are last-one-wins' ' + git rev-list --objects --no-object-names --object-names --all >output && + grep wanted_file output && + git rev-list --objects --object-names --no-object-names --all >output && + ! grep wanted_file output +' + test_expect_success 'rev-list A..B and rev-list ^A B are the same' ' git commit --allow-empty -m another && git tag -a -m "annotated" v1.0 && -- 2.22.0.410.gd8fdbe21b5-goog
Re: [PATCH v4] rev-list: teach --no-object-names to enable piping
On Wed, Jun 19, 2019 at 07:08:14AM -0700, Junio C Hamano wrote: > Emily Shaffer writes: > > > Since v3, added a corresponding "--object-names" arg to pair with > > "--no-object-names", and "last-one-wins" logic. Also added a test to > > validate this new arg and the logic. > > Thanks for a quick turnaround (unfortunately, I was OOO yesterday > and I am half-sick today, so please expect slow responses---sorry > about that). > > > In adding the test, I noticed that I had left in a test about --oid-only > > that doesn't apply after the changes from v2->v3; that test is removed. > > I noticed in range-diff, too. So now --object-names can be used > with --pretty (not that "rev-list --pretty --objects" makes much > sense in the first place, so no point in testing that it works). Yeah, it works. It looks weird, but it works pretty much as you'd expect: emilyshaffer@podkayne:~/git-second [stray-whitespace]$ tg2 rev-list --object-names --objects --pretty=short --max-count=1 HEAD | head -n 20 commit 701c66d5f2fafe163892fa0968ce8bca041dbc92 Author: Emily Shaffer rev-list: teach --no-object-names to enable piping d4b1d372d16aaff35b221afce017f90542fd9293 41d4cd23fd97f599053a19555a173894da71e560 .clang-format 42cdc4bbfb05934bb9c3ed2fe0e0d45212c32d7a .editorconfig emilyshaffer@podkayne:~/git-second [stray-whitespace]$ tg2 rev-list --no-object-names --objects --pretty=short --max-count=1 HEAD | head -n 20 commit 701c66d5f2fafe163892fa0968ce8bca041dbc92 Author: Emily Shaffer rev-list: teach --no-object-names to enable piping d4b1d372d16aaff35b221afce017f90542fd9293 41d4cd23fd97f599053a19555a173894da71e560 42cdc4bbfb05934bb9c3ed2fe0e0d45212c32d7a Ah, but when I was grabbing these samples, I noticed that I didn't update the manpage for rev-list. So please wait while I reroll again... sorry! - Emily
[PATCH v5] rev-list: teach --no-object-names to enable piping
Allow easier parsing by cat-file by giving rev-list an option to print only the OID of a non-commit object without any additional information. This is a short-term shim; later on, rev-list should be taught how to print the types of objects it finds in a format similar to cat-file's. Before this commit, the output from rev-list needed to be massaged before being piped to cat-file, like so: git rev-list --objects HEAD | cut -f 1 -d ' ' | git cat-file --batch-check This was especially unexpected when dealing with root trees, as an invisible whitespace exists at the end of the OID: git rev-list --objects --filter=tree:1 --max-count=1 HEAD | xargs -I% echo "AA%AA" Now, it can be piped directly, as in the added test case: git rev-list --objects --no-object-names HEAD | git cat-file --batch-check Signed-off-by: Emily Shaffer Change-Id: I489bdf0a8215532e54017513ff7541d70e1b --- Since v4, added the new options to `git help rev-list`. Documentation/git-rev-list.txt | 1 + Documentation/rev-list-options.txt | 10 ++ builtin/rev-list.c | 19 ++- t/t6000-rev-list-misc.sh | 20 4 files changed, 49 insertions(+), 1 deletion(-) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index 88609ff435..9392760b25 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -48,6 +48,7 @@ SYNOPSIS [ --date=] [ [ --objects | --objects-edge | --objects-edge-aggressive ] [ --unpacked ] + [ --object-names | --no-object-names ] [ --filter= [ --filter-print-omitted ] ] ] [ --missing= ] [ --pretty | --header ] diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 71a1fcc093..286fc163f1 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -708,6 +708,16 @@ ifdef::git-rev-list[] Only useful with `--objects`; print the object IDs that are not in packs. +--object-names:: + Only useful with `--objects`; print the names of the object IDs + that are found. This is the default behavior. + +--no-object-names:: + Only useful with `--objects`; does not print the names of the object + IDs that are found. This inverts `--object-names`. This flag allows + the output to be more easily parsed by commands such as + linkgit:git-cat-file[1]. + --filter=:: Only useful with one of the `--objects*`; omits objects (usually blobs) from the list of printed objects. The '' diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 660172b014..301ccb970b 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -49,6 +49,7 @@ static const char rev_list_usage[] = "--objects | --objects-edge\n" "--unpacked\n" "--header | --pretty\n" +"--[no-]object-names\n" "--abbrev= | --no-abbrev\n" "--abbrev-commit\n" "--left-right\n" @@ -75,6 +76,9 @@ enum missing_action { }; static enum missing_action arg_missing_action; +/* display only the oid of each object encountered */ +static int arg_show_object_names = 1; + #define DEFAULT_OIDSET_SIZE (16*1024) static void finish_commit(struct commit *commit); @@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char *name, void *cb_data) display_progress(progress, ++progress_counter); if (info->flags & REV_LIST_QUIET) return; - show_object_with_name(stdout, obj, name); + if (arg_show_object_names) + show_object_with_name(stdout, obj, name); + else + printf("%s\n", oid_to_hex(&obj->oid)); } static void show_edge(struct commit *commit) @@ -484,6 +491,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix) if (skip_prefix(arg, "--missing=", &arg)) continue; /* already handled above */ + if (!strcmp(arg, ("--no-object-names"))) { + arg_show_object_names = 0; + continue; + } + + if (!strcmp(arg, ("--object-names"))) { + arg_show_object_names = 1; + continue; + } + usage(rev_list_usage); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 0507999729..52a9e38d66 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -48,6 +48,26 @@ test_expect_success 'rev-list --objects with pathspecs and copied files' ' ! grep one output ' +test_expect_success 'rev-list --objects --no-object-names has no space/names' ' + git rev-list --objects -
Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage
On Wed, Jun 19, 2019 at 05:21:59PM -0400, Jeff King wrote: > On Fri, Jun 14, 2019 at 03:56:54PM -0700, Emily Shaffer wrote: > > > > Ah, I see. I don't consider "|" to indicate an exclusion to the point > > > that the options are rejected. Only that you wouldn't want to use both, > > > because one counteracts the other. So every "--no-foo" is mutually > > > exclusive with "--foo" in the sense that one override the other. But the > > > outcome is "last one wins", and not "whoops, we cannot figure out what > > > you meant". And that's what the original: > > > > > > --abbrev= | --no-abbrev > > > > > > before your patch was trying to say (and I suspect there are many other > > > cases of "|" with this kind of last-one-wins behavior). > > > > For what it's worth, in this case it's not last-one-wins - --no-abbrev > > always wins: > > Ah, thanks for pointing that; I hadn't noticed. That _is_ unlike most of > the rest of Git. I'm tempted to say it's a bug and should be fixed, but > I worry slightly that it could have an unexpected effect. > > > I think a good solution here is to go and add --abbrev-commit= > > without breaking support for --abbrev=; I'm a little more worried > > about changing --no-abbrev to last-one-wins but I'll take a crack at it > > and see what the test suite says. While I'm at it, I'll check for > > last-one-wins with multiple instances of --abbrev[-commit]=. > > I think --abbrev-commit= sounds safe enough, though I worry it may > get a bit complicated because we'd presumably want to fall back to the > from --abbrev=. I'll see how your patch turns out. :) > > I like the idea of changing --no-abbrev to last-one-wins, as above, but > the test suite may not give us that much confidence. These kinds of > cases are often not well-covered, and we're really worried about the > wider world of random scripts people have grown over the last 10 years. > Of course if the test suite does break horribly that might give us extra > caution, but I'm not sure "the test suite does not break" gives us much > confidence. I think at that point, based on what I've seen on the list, there's a tendency to include a config option to enable the new behavior. And I think that might pass the tipping point of "wasted effort" for me. The change isn't really enabling anything that was impossible before (last-one-wins is handy, but it's legacy public behavior now for --abbrev). So I'm kind of starting to lean towards doing nothing at all. It's a lot of work, for a not-very-functional change, which needs to be specially configured to even happen... So maybe I'll save my time and work on actual bugs instead. :) > > > Having done so, I'll also change the documentation here in rev-list to: > > --abbrev-commit[=] [--abbrev=] | --no-abbrev > > Yeah, that makes sense. > > -Peff Thanks for the input, all. I think I'll drop this. - Emily
Re: [PATCH] documentation: add tutorial for revision walking
On Wed, Jun 19, 2019 at 04:13:35AM -0400, Eric Sunshine wrote: > On Mon, Jun 17, 2019 at 7:20 PM Emily Shaffer wrote: > > On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote: > > > On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer > > > wrote: > > > > +int cmd_walken(int argc, const char **argv, const char *prefix) > > > > +{ > > > > + struct option options[] = { > > > > + OPT_END() > > > > + }; > > > > + > > > > + argc = parse_options(argc, argv, prefix, options, walken_usage, > > > > 0); > > > > + > > > > + ... > > > > > > Perhaps comment out the "..." or remove it altogether to avoid having > > > the compiler barf when the below instructions tell the reader to build > > > the command. > > > > Hmm. That part I'm not so sure about. I like to use the "..." to > > indicate where the code in the snippet should be added around the other > > code already in the file - which I suppose it does just as clearly if > > it's commented - but I also hope folks are not simply copy-pasting > > blindly from the tutorial. > > > > It seems like including uncommented "..." in code tutorials is pretty > > common. > > You're right, and that's not what I was "complaining" about. Looking > back at your original email, I see that I somehow got confused and > didn't realize or (quickly) forgot that you had already presented a > _complete_ cmd_walken() snippet just above that spot, and that the > cmd_walken() snippet upon which I was commenting was _incomplete_, > thus the "..." was perfectly justified. Not realizing that the > incomplete cmd_walken() example was just that (incomplete), I > "complained" that the following "compile the project" instructions > would barf on "...". > > Maybe I got confused because the tiny cmd_walken() snippets followed > one another so closely (or because I got interrupted several times > during the review), but one way to avoid that would be to present a > single _complete_ snippet from the start, followed by a bit of > explanation. That is, something like this: > > Open up a new file `builtin/walken.c` and set up the command handler: > > > /* "git walken" -- Part of the "My First Revision Walk" tutorial. */ > #include "builtin.h" > > int cmd_walken(int argc, const char **argv, const char *prefix) > { > const char * const usage[] = { > N_("git walken"), > NULL, > } > struct option options[] = { > OPT_END() > }; > > argc = parse_options(argc, argv, prefix, options, usage, 0); > > printf(_("cmd_walken incoming...\n")); > return 0; > } > > > `usage` is the usage message presented by `git -h walken`, and > `options` will eventually specify command-line options. Hmm. I can say that I personally would find that much more difficult to follow interactively, and I'd be tempted to copy-and-paste and skim through the wall of text if I was presented with such a snippet. However, I could also imagine the reverse - someone becoming tired of having their hand held through a fairly straightforward implementation, when they're perfectly capable of reading a long description and would just like to get on with it. I'm really curious about what others think in this scenario, since I imagine it boils down to individual learning styles. (Maybe we can split the difference and present a complete patch or new function, followed by a breakdown? That would end up even more verbose than the current approach, though.) ... Now that I'm thinking more about this, and reading some of your later comments on this mail, I think it might be valuable to lean on the sample patchset for complete code samples, especially if we figure a good way to distribute the patchset near the tutorial (as Junio and I are discussing in another branch of this thread). Then we can keep the tutorial concise, but have the complete code available for those who prefer to look there. > > > I don't think I have a good reason to push back on this except that I > > think "/* ... */" is ugly :) > > > > I'll go through and replace "..." with some actual hints about what's > > supposed to go there; for example, here I'll replace with "/* print and > > return */". > > Seeing as my initial review comment was in error, I'm not sure that >
Re: [PATCH] documentation: add tutorial for revision walking
On Wed, Jun 19, 2019 at 08:17:29AM -0700, Junio C Hamano wrote: > Emily Shaffer writes: > > > Maybe there's a case for storing them as a set of patch files that are > > revision-controlled somewhere within Documentation/? There was some > > discussion on the IRC a few weeks ago about trying to organize these > > tutorials into their own directory to form a sort of "Git Contribution > > 101" course, maybe it makes sense to store there? > > > > Documentation/contributing/myfirstcontrib/MyFirstContrib.txt > > Documentation/contributing/myfirstcontrib/sample/*.patch > > Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt > > Documentation/contributing/myfirstrevwalk/sample/*.patch > > > > I don't love the idea of maintaining text patches with the expectation > > that they should cleanly apply always,... > > Well, I actually think the above organization does match the intent > of the "My first contribution codelab" perfectly. When the codebase, > the workflow used by the project, and/or the coding or documentation > guideline gets updated, the text that documents how to contribute to > the project as well as the sample patches must be updated to match > the updated reality. > > I agree with you that maintaining the *.patch files to always > cleanly apply is less than ideal. A topic to update the sample > patches and tutorial text may be competing with another topic that > updates the very API the tutorials are teaching, and the sample > patches may not apply cleanly when two topics are merged together, > even if the "update sample patches and tutorial text" topic does > update them to match the API at the tip of the topic branch itself. > One thing we _could_ do is to pin the target version of the codebase > for the sake of tutorial. IOW, the sample/*.patch may not apply > cleanly to the version of the tree these patches were taken from, > but would always apply cleanly to the most recent released version > before the last update to the tutorial, or something like that. > > Also having to review the patch to sample/*.patch files will be > unpleasant. I wonder if we can ease some pain for both of the above issues by including some scripts to "inflate" the patch files into a topic branch, or figure out some more easily-reviewed (but more complicated, I suppose) method for sending updates to the sample/*.patch files. Imagining workflows like this: Doing the tutorial: - In worktree a/. - Run a magic script which creates a worktree with the sample code, b/. - Read through a/Documentation/MyFirstContribution.txt and generate a/builtins/psuh.c, referring to b/builtins/psuh.c if confused. Rebasing the tutorial patches: - In worktree a/. - Run a magic script which checks out a new branch at the last known good base for the patchset, then applies all the patches. - Now faced with, likely, a topic branch based on v (where n is latest release). - `git rebase v -x (make && ./bin-wrappers/git psuh)` - Interactively fix conflicts - Run a script to generate a magic interdiff from the old version of patches - Mail out magic interdiff to list and get approval - (Maybe maintainer does this when interdiff is happy? Maybe updater does this when review looks good?) Run a magic script to regenerate patches from rebased branch, and note somewhere they are based on v - Mail sample/*.patch (based on v) to list (if maintainer rolled the patches after interdiff approval, this step can be skipped) (This seems to still be a lot of steps, even with the magic script..) Alternatively, for the same process: Updater: Run a magic script to create topic branch based on v (like before) U: `git rebase v -x (make && ./bin-wrappers/git psuh)` U: Interactively fix conflicts U: Run a script to turn topic branch back into sample/*.patch U: Send email with changes to sample/*.patch (this will be ugly and unreadable) - message ID Reviewer: Run a magic script, providing argument, which grabs the diff-of-.patch and generates an interdiff, or a topic branch based on v R: Send comments explaining where issue is (tricky to find where to inline in the diff-of-.patch) U: Reroll diff-of-.patch email R: Accepts Maintainer: Applies diff-of-.patch email normally I suppose for the first suggestion, there ends up being quite a lot of onus on the maintainer, and a lot of trust that there is no difference between the RFC easy-to-read interdiff patchset. For the second suggestion, there ends up being onus on the reviewers to run some magical script. Maybe we can split the difference by expecting Updater to provide the interdiff below the --- line? Maybe in practice the diff-of-.patch isn't so unreadable, if it's only minor changes needed to bring the tutorial up to latest? I'm not sure there's a way to make this totally painless using email tools. - Emily
Re: [PATCH] doc: improve usage string in MyFirstContribution
On Thu, Jun 20, 2019 at 12:13:15AM +0200, Christian Couder wrote: > SYNOPSIS > > [verse] > -'git-psuh' > +'git-psuh ...' It doesn't require 1 or more args - you can run it with no args. So it might be better suited to state the args as optional: 'git psuh [arg]...' > > static const char * const psuh_usage[] = { > - N_("git psuh"), > + N_("git psuh ..."), ...and here too. - Emily
[PATCH v2] documentation: add tutorial for revision walking
Existing documentation on revision walks seems to be primarily intended as a reference for those already familiar with the procedure. This tutorial attempts to give an entry-level guide to a couple of bare-bones revision walks so that new Git contributors can learn the concepts without having to wade through options parsing or special casing. The target audience is a Git contributor who is just getting started with the concept of revision walking. The goal is to prepare this contributor to be able to understand and modify existing commands which perform revision walks more easily, although it will also prepare contributors to create new commands which perform walks. The tutorial covers a basic overview of the structs involved during revision walk, setting up a basic commit walk, setting up a basic all-object walk, and adding some configuration changes to both walk types. It intentionally does not cover how to create new commands or search for options from the command line or gitconfigs. There is an associated patchset at https://github.com/nasamuffin/git/tree/revwalk that contains a reference implementation of the code generated by this tutorial. Signed-off-by: Emily Shaffer Helped-by: Eric Sunshine --- Significant changes since r1 related to Eric and Junio's comments: - Formatting fixes (multiline comments, whitespaces, etc.) - Clarifications: - oneline options - `buf` arg in object walk callback - topo_order - placement of reverse setting - '-h' handling - Converted patchset to plumbing tool, using trace where appropriate - Removed unnecessary return values in walken_commit_walk and walken_object_walk - Added an entire section devoted to the `omitted` list - Tried to remove unnecessary explanatory comments in tutorial (and added more explanatory comments in sample code) There's still the question of how the samples should be source-controlled and kept fresh. I think we left it at "Maybe checking in a stack of *.patch files would be tolerable", but it didn't sound like anybody was quite happy with that solution. - Emily Documentation/Makefile | 1 + Documentation/MyFirstRevWalk.txt | 910 +++ 2 files changed, 911 insertions(+) create mode 100644 Documentation/MyFirstRevWalk.txt diff --git a/Documentation/Makefile b/Documentation/Makefile index 76f2ecfc1b..91e5da67c4 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS) TECH_DOCS += MyFirstContribution TECH_DOCS += SubmittingPatches +TECH_DOCS += MyFirstRevWalk TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt new file mode 100644 index 00..ea2f16c4b1 --- /dev/null +++ b/Documentation/MyFirstRevWalk.txt @@ -0,0 +1,910 @@ +My First Revision Walk +== + +== What's a Revision Walk? + +The revision walk is a key concept in Git - this is the process that underpins +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the +list of objects is found by walking parent relationships between objects. The +revision walk can also be used to determine whether or not a given object is +reachable from the current HEAD pointer. + +=== Related Reading + +- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of + the revision walker in its various incarnations. +- `Documentation/technical/api-revision-walking.txt` +- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] + gives a good overview of the types of objects in Git and what your revision + walk is really describing. + +== Setting Up + +Create a new branch from `master`. + + +git checkout -b revwalk origin/master + + +We'll put our fiddling into a new command. For fun, let's name it `git walken`. +Open up a new file `builtin/walken.c` and set up the command handler: + + +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + trace_printf(_("cmd_walken incoming...\n")); + return 0; +} + + +NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or +off at runtime. For the purposes of this tutorial, we will write `walken` as +though it is intended for use as a "plumbing" command: that is, a command which +is used primarily in scripts, rather than interactively by humans (a "porcelain" +command). So we will send our debug output to `trace_printf()` instead. When +running, enable trace output by setting the environment variable `GIT_TRACE`. + +Add usage text and `-h` handling, like all subcommands should consistently do +(our test sui
[RFC PATCH v2 04/13] walken: add handler to git_config
For now, we have no configuration options we want to set up for ourselves, but in the future we may need to. At the very least, we should invoke git_default_config() for each config option; we will do so inside of a skeleton config callback so that we know where to add configuration handling later on when we need it. Signed-off-by: Emily Shaffer --- builtin/walken.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index daae4f811a..2474a0d7b2 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -5,6 +5,7 @@ */ #include "builtin.h" +#include "config.h" #include "parse-options.h" /* @@ -24,11 +25,36 @@ const char * const walken_usage[] = { static void init_walken_defaults(void) { /* -* We don't actually need the same components `git log` does; leave this -* empty for now. +* We don't use any other components or have settings to initialize, so +* leave this empty. */ } +/* + * This method will be called back by git_config(). It is used to gather values + * from the configuration files available to Git. + * + * Each time git_config() finds a configuration file entry, it calls this + * callback. Then, this function should compare it to entries which concern us, + * and make settings changes as necessary. + * + * If we are called with a config setting we care about, we should use one of + * the helpers which exist in config.h to pull out the value for ourselves, i.e. + * git_config_string(...) or git_config_bool(...). + * + * If we don't match anything, we should pass it along to another stakeholder + * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our case, + * we'll ignore everybody else. + */ +static int git_walken_config(const char *var, const char *value, void *cb) +{ + /* +* For now, we don't have any custom configuration, so fall back on the +* default config. +*/ + return git_default_config(var, value, cb); +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -43,6 +69,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix) init_walken_defaults(); + git_config(git_walken_config, NULL); + /* * This line is "human-readable" and we are writing a plumbing command, * so we localize it and use the trace library to print only when -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 00/13] example implementation of revwalk tutorial
Since r1, made some significant changes. - Added a commit for counting the 'omitted' list, to match the new section added to the tutorial. - Added significant comments to allow the sample to better stand on its own. - Fixed style issues (die() formatting, etc) - Distinguished between human- and machine-readable output with trace_printf() and printf(), to turn the command into plumbing. - More changes as mentioned in the tutorial patch. Thanks. - Emily Emily Shaffer (13): walken: add infrastructure for revwalk demo walken: add usage to enable -h walken: add placeholder to initialize defaults walken: add handler to git_config walken: configure rev_info and prepare for walk walken: perform our basic revision walk walken: filter for authors from gmail address walken: demonstrate various topographical sorts walken: demonstrate reversing a revision walk list walken: add unfiltered object walk from HEAD walken: add filtered object walk walken: count omitted objects walken: reverse the object walk order Makefile | 1 + builtin.h| 1 + builtin/walken.c | 290 +++ git.c| 1 + 4 files changed, 293 insertions(+) create mode 100644 builtin/walken.c -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo
Begin to add scaffolding for `git walken`, a toy command which we will teach to perform a number of revision walks, in order to demonstrate the mechanics of revision walking for developers new to the Git project. This commit is the beginning of an educational series which correspond to the tutorial in Documentation/MyFirstRevWalk.txt. Signed-off-by: Emily Shaffer Change-Id: I64297621919412f54701e111366e99c4ef0feae3 --- Makefile | 1 + builtin.h| 1 + builtin/walken.c | 13 + 3 files changed, 15 insertions(+) create mode 100644 builtin/walken.c diff --git a/Makefile b/Makefile index f58bf14c7b..5bac1dbf8d 100644 --- a/Makefile +++ b/Makefile @@ -1137,6 +1137,7 @@ BUILTIN_OBJS += builtin/var.o BUILTIN_OBJS += builtin/verify-commit.o BUILTIN_OBJS += builtin/verify-pack.o BUILTIN_OBJS += builtin/verify-tag.o +BUILTIN_OBJS += builtin/walken.o BUILTIN_OBJS += builtin/worktree.o BUILTIN_OBJS += builtin/write-tree.o diff --git a/builtin.h b/builtin.h index ec7e0954c4..c919736c36 100644 --- a/builtin.h +++ b/builtin.h @@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char *prefix); int cmd_verify_commit(int argc, const char **argv, const char *prefix); int cmd_verify_tag(int argc, const char **argv, const char *prefix); int cmd_version(int argc, const char **argv, const char *prefix); +int cmd_walken(int argc, const char **argv, const char *prefix); int cmd_whatchanged(int argc, const char **argv, const char *prefix); int cmd_worktree(int argc, const char **argv, const char *prefix); int cmd_write_tree(int argc, const char **argv, const char *prefix); diff --git a/builtin/walken.c b/builtin/walken.c new file mode 100644 index 00..d2772a0e8f --- /dev/null +++ b/builtin/walken.c @@ -0,0 +1,13 @@ +/* + * "git walken" + * + * Part of the "My First Revision Walk" tutorial. + */ + +#include "builtin.h" + +int cmd_walken(int argc, const char **argv, const char *prefix) +{ + trace_printf(_("cmd_walken incoming...\n")); + return 0; +} -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 02/13] walken: add usage to enable -h
It's expected that Git commands support '-h' in order to provide a consistent user experience (and this expectation is enforced by the test suite). '-h' is captured by parse_options() by default; in order to support this flag, we add a short usage text to walken.c and invoke parse_options(). With this change, we can now add cmd_walken to the builtins set and expect tests to pass, so we'll do so - cmd_walken is now open for business. Signed-off-by: Emily Shaffer Change-Id: I2919dc1efadb82acb335617ea24371c84b03bbce --- builtin/walken.c | 25 + git.c| 1 + 2 files changed, 26 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index d2772a0e8f..9eea51ff70 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -5,9 +5,34 @@ */ #include "builtin.h" +#include "parse-options.h" + +/* + * All builtins are expected to provide a usage to provide a consistent user + * experience. + */ +const char * const walken_usage[] = { + N_("git walken"), + NULL, +}; int cmd_walken(int argc, const char **argv, const char *prefix) { + struct option options[] = { + OPT_END() + }; + + /* +* parse_options() handles showing usage if incorrect options are +* provided, or if '-h' is passed. +*/ + argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + + /* +* This line is "human-readable" and we are writing a plumbing command, +* so we localize it and use the trace library to print only when +* the GIT_TRACE environment variable is set. +*/ trace_printf(_("cmd_walken incoming...\n")); return 0; } diff --git a/git.c b/git.c index c2eec470c9..2a7fb9714f 100644 --- a/git.c +++ b/git.c @@ -601,6 +601,7 @@ static struct cmd_struct commands[] = { { "verify-pack", cmd_verify_pack }, { "verify-tag", cmd_verify_tag, RUN_SETUP }, { "version", cmd_version }, + { "walken", cmd_walken, RUN_SETUP }, { "whatchanged", cmd_whatchanged, RUN_SETUP }, { "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT }, { "write-tree", cmd_write_tree, RUN_SETUP }, -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 07/13] walken: filter for authors from gmail address
In order to demonstrate how to create grep filters for revision walks, filter the walk performed by cmd_walken() to print only commits which are authored by someone with a gmail address. This commit demonstrates how to append a grep pattern to a rev_info.grep_filter, to teach new contributors how to create their own more generalized grep filters during revision walks. Signed-off-by: Emily Shaffer --- builtin/walken.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index 335dcb6b21..da2d197914 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -11,6 +11,7 @@ #include "parse-options.h" #include "pretty.h" #include "line-log.h" +#include "grep.h" /* * All builtins are expected to provide a usage to provide a consistent user @@ -28,10 +29,8 @@ const char * const walken_usage[] = { */ static void init_walken_defaults(void) { - /* -* We don't use any other components or have settings to initialize, so -* leave this empty. -*/ + /* Needed by our grep filter. */ + init_grep_defaults(the_repository); } /* @@ -60,6 +59,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix * setup_revisions(argc, argv, rev, &opt); */ + /* Add a grep pattern to the author line in the header. */ + append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail"); + compile_grep_patterns(&rev->grep_filter); + /* Let's force oneline format. */ get_commit_format("oneline", rev); rev->verbose_header = 1; @@ -86,10 +89,7 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix */ static int git_walken_config(const char *var, const char *value, void *cb) { - /* -* For now, we don't have any custom configuration, so fall back on the -* default config. -*/ + grep_config(var, value, cb); return git_default_config(var, value, cb); } -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 03/13] walken: add placeholder to initialize defaults
Eventually, we will want a good place to initialize default variables for use during our revision walk(s) in `git walken`. For now, there's nothing to do here, but let's add the scaffolding so that it's easy to tell where to put the setup later on. Signed-off-by: Emily Shaffer --- builtin/walken.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 9eea51ff70..daae4f811a 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -16,6 +16,19 @@ const char * const walken_usage[] = { NULL, }; +/* + * Within init_walken_defaults() we can call into other useful defaults to set + * in the global scope or on the_repository. It's okay to borrow from other + * functions which are doing something relatively similar to yours. + */ +static void init_walken_defaults(void) +{ + /* +* We don't actually need the same components `git log` does; leave this +* empty for now. +*/ +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -28,6 +41,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix) */ argc = parse_options(argc, argv, prefix, options, walken_usage, 0); + init_walken_defaults(); + /* * This line is "human-readable" and we are writing a plumbing command, * so we localize it and use the trace library to print only when -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 12/13] walken: count omitted objects
It may be illuminating to see which objects were not included within a given filter. This also demonstrates, since filter-spec "tree:1" is used, that the 'omitted' list contains all objects which are omitted, not just the first objects which were omitted - that is, it continues to dereference omitted trees and commits. This is part of a tutorial on performing revision walks. Signed-off-by: Emily Shaffer --- builtin/walken.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index a744d042d8..dc59ff5009 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -45,7 +45,7 @@ static void init_walken_defaults(void) * mirror those settings in post_repo_init_init. */ static void final_rev_info_setup(int argc, const char **argv, const char *prefix, - struct rev_info *rev) +struct rev_info *rev) { /* * Optional: @@ -145,6 +145,11 @@ static void walken_show_object(struct object *obj, const char *str, void *buf) static void walken_object_walk(struct rev_info *rev) { struct list_objects_filter_options filter_options = {}; + struct oidset omitted; + struct oidset_iter oit; + struct object_id *oid = NULL; + int omitted_count = 0; + oidset_init(&omitted, 0); printf("walken_object_walk beginning...\n"); @@ -181,13 +186,19 @@ static void walken_object_walk(struct rev_info *rev) walken_show_commit, walken_show_object, NULL, NULL); } + /* Count the omitted objects. */ + oidset_iter_init(&omitted, &oit); + + while ((oid = oidset_iter_next(&oit))) + omitted_count++; + /* * This print statement is designed to be script-parseable. Script * authors will rely on the output not to change, so we will not * localize this string. It will go to stdout directly. */ - printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count, - blob_count, tag_count, tree_count); + printf("commits %d\n blobs %d\n tags %d\n trees %d omitted %d\n", + commit_count, blob_count, tag_count, tree_count, omitted_count); } /* -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 08/13] walken: demonstrate various topographical sorts
Order the revision walk by author or commit dates, to demonstrate how to apply topo_sort to a revision walk. While following the tutorial, new contributors are guided to run a walk with each sort and compare the results. Signed-off-by: Emily Shaffer Change-Id: I7ce2f3e8a77c42001293637ae209087afec4ce2c --- builtin/walken.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index da2d197914..6cc451324a 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -69,6 +69,13 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix /* add the HEAD to pending so we can start */ add_head_to_pending(rev); + + /* Let's play with the sort order. */ + rev->topo_order = 1; + + /* Toggle between these and observe the difference. */ + rev->sort_order = REV_SORT_BY_COMMIT_DATE; + /* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */ } /* -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
Provide a demonstration of a revision walk which traverses all types of object, not just commits. This type of revision walk is used for operations such as creating packfiles and performing fetches or clones, so it's useful to teach new developers how it works. For starters, only demonstrate the unfiltered version, as this will make the tutorial easier to follow. This commit is part of a tutorial on revision walking. Signed-off-by: Emily Shaffer Change-Id: If3b11652ba011b28d29b1c3984dac4a3f80a5f53 --- builtin/walken.c | 88 +++- 1 file changed, 79 insertions(+), 9 deletions(-) diff --git a/builtin/walken.c b/builtin/walken.c index 958923c172..42b23ba1ec 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -11,6 +11,7 @@ #include "parse-options.h" #include "pretty.h" #include "line-log.h" +#include "list-objects.h" #include "grep.h" /* @@ -22,6 +23,11 @@ const char * const walken_usage[] = { NULL, }; +static int commit_count; +static int tag_count; +static int blob_count; +static int tree_count; + /* * Within init_walken_defaults() we can call into other useful defaults to set * in the global scope or on the_repository. It's okay to borrow from other @@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } +static void walken_show_commit(struct commit *cmt, void *buf) +{ + commit_count++; +} + +static void walken_show_object(struct object *obj, const char *str, void *buf) +{ + switch (obj->type) { + case OBJ_TREE: + tree_count++; + break; + case OBJ_BLOB: + blob_count++; + break; + case OBJ_TAG: + tag_count++; + break; + case OBJ_COMMIT: + die(_("unexpectedly encountered a commit in " + "walken_show_object\n")); + commit_count++; + break; + default: + die(_("unexpected object type %s\n"), type_name(obj->type)); + break; + } +} + +/* + * walken_object_walk() is invoked by cmd_walken() after initialization. It does + * a walk of all object types. + */ +static void walken_object_walk(struct rev_info *rev) +{ + rev->tree_objects = 1; + rev->blob_objects = 1; + rev->tag_objects = 1; + rev->tree_blobs_in_commit_order = 1; + rev->exclude_promisor_objects = 1; + + if (prepare_revision_walk(rev)) + die(_("revision walk setup failed")); + + commit_count = 0; + tag_count = 0; + blob_count = 0; + tree_count = 0; + + traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + + /* +* This print statement is designed to be script-parseable. Script +* authors will rely on the output not to change, so we will not +* localize this string. It will go to stdout directly. +*/ + printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count, + blob_count, tag_count, tree_count); +} + /* * walken_commit_walk() is invoked by cmd_walken() after initialization. It * does the commit walk only. @@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev) struct strbuf prettybuf = STRBUF_INIT; /* - * prepare_revision_walk() gets the final steps ready for a revision +* prepare_revision_walk() gets the final steps ready for a revision * walk. We check the return value for errors. - */ +*/ if (prepare_revision_walk(rev)) { die(_("revision walk setup failed")); } /* - * Now we can start the real commit walk. get_revision grabs the next +* Now we can start the real commit walk. get_revision grabs the next * revision based on the contents of rev. */ rev->diffopt.close_file = 0; @@ -166,13 +231,18 @@ int cmd_walken(int argc, const char **argv, const char *prefix) /* We can set our traversal flags here. */ rev.always_show_header = 1; - /* -* Before we do the walk, we need to set a starting point by giving it -* something to go in `pending` - that happens in here -*/ - final_rev_info_setup(argc, argv, prefix, &rev); - walken_commit_walk(&rev); + if (1) { + add_head_to_pending(&rev); + walken_object_walk(&rev); + } else { + /* +* Before we do the walk, we need to set a starting point by giving it +* something to go in `pending` - that happens in here +*/ + final_rev_info_setup(argc, argv, prefix, &r
[RFC PATCH v2 13/13] walken: reverse the object walk order
Demonstrate that just like commit walks, object walks can have their order reversed. Additionally, add verbose logging of objects encountered in order to let contributors prove to themselves that the walk has actually been reversed. With this commit, `git walken` becomes extremely chatty - it's recommended to pipe the output through `head` or `tail` or to redirect it into a file. Signed-off-by: Emily Shaffer Change-Id: I91883b209a61ae4d87855878291e487fe36220c4 --- builtin/walken.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index dc59ff5009..37b02887a5 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -112,11 +112,13 @@ static int git_walken_config(const char *var, const char *value, void *cb) static void walken_show_commit(struct commit *cmt, void *buf) { + printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); commit_count++; } static void walken_show_object(struct object *obj, const char *str, void *buf) { + printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); switch (obj->type) { case OBJ_TREE: tree_count++; @@ -158,6 +160,7 @@ static void walken_object_walk(struct rev_info *rev) rev->tag_objects = 1; rev->tree_blobs_in_commit_order = 1; rev->exclude_promisor_objects = 1; + rev->reverse = 1; if (prepare_revision_walk(rev)) die(_("revision walk setup failed")); -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 11/13] walken: add filtered object walk
Demonstrate how filter specs can be used when performing a revision walk of all object types. In this case, tree depth is used. Contributors who are following the revision walking tutorial will be encouraged to run the revision walk with and without the filter in order to compare the number of objects seen in each case. Signed-off-by: Emily Shaffer Change-Id: I6d22ba153c1afbc780c261c47f1fa03ea478b5ed --- builtin/walken.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/builtin/walken.c b/builtin/walken.c index 42b23ba1ec..a744d042d8 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -12,6 +12,7 @@ #include "pretty.h" #include "line-log.h" #include "list-objects.h" +#include "list-objects-filter-options.h" #include "grep.h" /* @@ -143,6 +144,10 @@ static void walken_show_object(struct object *obj, const char *str, void *buf) */ static void walken_object_walk(struct rev_info *rev) { + struct list_objects_filter_options filter_options = {}; + + printf("walken_object_walk beginning...\n"); + rev->tree_objects = 1; rev->blob_objects = 1; rev->tag_objects = 1; @@ -157,7 +162,24 @@ static void walken_object_walk(struct rev_info *rev) blob_count = 0; tree_count = 0; - traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); + if (1) { + /* Unfiltered: */ + trace_printf(_("Unfiltered object walk.\n")); + traverse_commit_list(rev, walken_show_commit, + walken_show_object, NULL); + } else { + trace_printf(_("Filtered object walk with filterspec " + "'tree:1'.\n")); + /* +* We can parse a tree depth of 1 to demonstrate the kind of +* filtering that could occur during various operations (see +* `git help rev-list` and read the entry on `--filter`). +*/ + parse_list_objects_filter(&filter_options, "tree:1"); + + traverse_commit_list_filtered(&filter_options, rev, + walken_show_commit, walken_show_object, NULL, NULL); + } /* * This print statement is designed to be script-parseable. Script -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk
`struct rev_info` is what's used by the struct itself. `repo_init_revisions()` initializes the struct; then we need to set it up for the walk we want to perform, which is done in `final_rev_info_setup()`. The most important step here is adding the first object we want to walk to the pending array. Here, we take the easy road and use `add_head_to_pending()`; there is also a way to do it with `setup_revision_opt()` and `setup_revisions()` which we demonstrate but do not use. If we were to forget this step, the walk would do nothing - the pending queue would be checked, determined to be empty, and the walk would terminate immediately. Signed-off-by: Emily Shaffer Change-Id: I76754b740227cf17a449f3f536dbbe37031e6f9a --- builtin/walken.c | 50 1 file changed, 50 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 2474a0d7b2..c463eca843 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -5,6 +5,7 @@ */ #include "builtin.h" +#include "revision.h" #include "config.h" #include "parse-options.h" @@ -30,6 +31,40 @@ static void init_walken_defaults(void) */ } +/* + * cmd_log calls a second set of init after the repo_init_revisions call. We'll + * mirror those settings in post_repo_init_init. + */ +static void final_rev_info_setup(int argc, const char **argv, const char *prefix, + struct rev_info *rev) +{ + /* +* Optional: +* setup_revision_opt is used to pass options to the setup_revisions() +* call. It's got some special items for submodules and other types of +* optimizations, but for now, we'll just point it to HEAD and call it +* good. First we should make sure to reset it. This is useful for more +* complicated stuff but a decent shortcut for the first pass is +* add_head_to_pending(). +*/ + + /* +* struct setup_revision_opt opt; + +* memset(&opt, 0, sizeof(opt)); +* opt.def = "HEAD"; +* opt.revarg_opt = REVARG_COMMITTISH; +* setup_revisions(argc, argv, rev, &opt); +*/ + + /* Let's force oneline format. */ + get_commit_format("oneline", rev); + rev->verbose_header = 1; + + /* add the HEAD to pending so we can start */ + add_head_to_pending(rev); +} + /* * This method will be called back by git_config(). It is used to gather values * from the configuration files available to Git. @@ -61,6 +96,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix) OPT_END() }; + struct rev_info rev; + /* * parse_options() handles showing usage if incorrect options are * provided, or if '-h' is passed. @@ -71,6 +108,19 @@ int cmd_walken(int argc, const char **argv, const char *prefix) git_config(git_walken_config, NULL); + /* +* Time to set up the walk. repo_init_revisions sets up rev_info with +* the defaults, but then you need to make some configuration settings +* to make it do what's special about your walk. +*/ + repo_init_revisions(the_repository, &rev, prefix); + + /* +* Before we do the walk, we need to set a starting point by giving it +* something to go in `pending` - that happens in here +*/ + final_rev_info_setup(argc, argv, prefix, &rev); + /* * This line is "human-readable" and we are writing a plumbing command, * so we localize it and use the trace library to print only when -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 06/13] walken: perform our basic revision walk
Add the final steps needed and implement the walk loop itself. We add a method walken_commit_walk() which performs the final setup to revision.c and then iterates over commits from get_revision(). This basic walk only prints the subject line of each commit in the history. It is nearly equivalent to `git log --oneline`. Signed-off-by: Emily Shaffer Change-Id: If6dc5f3c9d14df077b99e42806cf790c96191582 --- builtin/walken.c | 43 +++ 1 file changed, 43 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index c463eca843..335dcb6b21 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -6,8 +6,11 @@ #include "builtin.h" #include "revision.h" +#include "commit.h" #include "config.h" #include "parse-options.h" +#include "pretty.h" +#include "line-log.h" /* * All builtins are expected to provide a usage to provide a consistent user @@ -90,6 +93,41 @@ static int git_walken_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } +/* + * walken_commit_walk() is invoked by cmd_walken() after initialization. It + * does the commit walk only. + */ +static void walken_commit_walk(struct rev_info *rev) +{ + struct commit *commit; + struct strbuf prettybuf = STRBUF_INIT; + + /* + * prepare_revision_walk() gets the final steps ready for a revision +* walk. We check the return value for errors. + */ + if (prepare_revision_walk(rev)) { + die(_("revision walk setup failed")); + } + + /* + * Now we can start the real commit walk. get_revision grabs the next +* revision based on the contents of rev. +*/ + rev->diffopt.close_file = 0; + while ((commit = get_revision(rev))) { + if (!commit) + continue; + strbuf_reset(&prettybuf); + pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); + /* +* We expect this part of the output to be machine-parseable - +* one commit message per line - so we must not localize it. +*/ + puts(prettybuf.buf); + } +} + int cmd_walken(int argc, const char **argv, const char *prefix) { struct option options[] = { @@ -115,12 +153,17 @@ int cmd_walken(int argc, const char **argv, const char *prefix) */ repo_init_revisions(the_repository, &rev, prefix); + /* We can set our traversal flags here. */ + rev.always_show_header = 1; + /* * Before we do the walk, we need to set a starting point by giving it * something to go in `pending` - that happens in here */ final_rev_info_setup(argc, argv, prefix, &rev); + walken_commit_walk(&rev); + /* * This line is "human-readable" and we are writing a plumbing command, * so we localize it and use the trace library to print only when -- 2.22.0.410.gd8fdbe21b5-goog
[RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list
The final installment in the tutorial about sorting revision walk outputs. This commit reverses the commit list, so that we see newer commits last (handy since we aren't using a pager). It's important to note that rev->reverse needs to be set after add_head_to_pending() or before setup_revisions(). (This is mentioned in the accompanying tutorial.) Signed-off-by: Emily Shaffer --- builtin/walken.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/walken.c b/builtin/walken.c index 6cc451324a..958923c172 100644 --- a/builtin/walken.c +++ b/builtin/walken.c @@ -69,6 +69,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix /* add the HEAD to pending so we can start */ add_head_to_pending(rev); + + /* Reverse the order */ + rev->reverse = 1; /* Let's play with the sort order. */ rev->topo_order = 1; -- 2.22.0.410.gd8fdbe21b5-goog