Re: [PATCH 1/1] documentation: add lab for first contribution

2019-04-12 Thread Emily Shaffer
On Thu, Apr 11, 2019 at 8:20 PM Junio C Hamano  wrote:
>
> "Emily Shaffer via GitGitGadget"  writes:
>
> > diff --git a/Documentation/MyFirstContribution 
> > b/Documentation/MyFirstContribution
> > new file mode 100644
> > index 00..9b87e424d6
> > --- /dev/null
> > +++ b/Documentation/MyFirstContribution
> > @@ -0,0 +1,674 @@
> > +My First Contribution
> > +=
> > +
> > +== Summary
>
> Just a minor nit but only the document title uses the underlined
> mark-up format, but not each of its sections?

I was basing the format of this document on
Documentation/SubmittingPatches, which underlines the document
title but prefixes the rest of the headers with /=+/ to indicate level. I
can change it, but I like the hierarchical headers offered by ={1,4}
prefix.

>
> Is the goal of this document to help those who want to contribute to
> *THIS* project, or to any project that happens to use Git as their
> source control tool of choice?  I take it to be the former, but if
> that is the case, perhaps extend the title to "My First Contribution
> to the Git Project" or something, so that those who use us merely as
> a tool can tell that it can be ignored more easily?

Good point, I'll extend the title.

>
> > +=== Set Up Your Workspace
> > +
> > +Let's start by making a development branch to work on our changes. Per
> > +`Documentation/SubmittingPatches`, since a brand new command is a new 
> > feature,
> > +it's fine to base your work on `master`. However, in the future for 
> > bugfixes,
> > +etc., you should check that doc and base it on the appropriate branch.
> > +
> > +For the purposes of this doc, we will base all our work on `master`. Before
> > +running the command below, ensure that you are on `master` first so your
> > +branch diverges at the right point.
> > +
> > +
> > +git checkout -b psuh
> > +
>
> An obvious and more explicit alternative, which may be better both
> for tutorial purposes (as it illustrates what is going on better by
> not relying on an implicit default) and for real-life (as it does
> not require checking out 'master' only to switch away from a new
> branch, saving a entry of two in the HEAD reflog) is
>
> For the purposes of this doc, we will base all our work on
> the `master` branch of the upstream project.  Fork the
> `psuh` branch you use for your development like so:
>
> 
> $ git checkout -b psuh origin/master
> 
>

I like this suggestion, but don't like the use of "fork" as it
somewhat conflates
a GitHub-specific term. I'll take this recommendation but use "create" instead
of "fork".

> > +=== Adding a new command
> > +
> > +Lots of the main useful commands are written as builtins, which means they 
> > are
>
> Drop 'useful', and possibly 'main' as well.  I would have written
> "Many of the Git subcommands are..." if I were writing it myself
> without reading yours.
>
> > +implemented in C and compiled into the main `git` executable.. So it is
> > +informative to implement `git psuh` as a builtin.
> > +Create a new file in `builtin/` called `psuh.c`.
>
> I am not sure what "informative" in this context means.

I was hoping to indicate that it would be an informative exercise for the reader
who is following the tutorial. I'll try to reword this and make it
more clear; thanks
for the example.

>
> Typically a built-in subcommand is implemented as a function
> whose name is "cmd_" followed by the name of the subcommand
> and stored in a C source file that is named after the name
> of the subcommand inside `builtin/` directory, so it is
> natural to implement your `psuh` command as a cmd_psuh()
> function in `builtin/psuh.c`.
>
> > +The entry point of your new command needs to match a certain signature:
> > +
> > +int cmd_psuh(int argc, const char **argv, const char *prefix)
> > +
> > +
> > +We'll also need to add the extern declaration of psuh; open up `builtin.h`,
> > +find the declaration for cmd_push, and add a new line for psuh:
> > +
> > +
> > +extern int cmd_psuh(int argc, const char **argv, const char *prefix);
> > +
>
> I think there is a push to drop the "extern " from decls of
> functions in *.h header files.
>

I'd just as soon change this documentation when that push changes the
other decls,
rather than spend time explaining a transient shift in how to d

Re: [PATCH 0/1] documentation: add lab for first contribution

2019-04-12 Thread Emily Shaffer
On Thu, Apr 11, 2019 at 7:35 PM Junio C Hamano  wrote:
>
> "Emily Shaffer via GitGitGadget"  writes:
>
> > RFC. I am still working on adding a section on handling refs and objects.
>
> Thanks.  Is 'lab' understood widely enough?  I _think_ you are
> abbreviating what is known as 'codelab' by your colleagues at work,
> but would it make it more in line with what we already have in this
> project, perhaps, to call it a "tutorial"?
>
I think you're right; I'll try to modify throughout. As part of that
change I will also move the sample to a branch with a more descriptive
name and change the URL.

Thanks a lot for the detailed review, Junio. I expect to send a new
patch later today, including an exercise in examining a specific
commit.

-- 
Emily Shaffer


Re: [PATCH 1/1] documentation: add lab for first contribution

2019-04-15 Thread Emily Shaffer
> >> ... then leave it in your example, perhaps?
> >>
> >
> > Good point. :)  I had wanted to avoid including my own name/email in the
> > tutorial; I used a throwaway "Git Contributor " for the 
> > example.
> > ...
> >> Keep a sample sign-off by "A U Thor " here.
>
>
> No, use "A U Thor " as I suggested.  That's the
> author ident the aspiring Git developer MUST become familiar with
> while working with our test suite in t/.  There you will also find
> the counterpart committer ident to use, if needed.

Done.

>
> Just FYI, I rarely give a "do it exactly like this" suggestion;
> often I instead stop at giving a general direction and leave it up
> to the contributers to perfect it.  The "A U Thor" is one of those
> rare cases.  On the other hand, "fork" was *not*.

Sorry about that. I've found it's good practice to "show, don't tell"
when I make review
comments to avoid confusion, which isn't quite the same as "do it
exactly like this"
but looks similar on the box. So I guessed wrong this time. :) Will
push a fix with it.

>
> > Do folks on Git project usually engage in test-driven development? I
> > would be happy to move the test up towards the front of the document
> > and mention the usefulness of TDD, but not if that's not something
> > emphasized usually by the group..
>
> I have no strong opinion on this myself.
>
> I suspect that the developer would clean up with "rebase -i" by
> squashing before submitting the result of a very fine-grained TDD
> workflow, as nobody would want to read printf("hello world") in
> [PATCH 1/47] that would become a real feature in a later step.  So
> if the tutorial wants to go into that tangent (which might be too
> much detail), it may be worth showing from the first few commits,
> but otherwise a sequence that pretends the reader to be a perfect
> developer who does not make any mistake in the middle may be more
> concise and more readable.  I dunno.

In that case, I'd just as soon leave the order as it is. I think that
a developer,
outside of the context of a tutorial, will end up writing tests in the
order they
prefer regardless of the order of a tutorial they did one time. Maybe I can add
a note about tests being required for incoming patches to discourage readers
from glossing over that section.

>
> Thanks.


Re: [PATCH v2 0/1] documentation: add lab for first contribution

2019-04-16 Thread Emily Shaffer
On Tue, Apr 16, 2019 at 1:26 PM Emily Shaffer via GitGitGadget
 wrote:
>
> RFC. I am still working on adding a section on handling refs and objects.

Sorry for the spam; above line was stale in GitGitGadget. The relevant
section has been added
just before the help page. I think the change is complete pending
review comments.

I've also added instructions for using send-email in addition to GitGitGadget.

Thanks!


Re: [PATCH v2 1/1] documentation: add lab for first contribution

2019-04-17 Thread Emily Shaffer
> > This code lab covers how to add a new command to Git and, in the
> > process, everything from cloning git/git to getting reviewed on the mail
>
> "lab"?  I thought we settled on "tutorial".  Also the place we are
> having conversation we call "mailing list", I think.

You're right. My find/replace missed the commit; I'll read through the document
again as well.

> > +MyFirstContribution.txt: MyFirstContribution
> > + $(QUIET_GEN) cp $< $@
>
> Hmph.
>
> Unlike SubmittingPatches that has known as that specific filename
> for a long time before we added to the *.txt -> *.html toolchain
> (hence people may look for it without the *.txt suffix), I do not
> immediately see why the source of this new tutorial needs a variant
> without the suffix.  Is there a reason why this new file cannot be
> created as Documentaiton/MyFirstContribution.txt that I am missing?

The only reason is that I wasn't aware of the special case. Fixed.

> > +== Getting Started
> > +
> > +=== Pull the Git codebase
> > +
> > +Git is mirrored in a number of locations. https://git-scm.com/downloads
> > +suggests the best place to clone from is GitHub.
>
> "suggests that one of the best places ..."?

I'll change it. But it seems odd to say that when git-scm.com only mentions
the one mirror.

> > +Let's start by making a development branch to work on our changes. Per
> > +`Documentation/SubmittingPatches`, since a brand new command is a new 
> > feature,
> > +it's fine to base your work on `master`. However, in the future for 
> > bugfixes,
> > +etc., you should check that doc and base it on the appropriate branch.
>
> Avoid unnecessary abbreviation; s/doc/document/.  Same for "lab", if
> we are to call this "codelab" instead of "tutorial".  I won't repeat
> this for brevity, but I see many instances of them.

Sure. I'll check across for instances of "lab" and reword them to
"tutorial"; same for "doc"
vs "document" etc.

> > +For the purposes of this doc, we will base all our work on the `master` 
> > branch
> > +of the upstream project. Create the `psuh` branch you will use for 
> > development
> > +like so:
> > +
> > +
> > +git checkout -b psuh origin/master
> > +
>
> 
> $ git checkout -b psuh origin/master
> 
>
> I think both of our existing tutorials spell out the shell prompt to
> clarify what these lines are.  It would especially help in this
> document, where you have other monospaced display material that are
> not commands to be typed but code snippets.

I appreciate your comment about clarity. But including the prompt also
makes it more difficult to copy-paste into the command line...
I think if someone is blindly copy-pasting this tutorial, they won't
be getting a lot out of it anyway. So I'll add the prompts.

>
> > +
> > +We'll make a number of commits here in order to demonstrate how to send 
> > many
> > +patches up for review simultaneously.
>
> I'd write "a topic with multiple patches" instead of "many
> patches".  The point being that we are not sending a group of
> unrelated changes, but are focusing on a theme.

Good suggestion, thank you.

> > +== Code It Up!
> > +
> > +NOTE: A reference implementation can be found at
> > +https://github.com/nasamuffin/git/tree/psuh.
> > +
> > +=== Adding a new command
> > +
> > +Lots of the main useful commands are written as builtins, which means they 
> > are
>
> I'd say "the subcommands" without "main" or "useful".  There are
> fringe subcommands that are implemented as built-in, there are main
> useful commands that are not built-in, and "useful"-ness is in the
> eyes of beholder.

 Sure, done.

> > +implemented in C and compiled into the main `git` executable. Since they 
> > are so
> > +common, it is a useful exercise to implement `git psuh` as a builtin 
> > subcommand.
>
> What does "they" refer to in this sentence?  Exiting built-in
> commands are so common?  In what way are they "common"?  They are
> commonly used?  That is not relevant in the choice of making 'git
> psuh' a built-in or a standalone.
>
> Adding a new built-in command, if it were common, may be a good
> target for illustration.  But it is not all that common.
>
> Adding a built-in command requires you to understand the start-up
> sequence to the exit status, and serves as a good end-to-end
> exercise, if this tutorial's main aim is to give a tour of the
> codebase and its internal API.  An almost no-op "git psuh" built-in
> is small enough to serve as a good end-to-end exercise, without
> requiring the author to know much about the internal API, and would
> be a good material to show how the contributor, the reviewers and
> the maintainer work together to add it to the system.
>
> So "they are so common" is probably not a good excuse, even though
> using `git psuh` may be a good exercise for the purpose of this
> tutorial.
>
> Since adding an almost no-op built-in command is relatively
> simple, it is a good material to demonstrate how you as an
>   

[PATCH v3] documentation: add lab for first contribution

2019-04-19 Thread Emily Shaffer
This tutorial covers how to add a new command to Git and, in the
process, everything from cloning git/git to getting reviewed on the
mailing list. It's meant for new contributors to go through
interactively, learning the techniques generally used by the git/git
development community.

Signed-off-by: Emily Shaffer 
---
Notable changes in v3:
- Changed document name from MyFirstContribution to
  MyFirstContribution.txt; this reduced the changes to Makefile and
  eliminated changes to .gitignore.
- git send-email section now uses the --cover-letter argument to
  format-patch, and explains single-patch emails
- Reworked summary of engaging in code review, moving to its own section
  "Responding To Reviews"
- "Now What?" section now explains when to make a new commit vs when to
  modify existing commit

Other smaller file-wide changes:
- Expanded use of "doc" ; reworded references to "lab" and "codelab"
- Added $ prompts to bash snippets
- monospaced things that should be but weren't

 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1065 +
 2 files changed, 1066 insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 26a2342bea..fddc3c3c95 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstContribution
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
new file mode 100644
index 00..d9bed013fc
--- /dev/null
+++ b/Documentation/MyFirstContribution.txt
@@ -0,0 +1,1065 @@
+My First Contribution to the Git Project
+
+
+== Summary
+
+This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
+the Git tree, sending it for review, and making changes based on comments.
+
+=== Prerequisites
+
+This tutorial assumes you're already fairly familiar with using Git to manage
+source code.  The Git workflow steps will largely remain unexplained.
+
+=== Related Reading
+
+This tutorial aims to summarize the following documents, but the reader may 
find
+useful additional context:
+
+- `Documentation/SubmittingPatches`
+- `Documentation/howto/new-command.txt`
+
+== Getting Started
+
+=== Pull the Git codebase
+
+Git is mirrored in a number of locations. https://git-scm.com/downloads
+suggests one of the best places to clone from is GitHub.
+
+
+$ git clone https://github.com/git/git git
+
+
+=== Identify Problem to Solve
+
+
+Use + to indicate fixed-width here; couldn't get ` to work nicely with the
+quotes around "Pony Saying 'Um, Hello'".
+
+In this tutorial, we will add a new command, +git psuh+, short for ``Pony 
Saying
+`Um, Hello''' - a feature which has gone unimplemented despite a high frequency
+of invocation during users' typical daily workflow.
+
+(We've seen some other effort in this space with the implementation of popular
+commands such as `sl`.)
+
+=== Set Up Your Workspace
+
+Let's start by making a development branch to work on our changes. Per
+`Documentation/SubmittingPatches`, since a brand new command is a new feature,
+it's fine to base your work on `master`. However, in the future for bugfixes,
+etc., you should check that document and base it on the appropriate branch.
+
+For the purposes of this document, we will base all our work on the `master`
+branch of the upstream project. Create the `psuh` branch you will use for
+development like so:
+
+
+$ git checkout -b psuh origin/master
+
+
+We'll make a number of commits here in order to demonstrate how to send a topic
+with multiple patches up for review simultaneously.
+
+== Code It Up!
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/psuh.
+
+=== Adding a new command
+
+Lots of the subcommands are written as builtins, which means they are
+implemented in C and compiled into the main `git` executable. Implementing the
+very simple `psuh` command as a built-in will demonstrate the structure of the
+codebase, the internal API, and the process of working together as a 
contributor
+with the reviewers and maintainer to integrate this change into the system.
+
+Built-in subcommands are typically implemented in a function named "cmd_"
+followed by the name of the subcommand, in a source file named after the
+subcommand and contained within `builtin/`. So it makes sense to implement your
+command in `builtin/psuh.c`. Create that file, and within it, write the entry
+point for your comman

Re: [PATCH v3] documentation: add lab for first contribution

2019-04-22 Thread Emily Shaffer
On Sun, Apr 21, 2019 at 3:52 AM Junio C Hamano  wrote:
>
> Emily Shaffer  writes:
>
> > This tutorial covers how to add a new command to Git and, in the
> > process, everything from cloning git/git to getting reviewed on the
> > mailing list. It's meant for new contributors to go through
> > interactively, learning the techniques generally used by the git/git
> > development community.
> >
> > Signed-off-by: Emily Shaffer 
> > ---
>
> I think a stray "lab" remains in the title of the patch.  They seem
> to have disappeared from all the other places, and "tutorial" is
> consistently used, which is good.

Thanks, finished.

>
> My eyes have lost freshness on this topic, so my review tonight is
> bound to be (much) less thorough than the previous round.

I appreciate the thorough reviews you gave til now, thanks very much
for all your time.

>
> > +- `Documentation/SubmittingPatches`
> > +- `Documentation/howto/new-command.txt`
>
> Good (relative to the earlier one, these are set in monospace).
>
> > +
> > +== Getting Started
> > +
> > +=== Pull the Git codebase
> > +
> > +Git is mirrored in a number of locations. https://git-scm.com/downloads
>
> Perhaps URLs should also be set in monospace?

This breaks the hyperlink. So I'd rather not?

>
>
> > +NOTE: When you are developing the Git project, it's preferred that you use 
> > the
> > +`DEVELOPER flag`; if there's some reason it doesn't work for you, you can 
> > turn
>
> I do not think you want to set 'flag' in monospace, too.
> i.e. s| flag`|` flag|;

Thanks, good catch.

> > + git_config(git_default_config, NULL)
> > + if (git_config_get_string_const("user.name", &cfg_name) > 0)
> > + {
> > + printf(_("No name is found in config\n"));
> > + }
> > + else
> > + {
> > + printf(_("Your name: %s\n"), cfg_name);
> > + }
>
> Style.  Opening braces "{" for control structures are never be on
> its own line, and else comes on the same line as closing "}" of if,
> i.e.
>
> if (...) {
> print ...
> } else {
> print ...
> }

Took this suggestion.

>
> Or just get rid of braces if you are not going to extend one (or
> both) of if/else blocks into multi-statement blocks.
>
> > +
> > +
> > +`git_config()` will grab the configuration from config files known to Git 
> > and
> > +apply standard precedence rules. `git_config_get_string_const()` will look 
> > up
> > +a specific key ("user.name") and give you the value. There are a number of
> > +single-key lookup functions like this one; you can see them all (and more 
> > info
> > +about how to use `git_config()`) in 
> > `Documentation/technical/api-config.txt`.
> > +
> > +You should see that the name printed matches the one you see when you run:
> > +
> > +
> > +$ git config --get user.name
> > +
> > +
> > +Great! Now we know how to check for values in the Git config. Let's commit 
> > this
> > +too, so we don't lose our progress.
> > +
> > +
> > +$ git add builtin/psuh.c
> > +$ git commit -sm "psuh: show parameters & config opts"
> > +
> > +
> > +NOTE: Again, the above is for sake of brevity in this tutorial. In a real 
> > change
> > +you should not use `-m` but instead use the editor to write a verbose 
> > message.
>
> We never encourge people to write irrelevant things or obvious
> things that do not have to be said.  But a single-liner message
> rarely is sufficient to convey "what motivated the change, and why
> the change is done in the way seen in the patch" in a meaningful
> way.
>
> i.e. s|verbose|meaningful|;

Thanks, done. I'll leave out repeating the lecture here as it was
given in the full sample commit.

>
> > +Create your test script and mark it executable:
> > +
> > +
> > +$ touch t/t-psuh-tutorial.sh
> > +$ chmod +x t/t-psuh-tutorial.sh
> > +
>
> I never "create an empty file" before editing in real life.  Is this
> a common workflow in some circles?
>
> I'd be tempted to suggest s/touch/edit/ here, but I dunno.

Looking back, I'm wondering why I wrote it this way - I think I wanted
to avoid prescribing an editor and also mention the permissions.

I'll try to reword this to mention the executable bit after the
content of the test script.

>
>

[PATCH v4] documentation: add tutorial for first contribution

2019-04-23 Thread Emily Shaffer
This tutorial covers how to add a new command to Git and, in the
process, everything from cloning git/git to getting reviewed on the
mailing list. It's meant for new contributors to go through
interactively, learning the techniques generally used by the git/git
development community.

Signed-off-by: Emily Shaffer 
---
Only minor changes from v3, correcting the comments Junio made in his
review.

- Changed commit subject
- Stray monospace typos
- Curly brace style

 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1073 +
 2 files changed, 1074 insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 26a2342bea..fddc3c3c95 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstContribution
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
new file mode 100644
index 00..fc4a59a8c6
--- /dev/null
+++ b/Documentation/MyFirstContribution.txt
@@ -0,0 +1,1073 @@
+My First Contribution to the Git Project
+
+
+== Summary
+
+This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
+the Git tree, sending it for review, and making changes based on comments.
+
+=== Prerequisites
+
+This tutorial assumes you're already fairly familiar with using Git to manage
+source code.  The Git workflow steps will largely remain unexplained.
+
+=== Related Reading
+
+This tutorial aims to summarize the following documents, but the reader may 
find
+useful additional context:
+
+- `Documentation/SubmittingPatches`
+- `Documentation/howto/new-command.txt`
+
+== Getting Started
+
+=== Pull the Git codebase
+
+Git is mirrored in a number of locations. https://git-scm.com/downloads
+suggests one of the best places to clone from is GitHub.
+
+
+$ git clone https://github.com/git/git git
+
+
+=== Identify Problem to Solve
+
+
+Use + to indicate fixed-width here; couldn't get ` to work nicely with the
+quotes around "Pony Saying 'Um, Hello'".
+
+In this tutorial, we will add a new command, +git psuh+, short for ``Pony 
Saying
+`Um, Hello''' - a feature which has gone unimplemented despite a high frequency
+of invocation during users' typical daily workflow.
+
+(We've seen some other effort in this space with the implementation of popular
+commands such as `sl`.)
+
+=== Set Up Your Workspace
+
+Let's start by making a development branch to work on our changes. Per
+`Documentation/SubmittingPatches`, since a brand new command is a new feature,
+it's fine to base your work on `master`. However, in the future for bugfixes,
+etc., you should check that document and base it on the appropriate branch.
+
+For the purposes of this document, we will base all our work on the `master`
+branch of the upstream project. Create the `psuh` branch you will use for
+development like so:
+
+
+$ git checkout -b psuh origin/master
+
+
+We'll make a number of commits here in order to demonstrate how to send a topic
+with multiple patches up for review simultaneously.
+
+== Code It Up!
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/psuh.
+
+=== Adding a new command
+
+Lots of the subcommands are written as builtins, which means they are
+implemented in C and compiled into the main `git` executable. Implementing the
+very simple `psuh` command as a built-in will demonstrate the structure of the
+codebase, the internal API, and the process of working together as a 
contributor
+with the reviewers and maintainer to integrate this change into the system.
+
+Built-in subcommands are typically implemented in a function named "cmd_"
+followed by the name of the subcommand, in a source file named after the
+subcommand and contained within `builtin/`. So it makes sense to implement your
+command in `builtin/psuh.c`. Create that file, and within it, write the entry
+point for your command in a function matching the style and signature:
+
+
+int cmd_psuh(int argc, const char **argv, const char *prefix)
+
+
+We'll also need to add the extern declaration of psuh; open up `builtin.h`,
+find the declaration for `cmd_push`, and add a new line for `psuh` immediately
+before it, in order to keep the declarations sorted:
+
+
+extern int cmd_psuh(int argc, const char **argv, const char *prefix);
+
+
+Be sure to `#include "builtin.h"` in your `psuh.c`.
+
+Go ahead and add some throwaway printf to that function. This is a decent
+starting

[PATCH] gitsubmodules: align html and nroff lists

2019-05-01 Thread Emily Shaffer
There appears to be a bug in the toolchain generating manpages from
lettered lists. When a list is enumerated with letters, the resulting
nroff shows numbers instead. Mostly this is harmless, but in the case of
gitsubmodules, the paragraph following the list refers back to each
bullet by letter. As a result, reading this documentation via `man
gitsubmodules` is hard to parse - readers must infer that a bug exists
and a refers to 1, b refers to 2, and c refers to 3 in the list above.

The problem specifically was introduced in ad47194; previously rather
than generating numerated lists the bulleted area was entirely
monospaced in HTML and shown in plaintext in nroff.

The bug seems to exist in docbook-xml - I've reported it on May 1 via
the docbook-apps mail list - but for now it may make more sense to just
work around the issue.

Signed-off-by: Emily Shaffer 
---
 Documentation/gitsubmodules.txt | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/gitsubmodules.txt b/Documentation/gitsubmodules.txt
index 57999e9f36..0a890205b8 100644
--- a/Documentation/gitsubmodules.txt
+++ b/Documentation/gitsubmodules.txt
@@ -169,15 +169,15 @@ ACTIVE SUBMODULES
 
 A submodule is considered active,
 
-  a. if `submodule..active` is set to `true`
+  1. if `submodule..active` is set to `true`
 +
 or
 
-  b. if the submodule's path matches the pathspec in `submodule.active`
+  2. if the submodule's path matches the pathspec in `submodule.active`
 +
 or
 
-  c. if `submodule..url` is set.
+  3. if `submodule..url` is set.
 
 and these are evaluated in this order.
 
@@ -193,11 +193,11 @@ For example:
 url = https://example.org/baz
 
 In the above config only the submodule 'bar' and 'baz' are active,
-'bar' due to (a) and 'baz' due to (c). 'foo' is inactive because
-(a) takes precedence over (c)
+'bar' due to (1) and 'baz' due to (3). 'foo' is inactive because
+(1) takes precedence over (3)
 
-Note that (c) is a historical artefact and will be ignored if the
-(a) and (b) specify that the submodule is not active. In other words,
+Note that (3) is a historical artefact and will be ignored if the
+(1) and (2) specify that the submodule is not active. In other words,
 if we have a `submodule..active` set to `false` or if the
 submodule's path is excluded in the pathspec in `submodule.active`, the
 url doesn't matter whether it is present or not. This is illustrated in
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH v4] documentation: add tutorial for first contribution

2019-05-01 Thread Emily Shaffer
On Tue, Apr 30, 2019 at 11:59:23AM -0700, Josh Steadmon wrote:
> Just a couple typo fixes listed below:
> 

Thanks for the review, Josh.

I'll hold these fixes locally until I either get something more
significant to fix or Junio asks for them before a merge to next, to
reduce spam to the list.

- Emily
> 
> On 2019.04.23 12:34, Emily Shaffer wrote:
> [snip]
> > +=== Implementation
> > +
> > +It's probably useful to do at least something besides printing out a 
> > string.
> > +Let's start by having a look at everything we get.
> > +
> > +Modify your `cmd_psuh` implementation to dump the args you're passed:
> > +
> > +
> > +   int i;
> > +
> > +   ...
> > +
> > +   printf(Q_("Your args (there is %d):\n",
> > + "Your args (there are %d):\n",
> > + argc),
> > +  argc);
> > +   for (i = 0; i < argc; i++) {
> > +   printf("%d: %s\n", i, argv[i]);
> > +   }
> > +   printf(_("Your current working directory:\n%s%s\n"),
> > +  prefix ? "/" : "", prefix ? prefix : "");
> > +
> > +
> > +
> > +Build and try it. As you may expect, there's pretty much just whatever we 
> > give
> > +on the command line, including the name of our command. (If `prefix` is 
> > empty
> > +for you, try `cd Documentation/ && ../bin-wrappers/git/ psuh`). That's not 
> > so
> 
> Looks like you have an errant "/" after "git".

Right you are. Thanks.
> 
> 
> [snip]
> > +=== Adding documentation
> > +
> > +Awesome! You've got a fantastic new command that you're ready to share 
> > with the
> > +community. But hang on just a minute - this isn't very user-friendly. Run 
> > the
> > +following:
> > +
> > +
> > +$ ./bin-wrappers/git help psuh
> > +
> > +
> > +Your new command is undocumented! Let's fix that.
> > +
> > +Take a look at `Documentation/git-*.txt`. These are the manpages for the
> > +subcommands that Git knows about. You can open these up and take a look to 
> > get
> > +acquainted with the format, but then go ahead and make a new file
> > +`Documentation/git-psuh.txt`. Like with most of the documentation in the 
> > Git
> > +project, help pages are written with AsciiDoc (see CodingGuidelines, 
> > "Writing
> > +Documentation" section). Use the following template to fill out your own
> > +manpage:
> > +
> > +// Surprisingly difficult to embed AsciiDoc source within AsciiDoc.
> > +[listing]
> > +
> > +git-psuh(1)
> > +===
> > +
> > +NAME
> > +
> > +git-psuh - Delight users' typo with a shy horse
> > +
> > +
> > +SYNOPSIS
> > +
> > +[verse]
> > +'git-psuh'
> > +
> > +DESCRIPTION
> > +---
> > +...
> > +
> > +OPTIONS[[OPTIONS]]
> > +--
> > +...
> > +
> > +OUTPUT
> > +--
> > +...
> > +
> > +
> > +GIT
> > +---
> > +Part of the linkgit:git[1] suite
> > +
> > +
> > +The most important pieces of this to note are the file header, underlined 
> > by =,
> > +the NAME section, and the SYNOPSIS, which would normally contain the 
> > grammar if
> > +your command took arguments. Try to use well-established manpage headers 
> > so your
> > +documentation is consistent with other Git and UNIX manpages; this makes 
> > life
> > +easier for your user, who can skip to the section they know contains the
> > +information they need.
> > +
> > +Now that you've written your manpage, you'll need to build it explicitly. 
> > We
> > +convert your AsciiDoc to troff which is man-readable like so:
> > +
> > +
> > +$ make all doc
> > +$ man Documentation/git-psuh.1
> > +
> > +
> > +or
> > +
> > +
> > +$ make -C Documentation/git-psuh.1
> 
> Needs a space after "Documentation/".

Done. Thanks much.


Re: [PATCH 2/2] merge: add --quit

2019-05-02 Thread Emily Shaffer
They both look fine to me, besides a couple typos in the commit message
in this one.

On Wed, May 01, 2019 at 08:11:52PM +0700, Nguyễn Thái Ngọc Duy wrote:
> This allows to cancel the current merge without reseting worktree/index,

"resetting".

> which is what --abort is for. Like other --quit(s), this is often used
> when you forgot that you're in the middle of a merge and already
> switched away, doing different things. By the time you're realize, you
"By the time you've realized" ?

> can't even continue the merge anymore.

Thanks,
Emily


Re: [RFE] Allow for "interactive"-like actions in non-interactive rebase

2019-05-06 Thread Emily Shaffer
Hi,

On Fri, May 03, 2019 at 06:04:15PM +0300, Konstantin Kharlamov wrote:
> Interactive rebase (i.e. for example "git rebase -i HEAD~10") is used most
> often to apply an action to a single commit, e.g. "rename", "edit", "fixup",
> etc…
> 
> As result, people keep coming up with custom scripts and aliases for every
> distinct action.
> 
> Instead, it would be nice to have native support in git to start "rebase"
> for a given commit, and pass the "interactive action" to use on that commit.

I would totally use this. The equivalent workflow right now is a pretty
large number of steps for, say, fixing a typo.

> Examples:
> 
> $ git rebase -i HEAD~10 --action edit
> $ git rebase -i HEAD~10 --action rename
> $ git rebase -i HEAD~10 --action fixup

Is there an alternative to any of these actions that can already be
taken individually? Or, another way of asking, were the interactive
rebase commands based on some other Git command which could be pain to
do on many commits individually?

Emily


Re: [PATCH] commit-graph: fix memory leak

2019-05-06 Thread Emily Shaffer
Hi,

This change looks good to me, and like good evidence for the benefits of
automated tooling :)

Thanks!
 - Emily

On Mon, May 06, 2019 at 02:36:58PM -0700, Josh Steadmon wrote:
> Free the commit graph when verify_commit_graph_lite() reports an error.
> Credit to OSS-Fuzz for finding this leak.
> 
> Signed-off-by: Josh Steadmon 
> ---
>  commit-graph.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 66865acbd7..4bce70d35c 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -267,8 +267,10 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
> int fd,
>   last_chunk_offset = chunk_offset;
>   }
>  
> - if (verify_commit_graph_lite(graph))
> + if (verify_commit_graph_lite(graph)) {
> + free(graph);
>   return NULL;
> + }
>  
>   return graph;
>  }
> -- 
> 2.21.0.1020.gf2820cf01a-goog
> 


Re: [PATCH v3 00/16] Add new command 'restore'

2019-05-06 Thread Emily Shaffer
Hiya,

On Thu, Apr 25, 2019 at 04:45:44PM +0700, Nguyễn Thái Ngọc Duy wrote:
> v3 changes are really small
> 
> - gitcli.txt is updated to mention the --staged/--worktree pair, in
>   comparison to --cached/--index which is also mention there.
> 
> - The patch 'rm --staged' is dropped. It could come back. But if it
>   does, it should be in a series where commands that take
>   --cached/--index will now take both --staged/--worktree. Those are
>   'rm', 'apply', 'check-attr', 'grep', 'diff' and maybe 'ls-files'.
>   
>   In other words we support --staged/--worktree everywhere while
>   --cached/--index still remains because of muscle memory. This is
>   of course only a good move if --staged/--worktree is much better
>   than --cached/--index.
> 
> - Since there's a chance this series may end up in 'master' and get
>   released, and I'm still worried that I screwed up somewhere, the last
>   patch declares the two commands experimental for maybe one or two
>   release cycles.
>   
>   If the reception is good, we revert that patch. If something
>   horrible is found, we can still change the default behavior without
>   anybody yelling at us. Or worst case scenario, we remove both
>   commands and declare a failed experiment.
> 
> PS. the intent-to-add support is still not in. But it should be in
> before the experimental status is removed.

I've got a usability comment, as we're giving restore a try within
Google for now. I found myself in a situation where I had accidentally
staged all my changes to tracked files (I think resulting from a stash
pop which generated a merge conflict?) and didn't see a good way to
unstage everything using restore.

I tried out `git restore --staged *` and it tried to restore every build
artifact in my working tree, all of which should be ignored, made a lot of
noisy errors, and left me with my changes still staged.

Then, figuring that I only had a few files, I thought I'd type them
each, with the exception of a .c/.h pair:

g restore --staged builtin/log.c builtin/walken.c revision.*

Because I had build artifacts present, this vomited while trying to
unstage revision.o, and again left me with all my changes still staged.
revision.o is validly ignored:

$ g check-ignore revision.o   
revision.o

Finally explicitly naming each file which I needed to restore worked,
but I have very little interest in doing this for more than a handful of
files, especially since the output of `git status` is not easy to paste
into the command line (due to the "modified:" etc marker).

I was definitely still able to make it do what I wanted with `git reset
HEAD` but thought you may be interested. It'd be cool to have a hint
which advises how you can unstage everything, or else to pay attention
to the gitignore and not try to unstage those files, or else to have a
command to unstage everything. (I looked for one by trying `git restore
--staged -a` but that doesn't exist :) )

Thanks!
Emily


Re: [PATCH v3 00/16] Add new command 'restore'

2019-05-07 Thread Emily Shaffer
On Tue, May 07, 2019 at 05:36:56PM +0700, Duy Nguyen wrote:
> On Tue, May 7, 2019 at 9:21 AM Emily Shaffer  wrote:
> > I've got a usability comment, as we're giving restore a try within
> > Google for now.
> 
> Thanks. I thought I was the only guinea pig :D and obviously not a
> good one since I know too much (which is not a good thing as a
> tester).
> 
> > I found myself in a situation where I had accidentally
> > staged all my changes to tracked files (I think resulting from a stash
> > pop which generated a merge conflict?) and didn't see a good way to
> > unstage everything using restore.
> >
> > I tried out `git restore --staged *` and it tried to restore every build
> > artifact in my working tree, all of which should be ignored, made a lot of
> > noisy errors, and left me with my changes still staged.
> 
> For the record, "git restore --staged :/" should do the trick and it
> is documented as an example (but without --staged).

Yeah, this worked, and today I also noted `git restore --staged .`
works, as does Junio's suggestion on the other mail (`git restore
--staged revision.\*`), and quoting the * (`git restore --staged '*'`).
So maybe I didn't think outside the box enough before mailing :)

> 
> Either way. I think you raise a good point about "*" (or patterns
> matching more than expected in general). I need to sleep on it and see
> if the old way of handling pattern matching failure is still a good
> way to go.

I think it's worth considering, especially as `git reset HEAD *` and
`git reset HEAD` both work. (`git restore --staged` complains that it
hasn't got any paths, but reset seems to figure you just mean
everything.) It was a surprising failure to me coming away from years of
using reset.

> 
> > Finally explicitly naming each file which I needed to restore worked,
> > but I have very little interest in doing this for more than a handful of
> > files, especially since the output of `git status` is not easy to paste
> > into the command line (due to the "modified:" etc marker).
> 
> For just a couple files, you could also use "-p" to go through them
> and either "a" to restore all or "d" to skip the file. Maybe adding
> "-i/--interactive" to git-restore too, just like "git add". Hmm...
> -- 
> Duyh

Anyways, maybe it's not a particularly weighty complaint, since there
are still ways to mass unstage, and if I had just RTFM I would have
found them. :) It's a new workflow, after all, and the old one still
works, so it's probably not reasonable to expect it all to just work the
same way with s/reset HEAD/restore --staged/.

Thanks!
 - Emily


Re: [PATCH v4] documentation: add tutorial for first contribution

2019-05-07 Thread Emily Shaffer
On Thu, May 02, 2019 at 07:11:04PM -0700, Phil Hord wrote:
> On Tue, Apr 23, 2019 at 12:35 PM Emily Shaffer  
> wrote:
> >
> > This tutorial covers how to add a new command to Git and, in the
> > process, everything from cloning git/git to getting reviewed on the
> > mailing list. It's meant for new contributors to go through
> > interactively, learning the techniques generally used by the git/git
> > development community.
> >
> 
> Thanks for working on this.  It's very nicely done.

:)

> > +Check it out! You've got a command! Nice work! Let's commit this.
> > +
> > +
> > +$ git add Makefile builtin.h builtin/psuh.c git.c
> > +$ git commit -s
> > +
> > +
> > +You will be presented with your editor in order to write a commit message. 
> > Start
> > +the commit with a 50-column or less subject line, including the name of the
> > +component you're working on. Remember to be explicit and provide the "Why" 
> > of
> 
> This part sounds a little ambiguous to me, as I'm expected to include
> the "Why" in my 50-column subject line.  I don't want to go overboard,
> but maybe direct them further to
> 
> After this, insert a blank line (always required) and then some
> text describing
> your change.  Remember to be explicit and ...
> 

Done. I agree it's ambiguous about how much is supposed to go into the
subject versus the body, so I've hopefully clarified by explaining that
the body "should provide the bulk of the context".

> > +
> > +$ make all doc
> > +$ man Documentation/git-psuh.1
> > +
> > +
> > +or
> > +
> > +
> > +$ make -C Documentation/git-psuh.1
> 
> There's an unwanted slash here. This should be `make -C Documentation
> git-psuh.1`.

Done, thanks. Nice catch.

> > +=== Overview of Testing Structure
> > +
> > +The tests in Git live in `t/` and are named with a 4-decimal digit, 
> > according to
> 
> This doesn't parse.  How about this?
> 
> named with a 4-decimal digit number using the schema shown in ...
> 

I think we've both managed to miss that I've swapped "decimal" and
"digit" by accident. :) How about "named with a 4-digit decimal number
using the schema"?

Very pleased that you caught this, since all the once-overs in the world
from the cooked-brain author and the cooked-brain second- or tenth-pass
reviewers wouldn't have caught this mistake. Thanks!

> > +== Sending Patches via GitGitGadget
> > +
> > +One option for sending patches is to follow a typical pull request 
> > workflow and
> > +send your patches out via GitGitGadget. GitGitGadget is a tool created by
> > +Johannes Schindelin to make life as a Git contributor easier for those 
> > used to
> > +the GitHub PR workflow. It allows contributors to open pull requests 
> > against its
> > +mirror of the Git project, and does some magic to turn the PR into a set of
> > +emails and sent them out for you. It also runs the Git continuous 
> > integration
> 
> nit: "send" them out for you.

Done, thanks.

> 
> > +suite for you. It's documented at http://gitgitgadget.github.io.
> > +
> > +=== Forking git/git on GitHub
> > +
> > +Before you can send your patch off to be reviewed using GitGitGadget, you 
> > will
> > +need to fork the Git project and upload your changes. First thing - make 
> > sure
> > +you have a GitHub account.
> > +
> > +Head to the https://github.com/git/git[GitHub mirror] and look for the Fork
> > +button. Place your fork wherever you deem appropriate and create it.
> > +
> > +=== Uploading To Your Own Fork
> 
> I noticed some of your titles Use Capital Initials and others do not.
> I suppose either is fine, but consistency is appreciated.
>

Nice catch. I've gone through and fixed up the titles throughout; as a
result I also caught a missed monospace.

> > +=== Sending Your Patches
> > +
> > +Now that your CI is passing and someone has granted you permission to use
> > +GitGitGadget with the `/allow` command,  sending out for review is as 
> > simple as
> 
> nit: extra space before "sending"
> 
Done.

> > +Next, go look at your pull request against GitGitGadget; you should see 
> > the CI
> > +has been  kicked off again. Now while the CI is running is a good time for 
> > you
> 
> nit: extra spaces before "kicked"
> 
Done.

> > +Make sure you retain the ``[PATCH 0/X]'' part; that's what indicates to 
>

Re: [PATCH v4] documentation: add tutorial for first contribution

2019-05-07 Thread Emily Shaffer
On Mon, May 06, 2019 at 03:28:44PM -0700, Jonathan Tan wrote:
> Sorry for not looking at this sooner. 
> 
> Firstly, I'm not sure if this file should be named without the ".txt",
> like SubmittingPatches.

I should mention that during this change's life as a GitGitGadget PR,
dscho performed a review in GitHub, which included a recommendation to
name this SubmittingPatches.txt. This obviates quite a bit of
boilerplate within the Makefile as there are rules for handling *.txt
documentation generation already. You can check out Johannes's review:
https://github.com/gitgitgadget/git/pull/177

I keep forgetting to add a Reviewed-by line to my commit. I'll do that
before pushing based on your comments, as well as Josh and Phil's.

> 
> As for my other comments below, the Makefile comment below is the only
> one I feel strongly about; feel free to disagree with the rest (which I
> think are subjective).
> 
> > diff --git a/Documentation/Makefile b/Documentation/Makefile
> > index 26a2342bea..fddc3c3c95 100644
> > --- a/Documentation/Makefile
> > +++ b/Documentation/Makefile
> > @@ -74,6 +74,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
> > technical/api-index-skel.txt technica
> >  SP_ARTICLES += $(API_DOCS)
> >  
> >  TECH_DOCS += SubmittingPatches
> > +TECH_DOCS += MyFirstContribution
> 
> Any reason not to keep this alphabetized?

No reason, done.

> > +=== Pull the Git codebase
> > +
> > +Git is mirrored in a number of locations. https://git-scm.com/downloads
> > +suggests one of the best places to clone from is GitHub.
> > +
> > +
> > +$ git clone https://github.com/git/git git
> > +
> 
> I would rename the header to "Clone the Git repository" instead, since
> "pull" has a specific meaning. Also, I think that "one of the best
> places" is unnecessary (I would just say "Clone the Git repository from
> one of its many mirrors, e.g.:"), but perhaps you want to leave it in
> there to maintain the informal tone.

I've merged the language from both and added that the GH mirror at
git/git is official.

"Git is mirrored in a number of locations. Clone the repository from one
of them; https://git-scm.com/downloads suggests one of the best places
to clone from is the official mirror on GitHub."

> > +We'll also need to add the extern declaration of psuh; open up `builtin.h`,
> > +find the declaration for `cmd_push`, and add a new line for `psuh` 
> > immediately
> > +before it, in order to keep the declarations sorted:
> > +
> > +
> > +extern int cmd_psuh(int argc, const char **argv, const char *prefix);
> > +
> 
> I was going to say to not include the "extern", but I see that builtin.h
> has them already, so it's probably better to leave it there for
> consistency.
> 

This was brought up in an earlier review and there's also a review
pending to remove them, which seems to have turned into a discussion
about how to maintain a script to remove them :) I'm going to avoid
politics and also remove the extern here, because it looks like that's
the direction the codebase is heading anyway.

> > +The list of commands lives in `git.c`. We can register a new command by 
> > adding
> > +a `cmd_struct` to the `commands[]` array. `struct cmd_struct` takes a 
> > string
> > +with the command name, a function pointer to the command implementation, 
> > and a
> > +setup option flag. For now, let's keep cheating off of `push`. Find the 
> > line
> > +where `cmd_push` is registered, copy it, and modify it for `cmd_psuh`, 
> > placing
> > +the new line in alphabetical order.
> 
> For an international audience, it might be better to replace "cheating
> off" with its literal meaning. It took me a while to understand that
> "cheating off" was meant to evoke a so-called cheat sheet.

You're right; I leaned too far towards casual voice here and included a
colloquialism. I've modified this to "let's keep mimicking `push`" as I
feel it means the same thing, without the slang but with a similar tone.

I also considered "copying from `push`" but didn't want to indicate we
would be copy-pasting lines of code. If anybody's got a better
suggestion for a verb here I'm happy to hear it; "cheating from X" is a
phrase I'm trying to stop using anyways :)

> > +Go ahead and inspect your new commit with `git show`. "psuh:" indicates you
> > +have modified mainly the `psuh` command. The subject line gives readers an 
> > idea
> > +of what you've changed. The sign-off line (`-s`) indicates that you agree 
> > to
> > +the Developer's Certificate of Origin 1.1 (see the
> > +`Documentation/SubmittingPatches` +++[[dco]]+++ header). If you wish to 
> > add some
> > +context to your change, go ahead with `git commit --amend`.
> 
> I think the last sentence is confusing - didn't we already add the
> context? (And if it's meant more along the lines of "if you want to
> change your commit message for whatever reason, use --amend", I don't
> think that's necessary here, since we are assuming that the user knows
> how to use Git.)

I think you're 

[PATCH v5 2/2] documentation: add anchors to MyFirstContribution

2019-05-07 Thread Emily Shaffer
During the course of review for MyFirstContribution.txt, the suggestion
came up to include anchors to make it easier for veteran contributors to
link specific sections of this documents to newbies. To make life easier
for reviewers, add these anchors in their own commit. See review context
here: https://public-inbox.org/git/20190507195938.gd220...@google.com/

AsciiDoc does not support :sectanchors: and the anchors are not
discoverable, but they are referenceable. So a link to
"foo.com/MyFirstContribution.html#prerequisites" will still work if that
file was generated with AsciiDoc. The inclusion of :sectanchors: does
not create warnings or errors while compiling directly with `asciidoc -b
html5 Documentation/MyFirstContribution.txt` or while compiling with
`make doc`.

AsciiDoctor does support :sectanchors: and displays a paragraph link on
mouseover. When the anchor is included above or inline with a section
(as in this change), the link provided points to the custom ID contained
within [[]] instead of to an autogenerated ID. Practically speaking,
this means we have .../MyFirstContribution.html#summary instead of
.../MyFirstContribution.html#_summary. In addition to being prettier,
the custom IDs also enable anchor linking to work with
asciidoc-generated pages. This change compiles with no warnings using
`asciidoctor -b html5 Documentation/MyFirstContribution.txt`.

Signed-off-by: Emily Shaffer 
---
 Documentation/MyFirstContribution.txt | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
index 0f9a7fed93..a0380d5dc8 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -1,16 +1,20 @@
 My First Contribution to the Git Project
 
+:sectanchors:
 
+[[summary]]
 == Summary
 
 This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
 the Git tree, sending it for review, and making changes based on comments.
 
+[[prerequisites]]
 === Prerequisites
 
 This tutorial assumes you're already fairly familiar with using Git to manage
 source code.  The Git workflow steps will largely remain unexplained.
 
+[[related-reading]]
 === Related Reading
 
 This tutorial aims to summarize the following documents, but the reader may 
find
@@ -19,8 +23,10 @@ useful additional context:
 - `Documentation/SubmittingPatches`
 - `Documentation/howto/new-command.txt`
 
+[[getting-started]]
 == Getting Started
 
+[[cloning]]
 === Clone the Git Repository
 
 Git is mirrored in a number of locations. Clone the repository from one of 
them;
@@ -31,6 +37,7 @@ the official mirror on GitHub.
 $ git clone https://github.com/git/git git
 
 
+[[identify-problem]]
 === Identify Problem to Solve
 
 
@@ -44,6 +51,7 @@ of invocation during users' typical daily workflow.
 (We've seen some other effort in this space with the implementation of popular
 commands such as `sl`.)
 
+[[setup-workspace]]
 === Set Up Your Workspace
 
 Let's start by making a development branch to work on our changes. Per
@@ -62,11 +70,13 @@ $ git checkout -b psuh origin/master
 We'll make a number of commits here in order to demonstrate how to send a topic
 with multiple patches up for review simultaneously.
 
+[[code-it-up]]
 == Code It Up!
 
 NOTE: A reference implementation can be found at
 https://github.com/nasamuffin/git/tree/psuh.
 
+[[add-new-command]]
 === Adding a New Command
 
 Lots of the subcommands are written as builtins, which means they are
@@ -195,6 +205,7 @@ For the remainder of the tutorial, the subject line only 
will be listed for the
 sake of brevity. However, fully-fleshed example commit messages are available
 on the reference implementation linked at the top of this document.
 
+[[implementation]]
 === Implementation
 
 It's probably useful to do at least something besides printing out a string.
@@ -359,6 +370,7 @@ about. Neat! Let's commit that as well.
 $ git commit -sm "psuh: display the top of origin/master"
 
 
+[[add-documentation]]
 === Adding Documentation
 
 Awesome! You've got a fantastic new command that you're ready to share with the
@@ -446,6 +458,7 @@ sees that your command has been implemented as well as 
documented) by running
 
 Go ahead and commit your new documentation change.
 
+[[add-usage]]
 === Adding Usage Text
 
 Try and run `./bin-wrappers/git psuh -h`. Your command should crash at the end.
@@ -502,6 +515,7 @@ your command terminated before anything else interesting 
happens. Great!
 
 Go ahead and commit this one, too.
 
+[[testing]]
 == Testing
 
 It's important to test your code - even for a little toy command like this one.
@@ -516,11 +530,13 @@ So let's write some tests.
 
 Related reading: `t/README`
 
+[[overview-test-structure]]
 === Overview of Testing Structure
 
 The tests in Git live in `t/` and are named with a

[PATCH v5 1/2] documentation: add tutorial for first contribution

2019-05-07 Thread Emily Shaffer
This tutorial covers how to add a new command to Git and, in the
process, everything from cloning git/git to getting reviewed on the
mailing list. It's meant for new contributors to go through
interactively, learning the techniques generally used by the git/git
development community.

Reviewed-by: Johannes Schindelin 
Reviewed-by: Jonathan Tan 
Signed-off-by: Emily Shaffer 
---
 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1075 +
 2 files changed, 1076 insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6d738f831e..360c8051e2 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -73,6 +73,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += MyFirstContribution
 TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
new file mode 100644
index 00..0f9a7fed93
--- /dev/null
+++ b/Documentation/MyFirstContribution.txt
@@ -0,0 +1,1075 @@
+My First Contribution to the Git Project
+
+
+== Summary
+
+This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
+the Git tree, sending it for review, and making changes based on comments.
+
+=== Prerequisites
+
+This tutorial assumes you're already fairly familiar with using Git to manage
+source code.  The Git workflow steps will largely remain unexplained.
+
+=== Related Reading
+
+This tutorial aims to summarize the following documents, but the reader may 
find
+useful additional context:
+
+- `Documentation/SubmittingPatches`
+- `Documentation/howto/new-command.txt`
+
+== Getting Started
+
+=== Clone the Git Repository
+
+Git is mirrored in a number of locations. Clone the repository from one of 
them;
+https://git-scm.com/downloads suggests one of the best places to clone from is
+the official mirror on GitHub.
+
+
+$ git clone https://github.com/git/git git
+
+
+=== Identify Problem to Solve
+
+
+Use + to indicate fixed-width here; couldn't get ` to work nicely with the
+quotes around "Pony Saying 'Um, Hello'".
+
+In this tutorial, we will add a new command, +git psuh+, short for ``Pony 
Saying
+`Um, Hello''' - a feature which has gone unimplemented despite a high frequency
+of invocation during users' typical daily workflow.
+
+(We've seen some other effort in this space with the implementation of popular
+commands such as `sl`.)
+
+=== Set Up Your Workspace
+
+Let's start by making a development branch to work on our changes. Per
+`Documentation/SubmittingPatches`, since a brand new command is a new feature,
+it's fine to base your work on `master`. However, in the future for bugfixes,
+etc., you should check that document and base it on the appropriate branch.
+
+For the purposes of this document, we will base all our work on the `master`
+branch of the upstream project. Create the `psuh` branch you will use for
+development like so:
+
+
+$ git checkout -b psuh origin/master
+
+
+We'll make a number of commits here in order to demonstrate how to send a topic
+with multiple patches up for review simultaneously.
+
+== Code It Up!
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/psuh.
+
+=== Adding a New Command
+
+Lots of the subcommands are written as builtins, which means they are
+implemented in C and compiled into the main `git` executable. Implementing the
+very simple `psuh` command as a built-in will demonstrate the structure of the
+codebase, the internal API, and the process of working together as a 
contributor
+with the reviewers and maintainer to integrate this change into the system.
+
+Built-in subcommands are typically implemented in a function named "cmd_"
+followed by the name of the subcommand, in a source file named after the
+subcommand and contained within `builtin/`. So it makes sense to implement your
+command in `builtin/psuh.c`. Create that file, and within it, write the entry
+point for your command in a function matching the style and signature:
+
+
+int cmd_psuh(int argc, const char **argv, const char *prefix)
+
+
+We'll also need to add the declaration of psuh; open up `builtin.h`, find the
+declaration for `cmd_push`, and add a new line for `psuh` immediately before 
it,
+in order to keep the declarations sorted:
+
+
+int cmd_psuh(int argc, const char **argv, const char *prefix);
+
+
+Be sure to `#include "builtin.h"` in your `psuh.c`.
+
+Go ahead and add some throwaway printf to that function. This is a de

[PATCH v5 0/2] documentation: add lab for first contribution

2019-05-07 Thread Emily Shaffer
Since v4, I've made one major change and a handful of minor changes. The
major change is contained entirely in patch 2/2; patch 1/2 has only nit
fixes compared to v4.

Major:
- I've added anchors to each section. These anchors are custom-named in
  order to be compatible with asciidoc as well as asciidoctor. I figured
  it would be easier to review the naming choices if the anchors stood
  alone from the main introductory patch.

Minor:
- Alphabetized Documentation/Makefile
- Fixup headers - a few renames for clarity, casing, and a missed
  monospace, based on Phil Hord & Jonathan Tan comments.
- Removed 'extern' from the addition to builtin.h
- Replace "cheating off" with "mimicking" in a few places
- Clarify instructions for writing commit message to remove ambiguity
  and include requirement for newline after subject
- Remove unnecessary explanation of `git commit --amend`
- Remove braces to bring code sample into Git style
- Typo, word-reversal, and early line break fixes
- Add neglected argument to `git send-email` samples to actually include
  patches

Emily Shaffer (2):
  documentation: add tutorial for first contribution
  documentation: add anchors to MyFirstContribution

 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1110 +
 2 files changed,  insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH v5 1/2] documentation: add tutorial for first contribution

2019-05-07 Thread Emily Shaffer
> +Add a line to `#include "config.h"`. Then, add the following bits to the
> +function body:
> +
> +
> + const char *cfg_name;
> +
> +...
> +
> + git_config(git_default_config, NULL)
> + if (git_config_get_string_const("user.name", &cfg_name) > 0) {
> + printf(_("No name is found in config\n"));
> + } else {
> + printf(_("Your name: %s\n"), cfg_name);
> + }

Just noticed the braces here, too. I have removed them from my local
copy and will upload again if I hear other comments warranting a v6.

 - Emily


Re: [PATCH v5 1/2] documentation: add tutorial for first contribution

2019-05-08 Thread Emily Shaffer
On Wed, May 08, 2019 at 12:46:47PM +0900, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > +=== Clone the Git Repository
> > +
> > +Git is mirrored in a number of locations. Clone the repository from one of 
> > them;
> > +https://git-scm.com/downloads suggests one of the best places to clone 
> > from is
> > +the official mirror on GitHub.
> 
> I didn't want to have to get into fine-grained wordsmithing (let
> alone bikeshedding) this late in the iteration of the topic, but
> "the official mirror" is not something anybody suggested in the
> recent reviews (JTan's rephrasing, which I already said that I am OK
> with, said something like "one of the many mirrors").

I should have waited longer on sending this round of patch out; as it
stands I sent the patch with this fix a few hours before your response
to JTan came out. Sorry for the confusion.

> And "official" is a phrase I have trouble with in this context.  A
> mirror does not have to be blessed as official; that's the point of
> a mirror---anybody can make one to help users with better
> connectivity or availability, as long as its users trust the mirror
> maintainer for mirror's correctness and freshness.

You're right and I'll remove it. However, I've seen at least one
instance of confusion over Git's lack of an "official" mirror (over on
#git on Freenode). I'd like to rephrase this to explain the reasoning
behind having multiple mirrors, none of which are official.

To that end, I propose replacing the phrase with "one of the best places
to clone from is this mirror on GitHub." followed by the clone command
to git/git, then followed by:

NOTE: As Git is a distributed version control, the Git project also
follows a distributed model. There is no central official mirror; any
mirror which you can reasonably trust is being kept correct and fresh to
the Git source distributed by the Git maintainer is fine to work from.

What do we think?

Alternatively, if the desire is to just be done with it, I have no
problem with Junio rewording however he feels is best and otherwise
applying this patch - if there is value in reducing the churn on the
mailing list for this patch.

> So perhaps "... clone from is the mirror maintained by GitHub folks"
> or just simply "is the mirror on GitHub"?
> 
> > +$ git send-email --to=tar...@example.com
> > +--in-reply-to=""
> 
> Very nice attention to the detail here.  I like it (the earlier
> round did not have dq around the message ID).
> 
> > +psuh/v2*
> > +
> 
> All other edits relative to the previous round look very sensible to
> me.  Thanks.


Re: Proposal: Remembering message IDs sent with git send-email

2019-05-08 Thread Emily Shaffer
On Wed, May 08, 2019 at 07:10:13PM -0400, Drew DeVault wrote:
> I want to gather some thoughts about this. Say you've written a patch
> series and are getting ready to send a -v2. If you set
> --in-reply-to=ask, it'll show you a list of emails you've recently sent,
> and their subject lines, and ask you to pick one to use the message ID
> from. It'll set the In-Reply-To header to your selection.

It sounds to me like you mean to call this during `git format-patch` -
that is, `git format-patch -v2 --cover-letter --in-reply-to=ask master..branch
-o branch/`. That should set the In-Reply-To: header on your cover
letter.

There's also the possibility that you mean `git send-email
--in-reply-to=ask branch/v2*` - in which case I imagine the In-Reply-To:
is added as the message is sent, but not added to the cover letter text
file.

> 
> I'd also like to add a custom header, X-Patch-Supersedes: ,
> with a similar behavior & purpose.

Is the hope to store the message ID you choose from --in-reply-to=ask
into the X-Patch-Supersedes: header? I'm not sure I understand what
you're trying to solve; if you use `git format-patch --in-reply-to` it
sounds like the X-Patch-Supersedes: and In-Reply-To: would be redundant.

Is it possible you mean you want (sorry for pseudocode scribblings)
[PATCH v2 1/1]->X-Patch-Supersedes = [PATCH 1/1]->Message-Id ? I think that
wouldn't look good in a threaded mail client?

> 
> Thoughts?

Or maybe I totally misunderstood :)

What I think might be useful (and what I was hoping you were going to
talk about when I saw the subject line) would be if the Message-Id is
conveniently stored during `git send-email` on v1 and somehow saved in a
useful place in order to apply to the In-Reply-To field on v2
automatically upon `git format-patch -v2`. I'll admit I didn't know
about --in-reply-to=ask and that helps with the pain point I've
experienced sending out v2 before.

 - Emily


Re: Proposal: Remembering message IDs sent with git send-email

2019-05-09 Thread Emily Shaffer
On Thu, May 09, 2019 at 12:50:25PM -0400, Drew DeVault wrote:
> On 2019-05-08  5:19 PM, Emily Shaffer wrote:
> > What I think might be useful (and what I was hoping you were going to
> > talk about when I saw the subject line) would be if the Message-Id is
> > conveniently stored during `git send-email` on v1 and somehow saved in a
> > useful place in order to apply to the In-Reply-To field on v2
> > automatically upon `git format-patch -v2`. I'll admit I didn't know
> > about --in-reply-to=ask and that helps with the pain point I've
> > experienced sending out v2 before.
> 
> --in-reply-to=ask doesn't exist, that's what I'm looking to add. This
> convenient storage mechanism is exactly what I'm talking about. Sorry
> for the confusion.

Looking at the documentation, I suppose I hadn't realized before that
--thread will generate a Message-Id for your cover letter. It does seem
like we could teach --thread to check for the previous patch's cover
letter in the directory provided by -o. Of course, this wouldn't work
if the author was generating v2 and didn't have the v1 files available
(i.e. different workstation or different author picking up the set).

I'm still not sure I see the value of the extra header proposed here.
I'd appreciate an explanation of how you think it would be used, Drew.

I don't know much about emailed workflows outside of Git; is this
something likely to be useful to other communities?

 - Emily


Re: [PATCH] git.c: show usage for accessing the git(1) help page

2019-05-14 Thread Emily Shaffer
On Tue, May 14, 2019 at 04:24:50PM +0100, Philip Oakley wrote:
> It is not immediately obvious how to use the `git help` system
> to show the git(1) page, with all its background and ccordinating
> material, such as environment variables.
> 
> Let's simply list it as the last few words of the last usage line.
> 
> Signed-off-by: Philip Oakley 
> ---
> This follows from the discussion 
> <3cd065d1-9db5-f2e6-ddff-aa539746d...@iee.org>
> ---
>  git.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/git.c b/git.c
> index 2324ac0b7e..9a852b09c1 100644
> --- a/git.c
> +++ b/git.c
> @@ -33,7 +33,7 @@ const char git_usage_string[] =
>  const char git_more_info_string[] =
>   N_("'git help -a' and 'git help -g' list available subcommands and 
> some\n"
>  "concept guides. See 'git help ' or 'git help '\n"
> -"to read about a specific subcommand or concept.");
> +"to read about a specific subcommand or concept. Or use 'git help 
> git'.");

I'm not sure the wording makes sense here. It sounds like you're saying,
"Or use 'git help git' to read about specific subcommands or concepts."
which isn't really what I think you're trying to say.

What about, "Or, use 'git help git' for a detailed guide of the Git
system as a whole."

(I'm still not sure that's quite it - since `git help git` mostly
details the flags you can pass to git before invoking a subcommand. But
I'm not sure that `git --help` is the place to say that...)

>  
>  static int use_pager = -1;
>  
> -- 
> 2.21.0.windows.1.1517.gbad5f960a3.dirty
> 


[RFC PATCH] grep: provide sane default to grep_source struct

2019-05-15 Thread Emily Shaffer
grep_buffer creates a struct grep_source gs and calls grep_source()
with it. However, gs.name is null, which eventually produces a
segmentation fault in
grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
not set.

This seems to be unreachable from existing commands but is reachable in
the event that someone rolls their own revision walk and neglects to set
rev_info->grep_filter->status_only. Conceivably, someone might want to
print all changed lines of all commits which match some regex.

Signed-off-by: Emily Shaffer 
---
Hiya,

I ran into this issue while I was working on a tutorial on rolling your
own revision walk. I didn't want to print all the lines associated with
a matching change, but it didn't seem good that a seemingly-sane
grep_filter config was segfaulting.

I don't know if adding a placeholder name is the right answer (hence RFC
patch).

Jonathan Nieder proposed alternatively adding some check to grep_source()
to ensure that if opt->status_only is unset, gs->name must be non-NULL
(and yell about it if not), as well as some extra comments indicating
what assumptions are made about the data coming into functions like
grep_source(). I'm fine with that as well (although I'm not sure it
makes sense semantically to require a name which the user probably can't
easily set, or else ban the user from printing LOC during grep). Mostly
I'm happy with any solution besides a segfault with no error logging :)

Thanks in advance for everyone's thoughts.
Emily


 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 0d50598acd..fd84454faf 100644
--- a/grep.c
+++ b/grep.c
@@ -2045,7 +2045,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned 
long size)
struct grep_source gs;
int r;
 
-   grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
+   grep_source_init(&gs, GREP_SOURCE_BUF, _("(in memory)"), NULL, NULL);
gs.buf = buf;
gs.size = size;
 
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [RFC PATCH] grep: provide sane default to grep_source struct

2019-05-16 Thread Emily Shaffer
On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Emily Shaffer wrote:
> 
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
> 
> Thanks for catching it.  Taking a step back, I think the problem is in
> the definition of "struct grep_source":
> 
>   struct grep_source {
>   char *name;
> 
>   enum grep_source_type {
>   GREP_SOURCE_OID,
>   GREP_SOURCE_FILE,
>   GREP_SOURCE_BUF,
>   } type;
>   void *identifier;
> 
>   ...
>   };
> 
> What is the difference between a 'name' and an 'identifier'?  Who is
> responsible for free()ing them?  Can they be NULL?  This is pretty
> underdocumented for a public type.
> 
> If we take the point of view that 'name' should always be non-NULL,
> then I wonder:
> 
> - can we document that?
> - can grep_source_init enforce that?

Today grep_source_init() defaults to NULL. So if we decide that 'name'
should be non-NULL it will be somewhat changing the intent.

void grep_source_init(struct grep_source *gs, enum grep_source_type 
type,
  const char *name, const char *path,   
 
  const void *identifier)   
 
{   
 
gs->type = type;
 
gs->name = xstrdup_or_null(name); 
...

> - can we take advantage of that in grep_source as well, as a sanity
>   check that the grep_source has been initialized?
> - while we're here, can we describe what the field is used for
>   (prefixing output with context before a ":", I believe)?

In general the documentation for grep.[ch] is pretty light. There aren't
any header comments and `Documentation/technical/api-grep.txt` is a
todo. So I agree that we should document it anywhere we can.

> > Jonathan Nieder proposed alternatively adding some check to grep_source()
> > to ensure that if opt->status_only is unset, gs->name must be non-NULL
> > (and yell about it if not), as well as some extra comments indicating
> > what assumptions are made about the data coming into functions like
> > grep_source(). I'm fine with that as well (although I'm not sure it
> > makes sense semantically to require a name which the user probably can't
> > easily set, or else ban the user from printing LOC during grep). Mostly
> > I'm happy with any solution besides a segfault with no error logging :)
> 
> Let's compare the two possibilities.
> 
> The advantage of "(in memory)" is that it Just Works, which should
> make a nicer development experience with getting a new caller mostly
> working on the way to getting them working just the way you want.
> 
> The disadvantage is that if we start outputting that in production, we
> and static analyzers are less likely to notice.  In other words,
> printing "(in memory)" is leaking details to the end user that do not
> match what the end user asked for.  NULL would instead produce a
> crash, prompting the author of the caller to fix it.
> 
> What was particularly pernicious about this example is that the NULL
> dereference only occurs if the grep has a match.  So I suppose I'm
> leaning toward (in addition to adding some comments to document the
> struct) adding a check like
> 
>   if (!gs->name && !opt->status_only)
>   BUG("grep calls that could print name require name");
> 
> to grep_source.

Why not both? :)

But seriously, I am planning to push a second patch with both, per
Junio's reply.

I'll consider the documentation out of scope for now since I'm not sure
I know enough about grep.[ch]'s intent or history to document anything
yet. :)

> 
> That would also sidestep the question of whether this debugging aid
> should be translated. :)
> 
> Sensible?
> 
> Thanks,
> Jonathan


[PATCH v2] grep: provide sane default to grep_source struct

2019-05-16 Thread Emily Shaffer
grep_buffer creates a struct grep_source gs and calls grep_source()
with it. However, gs.name is null, which eventually produces a
segmentation fault in
grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
not set.

This seems to be unreachable from existing commands but is reachable in
the event that someone rolls their own revision walk and neglects to set
rev_info->grep_filter->status_only. Conceivably, someone might want to
print all changed lines of all commits which match some regex.

To futureproof, give developers a heads-up if grep_source() is called
and a valid name field is expected but not given. This path should be
unreachable now, but in the future may become reachable if developers
want to add the ability to use grep_buffer() in prepared in-core data
and provide hints to the user about where the data came from.

Signed-off-by: Emily Shaffer 
---
I've added the BUG() line to grep_source(). I considered adding it to
grep_source_1() but didn't see much difference. Looks like
grep_source_1() exists purely as a helper to grep_source() and is never
called by anybody else... but it's the function where the crash would
happen. So I'm not sure.

I also modified the wording for the BUG("") statement a little from
jrn's suggestion to hopefully make it more explicit, but I welcome
wordsmithing.

 grep.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 0d50598acd..6ceb880a8c 100644
--- a/grep.c
+++ b/grep.c
@@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x)
 
 int grep_source(struct grep_opt *opt, struct grep_source *gs)
 {
+   if (!opt->status_only && gs->name == NULL)
+   BUG("grep call which could print a name requires "
+   "grep_source.name be non-NULL");
/*
 * we do not have to do the two-pass grep when we do not check
 * buffer-wide "all-match".
@@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, 
unsigned long size)
struct grep_source gs;
int r;
 
-   grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
+   /* TODO: In the future it may become desirable to pass in the name as
+* an argument to grep_buffer(). At that time, "(in core)" should be
+* replaced.
+*/
+   grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL);
gs.buf = buf;
gs.size = size;
 
-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH v6 0/2] documentation: add tutorial for first contribution

2019-05-17 Thread Emily Shaffer
Only minor changes since v5. Removed 'official' from the blurb about the
Github mirror, and removed some spurious braces around oneline code
branches.

Emily Shaffer (2):
  documentation: add tutorial for first contribution
  documentation: add anchors to MyFirstContribution

 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1109 +
 2 files changed, 1110 insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH v6 1/2] documentation: add tutorial for first contribution

2019-05-17 Thread Emily Shaffer
This tutorial covers how to add a new command to Git and, in the
process, everything from cloning git/git to getting reviewed on the
mailing list. It's meant for new contributors to go through
interactively, learning the techniques generally used by the git/git
development community.

Reviewed-by: Johannes Schindelin 
Reviewed-by: Jonathan Tan 
Signed-off-by: Emily Shaffer 
---
 Documentation/Makefile|1 +
 Documentation/MyFirstContribution.txt | 1074 +
 2 files changed, 1075 insertions(+)
 create mode 100644 Documentation/MyFirstContribution.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index dbf5a0f276..76f2ecfc1b 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -76,6 +76,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += MyFirstContribution
 TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
new file mode 100644
index 00..bc267c4931
--- /dev/null
+++ b/Documentation/MyFirstContribution.txt
@@ -0,0 +1,1074 @@
+My First Contribution to the Git Project
+
+
+== Summary
+
+This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
+the Git tree, sending it for review, and making changes based on comments.
+
+=== Prerequisites
+
+This tutorial assumes you're already fairly familiar with using Git to manage
+source code.  The Git workflow steps will largely remain unexplained.
+
+=== Related Reading
+
+This tutorial aims to summarize the following documents, but the reader may 
find
+useful additional context:
+
+- `Documentation/SubmittingPatches`
+- `Documentation/howto/new-command.txt`
+
+== Getting Started
+
+=== Clone the Git Repository
+
+Git is mirrored in a number of locations. Clone the repository from one of 
them;
+https://git-scm.com/downloads suggests one of the best places to clone from is
+the mirror on GitHub.
+
+
+$ git clone https://github.com/git/git git
+
+
+=== Identify Problem to Solve
+
+
+Use + to indicate fixed-width here; couldn't get ` to work nicely with the
+quotes around "Pony Saying 'Um, Hello'".
+
+In this tutorial, we will add a new command, +git psuh+, short for ``Pony 
Saying
+`Um, Hello''' - a feature which has gone unimplemented despite a high frequency
+of invocation during users' typical daily workflow.
+
+(We've seen some other effort in this space with the implementation of popular
+commands such as `sl`.)
+
+=== Set Up Your Workspace
+
+Let's start by making a development branch to work on our changes. Per
+`Documentation/SubmittingPatches`, since a brand new command is a new feature,
+it's fine to base your work on `master`. However, in the future for bugfixes,
+etc., you should check that document and base it on the appropriate branch.
+
+For the purposes of this document, we will base all our work on the `master`
+branch of the upstream project. Create the `psuh` branch you will use for
+development like so:
+
+
+$ git checkout -b psuh origin/master
+
+
+We'll make a number of commits here in order to demonstrate how to send a topic
+with multiple patches up for review simultaneously.
+
+== Code It Up!
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/psuh.
+
+=== Adding a New Command
+
+Lots of the subcommands are written as builtins, which means they are
+implemented in C and compiled into the main `git` executable. Implementing the
+very simple `psuh` command as a built-in will demonstrate the structure of the
+codebase, the internal API, and the process of working together as a 
contributor
+with the reviewers and maintainer to integrate this change into the system.
+
+Built-in subcommands are typically implemented in a function named "cmd_"
+followed by the name of the subcommand, in a source file named after the
+subcommand and contained within `builtin/`. So it makes sense to implement your
+command in `builtin/psuh.c`. Create that file, and within it, write the entry
+point for your command in a function matching the style and signature:
+
+
+int cmd_psuh(int argc, const char **argv, const char *prefix)
+
+
+We'll also need to add the declaration of psuh; open up `builtin.h`, find the
+declaration for `cmd_push`, and add a new line for `psuh` immediately before 
it,
+in order to keep the declarations sorted:
+
+
+int cmd_psuh(int argc, const char **argv, const char *prefix);
+
+
+Be sure to `#include "builtin.h"` in your `psuh.c`.
+
+Go ahead and add some throwaway printf to that function. This is a decent
+sta

[PATCH v6 2/2] documentation: add anchors to MyFirstContribution

2019-05-17 Thread Emily Shaffer
During the course of review for MyFirstContribution.txt, the suggestion
came up to include anchors to make it easier for veteran contributors to
link specific sections of this documents to newbies. To make life easier
for reviewers, add these anchors in their own commit. See review context
here: https://public-inbox.org/git/20190507195938.gd220...@google.com/

AsciiDoc does not support :sectanchors: and the anchors are not
discoverable, but they are referenceable. So a link to
"foo.com/MyFirstContribution.html#prerequisites" will still work if that
file was generated with AsciiDoc. The inclusion of :sectanchors: does
not create warnings or errors while compiling directly with `asciidoc -b
html5 Documentation/MyFirstContribution.txt` or while compiling with
`make doc`.

AsciiDoctor does support :sectanchors: and displays a paragraph link on
mouseover. When the anchor is included above or inline with a section
(as in this change), the link provided points to the custom ID contained
within [[]] instead of to an autogenerated ID. Practically speaking,
this means we have .../MyFirstContribution.html#summary instead of
.../MyFirstContribution.html#_summary. In addition to being prettier,
the custom IDs also enable anchor linking to work with
asciidoc-generated pages. This change compiles with no warnings using
`asciidoctor -b html5 Documentation/MyFirstContribution.txt`.

Signed-off-by: Emily Shaffer 
---
 Documentation/MyFirstContribution.txt | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
index bc267c4931..274df8575b 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -1,16 +1,20 @@
 My First Contribution to the Git Project
 
+:sectanchors:
 
+[[summary]]
 == Summary
 
 This is a tutorial demonstrating the end-to-end workflow of creating a change 
to
 the Git tree, sending it for review, and making changes based on comments.
 
+[[prerequisites]]
 === Prerequisites
 
 This tutorial assumes you're already fairly familiar with using Git to manage
 source code.  The Git workflow steps will largely remain unexplained.
 
+[[related-reading]]
 === Related Reading
 
 This tutorial aims to summarize the following documents, but the reader may 
find
@@ -19,8 +23,10 @@ useful additional context:
 - `Documentation/SubmittingPatches`
 - `Documentation/howto/new-command.txt`
 
+[[getting-started]]
 == Getting Started
 
+[[cloning]]
 === Clone the Git Repository
 
 Git is mirrored in a number of locations. Clone the repository from one of 
them;
@@ -31,6 +37,7 @@ the mirror on GitHub.
 $ git clone https://github.com/git/git git
 
 
+[[identify-problem]]
 === Identify Problem to Solve
 
 
@@ -44,6 +51,7 @@ of invocation during users' typical daily workflow.
 (We've seen some other effort in this space with the implementation of popular
 commands such as `sl`.)
 
+[[setup-workspace]]
 === Set Up Your Workspace
 
 Let's start by making a development branch to work on our changes. Per
@@ -62,11 +70,13 @@ $ git checkout -b psuh origin/master
 We'll make a number of commits here in order to demonstrate how to send a topic
 with multiple patches up for review simultaneously.
 
+[[code-it-up]]
 == Code It Up!
 
 NOTE: A reference implementation can be found at
 https://github.com/nasamuffin/git/tree/psuh.
 
+[[add-new-command]]
 === Adding a New Command
 
 Lots of the subcommands are written as builtins, which means they are
@@ -195,6 +205,7 @@ For the remainder of the tutorial, the subject line only 
will be listed for the
 sake of brevity. However, fully-fleshed example commit messages are available
 on the reference implementation linked at the top of this document.
 
+[[implementation]]
 === Implementation
 
 It's probably useful to do at least something besides printing out a string.
@@ -358,6 +369,7 @@ about. Neat! Let's commit that as well.
 $ git commit -sm "psuh: display the top of origin/master"
 
 
+[[add-documentation]]
 === Adding Documentation
 
 Awesome! You've got a fantastic new command that you're ready to share with the
@@ -445,6 +457,7 @@ sees that your command has been implemented as well as 
documented) by running
 
 Go ahead and commit your new documentation change.
 
+[[add-usage]]
 === Adding Usage Text
 
 Try and run `./bin-wrappers/git psuh -h`. Your command should crash at the end.
@@ -501,6 +514,7 @@ your command terminated before anything else interesting 
happens. Great!
 
 Go ahead and commit this one, too.
 
+[[testing]]
 == Testing
 
 It's important to test your code - even for a little toy command like this one.
@@ -515,11 +529,13 @@ So let's write some tests.
 
 Related reading: `t/README`
 
+[[overview-test-structure]]
 === Overview of Testing Structure
 
 The tests in Git live in `t/` and are named with a 4-dig

[PATCH] doc: hint about GIT_DEBUGGER

2019-05-17 Thread Emily Shaffer
We check for a handy environment variable GIT_DEBUGGER when running via
bin-wrappers/, but this feature is undocumented. Add a hint to how to
use it into the CodingGuidelines (which is where other useful
environment settings like DEVELOPER are documented).

It looks like you can use GIT_DEBUGGER to pick gdb by default, or you
can hand it your own debugger if you like to use something else (or if
you want custom flags for gdb). Hopefully document that intent within
CodingGuidelines.

Signed-off-by: Emily Shaffer 
---
Maybe this isn't the right place for this patch. But right now git grep
reveals that GIT_DEBUGGER is completely undocumented.

Alternatively, it might make sense to only add a short blurb about using
GIT_DEBUGGER flag to CodingGuidelines and then documenting how to use it
inside of wrap-for-bin.sh.

 Documentation/CodingGuidelines | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 32210a4386..e17cd75b50 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -412,6 +412,11 @@ For C programs:
must be declared with "extern" in header files. However, function
declarations should not use "extern", as that is already the default.
 
+ - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
+   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
+   run `GIT_DEBUGGER=debugger-binary some-args ./bin-wrappers/git foo` to
+   use your own debugger and arguments. (See `wrap-for-bin.sh`.)
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH] doc: hint about GIT_DEBUGGER

2019-05-20 Thread Emily Shaffer
[snip]

> > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
> > +   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
> > +   run `GIT_DEBUGGER=debugger-binary some-args ./bin-wrappers/git foo` to
> 
> Missing some quotes around debugger-binary and some-args:
> +   run `GIT_DEBUGGER="debugger-binary some-args" ./bin-wrappers/git foo` to
> 
> Also, one thing I always wonder about with command documentation like
> this is whether people will understand that "debugger-binary",
> "some-args", and "foo" are just placeholders rather than literal text
> -- and that everything else is literal text and not meant to be
> placeholders.  Does it make since to include a couple examples, or
> perhaps modify the text somehow to avoid confusion between
> placeholders and literals, or maybe just tell me I overthinking this?
> (I've been bit by similar problems in other contexts, so I'm just
> flagging it for you to consider).

It's a good point. I like to use placeholders that don't sound like a
real command (and failed a little here), for example,
`GIT_DEBUGGER=my-cool-debugger some-various-args`. It can be a challenge
to choose a placeholder that sounds fake but also doesn't sound too
informal (the above feels informal to me). I think the best way is to
show an example, that's a good idea. I'll come up with one and send
another round this week.

Thanks a bunch for having a look.

 - Emily

> 
> Elijah


[PATCH v2] doc: hint about GIT_DEBUGGER

2019-05-20 Thread Emily Shaffer
We check for a handy environment variable GIT_DEBUGGER when running via
bin-wrappers/, but this feature is undocumented. Add a hint to how to
use it into the CodingGuidelines (which is where other useful
environment settings like DEVELOPER are documented).

It looks like you can use GIT_DEBUGGER to pick gdb by default, or you
can hand it your own debugger if you like to use something else (or if
you want custom flags for gdb). Hopefully document that intent within
CodingGuidelines.

Signed-off-by: Emily Shaffer 
---
 Documentation/CodingGuidelines | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 32210a4386..e99af36df9 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -412,6 +412,12 @@ For C programs:
must be declared with "extern" in header files. However, function
declarations should not use "extern", as that is already the default.
 
+ - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
+   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
+   run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to
+   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
+   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
-- 
2.21.0.1020.gf2820cf01a-goog



Re: [PATCH v2] doc: hint about GIT_DEBUGGER

2019-05-21 Thread Emily Shaffer
On Tue, May 21, 2019 at 09:06:18AM -0700, Elijah Newren wrote:
> On Mon, May 20, 2019 at 6:01 PM Emily Shaffer  wrote:
> >
> > We check for a handy environment variable GIT_DEBUGGER when running via
> > bin-wrappers/, but this feature is undocumented. Add a hint to how to
> > use it into the CodingGuidelines (which is where other useful
> > environment settings like DEVELOPER are documented).
> >
> 
> Two very minor nits:
> 
> > It looks like you can use GIT_DEBUGGER to pick gdb by default, or you
> 
> I think it'd sound better without 'It looks like'; perhaps drop that part?
> 
> > can hand it your own debugger if you like to use something else (or if
> > you want custom flags for gdb). Hopefully document that intent within
> > CodingGuidelines.
> 
> Maybe just leave out 'Hopefully'?
> 
> >
> > Signed-off-by: Emily Shaffer 
> > ---
> >  Documentation/CodingGuidelines | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> > index 32210a4386..e99af36df9 100644
> > --- a/Documentation/CodingGuidelines
> > +++ b/Documentation/CodingGuidelines
> > @@ -412,6 +412,12 @@ For C programs:
> > must be declared with "extern" in header files. However, function
> > declarations should not use "extern", as that is already the default.
> >
> > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
> > +   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
> > +   run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to
> > +   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
> > +   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)
> > +
> 
> Other than the minor nits above:
> Acked-by: Elijah Newren 

Made both the changes you recommended; I'll hang onto it locally until
this afternoon in case anybody else has something to say. Thanks for
looking, Elijah.

 - Emily


Re: [PATCH v2] grep: provide sane default to grep_source struct

2019-05-21 Thread Emily Shaffer
On Thu, May 16, 2019 at 06:02:54PM -0400, Jeff King wrote:
> On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote:
> 
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
> 
> Funny wrapping here.
> 
> > This seems to be unreachable from existing commands but is reachable in
> > the event that someone rolls their own revision walk and neglects to set
> > rev_info->grep_filter->status_only. Conceivably, someone might want to
> > print all changed lines of all commits which match some regex.
> > 
> > To futureproof, give developers a heads-up if grep_source() is called
> > and a valid name field is expected but not given. This path should be
> > unreachable now, but in the future may become reachable if developers
> > want to add the ability to use grep_buffer() in prepared in-core data
> > and provide hints to the user about where the data came from.
> 
> So I guess this is probably my fault, as I was the one who factored out
> the grep_source bits long ago (though I would also not be surprised if
> it could be triggered before, too).
> 
> I think adding a BUG() is the right thing to do.
> 
> > I've added the BUG() line to grep_source(). I considered adding it to
> > grep_source_1() but didn't see much difference. Looks like
> > grep_source_1() exists purely as a helper to grep_source() and is never
> > called by anybody else... but it's the function where the crash would
> > happen. So I'm not sure.
> 
> I agree it probably doesn't matter. I'd probably stick at at the top of
> grep_source_1(), just with the rationale that it could possibly catch
> more cases if somebody ever refactors grep_source() to have two
> different callers (e.g., imagine we later add a variant that takes more
> options, but leave the old one to avoid disrupting existing callers).

I think this combined with needing a history lesson to know not to call
grep_source_1() makes a pretty strong case for moving it. I'll put the
BUG() line at the top of grep_source_1() instead.

> 
> > I also modified the wording for the BUG("") statement a little from
> > jrn's suggestion to hopefully make it more explicit, but I welcome
> > wordsmithing.
> > [...]
> > +   BUG("grep call which could print a name requires "
> > +   "grep_source.name be non-NULL");
> 
> It looks fine to me. The point is that nobody should ever see this. And
> if they do, we should already get a file/line number telling us where
> the problem is (and a coredump to poke at). So as long as it's enough to
> convince a regular user that they should make a report to the mailing
> list, it will have done its job.
> 
> > diff --git a/grep.c b/grep.c
> > index 0d50598acd..6ceb880a8c 100644
> > --- a/grep.c
> > +++ b/grep.c
> > @@ -2021,6 +2021,9 @@ static int chk_hit_marker(struct grep_expr *x)
> >  
> >  int grep_source(struct grep_opt *opt, struct grep_source *gs)
> >  {
> > +   if (!opt->status_only && gs->name == NULL)
> > +   BUG("grep call which could print a name requires "
> > +   "grep_source.name be non-NULL");
> > /*
> >  * we do not have to do the two-pass grep when we do not check
> >  * buffer-wide "all-match".
> 
> Minor nit, but can we put a blank line between the new block and the
> comment?

I made sure to include blank lines surrounding this block in its new
home.

> 
> > @@ -2045,7 +2048,11 @@ int grep_buffer(struct grep_opt *opt, char *buf, 
> > unsigned long size)
> > struct grep_source gs;
> > int r;
> >  
> > -   grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
> > +   /* TODO: In the future it may become desirable to pass in the name as
> > +* an argument to grep_buffer(). At that time, "(in core)" should be
> > +* replaced.
> > +*/
> > +   grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL);
> 
> Hmm. I don't see much point in this one, as it would just avoid
> triggering our BUG(). If somebody is adding new grep_buffer() callers
> that don't use status_only, wouldn't we want them to see the BUG() to
> know that they need to refactor grep_buffer() to provide a name?

For me, I'd rather have *some* functionality available without doing a
large refactor. The line now isn't 

[PATCH v3] grep: provide sane default to grep_source struct

2019-05-21 Thread Emily Shaffer
grep_buffer creates a struct grep_source gs and calls grep_source()
with it. However, gs.name is null, which eventually produces a
segmentation fault in grep_source()->grep_source_1()->show_line()
when grep_opt.status_only is not set.

This seems to be unreachable from existing commands but is reachable in
the event that someone rolls their own revision walk and neglects to set
rev_info->grep_filter->status_only. Conceivably, someone might want to
print all changed lines of all commits which match some regex.

To futureproof, give developers a heads-up if grep_source() is called
and a valid name field is expected but not given. This path should be
unreachable now, but in the future may become reachable if developers
want to add the ability to use grep_buffer() in prepared in-core data
and provide hints to the user about where the data came from.

Signed-off-by: Emily Shaffer 
---

I think Peff and Jonathan are right. If someone does want to hack away
on something in the very early stages, the BUG() line gives a pretty
clear indicator of what to modify to make it work.

I also moved the BUG() to grep_source_1() to keep it close to the error
itself and reflowed the commit message.

 - Emily

 grep.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/grep.c b/grep.c
index 0d50598acd..f7c3a5803e 100644
--- a/grep.c
+++ b/grep.c
@@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
enum grep_context ctx = GREP_CONTEXT_HEAD;
xdemitconf_t xecfg;
 
+   if (!opt->status_only && gs->name == NULL)
+   BUG("grep call which could print a name requires "
+   "grep_source.name be non-NULL");
+
if (!opt->output)
opt->output = std_output;
 
-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines

2019-05-22 Thread Emily Shaffer
We check for a handy environment variable GIT_DEBUGGER when running via
bin-wrappers/, but this feature is undocumented. Add a hint to how to
use it into the CodingGuidelines (which is where other useful
environment settings like DEVELOPER are documented).

You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your
own debugger if you like to use something else (or if you want custom
flags for gdb). This commit documents that intent within
CodingGuidelines.

Signed-off-by: Emily Shaffer 
---

Since v2, only the commit message has changed. Removed some weak
language to make the commit message read more easily.

 Documentation/CodingGuidelines | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 32210a4386..e99af36df9 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -412,6 +412,12 @@ For C programs:
must be declared with "extern" in header files. However, function
declarations should not use "extern", as that is already the default.
 
+ - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
+   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
+   run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to
+   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
+   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
-- 
2.21.0.1020.gf2820cf01a-goog



[PATCH v4] grep: fail if call could output and name is null

2019-05-23 Thread Emily Shaffer
grep_source(), which performs much of the work for Git's grep library,
allows passing an arbitrary struct grep_source which represents the text
which grep_source() should search to match a pattern in the provided
struct grep_opt. In most callers, the grep_source::name field is set to
an appropriate prefix to print before a colon when a result matches:

README:Git is an Open Source project covered by the GNU General

One caller, grep_buffer(), leaves the grep_source::name field set to
NULL because there isn't enough context to determine an appropriate name
for this kind of output line. In practice, this has been fine: the only
caller of grep_buffer() is "git log --grep", and that caller sets
grep_opt::status_only, which disables output and only checks whether a
match exists. But this is brittle: a future caller can call
grep_buffer() without grep_opt::status_only set, and as soon as it hits
a match, grep_source() will try to print the match and segfault:

(null):Git is an Open Source project covered by the GNU General

For example, a future caller might want to print all matching lines from
commits which match a regex.

Futureproof by diagnosing early a use of the API that could trigger that
condition, before we know whether the pattern matches:

BUG: grep.c:1783: grep call which could print a name requires
grep_source.name be non-NULL
Aborted

This way, the caller's author gets an indication of how to fix the issue
- by providing grep_source::name or setting grep_opt::status_only - and
they are warned of the potential for a segfault unconditionally, rather
than only if there is a match.

Noticed while adding such a call to a tutorial on revision walks.

Signed-off-by: Emily Shaffer 
Reviewed-by: Jonathan Nieder 
---
Since v3, only the commit message has changed. Reworked based on
Jonathan Nieder's suggestions (with some modifications for readability).

 - Emily

 grep.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/grep.c b/grep.c
index 0d50598acd..f7c3a5803e 100644
--- a/grep.c
+++ b/grep.c
@@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
enum grep_context ctx = GREP_CONTEXT_HEAD;
xdemitconf_t xecfg;
 
+   if (!opt->status_only && gs->name == NULL)
+   BUG("grep call which could print a name requires "
+   "grep_source.name be non-NULL");
+
if (!opt->output)
opt->output = std_output;
 
-- 
2.22.0.rc1.257.g3120a18244-goog



Re: [PATCH v1 1/5] list-objects-filter: refactor into a context struct

2019-05-23 Thread Emily Shaffer
On Tue, May 21, 2019 at 05:21:50PM -0700, Matthew DeVore wrote:
> The next patch will create and manage filters in a new way, which means
> that this bundle of data will have to be managed at a new callsite. Make
> this bundle of data more manageable by putting it in a struct and
> making it part of the list-objects-filter module's API.

This commit message might read more easily on its own if you define
"this bundle of data" at least once. Since there are things being moved
from both list-objects-filter.c (filter_blobs_none_data) and
list-objects-filter.h (list_objects_filter_result and filter_free_fn)
into the new struct in list-objects-filter.h, it's not immediately clear
to me from the diff what's going on.

[snip]
> -static void *filter_blobs_none__init(
> - struct oidset *omitted,
> +static void filter_blobs_none__init(
>   struct list_objects_filter_options *filter_options,
> - filter_object_fn *filter_fn,
> - filter_free_fn *filter_free_fn)
> + struct filter_context *ctx)
>  {
> - struct filter_blobs_none_data *d = xcalloc(1, sizeof(*d));
> - d->omits = omitted;
> -
> - *filter_fn = filter_blobs_none;
> - *filter_free_fn = free;
> - return d;
> + ctx->filter_fn = filter_blobs_none;

I think you want to set ctx->free_fn here too, right? It seems like
you're not setting ctx->omitted anymore because you'd be reading that
information in from ctx->omitted (so it's redundant).

>  }

[snip]
> -/*
> - * A filter for list-objects to omit large blobs.
> - * And to OPTIONALLY collect a list of the omitted OIDs.
> - */
> +/* A filter for list-objects to omit large blobs. */
>  struct filter_blobs_limit_data {
> - struct oidset *omits;
>   unsigned long max_bytes;

I suppose I don't have a good enough grasp of the architecture here to
follow why you want to move 'omits' but not 'max_bytes' into the new
struct. Maybe it will become clear as I look at the rest of your patches
:)

>  };

Most of this patch looks like a pretty straightforward conversion.
Thanks.

 - Emily


Re: [PATCH v1 2/5] list-objects-filter-options: error is localizeable

2019-05-23 Thread Emily Shaffer
On Tue, May 21, 2019 at 05:21:51PM -0700, Matthew DeVore wrote:
> The "invalid filter-spec" message is user-facing and not a BUG, so make
> it localizeable.

What does it look like? Is it human-readable in its current form? I ask
because (without having looked ahead) I think you're going to move it
from a BUG to a user-facing error in a later patchset (which I don't
think the commit message reflects well). If I'm wrong, and today it's a
possibly user-facing string, then I think this patch could stand on its
own outside of this patchset, which would probably save you some effort
keeping a simple change in limbo for a long time. But if I'm right, I
think the commit message could use some work.

 - Emily

> 
> Signed-off-by: Matthew DeVore 
> ---
>  list-objects-filter-options.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index c0036f7378..e46ea467bc 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -81,21 +81,21 @@ static int gently_parse_list_objects_filter(
>   filter_options->choice = LOFC_SPARSE_PATH;
>   filter_options->sparse_path_value = strdup(v0);
>   return 0;
>   }
>   /*
>* Please update _git_fetch() in git-completion.bash when you
>* add new filters
>*/
>  
>   if (errbuf)
> - strbuf_addf(errbuf, "invalid filter-spec '%s'", arg);
> + strbuf_addf(errbuf, _("invalid filter-spec '%s'"), arg);
>  
>   memset(filter_options, 0, sizeof(*filter_options));
>   return 1;
>  }
>  
>  int parse_list_objects_filter(struct list_objects_filter_options 
> *filter_options,
> const char *arg)
>  {
>   struct strbuf buf = STRBUF_INIT;
>   if (gently_parse_list_objects_filter(filter_options, arg, &buf))
> -- 
> 2.21.0
> 


Re: [PATCH v3] doc: hint about GIT_DEBUGGER in CodingGuidelines

2019-05-28 Thread Emily Shaffer
On Thu, May 23, 2019 at 06:09:17AM -0400, Eric Sunshine wrote:
> On Wed, May 22, 2019 at 8:56 PM Emily Shaffer  wrote:
> > We check for a handy environment variable GIT_DEBUGGER when running via
> > bin-wrappers/, but this feature is undocumented. Add a hint to how to
> > use it into the CodingGuidelines (which is where other useful
> > environment settings like DEVELOPER are documented).
> >
> > You can use GIT_DEBUGGER to pick gdb by default, or you can hand it your
> > own debugger if you like to use something else (or if you want custom
> > flags for gdb). This commit documents that intent within
> > CodingGuidelines.
> 
> This last sentence is repeating what is already stated in the first
> paragraph, thus doesn't seem to add value. In fact, the remainder of
> the second paragraph seems to be repeating what is in the patch
> proper, thus could likely be dropped.

Yes, you're right. Dropped.

> 
> > Signed-off-by: Emily Shaffer 
> > ---
> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> > @@ -412,6 +412,12 @@ For C programs:
> > + - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
> > +   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
> > +   run `GIT_DEBUGGER=my-debugger-binary my-args ./bin-wrappers/git foo` to
> 
> Don't you need to bind my-debugger-binary and my-args together with
> shell quotes? Also, placeholders like these are often ensconced in
> angle brackets, so perhaps:
> 
> ... `GIT_DEBUGGER=" " ./bin-wrappers/git ...

Fixed. I did get the dq in the example but missed it with the
placeholders. :) Thanks, good catch.

> > +   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
> > +   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)

Patch update to follow. Thanks.

 - Emily


[PATCH v4] doc: hint about GIT_DEBUGGER in CodingGuidelines

2019-05-28 Thread Emily Shaffer
We check for a handy environment variable GIT_DEBUGGER when running via
bin-wrappers/, but this feature is undocumented. Add a hint to how to
use it into the CodingGuidelines (which is where other useful
environment settings like DEVELOPER are documented).

Signed-off-by: Emily Shaffer 
---
 Documentation/CodingGuidelines | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 32210a4386..1169ff6c8e 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -412,6 +412,12 @@ For C programs:
must be declared with "extern" in header files. However, function
declarations should not use "extern", as that is already the default.
 
+ - You can launch gdb around your program using the shorthand GIT_DEBUGGER.
+   Run `GIT_DEBUGGER=1 ./bin-wrappers/git foo` to simply use gdb as is, or
+   run `GIT_DEBUGGER=" " ./bin-wrappers/git foo` to
+   use your own debugger and arguments. Example: `GIT_DEBUGGER="ddd --gdb"
+   ./bin-wrappers/git log` (See `wrap-for-bin.sh`.)
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
-- 
2.22.0.rc1.257.g3120a18244-goog



Re: [PATCH v1 3/5] list-objects-filter: implement composite filters

2019-05-28 Thread Emily Shaffer
On Tue, May 21, 2019 at 05:21:52PM -0700, Matthew DeVore wrote:
> Allow combining filters such that only objects accepted by all filters
> are shown. The motivation for this is to allow getting directory
> listings without also fetching blobs. This can be done by combining
> blob:none with tree:. There are massive repositories that have
> larger-than-expected trees - even if you include only a single commit.
> 
> The current usage requires passing the filter to rev-list, or sending
> it over the wire, as:
> 
>   combine:+
> 
> (i.e.: git rev-list --filter=combine:tree:2+blob:limit=32k). This is
> potentially awkward because individual filters must be URL-encoded if
> they contain + or %. This can potentially be improved by supporting a
> repeated flag syntax, e.g.:
> 
>   $ git rev-list --filter=tree:2 --filter=blob:limit=32k
> 
> Such usage is currently an error, so giving it a meaning is backwards-
> compatible.
> 
> Signed-off-by: Matthew DeVore 
> ---
>  Documentation/rev-list-options.txt |  12 ++
>  contrib/completion/git-completion.bash |   2 +-
>  list-objects-filter-options.c  | 161 -
>  list-objects-filter-options.h  |  14 ++-
>  list-objects-filter.c  | 114 +
>  t/t6112-rev-list-filters-objects.sh| 159 +++-
>  6 files changed, 455 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/rev-list-options.txt 
> b/Documentation/rev-list-options.txt
> index ddbc1de43f..4fb0c4fbb0 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -730,20 +730,32 @@ specification contained in .
>  +
>  The form '--filter=tree:' omits all blobs and trees whose depth
>  from the root tree is >=  (minimum depth if an object is located
>  at multiple depths in the commits traversed). =0 will not include
>  any trees or blobs unless included explicitly in the command-line (or
>  standard input when --stdin is used). =1 will include only the
>  tree and blobs which are referenced directly by a commit reachable from
>   or an explicitly-given object. =2 is like =1
>  while also including trees and blobs one more level removed from an
>  explicitly-given commit or tree.
> ++
> +The form '--filter=combine:++...' combines
> +several filters. Only objects which are accepted by every filter are
> +included. Filters are joined by '{plus}' and individual filters are %-encoded
> +(i.e. URL-encoded). Besides the '{plus}' and '%' characters, the following
> +characters are reserved and also must be encoded:
> +`~!@#$^&*()[]{}\;",<>?`+'`+ as well as all characters with ASCII code
> +<= `0x20`, which includes space and newline.
> ++
> +Other arbitrary characters can also be encoded. For instance,
> +'combine:tree:3+blob:none' and 'combine:tree%3A2+blob%3Anone' are
> +equivalent.
>  
>  --no-filter::
>   Turn off any previous `--filter=` argument.
>  
>  --filter-print-omitted::
>   Only useful with `--filter=`; prints a list of the objects omitted
>   by the filter.  Object IDs are prefixed with a ``~'' character.
>  
>  --missing=::
>   A debug option to help with future "partial clone" development.
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 3eefbabdb1..0fd0a10d0c 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1529,21 +1529,21 @@ _git_difftool ()
>  __git_fetch_recurse_submodules="yes on-demand no"
>  
>  _git_fetch ()
>  {
>   case "$cur" in
>   --recurse-submodules=*)
>   __gitcomp "$__git_fetch_recurse_submodules" "" 
> "${cur##--recurse-submodules=}"
>   return
>   ;;
>   --filter=*)
> - __gitcomp "blob:none blob:limit= sparse:oid= sparse:path=" "" 
> "${cur##--filter=}"
> + __gitcomp "blob:none blob:limit= sparse:oid= sparse:path= 
> combine: tree:" "" "${cur##--filter=}"
>   return
>   ;;
>   --*)
>   __gitcomp_builtin fetch
>   return
>   ;;
>   esac
>   __git_complete_remote_or_refspec
>  }
>  
> diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
> index e46ea467bc..d7a1516188 100644
> --- a/list-objects-filter-options.c
> +++ b/list-objects-filter-options.c
> @@ -1,19 +1,24 @@
>  #include "cache.h"
>  #include "commit.h"
>  #include "config.h"
>  #include "revision.h"
>  #include "argv-array.h"
>  #include "list-objects.h"
>  #include "list-objects-filter.h"
>  #include "list-objects-filter-options.h"
>  
> +static int parse_combine_filter(
> + struct list_objects_filter_options *filter_options,
> + const char *arg,
> + struct strbuf *errbuf);
> +
>  /*
>   * Parse value of the argument to the "filter" keyword.
>   * On the command line this looks like:
>   *   --filter=
>   * and in the pack protocol as:
>   *   "filter" SP 
>   *
>   * The filter

Re: [PATCH v6 1/2] documentation: add tutorial for first contribution

2019-05-29 Thread Emily Shaffer
On Sun, May 26, 2019 at 09:48:26AM +0200, Christian Couder wrote:
> On Fri, May 17, 2019 at 11:48 PM Emily Shaffer  
> wrote:
> >
> > This tutorial covers how to add a new command to Git and, in the
> > process, everything from cloning git/git to getting reviewed on the
> > mailing list. It's meant for new contributors to go through
> > interactively, learning the techniques generally used by the git/git
> > development community.
> 
> Very nice, thanks! I tried it and I liked it very much.
> 
> I noted a few nits that might help improve it a bit.
> 
> > +
> > +$ git clone https://github.com/git/git git
> 
> Nit: maybe add "$ cd git" after that.

Sure, done.

> 
> > +Check it out! You've got a command! Nice work! Let's commit this.
> > +
> > +
> > +$ git add Makefile builtin.h builtin/psuh.c git.c
> > +$ git commit -s
> > +
> 
> Nit: when building a "git-psuh" binary is created at the root of the
> repo which will pollute the `git status` output. The usual way we deal
> with that is by adding "/git-psuh" to the ".gitignore" at the root of
> the repo.

Right you are - good catch. I'll add a paragraph about adding to the
gitignore.
> 
> > +=== Implementation
> > +
> > +It's probably useful to do at least something besides printing out a 
> > string.
> > +Let's start by having a look at everything we get.
> > +
> > +Modify your `cmd_psuh` implementation to dump the args you're passed:
> 
> Nit: maybe it could be a bit more clear that the previous printf()
> call should be kept as is, otherwise the test added later will fail.

Done.

> 
> > +
> > +   const char *cfg_name;
> > +
> > +...
> > +
> > +   git_config(git_default_config, NULL)
> 
> Nit: a ";" is missing at the end of the above line.

Yikes, done.

> 
> > +Let's commit this as well.
> > +
> > +
> > +$ git commit -sm "psuh: print the current branch"
> 
> Nit: maybe add "builtin/psuh.c" at the end of the above line, so that
> a `git add builtin/psuh.c` is not needed.

This is purely personal preference, but I prefer manually adding files
first. I didn't add any indication about staging the changes to psuh.c
though, so I'm adding a line to `git add builtin/psuh.c`. I found one
other place where the commit line was shown without the add line, so I
included the add there too.

> 
> > +
> > +git-psuh(1)
> > +===
> > +
> > +NAME
> > +
> > +git-psuh - Delight users' typo with a shy horse
> > +
> > +
> > +SYNOPSIS
> > +
> > +[verse]
> > +'git-psuh'
> > +
> > +DESCRIPTION
> > +---
> > +...
> > +
> > +OPTIONS[[OPTIONS]]
> > +--
> > +...
> > +
> > +OUTPUT
> > +--
> > +...
> > +
> > +
> 
> Nit: it seems that the above newline could be removed.


Sure, why not.

> 
> Thanks,
> Christian.

Thanks for trying it out and for your thorough review, Christian. I
appreciate it! Since this is checked into next already, I'll be sending
a follow-on patch in reply to my last version in this thread which is
based on the tip of next.

 - Emily


[PATCH] doc: add some nit fixes to MyFirstContribution

2019-05-29 Thread Emily Shaffer
A trial run-through of the tutorial revealed a few typos and missing
commands in the tutorial itself. This commit fixes typos, clarifies
which lines to keep or modify in some places, and adds a section on
putting the git-psuh binary into the gitignore.

Signed-off-by: Emily Shaffer 
---
This patch is based on next, as the doc hasn't made it to master yet.

- Missing `cd git` after cloning the repo
- Documented how to add git-psuh to the gitignore
- Documented the need to leave prior printfs in place during the
  tutorial
- Typos: missing ;, stray newline
- Missing `git add builtin/psuh.c` in a couple of places; this could
  also have been done by adding the filename to the end of the commit
  call, but I don't think that's a good habit (as opposed to staging all
  changes, inspecting the wt state, and then committing). Open for
  debate.

Big thanks to Christian for the trial run and review.

 - Emily

 Documentation/MyFirstContribution.txt | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
index 274df8575b..895b7cfd4f 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -35,6 +35,7 @@ the mirror on GitHub.
 
 
 $ git clone https://github.com/git/git git
+$ cd git
 
 
 [[identify-problem]]
@@ -164,8 +165,28 @@ $ ./bin-wrappers/git psuh
 
 Check it out! You've got a command! Nice work! Let's commit this.
 
+`git status` reveals modified `Makefile`, `builtin.h`, and `git.c` as well as
+untracked `builtin/psuh.c` and `git-psuh`. First, let's take care of the 
binary,
+which should be ignored. Open `.gitignore` in your editor, find `/git-push`, 
and
+add an entry for your new command in alphabetical order:
+
+
+...
+/git-prune-packed
+/git-psuh
+/git-pull
+/git-push
+/git-quiltimport
+/git-range-diff
+...
+
+
+Checking `git status` again should show that `git-psuh` has been removed from
+the untracked list and `.gitignore` has been added to the modified list. Now we
+can stage and commit:
+
 
-$ git add Makefile builtin.h builtin/psuh.c git.c
+$ git add Makefile builtin.h builtin/psuh.c git.c .gitignore
 $ git commit -s
 
 
@@ -211,7 +232,8 @@ on the reference implementation linked at the top of this 
document.
 It's probably useful to do at least something besides printing out a string.
 Let's start by having a look at everything we get.
 
-Modify your `cmd_psuh` implementation to dump the args you're passed:
+Modify your `cmd_psuh` implementation to dump the args you're passed, keeping
+existing `printf()` calls in place:
 
 
int i;
@@ -243,7 +265,7 @@ function body:
 
 ...
 
-   git_config(git_default_config, NULL)
+   git_config(git_default_config, NULL);
if (git_config_get_string_const("user.name", &cfg_name) > 0)
printf(_("No name is found in config\n"));
else
@@ -315,6 +337,7 @@ Run it again. Check it out - here's the (verbose) name of 
your current branch!
 Let's commit this as well.
 
 
+$ git add builtin/psuh.c
 $ git commit -sm "psuh: print the current branch"
 
 
@@ -366,6 +389,7 @@ see the subject line of the most recent commit in 
`origin/master` that you know
 about. Neat! Let's commit that as well.
 
 
+$ git add builtin/psuh.c
 $ git commit -sm "psuh: display the top of origin/master"
 
 
@@ -418,7 +442,6 @@ OUTPUT
 --
 ...
 
-
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.22.0.rc1.257.g3120a18244-goog



Is --filter-print-omitted correct/used/needed?

2019-06-06 Thread Emily Shaffer
Yesterday while down a rabbit hole, I discovered an interesting thing
about the 'omitted' object in many filters in
list-objects-filter-options.h.  It appears that when we call
list-objects.h:traverse_commit_list_filtered() with a non-NULL
'omitted' argument, we still perform a walk of all objects - that is,
the filtered walk is no more efficient than the unfiltered walk from
the same place via 'traverse_commit_list()'. I verified by calling
each and counting the objects:

161 if (0) {
162 /* Unfiltered: */
163 printf(_("Unfiltered object walk.\n"));
164 traverse_commit_list(rev, walken_show_commit,
165 walken_show_object, NULL);
166 } else {
167 printf(_("Filtered object walk with filterspec
'tree:1'.\n"));
168 /*
169  * We can parse a tree depth of 1 to demonstrate
the kind of
170  * filtering that could occur eg during shallow
cloning.
171  */
172 parse_list_objects_filter(&filter_options,
"tree:1");
173
174 traverse_commit_list_filtered(&filter_options,
rev,
175 walken_show_commit, walken_show_object,
NULL, &omitted);
176 }
177
178 /* Count the omitted objects. */
179 oidset_iter_init(&omitted, &oit);
180
181 while ((oid = oidset_iter_next(&oit)))
182 omitted_count++;
183
184 printf(_("Object walk completed. Found %d commits, %d
blobs, %d tags, "
185"and %d trees; %d omitted objects.\n"),
commit_count,
186blob_count, tag_count, tree_count, omitted_count);

I found that omitted_count was always equal to the difference between
sum(blob_count, tag_count, tree_count, commit_count) in the unfiltered
and filtered walks. I also found that the length of time required to
perform the unfiltered walk and the filtered-with-non-NULL-omitted
walk was the same, while the time required to perform the filtered
walk with NULL omitted was significantly shorter. (The walk in
question was over the latest release of Git master, plus the ten or so
commits in my feature branch.)

I was surprised! I figured that with filter "tree:1" that "omitted"
would contain only the objects on the "border" of the filter - that
is, I assumed it would contain the blobs and trees in the root git
dir, but none of those trees' blobs and trees. After talking with
jrnieder at length, it sounds like neither of us were clear on why
this "omitted" list would be needed beyond the initial development
stage of a new filter... Jonathan's impression was also that if we do
need the "omitted" list, it may be a bug that we're still traversing
objects which are only reachable from objects already omitted.

I grepped the Git source and found that we only provide a non-NULL
"omitted" when someone calls "git rev-list --filter-print-omitted",
which we verify with a simple test case for "blobs:none", in which
case the "border" objects which were omitted must be the same as all
objects which were omitted (since blobs aren't pointing to anything
else). I think if we had written a similar test case with some trees
we expect to omit we might have noticed sooner.

Since I was already in the rabbit hole, out of curiosity I did a
search on Github and only found one project referencing
--filter-print-omitted which wasn't a mirror of Git:
https://github.com/search?l=Python&q=%22--filter-print-omitted%22&type=Code

So, what do we use --filter-print-omitted for? Is anybody needing it?
Or do we just use it to verify this one test case? Should we fix it,
or get rid of it, or neither?

 - Emily


[PATCH] documentation: add tutorial for revision walking

2019-06-06 Thread Emily Shaffer
Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer 
---

This one is longer than the MyFirstContribution one, thanks in advance
to anybody with the wherewithal to review this.

I'll also be mailing an RFC patchset In-Reply-To this message; the RFC
patchset should not be merged to Git, as I intend to host it in my own
mirror as an example. I hosted a similar example for the
MyFirstContribution tutorial; it's visible at
https://github.com/nasamuffin/git/tree/psuh. There might be a better
place to host these so I don't "own" them but I'm not sure what it is;
keeping them as a live branch somewhere struck me as an okay way to keep
them from getting stale.

Looking forward to hearing everyone's comments!
 - Emily

 Documentation/.gitignore |   1 +
 Documentation/Makefile   |   1 +
 Documentation/MyFirstRevWalk.txt | 826 +++
 3 files changed, 828 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/.gitignore b/Documentation/.gitignore
index 9022d48355..0e3df737c5 100644
--- a/Documentation/.gitignore
+++ b/Documentation/.gitignore
@@ -12,6 +12,7 @@ cmds-*.txt
 mergetools-*.txt
 manpage-base-url.xsl
 SubmittingPatches.txt
+MyFirstRevWalk.txt
 tmp-doc-diff/
 GIT-ASCIIDOCFLAGS
 /GIT-EXCLUDED-PROGRAMS
diff --git a/Documentation/Makefile b/Documentation/Makefile
index dbf5a0f276..d57b80962f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out 
technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 00..494c09d1fa
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,826 @@
+My First Revision Walk
+==
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, 
the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be usedto determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer 
Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+
+git checkout -b revwalk origin/master
+
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include 
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+printf(_("cmd_walken incoming...\n"));
+return 0;
+}
+
+
+Add usage text and `-h` handling, in order to pass the test suite:
+
+
+static const char * const walken_usage[] = {
+   N_("git walken"),
+   NULL,
+}
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options, walken_usage, 0

[RFC PATCH 01/13] walken: add infrastructure for revwalk demo

2019-06-06 Thread Emily Shaffer
Begin to add scaffolding for `git walken`, a toy command which we will
teach to perform a number of revision walks, in order to demonstrate the
mechanics of revision walking for developers new to the Git project.

This commit is the beginning of an educational series which correspond
to the tutorial in Documentation/MyFirstRevWalk.txt.

Signed-off-by: Emily Shaffer 
---
 Makefile |  1 +
 builtin.h|  1 +
 builtin/walken.c | 14 ++
 3 files changed, 16 insertions(+)
 create mode 100644 builtin/walken.c

diff --git a/Makefile b/Makefile
index 8a7e235352..a25d46c7a3 100644
--- a/Makefile
+++ b/Makefile
@@ -1143,6 +1143,7 @@ BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
 BUILTIN_OBJS += builtin/verify-tag.o
+BUILTIN_OBJS += builtin/walken.o
 BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
diff --git a/builtin.h b/builtin.h
index ec7e0954c4..c919736c36 100644
--- a/builtin.h
+++ b/builtin.h
@@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char 
*prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 int cmd_version(int argc, const char **argv, const char *prefix);
+int cmd_walken(int argc, const char **argv, const char *prefix);
 int cmd_whatchanged(int argc, const char **argv, const char *prefix);
 int cmd_worktree(int argc, const char **argv, const char *prefix);
 int cmd_write_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/walken.c b/builtin/walken.c
new file mode 100644
index 00..bfeaa5188d
--- /dev/null
+++ b/builtin/walken.c
@@ -0,0 +1,14 @@
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include 
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+   printf(_("cmd_walken incoming...\n"));
+   return 0;
+}
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 00/13] example implementation of revwalk tutorial

2019-06-06 Thread Emily Shaffer
This patchset is NOT intended to be merged to the Git project!

This patchset should indicate what a contributor would generate by following the
MyFirstRevWalk tutorial.

I intend to push a feature branch with these patches to my own mirror of
Git on Github (github.com/nasamuffin/git/tree/revwalk). I'm sending them
for review by the list to check for consistency with the Git codebase,
so they aren't a bad example for new contributors.

Thanks for any reviews, all!
 - Emily

Emily Shaffer (13):
  walken: add infrastructure for revwalk demo
  walken: add usage to enable -h
  walken: add placeholder to initialize defaults
  walken: add handler to git_config
  walken: configure rev_info and prepare for walk
  walken: perform our basic revision walk
  walken: filter for authors from gmail address
  walken: demonstrate various topographical sorts
  walken: demonstrate reversing a revision walk list
  walken: add unfiltered object walk from HEAD
  walken: add filtered object walk
  walken: count omitted objects
  walken: reverse the object walk order

 Makefile |   1 +
 builtin.h|   1 +
 builtin/walken.c | 263 +++
 git.c|   1 +
 4 files changed, 266 insertions(+)
 create mode 100644 builtin/walken.c

-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 05/13] walken: configure rev_info and prepare for walk

2019-06-06 Thread Emily Shaffer
`struct rev_info` is what's used by the struct itself.
`repo_init_revisions()` initializes the struct; then we need to set it
up for the walk we want to perform, which is done in
`final_rev_info_setup()`.

The most important step here is adding the first object we want to walk
to the pending array. Here, we take the easy road and use
`add_head_to_pending()`; there is also a way to do it with
`setup_revision_opt()` and `setup_revisions()` which we demonstrate but
do not use. If we were to forget this step, the walk would do nothing -
the pending queue would be checked, determined to be empty, and the walk
would terminate immediately.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 5d1666a5da..c101db38c7 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,6 +6,7 @@
 
 #include 
 #include "builtin.h"
+#include "revision.h"
 #include "config.h"
 #include "parse-options.h"
 
@@ -26,6 +27,35 @@ static void init_walken_defaults(void)
 */
 }
 
+/*
+ * cmd_log calls a second set of init after the repo_init_revisions call. We'll
+ * mirror those settings in post_repo_init_init.
+ */
+static void final_rev_info_setup(int argc, const char **argv, const char 
*prefix,
+   struct rev_info *rev)
+{
+   struct setup_revision_opt opt;
+
+   /* setup_revision_opt is used to pass options to the setup_revisions()
+* call. It's got some special items for submodules and other types of
+* optimizations, but for now, we'll just point it to HEAD and call it
+* good. First we should make sure to reset it. TODO: This is useful for
+* more complicated stuff revisions, but a decent shortcut for the first
+* pass is add_head_to_pending().
+*/
+   memset(&opt, 0, sizeof(opt));
+   opt.def = "HEAD";
+   opt.revarg_opt = REVARG_COMMITTISH;
+   //setup_revisions(argc, argv, rev, &opt);
+
+   /* Let's force oneline format. */
+   get_commit_format("oneline", rev);
+   rev->verbose_header = 1;
+   
+   /* add the HEAD to pending so we can start */
+   add_head_to_pending(rev);
+}
+
 /*
  * This method will be called back by git_config(). It is used to gather values
  * from the configuration files available to Git.
@@ -54,12 +84,24 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   struct rev_info rev;
+
argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
init_walken_defaults();
 
git_config(git_walken_config, NULL);
 
+   /* Time to set up the walk. repo_init_revisions sets up rev_info with
+* the defaults, but then you need to make some configuration settings
+* to make it do what's special about your walk.
+*/
+   repo_init_revisions(the_repository, &rev, prefix);
+
+   /* Before we do the walk, we need to set a starting point. It's not
+* coming from opt. */
+   final_rev_info_setup(argc, argv, prefix, &rev);
+
printf(_("cmd_walken incoming...\n"));
return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 04/13] walken: add handler to git_config

2019-06-06 Thread Emily Shaffer
For now, we have no configuration options we want to set up for
ourselves, but in the future we may need to. At the very least, we
should invoke git_default_config() for each config option; we will do so
inside of a skeleton config callback so that we know where to add
configuration handling later on when we need it.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index dcee906556..5d1666a5da 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,6 +6,7 @@
 
 #include 
 #include "builtin.h"
+#include "config.h"
 #include "parse-options.h"
 
 static const char * const walken_usage[] = {
@@ -25,6 +26,28 @@ static void init_walken_defaults(void)
 */
 }
 
+/*
+ * This method will be called back by git_config(). It is used to gather values
+ * from the configuration files available to Git.
+ *
+ * Each time git_config() finds a configuration file entry, it calls this
+ * callback. Then, this function should compare it to entries which concern us,
+ * and make settings changes as necessary.
+ *
+ * If we are called with a config setting we care about, we should use one of
+ * the helpers which exist in config.h to pull out the value for ourselves, 
i.e.
+ * git_config_string(...) or git_config_bool(...).
+ *
+ * If we don't match anything, we should pass it along to another stakeholder
+ * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our 
case,
+ * we'll ignore everybody else.
+ */
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+   /* For now, let's not bother with anything. */
+   return git_default_config(var, value, cb);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -34,6 +57,9 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
init_walken_defaults();
+
+   git_config(git_walken_config, NULL);
+
printf(_("cmd_walken incoming...\n"));
return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 08/13] walken: demonstrate various topographical sorts

2019-06-06 Thread Emily Shaffer
Order the revision walk by author or commit dates, to demonstrate how to
apply topo_sort to a revision walk.

While following the tutorial, new contributors are guided to run a walk
with each sort and compare the results.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 6c0f4e7b7a..716d31f04e 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,11 @@ static void final_rev_info_setup(int argc, const char 
**argv, const char *prefix

/* add the HEAD to pending so we can start */
add_head_to_pending(rev);
+   
+   /* Let's play with the sort order. */
+   rev->topo_order = 1;
+   rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+   /* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */
 }
 
 /*
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 13/13] walken: reverse the object walk order

2019-06-06 Thread Emily Shaffer
Demonstrate that just like commit walks, object walks can have their
order reversed. Additionally, add verbose logging of objects encountered
in order to let contributors prove to themselves that the walk has
actually been reversed. With this commit, `git walken` becomes extremely
chatty - it's recommended to pipe the output through `head` or `tail` or
to redirect it into a file.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index d93725ee88..4bfee3a2d7 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -102,11 +102,13 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
 
 static void walken_show_commit(struct commit *cmt, void *buf)
 {
+   printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid));
commit_count++;
 }
 
 static void walken_show_object(struct object *obj, const char *str, void *buf)
 {
+   printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid));
switch (obj->type) {
case OBJ_TREE:
tree_count++;
@@ -149,6 +151,7 @@ static int walken_object_walk(struct rev_info *rev)
rev->tag_objects = 1;
rev->tree_blobs_in_commit_order = 1;
rev->exclude_promisor_objects = 1;
+   rev->reverse = 1;
 
if (prepare_revision_walk(rev))
die(_("revision walk setup failed"));
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 09/13] walken: demonstrate reversing a revision walk list

2019-06-06 Thread Emily Shaffer
The final installment in the tutorial about sorting revision walk
outputs. This commit reverses the commit list, so that we see newer
commits last (handy since we aren't using a pager).

It's important to note that rev->reverse needs to be set after
add_head_to_pending() or before setup_revisions(). (This is mentioned in
the accompanying tutorial.)

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 716d31f04e..86c8d29c48 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,9 @@ static void final_rev_info_setup(int argc, const char **argv, 
const char *prefix

/* add the HEAD to pending so we can start */
add_head_to_pending(rev);
+
+   /* Reverse the order */
+   rev->reverse = 1;

/* Let's play with the sort order. */
rev->topo_order = 1;
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 06/13] walken: perform our basic revision walk

2019-06-06 Thread Emily Shaffer
Add the final steps needed and implement the walk loop itself. We add a
method walken_commit_walk() which performs the final setup to revision.c
and then iterates over commits from get_revision().

This basic walk only prints the subject line of each commit in the
history. It is nearly equivalent to `git log --oneline`.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index c101db38c7..9cf19a24ab 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -7,8 +7,11 @@
 #include 
 #include "builtin.h"
 #include "revision.h"
+#include "commit.h"
 #include "config.h"
 #include "parse-options.h"
+#include "pretty.h"
+#include "line-log.h"
 
 static const char * const walken_usage[] = {
N_("git walken"),
@@ -78,6 +81,39 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
+/*
+ * walken_commit_walk() is invoked by cmd_walken() after initialization. It
+ * does the commit walk only.
+ */
+static int walken_commit_walk(struct rev_info *rev)
+{
+   struct commit *commit;
+   struct strbuf prettybuf;
+
+   strbuf_init(&prettybuf, 0);
+
+
+   /* prepare_revision_walk() gets the final steps ready for a revision
+* walk. We check the return value for errors. */
+   if (prepare_revision_walk(rev)) {
+   die(_("revision walk setup failed"));
+   }
+
+   /* Now we can start the real commit walk. get_revision grabs the next
+* revision based on the contents of rev.
+*/
+   rev->diffopt.close_file = 0;
+   while ((commit = get_revision(rev)) != NULL) {
+   if (commit == NULL)
+   continue;
+   strbuf_reset(&prettybuf);
+   pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+   printf(_("%s\n"), prettybuf.buf);
+
+   }
+   return 0;
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -98,10 +134,15 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 */
repo_init_revisions(the_repository, &rev, prefix);
 
+   /* We can set our traversal flags here. */
+   rev.always_show_header = 1;
+
/* Before we do the walk, we need to set a starting point. It's not
 * coming from opt. */
final_rev_info_setup(argc, argv, prefix, &rev);
 
+   walken_commit_walk(&rev);
+
printf(_("cmd_walken incoming...\n"));
return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 02/13] walken: add usage to enable -h

2019-06-06 Thread Emily Shaffer
One requirement of the Git test suite is that all commands support '-h',
which is captured by parse_options(). In order to support this flag, add
a short usage text to walken.c and invoke parse_options().

With this change, we can now add cmd_walken to the builtins set and
expect tests to pass, so we'll do so - cmd_walken is now open for
business.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 12 
 git.c|  1 +
 2 files changed, 13 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index bfeaa5188d..5ae7c7d93f 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,9 +6,21 @@
 
 #include 
 #include "builtin.h"
+#include "parse-options.h"
+
+static const char * const walken_usage[] = {
+   N_("git walken"),
+   NULL,
+};
 
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
+   struct option options[] = {
+   OPT_END()
+   };
+
+   argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
printf(_("cmd_walken incoming...\n"));
return 0;
 }
diff --git a/git.c b/git.c
index 1bf9c94550..209c42836f 100644
--- a/git.c
+++ b/git.c
@@ -600,6 +600,7 @@ static struct cmd_struct commands[] = {
{ "verify-pack", cmd_verify_pack },
{ "verify-tag", cmd_verify_tag, RUN_SETUP },
{ "version", cmd_version },
+   { "walken", cmd_walken, RUN_SETUP },
{ "whatchanged", cmd_whatchanged, RUN_SETUP },
{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
{ "write-tree", cmd_write_tree, RUN_SETUP },
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 10/13] walken: add unfiltered object walk from HEAD

2019-06-06 Thread Emily Shaffer
Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 79 ++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 86c8d29c48..408af6c841 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 static const char * const walken_usage[] = {
@@ -19,6 +20,11 @@ static const char * const walken_usage[] = {
NULL,
 };
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -93,6 +99,70 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+   commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+   switch (obj->type) {
+   case OBJ_TREE:
+   tree_count++;
+   break;
+   case OBJ_BLOB:
+   blob_count++;
+   break;
+   case OBJ_TAG:
+   tag_count++;
+   break;
+   case OBJ_COMMIT:
+   printf(_("Unexpectedly encountered a commit in "
+"walken_show_object!\n"));
+   commit_count++;
+   break;
+   default:
+   printf(_("Unexpected object type %s!\n"),
+  type_name(obj->type));
+   break;
+   }
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It 
does
+ * a walk of all object types.
+ */
+static int walken_object_walk(struct rev_info *rev)
+{
+   struct list_objects_filter_options filter_options = {};
+   struct oidset omitted;
+   oidset_init(&omitted, 0);
+
+   printf("walken_object_walk beginning...\n");
+
+   rev->tree_objects = 1;
+   rev->blob_objects = 1;
+   rev->tag_objects = 1;
+   rev->tree_blobs_in_commit_order = 1;
+   rev->exclude_promisor_objects = 1;
+
+   if (prepare_revision_walk(rev))
+   die(_("revision walk setup failed"));
+
+   commit_count = 0;
+   tag_count = 0;
+   blob_count = 0;
+   tree_count = 0;
+
+   traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+   printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
+  "and %d trees.\n"), commit_count, blob_count, tag_count,
+  tree_count);
+
+   return 0;
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * does the commit walk only.
@@ -151,9 +221,14 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 
/* Before we do the walk, we need to set a starting point. It's not
 * coming from opt. */
-   final_rev_info_setup(argc, argv, prefix, &rev);
 
-   walken_commit_walk(&rev);
+   if (1) {
+   add_head_to_pending(&rev);
+   walken_object_walk(&rev);
+   } else {
+   final_rev_info_setup(argc, argv, prefix, &rev);
+   walken_commit_walk(&rev);
+   }
 
printf(_("cmd_walken incoming...\n"));
return 0;
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 07/13] walken: filter for authors from gmail address

2019-06-06 Thread Emily Shaffer
In order to demonstrate how to create grep filters for revision walks,
filter the walk performed by cmd_walken() to print only commits which
are authored by someone with a gmail address.

This commit demonstrates how to append a grep pattern to a
rev_info.grep_filter, to teach new contributors how to create their own
more generalized grep filters during revision walks.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 9cf19a24ab..6c0f4e7b7a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "grep.h"
 
 static const char * const walken_usage[] = {
N_("git walken"),
@@ -25,9 +26,8 @@ static const char * const walken_usage[] = {
  */
 static void init_walken_defaults(void)
 {
-   /* We don't actually need the same components `git log` does; leave this
-* empty for now.
-*/
+   /* Needed by our grep filter. */
+   init_grep_defaults(the_repository);
 }
 
 /*
@@ -51,6 +51,10 @@ static void final_rev_info_setup(int argc, const char 
**argv, const char *prefix
opt.revarg_opt = REVARG_COMMITTISH;
//setup_revisions(argc, argv, rev, &opt);
 
+   /* Add a grep pattern to the author line in the header. */
+   append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, 
"gmail");
+   compile_grep_patterns(&rev->grep_filter);
+
/* Let's force oneline format. */
get_commit_format("oneline", rev);
rev->verbose_header = 1;
@@ -77,7 +81,7 @@ static void final_rev_info_setup(int argc, const char **argv, 
const char *prefix
  */
 static int git_walken_config(const char *var, const char *value, void *cb)
 {
-   /* For now, let's not bother with anything. */
+   grep_config(var, value, cb);
return git_default_config(var, value, cb);
 }
 
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 12/13] walken: count omitted objects

2019-06-06 Thread Emily Shaffer
It may be illuminating to see which objects were not included within a
given filter. This also demonstrates, since filter-spec "tree:1" is
used, that the 'omitted' list contains all objects which are omitted,
not just the first objects which were omitted - that is, it continues to
dereference omitted trees and commits.

This is part of a tutorial on performing revision walks.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index f2c98bcd6b..d93725ee88 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -137,6 +137,9 @@ static int walken_object_walk(struct rev_info *rev)
 {
struct list_objects_filter_options filter_options = {};
struct oidset omitted;
+   struct oidset_iter oit;
+   struct object_id *oid = NULL;
+   int omitted_count = 0;
oidset_init(&omitted, 0);
 
printf("walken_object_walk beginning...\n");
@@ -172,9 +175,15 @@ static int walken_object_walk(struct rev_info *rev)
walken_show_commit, walken_show_object, NULL, &omitted);
}
 
+   /* Count the omitted objects. */
+   oidset_iter_init(&omitted, &oit);
+
+   while ((oid = oidset_iter_next(&oit)))
+   omitted_count++;
+
printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
-  "and %d trees.\n"), commit_count, blob_count, tag_count,
-  tree_count);
+  "and %d trees; %d omitted objects.\n"), commit_count,
+  blob_count, tag_count, tree_count, omitted_count);
 
return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 03/13] walken: add placeholder to initialize defaults

2019-06-06 Thread Emily Shaffer
Eventually, we will want a good place to initialize default variables
for use during our revision walk(s) in `git walken`. For now, there's
nothing to do here, but let's add the scaffolding so that it's easy to
tell where to put the setup later on.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 5ae7c7d93f..dcee906556 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -13,6 +13,18 @@ static const char * const walken_usage[] = {
NULL,
 };
 
+/*
+ * Within init_walken_defaults() we can call into other useful defaults to set
+ * in the global scope or on the_repository. It's okay to borrow from other
+ * functions which are doing something relatively similar to yours.
+ */
+static void init_walken_defaults(void)
+{
+   /* We don't actually need the same components `git log` does; leave this
+* empty for now.
+*/
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -21,6 +33,7 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 
argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
+   init_walken_defaults();
printf(_("cmd_walken incoming...\n"));
return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog



[RFC PATCH 11/13] walken: add filtered object walk

2019-06-06 Thread Emily Shaffer
Demonstrate how filter specs can be used when performing a revision walk
of all object types. In this case, tree depth is used. Contributors who
are following the revision walking tutorial will be encouraged to run
the revision walk with and without the filter in order to compare the
number of objects seen in each case.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 408af6c841..f2c98bcd6b 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -13,6 +13,7 @@
 #include "pretty.h"
 #include "line-log.h"
 #include "list-objects.h"
+#include "list-objects-filter-options.h"
 #include "grep.h"
 
 static const char * const walken_usage[] = {
@@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev)
blob_count = 0;
tree_count = 0;
 
-   traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+   if (1) {
+   /* Unfiltered: */
+   printf(_("Unfiltered object walk.\n"));
+   traverse_commit_list(rev, walken_show_commit,
+   walken_show_object, NULL);
+   } else {
+   printf(_("Filtered object walk with filterspec 'tree:1'.\n"));
+   /*
+* We can parse a tree depth of 1 to demonstrate the kind of
+* filtering that could occur eg during shallow cloning.
+*/
+   parse_list_objects_filter(&filter_options, "tree:1");
+
+   traverse_commit_list_filtered(&filter_options, rev,
+   walken_show_commit, walken_show_object, NULL, &omitted);
+   }
 
printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
   "and %d trees.\n"), commit_count, blob_count, tag_count,
-- 
2.22.0.rc1.311.g5d7573a151-goog



Re: Is --filter-print-omitted correct/used/needed?

2019-06-07 Thread Emily Shaffer
On Fri, Jun 07, 2019 at 02:57:01PM -0400, Jeff Hostetler wrote:
> 
> 
> On 6/7/2019 2:38 AM, Christian Couder wrote:
> > On Thu, Jun 6, 2019 at 10:18 PM Emily Shaffer  
> > wrote:
> > 
> > 
> > 
> > > I grepped the Git source and found that we only provide a non-NULL
> > > "omitted" when someone calls "git rev-list --filter-print-omitted",
> > > which we verify with a simple test case for "blobs:none", in which
> > > case the "border" objects which were omitted must be the same as all
> > > objects which were omitted (since blobs aren't pointing to anything
> > > else). I think if we had written a similar test case with some trees
> > > we expect to omit we might have noticed sooner.
> > 
> > It seems that --filter-print-omitted was introduced in caf3827e2f
> > (rev-list: add list-objects filtering support, 2017-11-21) so I cc'ed
> > Jeff.
> > 
> > [...]
> 
> The --filter-print-omitted was intended to print the complete list
> of omitted objects.  For example, a packfile built from a filtered
> command and a packfile build from the unfiltered command would differ
> by exactly that set of objects.
> 
> So the discrepancy reported by the tree:1 example is incorrect.
> The omitted set is the full set, not the frontier.  So when
> --filter-print-omitted is used, we still have to do the full tree walk.
> When not specified, we do get the perf boost because we can terminate
> the tree walk early.
> 
> 
> > > So, what do we use --filter-print-omitted for? Is anybody needing it?
> > > Or do we just use it to verify this one test case? Should we fix it,
> > > or get rid of it, or neither?
> > 
> > In caf3827e2f there is:
> > 
> >  This patch introduces handling of missing objects to help
> >  debugging and development of the "partial clone" mechanism,
> >  and once the mechanism is implemented, for a power user to
> >  perform operations that are missing-object aware without
> >  incurring the cost of checking if a missing link is expected.
> > 
> > So I would say that if you think that --filter-print-omitted doesn't
> > help in debugging or development, and can even be confusing, and that
> > it also doesn't help performance for power users or anyone else, then
> > it would make sense to remove it, unless you find a way to make it
> > fulfill its original goals, or maybe other worthwhile goals.
> 
> I don't currently have a use for that (other than the existing test
> cases), but we could use that in the future as a guide for the server
> to put the omitted objects on a CDN, for example.
> 
> So I'd say let's leave it as is for now.

Thanks for the input, Jeff. I wasn't sure from looking at it whether it
was intended behavior with a plan for use; looks like it is. I'll leave
it alone.

> 
> 
> Jeff
> 
> 
> 


[PATCH] revision: remove stray whitespace when name empty

2019-06-07 Thread Emily Shaffer
Teach show_object_with_name() to avoid writing a space before a name
which is empty. Also teach tests for rev-list --objects --filter to not
require a space between the object ID and name.

show_object_with_name() inserts a space between an object's OID and name
regardless of whether the name is empty or not. This causes 'git
cat-file (--batch | --batch-check)' to fail to discover the type of root
directories:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
| git cat-file --batch-check
  git rev-parse HEAD: | xargs -I% git cat-file -t %
  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
| xargs -I% echo "AA%AA"
  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
| cut -f 1 -d ' ' | git cat-file --batch-check

The extra space provided by 'show_object_with_name()' sticks to the
output of 'git rev-list' even if it's piped into a file.

Signed-off-by: Emily Shaffer 
---
I don't see any reason _not_ to remove this stray whitespace at the end,
since it seems like it just gets in the way of easy scripting. I also
think this case will only present itself for root trees.

Not sure if making the whitespace optional is the right solution for the
test, although I couldn't come up with a cleaner approach. Maybe
something like this would be better, even though handling it in the
regex is shorter?

  if [[ -z "$name" ]] then
grep "^$hash" actual
  else
grep "^$hash $name" actual
  fi

 - Emily

 revision.c  | 3 ++-
 t/t6112-rev-list-filters-objects.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index d4aaf0ef25..260258b371 100644
--- a/revision.c
+++ b/revision.c
@@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, 
const char *name)
 {
const char *p;
 
-   fprintf(out, "%s ", oid_to_hex(&obj->oid));
+   fprintf(out, "%s%s", oid_to_hex(&obj->oid),
+   strcmp(name, "") == 0 ? "" : " ");
for (p = name; *p && *p != '\n'; p++)
fputc(*p, out);
fputc('\n', out);
diff --git a/t/t6112-rev-list-filters-objects.sh 
b/t/t6112-rev-list-filters-objects.sh
index acd7f5ab80..e05faa7a67 100755
--- a/t/t6112-rev-list-filters-objects.sh
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -288,7 +288,7 @@ expect_has () {
name=$2 &&
 
hash=$(git -C r3 rev-parse $commit:$name) &&
-   grep "^$hash $name$" actual
+   grep "^$hash \?$name$" actual
 }
 
 test_expect_success 'verify tree:1 includes root trees' '
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog



Re: [PATCH] revision: remove stray whitespace when name empty

2019-06-12 Thread Emily Shaffer
On Fri, Jun 07, 2019 at 08:42:45PM -0400, Eric Sunshine wrote:
> On Fri, Jun 7, 2019 at 6:59 PM Emily Shaffer  wrote:
> > Teach show_object_with_name() to avoid writing a space before a name
> > which is empty. Also teach tests for rev-list --objects --filter to not
> > require a space between the object ID and name.
> > [...]
> > Signed-off-by: Emily Shaffer 
> > ---
> > Not sure if making the whitespace optional is the right solution for the
> > test, although I couldn't come up with a cleaner approach. Maybe
> > something like this would be better, even though handling it in the
> > regex is shorter?
> >
> >   if [[ -z "$name" ]] then
> 
> In Git, we avoid Bash-isms. Just use 'test'. And, style is to place
> 'then' on its own line.
> 
> if test -z "$name"
> then
> 
> > diff --git a/revision.c b/revision.c
> > @@ -40,7 +40,8 @@ void show_object_with_name(FILE *out, struct object *obj, 
> > const char *name)
> >  {
> > -   fprintf(out, "%s ", oid_to_hex(&obj->oid));
> > +   fprintf(out, "%s%s", oid_to_hex(&obj->oid),
> > +   strcmp(name, "") == 0 ? "" : " ");
> > for (p = name; *p && *p != '\n'; p++)
> > fputc(*p, out);
> > fputc('\n', out);
> 
> It's subjective, but this starts to be a bit too noisy and unreadable.
> An alternative:
> 
> fputs(oid_to_hex(...), out);
> if (*name)
> putc(' ', out);
> 
> > diff --git a/t/t6112-rev-list-filters-objects.sh 
> > b/t/t6112-rev-list-filters-objects.sh
> > @@ -288,7 +288,7 @@ expect_has () {
> > hash=$(git -C r3 rev-parse $commit:$name) &&
> > -   grep "^$hash $name$" actual
> > +   grep "^$hash \?$name$" actual
> 
> This would be the first use of \? with 'grep' in the test suite. I
> would worry about portability. Your suggestion earlier to tailor
> 'grep' invocation based upon whether $name is empty would be safer.


Thanks for the review, Eric. I'm going to try again from scratch with
Peff and Junio's suggestion about adding an option to remove object
names, and I'll try to keep your general style concerns in mind when I
do so.

 - Emily


Re: [PATCH] revision: remove stray whitespace when name empty

2019-06-12 Thread Emily Shaffer
On Mon, Jun 10, 2019 at 09:29:14AM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > Your patch only helps with this at all because you're using the "tree:1"
> > ...
> > because there you'll have actual names which cat-file will choke on. So
> > it seems like this is helping only a very limited use case.
> > ...
> > Alternatively, it would be reasonable to me to have an option for
> > "rev-list --objects" to have an option to suppress the filename (and
> > space) entirely.
> 
> Yup, I think that is a more reasonable short-term change compared to
> the patch being discussed, and ...

I like this too. Starting work on it today.

> 
> > I think in the longer run along those lines that "--objects" should
> > allow cat-file style pretty-print formats, which would eliminate the
> > need to pipe to cat-file in the first place. That makes this parsing
> > problem go away entirely, and it's way more efficient to boot (rev-list
> > already knows the types!).
> 
> ... of course this makes tons of sense.

Probably not going to start work on this now, but I will make a note to
myself to have a look if I have some spare time soon, and keep an eye
out for reviews in that area.

> 
> Thanks.

Thanks all for the feedback, I'll have a reroll soon.

 - Emily


[PATCH v2] rev-list: teach --oid-only to enable piping

2019-06-13 Thread Emily Shaffer
Allow easier parsing by cat-file by giving rev-list an option to print
only the OID of an object without any additional information. This is a
short-term shim; later on, rev-list should be taught how to print the
types of objects it finds in a format similar to cat-file's.

Before this commit, the output from rev-list needed to be massaged
before being piped to cat-file, like so:

  git rev-list --objects HEAD | cut -f 1 -d ' ' \
| git cat-file --batch-check

This was especially unexpected when dealing with root trees, as an
invisible whitespace exists at the end of the OID:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
| xargs -I% echo "AA%AA"

Now, it can be piped directly, as in the added test case:

  git rev-list --objects --oid-only HEAD | git cat-file --batch-check

Signed-off-by: Emily Shaffer 
Change-Id: I489bdf0a8215532e54017513ff7541d70e1b
---
It didn't appear that using an existing --pretty string would do the
trick, as the formatting options for --pretty apply specifically to
commit objects (you can specify the commit hash and the tree hash, but
I didn't see a way to more generally pretty-print all types of objects).
rev-list doesn't appear to use parse-options-h, so I added a new option
as simply as I could see without breaking existing style; it seemed
wrong to add a flag to the `struct rev_list_info` as the definition of
struct and flag values alike are contained in bisect.h. There are a
couple other options to rev-list which are stored as globals.

At the moment, this doesn't work with --abbrev, and although I think
it'd be a good fit, right now --abbrev doesn't seem to work without
--abbrev-commit, and both those options end up being eaten by
setup_revisions() rather than by rev-list itself.

 - Emily

 builtin/rev-list.c   | 21 -
 t/t6000-rev-list-misc.sh | 18 ++
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f31837d30..ff07ea9564 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -48,7 +48,7 @@ static const char rev_list_usage[] =
 "--children\n"
 "--objects | --objects-edge\n"
 "--unpacked\n"
-"--header | --pretty\n"
+"--header | --pretty | --oid-only\n"
 "--abbrev= | --no-abbrev\n"
 "--abbrev-commit\n"
 "--left-right\n"
@@ -75,6 +75,8 @@ enum missing_action {
 };
 static enum missing_action arg_missing_action;
 
+static int arg_oid_only; /* display only the oid of each object encountered */
+
 #define DEFAULT_OIDSET_SIZE (16*1024)
 
 static void finish_commit(struct commit *commit, void *data);
@@ -90,6 +92,11 @@ static void show_commit(struct commit *commit, void *data)
return;
}
 
+   if (arg_oid_only) {
+   printf("%s\n", oid_to_hex(&commit->object.oid));
+   return;
+   }
+
graph_show_commit(revs->graph);
 
if (revs->count) {
@@ -255,6 +262,10 @@ static void show_object(struct object *obj, const char 
*name, void *cb_data)
display_progress(progress, ++progress_counter);
if (info->flags & REV_LIST_QUIET)
return;
+   if (arg_oid_only) {
+   printf("%s\n", oid_to_hex(&obj->oid));
+   return;
+   }
show_object_with_name(stdout, obj, name);
 }
 
@@ -484,6 +495,11 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
if (skip_prefix(arg, "--missing=", &arg))
continue; /* already handled above */
 
+   if (!strcmp(arg, ("--oid-only"))) {
+   arg_oid_only = 1;
+   continue;
+   }
+
usage(rev_list_usage);
 
}
@@ -499,6 +515,9 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
/* Only --header was specified */
revs.commit_format = CMIT_FMT_RAW;
 
+   if (arg_oid_only && revs.commit_format != CMIT_FMT_UNSPECIFIED)
+   die(_("cannot combine --oid-only and --pretty or --header"));
+
if ((!revs.commits && reflog_walk_empty(revs.reflog_info) &&
 (!(revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
  !revs.pending.nr) &&
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 0507999729..996ba1799b 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -48,6 +48,24 @@ test_expect_success 'rev-list --objects with pathspecs and 
copied files' '
! grep one output
 '
 
+test_expect_success 'rev-list --objects --oid-only has no whitespace/names' '
+   git rev-list --objects --oid-only HEAD >output &a

[RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage

2019-06-13 Thread Emily Shaffer
Indicate that --abbrev only works with --abbrev-commit also specified.
It seems that simply running `git rev-list --abbrev=5` doesn't
abbreviate commit OIDs. But the combination of `git rev-list
--abbrev-commit --abbrev=5` works as expected. Clarify in the
documentation by indicating that --abbrev is an optional addition to the
--abbrev-commit option. --no-abbrev remains on a separate line as it can
still be used to disable OID abbreviation even if --abbrev-commit has
been specified.

Signed-off-by: Emily Shaffer 
Change-Id: If9b1198938e1a3515ae6740241f7b791fb7a88bd
---
I thought this was odd when I was working on the other rev-list changes
- --abbrev doesn't do anything on its own. It looks like it does work by
itself in other commands, but apparently not in rev-list.

Listed this patch as RFC because maybe instead it's better to fix
something so --abbrev can be used alone, or teach --abbrev-commit=.
It looks like `git log --abbrev=5` also doesn't work the way one might
expect, which makes sense to me, as they use the same internals for
option parsing (parse_revisions()).

The manpages for log and rev-list both correctly indicate that
--abbrev= is an optional addition to --abbrev-commit. `git log -h` is
generated by parse-options tooling and doesn't cover --abbrev-commit at
all, but `git rev-list` doesn't use an option parser on its own and the
usage is hardcoded.

 - Emily

 builtin/rev-list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 9f31837d30..6ae0087b01 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -49,8 +49,8 @@ static const char rev_list_usage[] =
 "--objects | --objects-edge\n"
 "--unpacked\n"
 "--header | --pretty\n"
-"--abbrev= | --no-abbrev\n"
-"--abbrev-commit\n"
+"--abbrev-commit [--abbrev=]\n"
+"--no-abbrev\n"
 "--left-right\n"
 "--count\n"
 "  special purpose:\n"
-- 
2.22.0.rc2.383.gf4fbbf30c2-goog



Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage

2019-06-14 Thread Emily Shaffer
On Fri, Jun 14, 2019 at 12:18:41PM -0400, Jeff King wrote:
> On Thu, Jun 13, 2019 at 03:15:41PM -0700, Emily Shaffer wrote:
> 
> > I thought this was odd when I was working on the other rev-list changes
> > - --abbrev doesn't do anything on its own. It looks like it does work by
> > itself in other commands, but apparently not in rev-list.
> > 
> > Listed this patch as RFC because maybe instead it's better to fix
> > something so --abbrev can be used alone, or teach --abbrev-commit=.
> > It looks like `git log --abbrev=5` also doesn't work the way one might
> > expect, which makes sense to me, as they use the same internals for
> > option parsing (parse_revisions()).
> > 
> > The manpages for log and rev-list both correctly indicate that
> > --abbrev= is an optional addition to --abbrev-commit. `git log -h` is
> > generated by parse-options tooling and doesn't cover --abbrev-commit at
> > all, but `git rev-list` doesn't use an option parser on its own and the
> > usage is hardcoded.
> 
> Yeah, "--abbrev" is a bit tricky here. It is really "when you abbrev, do
> it to this level". In "log", that means that "git log --abbrev=5 --raw"
> _does_ do something useful (it abbreviates the blob hashes). And then
> you may add "--abbrev-commit" on top to ask to abbreviate the commit
> ids.
> 
> And rev-list follows that same pattern, except that rev-list _never_
> shows diff output. You'd traditionally do (and this is how log was
> implemented once upon a time):
> 
>   git rev-list HEAD | git diff-tree --stdin --abbrev=5 --raw
> 
> But even there, we are not seeing the commit id output by rev-list. It
> goes to diff-tree, which then formats it separately. So if you wanted
> abbreviated commits there, you'd add "--abbrev-commit" to the diff-tree
> invocation, not rev-list!
> 
> So no, I cannot see a way in which "rev-list --abbrev" is useful without
> "--abbrev-commit". Which means that perhaps the former should imply the
> latter.

Maybe it should; maybe this patch is a good excuse to do so, and enforce
mutual exclusion of --abbrev-commit/--abbrev and --no-abbrev. Maybe it's
also a good time to add --abbrev-commit=?

> 
> And as you noticed in your other patch, there is no way to abbreviate
> "--objects" output at all. I am not sure I have ever seen a good use for
> that. Though to be honest, I am not sure that "--abbrev" is really all
> that useful in the first place. Machine-readable output should never
> abbreviate, and human-readable ones generally already do.
> 
> But at any rate, before making any behavior changes it may make sense to
> think about how they'd interact with "rev-list --objects" abbreviation,
> if it were to be added.
> 
> As for the patch itself:
> 
> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index 9f31837d30..6ae0087b01 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -49,8 +49,8 @@ static const char rev_list_usage[] =
> >  "--objects | --objects-edge\n"
> >  "--unpacked\n"
> >  "--header | --pretty\n"
> > -"--abbrev= | --no-abbrev\n"
> > -"--abbrev-commit\n"
> > +"--abbrev-commit [--abbrev=]\n"
> > +"--no-abbrev\n"
> 
> So --no-abbrev clears both --abbrev and --abbrev-commit. That sort of
> makes sense, though I did not expect it. But it also means that:
> 
>   --abbrev-commit [--abbrev= | --no-abbrev]
> 
> is not right. Possibly:
> 
>   --abbrev-commit [--abbrev=] | --no-abbrev
> 
> would show the interaction more clearly, but I don't have a strong
> opinion.

I did consider demonstrating it this way, but when both --abbrev-commit
and --no-abbrev are used together, we don't complain or reject the
invocation - which I would expect if the usage states the two options
are mutually exclusive.

I've been trying to think of good reasons not to enforce their mutual
exclusion, and the one I keep coming back to is that --no-abbrev might
be desired to override a git config'd abbreviation length - although I
didn't check to see whether we have one, maybe we would want one later.
And even in that case, I suppose that --abbrev-commit would not be
explicitly added to the call (because we'd infer from the config), or
that if it did need to be explicitly added (like if we need the user to
say they want abbreviation, but we want to use their configured
preferred length) then we could still reject the addition of
--no-abbrev.

So maybe it makes even more sense to take this patch as an opportunity
to make these options mutually exclusive... although that checking I
think would wind up in revision.c, and therefore widen the impact of
the change significantly.

>From a practical standpoint, I guess you could consider --abbrev-commit
and --no-abbrev mutually exclusive, but since it's not enforced, I have
a little preference not to document it as though it is...

 - Emily


Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage

2019-06-14 Thread Emily Shaffer
On Fri, Jun 14, 2019 at 05:27:14PM -0400, Jeff King wrote:
> On Fri, Jun 14, 2019 at 01:59:50PM -0700, Emily Shaffer wrote:
> 
> > > So no, I cannot see a way in which "rev-list --abbrev" is useful without
> > > "--abbrev-commit". Which means that perhaps the former should imply the
> > > latter.
> > 
> > Maybe it should; maybe this patch is a good excuse to do so, and enforce
> > mutual exclusion of --abbrev-commit/--abbrev and --no-abbrev. Maybe it's
> > also a good time to add --abbrev-commit=?
> 
> Hmm, yeah, I think that would reduce confusion quite a bit. If it were
> "--abbrev-commit=", then "--abbrev" would not be useful for
> anything in rev-list. It would still work as a historical item, but we
> would not need or want to advertise it in the usage at all. Good
> suggestion.

Given your comments below, I think rather than enforcing mutual
exclusion it makes more sense to enforce last-one-wins. But the thinking
is essentially the same.

> 
> > > is not right. Possibly:
> > > 
> > >   --abbrev-commit [--abbrev=] | --no-abbrev
> > > 
> > > would show the interaction more clearly, but I don't have a strong
> > > opinion.
> > 
> > I did consider demonstrating it this way, but when both --abbrev-commit
> > and --no-abbrev are used together, we don't complain or reject the
> > invocation - which I would expect if the usage states the two options
> > are mutually exclusive.
> 
> Ah, I see. I don't consider "|" to indicate an exclusion to the point
> that the options are rejected. Only that you wouldn't want to use both,
> because one counteracts the other. So every "--no-foo" is mutually
> exclusive with "--foo" in the sense that one override the other. But the
> outcome is "last one wins", and not "whoops, we cannot figure out what
> you meant". And that's what the original:
> 
>   --abbrev= | --no-abbrev
> 
> before your patch was trying to say (and I suspect there are many other
> cases of "|" with this kind of last-one-wins behavior).

For what it's worth, in this case it's not last-one-wins - --no-abbrev
always wins:

  emilyshaffer@podkayne:~/git [master]$ g rev-list --abbrev-commit
  --no-abbrev --max-count=5 --pretty=oneline HEAD
  b697d92f56511e804b8ba20ccbe7bdc85dc66810 Git 2.22
  6ee1eaca3e996e69691f515742129645f453e0e8 Merge tag 'l10n-2.22.0-rnd3' of
git://github.com/git-l10n/git-po
  0cdb8d2db2f39d1a29636975168c457d2dc0d466 Merge branch 'fr_review' of
git://github.com/jnavila/git
  d0149200792f579151166a4a5bfae7e66c5d998b Merge branch 'master' of
git://github.com/alshopov/git-po
  82eb147dbbbd0221980883e87ca7efd16a939a6f l10n: fr.po: Review French
translation
  emilyshaffer@podkayne:~/git [master]$ g rev-list --no-abbrev
  --abbrev-commit --max-count=5 --pretty=oneline HEAD
  b697d92f56511e804b8ba20ccbe7bdc85dc66810 Git 2.22
  6ee1eaca3e996e69691f515742129645f453e0e8 Merge tag 'l10n-2.22.0-rnd3' of
git://github.com/git-l10n/git-po
  0cdb8d2db2f39d1a29636975168c457d2dc0d466 Merge branch 'fr_review' of
git://github.com/jnavila/git
  d0149200792f579151166a4a5bfae7e66c5d998b Merge branch 'master' of
git://github.com/alshopov/git-po
  82eb147dbbbd0221980883e87ca7efd16a939a6f l10n: fr.po: Review French
translation
  emilyshaffer@podkayne:~/git [master]$ g rev-list --abbrev-commit
  --max-count=5 --pretty=oneline HEAD
  b697d92f56 Git 2.22
  6ee1eaca3e Merge tag 'l10n-2.22.0-rnd3' of
git://github.com/git-l10n/git-po
  0cdb8d2db2 Merge branch 'fr_review' of git://github.com/jnavila/git
  d014920079 Merge branch 'master' of git://github.com/alshopov/git-po
  82eb147dbb l10n: fr.po: Review French translation

> 
> > I've been trying to think of good reasons not to enforce their mutual
> > exclusion, and the one I keep coming back to is that --no-abbrev might
> > be desired to override a git config'd abbreviation length - although I
> > didn't check to see whether we have one, maybe we would want one later.
> > And even in that case, I suppose that --abbrev-commit would not be
> > explicitly added to the call (because we'd infer from the config), or
> > that if it did need to be explicitly added (like if we need the user to
> > say they want abbreviation, but we want to use their configured
> > preferred length) then we could still reject the addition of
> > --no-abbrev.
> >
> > So maybe it makes even more sense to take this patch as an opportunity
> > to make these options mutually exclusive... although that checking I
> > think would 

Re: [PATCH v2] rev-list: teach --oid-only to enable piping

2019-06-14 Thread Emily Shaffer
On Fri, Jun 14, 2019 at 01:25:59PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> 
> > But I wonder if things would be simpler if we did not touch the commit
> > code path at all. I.e., if this were simply "--no-object-names", and it
> > touched only show_object().
> 
> Yeah, that sounds more tempting.  And the refined code structure you
> suggested ...
> 
> >> @@ -255,6 +262,10 @@ static void show_object(struct object *obj, const 
> >> char *name, void *cb_data)
> >>display_progress(progress, ++progress_counter);
> >>if (info->flags & REV_LIST_QUIET)
> >>return;
> >> +  if (arg_oid_only) {
> >> +  printf("%s\n", oid_to_hex(&obj->oid));
> >> +  return;
> >> +  }
> >>show_object_with_name(stdout, obj, name);
> >>  }
> >>  
> >
> > A minor style point, but I think this might be easier to follow without
> > the early return, since we are really choosing to do A or B. Writing:
> >
> >   if (arg_oid_only)
> > printf(...);
> >   else
> > show_object_with_name(...);
> >
> > shows that more clearly, I think.
> 
> ... is a good way to clearly show that intention, I would think.

Sounds good. Thanks, both; I'll reroll that quickly today.


Re: [PATCH v2] rev-list: teach --oid-only to enable piping

2019-06-14 Thread Emily Shaffer
On Fri, Jun 14, 2019 at 12:07:28PM -0400, Jeff King wrote:
> On Thu, Jun 13, 2019 at 02:51:03PM -0700, Emily Shaffer wrote:

> > +test_expect_success 'rev-list --objects --oid-only is usable by cat-file' '
> > +   git rev-list --objects --oid-only --all >list-output &&
> > +   git cat-file --batch-check cat-output &&
> > +   ! grep missing cat-output
> > +'
> 
> Usually we prefer to look for the expected output, rather than making
> sure we did not find the unexpected. But I'm not sure if that might be
> burdensome in this case (i.e., if there's a bunch of cruft coming out of
> "rev-list" that would be annoying to match, and might even change as
> people add more tests). So I'm OK with it either way.

My (newbie) opinion is that in this case, we specifically want to know
that cat-file didn't choke on objects which we know exist (since they
came from rev-list). I have the feeling that checking for the exact
objects returned instead (or a sample of them) would be more brittle and
would also make the wording of the test less direct.

So if there's no complaint either way, I'd prefer to leave it the way it
is.

By the way, rev-list-misc.sh has a number of other existing "! grep ..."
lines.

 - Emily


[PATCH v3] rev-list: teach --no-object-names to enable piping

2019-06-14 Thread Emily Shaffer
Allow easier parsing by cat-file by giving rev-list an option to print
only the OID of a non-commit object without any additional information.
This is a short-term shim; later on, rev-list should be taught how to
print the types of objects it finds in a format similar to cat-file's.

Before this commit, the output from rev-list needed to be massaged
before being piped to cat-file, like so:

  git rev-list --objects HEAD | cut -f 1 -d ' ' \
| git cat-file --batch-check

This was especially unexpected when dealing with root trees, as an
invisible whitespace exists at the end of the OID:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD \
| xargs -I% echo "AA%AA"

Now, it can be piped directly, as in the added test case:

  git rev-list --objects --no-object-names HEAD | git cat-file --batch-check

Signed-off-by: Emily Shaffer 
Change-Id: I489bdf0a8215532e54017513ff7541d70e1b
---
Based on Peff and Junio's comments, made following changes since v2:

 - Removed interaction with commit objects
 - Renamed option to "no-object-names"
 - Removed warnings when new option is combined with commit-formatting
   options (and reflected in usage)
 - Simplified logic in show_object()

Thanks for the thoughts, all.

 - Emily

 builtin/rev-list.c   | 14 +-
 t/t6000-rev-list-misc.sh | 18 ++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 660172b014..7e2598fd22 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -49,6 +49,7 @@ static const char rev_list_usage[] =
 "--objects | --objects-edge\n"
 "--unpacked\n"
 "--header | --pretty\n"
+"--no-object-names\n"
 "--abbrev= | --no-abbrev\n"
 "--abbrev-commit\n"
 "--left-right\n"
@@ -75,6 +76,9 @@ enum missing_action {
 };
 static enum missing_action arg_missing_action;
 
+/* display only the oid of each object encountered */
+static int arg_no_object_names;
+
 #define DEFAULT_OIDSET_SIZE (16*1024)
 
 static void finish_commit(struct commit *commit);
@@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char 
*name, void *cb_data)
display_progress(progress, ++progress_counter);
if (info->flags & REV_LIST_QUIET)
return;
-   show_object_with_name(stdout, obj, name);
+   if (arg_no_object_names)
+   printf("%s\n", oid_to_hex(&obj->oid));
+   else
+   show_object_with_name(stdout, obj, name);
 }
 
 static void show_edge(struct commit *commit)
@@ -484,6 +491,11 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
if (skip_prefix(arg, "--missing=", &arg))
continue; /* already handled above */
 
+   if (!strcmp(arg, ("--no-object-names"))) {
+   arg_no_object_names = 1;
+   continue;
+   }
+
usage(rev_list_usage);
 
}
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 0507999729..5d87171b99 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -48,6 +48,24 @@ test_expect_success 'rev-list --objects with pathspecs and 
copied files' '
! grep one output
 '
 
+test_expect_success 'rev-list --objects --no-object-names has no space/names' '
+   git rev-list --objects --no-object-names HEAD >output &&
+   ! grep wanted_file output &&
+   ! grep unwanted_file output &&
+   ! grep " " output
+'
+
+test_expect_success 'rev-list --objects --no-object-names works with cat-file' 
'
+   git rev-list --objects --no-object-names --all >list-output &&
+   git cat-file --batch-check cat-output &&
+   ! grep missing cat-output
+'
+
+test_expect_success 'rev-list --oid-only is incompatible with --pretty' '
+   test_must_fail git rev-list --objects --oid-only --pretty HEAD &&
+   test_must_fail git rev-list --objects --oid-only --header HEAD
+'
+
 test_expect_success 'rev-list A..B and rev-list ^A B are the same' '
git commit --allow-empty -m another &&
git tag -a -m "annotated" v1.0 &&
-- 
2.22.0.410.gd8fdbe21b5-goog



Re: [RFC PATCH 11/13] walken: add filtered object walk

2019-06-17 Thread Emily Shaffer
On Fri, Jun 07, 2019 at 03:15:53PM -0400, Jeff Hostetler wrote:
> 
> 
> On 6/6/2019 9:08 PM, Emily Shaffer wrote:
> > Demonstrate how filter specs can be used when performing a revision walk
> > of all object types. In this case, tree depth is used. Contributors who
> > are following the revision walking tutorial will be encouraged to run
> > the revision walk with and without the filter in order to compare the
> > number of objects seen in each case.
> > 
> > Signed-off-by: Emily Shaffer 
> > ---
> >   builtin/walken.c | 18 +-
> >   1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > index 408af6c841..f2c98bcd6b 100644
> > --- a/builtin/walken.c
> > +++ b/builtin/walken.c
> > @@ -13,6 +13,7 @@
> >   #include "pretty.h"
> >   #include "line-log.h"
> >   #include "list-objects.h"
> > +#include "list-objects-filter-options.h"
> >   #include "grep.h"
> >   static const char * const walken_usage[] = {
> > @@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev)
> > blob_count = 0;
> > tree_count = 0;
> > -   traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> > +   if (1) {
> > +   /* Unfiltered: */
> > +   printf(_("Unfiltered object walk.\n"));
> > +   traverse_commit_list(rev, walken_show_commit,
> > +   walken_show_object, NULL);
> > +   } else {
> > +   printf(_("Filtered object walk with filterspec 'tree:1'.\n"));
> > +   /*
> > +* We can parse a tree depth of 1 to demonstrate the kind of
> > +* filtering that could occur eg during shallow cloning.
> > +*/
> 
> I think I'd avoid the term "shallow clone" here.  Shallow clone
> refers to getting a limited commit history.  That's orthogonal from
> partial clone and the filtered tree walk that operates *within* a commit
> or a series of commits.
> 
> Granted, a user might want to do both a shallow and partial clone (and
> then later partial fetches), but I wouldn't mix the concepts here.

It's a valid complaint. I removed the mention of shallow cloning and
replaced it with a reference to the documentation for --filter in
rev-list. Thanks.

 - Emily


Re: [PATCH] documentation: add tutorial for revision walking

2019-06-17 Thread Emily Shaffer
On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote:
> On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer  wrote:
> > [...]
> > The tutorial covers a basic overview of the structs involved during
> > revision walk, setting up a basic commit walk, setting up a basic
> > all-object walk, and adding some configuration changes to both walk
> > types. It intentionally does not cover how to create new commands or
> > search for options from the command line or gitconfigs.
> > [...]
> > Signed-off-by: Emily Shaffer 
> > ---
> > diff --git a/Documentation/.gitignore b/Documentation/.gitignore
> > @@ -12,6 +12,7 @@ cmds-*.txt
> >  SubmittingPatches.txt
> > +MyFirstRevWalk.txt
> 
> The new file itself is named Documentation/MyFirstRevWalk.txt, so why
> add it to .gitignore?

Yep, fixed. Holdover from an initial attempt which named the file
MyFirstRevWalk (no extension), which was then corrected for the earlier
tutorial I sent. Thanks.

> 
> > diff --git a/Documentation/MyFirstRevWalk.txt 
> > b/Documentation/MyFirstRevWalk.txt
> > @@ -0,0 +1,826 @@
> > +== What's a Revision Walk?
> > +
> > +The revision walk is a key concept in Git - this is the process that 
> > underpins
> > +operations like `git log`, `git blame`, and `git reflog`. Beginning at 
> > HEAD, the
> > +list of objects is found by walking parent relationships between objects. 
> > The
> > +revision walk can also be usedto determine whether or not a given object is
> 
> s/usedto/used to/

Done.

> 
> > +reachable from the current HEAD pointer.
> > +
> > +We'll put our fiddling into a new command. For fun, let's name it `git 
> > walken`.
> > +Open up a new file `builtin/walken.c` and set up the command handler:
> > +
> > +
> > +/*
> > + * "git walken"
> > + *
> > + * Part of the "My First Revision Walk" tutorial.
> > + */
> > +
> > +#include 
> > +#include "builtin.h"
> 
> Git source files must always include cache.h or git-compat-util.h (or,
> for builtins, builtin.h) as the very first header since those headers
> take care of differences which might crop up as problems with system
> headers on various platforms. System headers are included after Git
> headers. So, stdio.h should be included after builtin.h. In this case,
> however, stdio.h will get pulled in by git-compat-util.h anyhow, so
> you need not include it here.

Done.

> 
> > +Add usage text and `-h` handling, in order to pass the test suite:
> > +
> > +
> > +static const char * const walken_usage[] = {
> > +   N_("git walken"),
> > +   NULL,
> > +}
> 
> Unless you plan on referencing this from functions other than
> cmd_walken(), it need not be global.

Done; bad C++ habits sneaking in. :)

> 
> > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > +{
> > +   struct option options[] = {
> > +   OPT_END()
> > +   };
> > +
> > +   argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
> > +
> > +   ...
> 
> Perhaps comment out the "..." or remove it altogether to avoid having
> the compiler barf when the below instructions tell the reader to build
> the command.

Hmm. That part I'm not so sure about. I like to use the "..." to
indicate where the code in the snippet should be added around the other
code already in the file - which I suppose it does just as clearly if
it's commented - but I also hope folks are not simply copy-pasting
blindly from the tutorial.

It seems like including uncommented "..." in code tutorials is pretty
common.

I don't think I have a good reason to push back on this except that I
think "/* ... */" is ugly :)

I'll go through and replace "..." with some actual hints about what's
supposed to go there; for example, here I'll replace with "/* print and
return */".

> 
> > +}
> > +
> > +Also add the relevant line in builtin.h near `cmd_whatchanged()`:
> 
> s/builtin.h/`&`/

Done.

> 
> > +Build and test out your command, without forgetting to ensure the 
> > `DEVELOPER`
> > +flag is set:
> > +
> > +
> > +echo DEVELOPER=1 >config.mak
> 
> This will blast existing content of 'config.mak' which could be
> dangerous. It might be better to suggest >> instead.

Done.

> 
> > +`name` is the SHA-1 of the object - a 40-digit hex string you may be 
> > familiar
> > +with from using Git to organize your source in the past. Chec

Re: [PATCH] documentation: add tutorial for revision walking

2019-06-17 Thread Emily Shaffer
On Mon, Jun 10, 2019 at 01:49:41PM -0700, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > +My First Revision Walk
> > +==
> > +
> > +== What's a Revision Walk?
> > +
> > +The revision walk is a key concept in Git - this is the process that 
> > underpins
> > +operations like `git log`, `git blame`, and `git reflog`. Beginning at 
> > HEAD, the
> > +list of objects is found by walking parent relationships between objects. 
> > The
> > +revision walk can also be usedto determine whether or not a given object is
> > +reachable from the current HEAD pointer.
> 
> s/usedto/used to/;
Done.
> 
> > +We'll put our fiddling into a new command. For fun, let's name it `git 
> > walken`.
> > +Open up a new file `builtin/walken.c` and set up the command handler:
> > +
> > +
> > +/*
> > + * "git walken"
> > + *
> > + * Part of the "My First Revision Walk" tutorial.
> > + */
> > +
> > +#include 
> 
> Bad idea.  In the generic part of the codebase, system headers are
> supposed to be supplied by including git-compat-util.h (or cache.h
> or builtin.h, that are common header files that begin by including
> it and are allowed by CodingGuidelines to be used as such).
Done.
> 
> > +#include "builtin.h"
> > +
> > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > +{
> > +printf(_("cmd_walken incoming...\n"));
> > +return 0;
> > +}
> > +
> 
> I wonder if it makes sense to use trace instead of printf, as our
> reader has already seen the psuh example for doing the above.

Hmmm. I will think about it and look into the intended use of each. I
hadn't considered using a different logging method.

> 
> > +Add usage text and `-h` handling, in order to pass the test suite:
> 
> It is not wrong per-se, and it indeed is a very good practice to
> make sure that our subcommands consistently gives usage text and
> short usage.  Encouraging them early is a good idea.
> 
> But "in order to pass the test suite" invites "eh, the test suite
> does not pass without usage and -h?  why?".
> 
> Either drop the mention of "the test suite", or perhaps say
> something like
> 
>   Add usage text and `-h` handling, like all the subcommands
>   should consistently do (our test suite will notice and
>   complain if you fail to do so).
> 
> i.e. the real purpose is consistency and usability; test suite is
> merely an enforcement mechanism.

Yeah, you're right. I'll reword this.

> 
> > +
> > +{ "walken", cmd_walken, RUN_SETUP },
> > +
> > +
> > +Add it to the `Makefile` near the line for `builtin\worktree.o`:
> 
> Backslash intended?

Nope, typo.


Thanks for the comments, Junio.

 - Emily


Re: [PATCH] documentation: add tutorial for revision walking

2019-06-17 Thread Emily Shaffer
On Mon, Jun 10, 2019 at 01:25:14PM -0700, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > I'll also be mailing an RFC patchset In-Reply-To this message; the RFC
> > patchset should not be merged to Git, as I intend to host it in my own
> > mirror as an example. I hosted a similar example for the
> > MyFirstContribution tutorial; it's visible at
> > https://github.com/nasamuffin/git/tree/psuh. There might be a better
> > place to host these so I don't "own" them but I'm not sure what it is;
> > keeping them as a live branch somewhere struck me as an okay way to keep
> > them from getting stale.
> 
> Yes, writing the initial version is one thing, but keeping it alive
> is more work and more important.  As the underlying API changes over
> time, it will become necessary to update the sample implementation,
> but for a newbie who wants to learn by building "walken" on top of
> the then-current codebase and API, it would not be so helpful to
> show "these 7 patches were for older codebase, and the tip 2 are
> incremental updates to adjust to the newer API", so the maintenance
> of these sample patches may need different paradigm than the norm
> for our main codebase that values incremental polishing.
>
I'm trying to think of how it would end up working if I tried to use a
Github workflow. I think it wouldn't - someone would open a PR, and then
I'd have to rewrite that change into the appropriate commit in the live
branch and push the entire branch anew. Considering that workflow leaves
me doubly convinced that leaving it in my personal fork indefinitely
might not be wise (what if I become unable to continue maintaining it)?

I wonder if this is something that might fit well in
one of the more closely-associated mirrors, like gitster/git or
gitgitgadget/git - although I wonder if those count as "owned" by Junio
and Johannes, respectively. H.

Maybe there's a case for storing them as a set of patch files that are
revision-controlled somewhere within Documentation/? There was some
discussion on the IRC a few weeks ago about trying to organize these
tutorials into their own directory to form a sort of "Git Contribution
101" course, maybe it makes sense to store there?

  Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
  Documentation/contributing/myfirstcontrib/sample/*.patch
  Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
  Documentation/contributing/myfirstrevwalk/sample/*.patch

I don't love the idea of maintaining text patches with the expectation
that they should cleanly apply always, but it might make the idea that
they shouldn't contain 2 patches on the tip for API adjustment more
clear. And it would be probably pretty easy to inflate and build them
with a build target or something. Hm.

 - Emily


Re: [PATCH v3] rev-list: teach --no-object-names to enable piping

2019-06-18 Thread Emily Shaffer
On Mon, Jun 17, 2019 at 03:32:34PM -0700, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > Allow easier parsing by cat-file by giving rev-list an option to print
> > only the OID of a non-commit object without any additional information.
> > This is a short-term shim; later on, rev-list should be taught how to
> > print the types of objects it finds in a format similar to cat-file's.
> >
> > Before this commit, the output from rev-list needed to be massaged
> > before being piped to cat-file, like so:
> >
> >   git rev-list --objects HEAD | cut -f 1 -d ' ' \
> > | git cat-file --batch-check
> 
> Write this with '|' at the end of the first line; that way the shell
> knows you haven't finished your sentence without any backslash.

Oh cool. Will do.

> 
> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index 660172b014..7e2598fd22 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -49,6 +49,7 @@ static const char rev_list_usage[] =
> >  "--objects | --objects-edge\n"
> >  "--unpacked\n"
> >  "--header | --pretty\n"
> > +"--no-object-names\n"
> 
> Ideally, this should be "--[no-]object-names", i.e. --object-names
> which may be the default when using --objects could be tweaked with
> --no-object-names and then again turned on with a --object-names
> later on the command line, following the usual "last one wins".

Sure, good point. Will fix.

> 
> > @@ -75,6 +76,9 @@ enum missing_action {
> >  };
> >  static enum missing_action arg_missing_action;
> >  
> > +/* display only the oid of each object encountered */
> > +static int arg_no_object_names;
> 
> And this would become
> 
> static int show_object_names = 1;
> 
> that can be turned off via --no-object-names (and --object-names
> would flip it on).  It would reduce the double negation brain
> twister, hopefully.
> 

I'll send a new patch today. Thanks.

 - Emily


[PATCH v4] rev-list: teach --no-object-names to enable piping

2019-06-18 Thread Emily Shaffer
Allow easier parsing by cat-file by giving rev-list an option to print
only the OID of a non-commit object without any additional information.
This is a short-term shim; later on, rev-list should be taught how to
print the types of objects it finds in a format similar to cat-file's.

Before this commit, the output from rev-list needed to be massaged
before being piped to cat-file, like so:

  git rev-list --objects HEAD | cut -f 1 -d ' ' |
git cat-file --batch-check

This was especially unexpected when dealing with root trees, as an
invisible whitespace exists at the end of the OID:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD |
xargs -I% echo "AA%AA"

Now, it can be piped directly, as in the added test case:

  git rev-list --objects --no-object-names HEAD | git cat-file --batch-check

Signed-off-by: Emily Shaffer 
Change-Id: I489bdf0a8215532e54017513ff7541d70e1b
---
Since v3, added a corresponding "--object-names" arg to pair with
"--no-object-names", and "last-one-wins" logic. Also added a test to
validate this new arg and the logic.

I did not take Junio's suggestion of naming the arg "show_object_names"
as "arg_show_object_names" better matches the existing style of
builtin/revlist.c.

In adding the test, I noticed that I had left in a test about --oid-only
that doesn't apply after the changes from v2->v3; that test is removed.

 - Emily

 builtin/rev-list.c   | 19 ++-
 t/t6000-rev-list-misc.sh | 20 
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 660172b014..301ccb970b 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -49,6 +49,7 @@ static const char rev_list_usage[] =
 "--objects | --objects-edge\n"
 "--unpacked\n"
 "--header | --pretty\n"
+"--[no-]object-names\n"
 "--abbrev= | --no-abbrev\n"
 "--abbrev-commit\n"
 "--left-right\n"
@@ -75,6 +76,9 @@ enum missing_action {
 };
 static enum missing_action arg_missing_action;
 
+/* display only the oid of each object encountered */
+static int arg_show_object_names = 1;
+
 #define DEFAULT_OIDSET_SIZE (16*1024)
 
 static void finish_commit(struct commit *commit);
@@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char 
*name, void *cb_data)
display_progress(progress, ++progress_counter);
if (info->flags & REV_LIST_QUIET)
return;
-   show_object_with_name(stdout, obj, name);
+   if (arg_show_object_names)
+   show_object_with_name(stdout, obj, name);
+   else
+   printf("%s\n", oid_to_hex(&obj->oid));
 }
 
 static void show_edge(struct commit *commit)
@@ -484,6 +491,16 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
if (skip_prefix(arg, "--missing=", &arg))
continue; /* already handled above */
 
+   if (!strcmp(arg, ("--no-object-names"))) {
+   arg_show_object_names = 0;
+   continue;
+   }
+
+   if (!strcmp(arg, ("--object-names"))) {
+   arg_show_object_names = 1;
+   continue;
+   }
+
usage(rev_list_usage);
 
}
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 0507999729..52a9e38d66 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -48,6 +48,26 @@ test_expect_success 'rev-list --objects with pathspecs and 
copied files' '
! grep one output
 '
 
+test_expect_success 'rev-list --objects --no-object-names has no space/names' '
+   git rev-list --objects --no-object-names HEAD >output &&
+   ! grep wanted_file output &&
+   ! grep unwanted_file output &&
+   ! grep " " output
+'
+
+test_expect_success 'rev-list --objects --no-object-names works with cat-file' 
'
+   git rev-list --objects --no-object-names --all >list-output &&
+   git cat-file --batch-check cat-output &&
+   ! grep missing cat-output
+'
+
+test_expect_success '--no-object-names and --object-names are last-one-wins' '
+   git rev-list --objects --no-object-names --object-names --all >output &&
+   grep wanted_file output &&
+   git rev-list --objects --object-names --no-object-names --all >output &&
+   ! grep wanted_file output
+'
+
 test_expect_success 'rev-list A..B and rev-list ^A B are the same' '
git commit --allow-empty -m another &&
git tag -a -m "annotated" v1.0 &&
-- 
2.22.0.410.gd8fdbe21b5-goog



Re: [PATCH v4] rev-list: teach --no-object-names to enable piping

2019-06-19 Thread Emily Shaffer
On Wed, Jun 19, 2019 at 07:08:14AM -0700, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > Since v3, added a corresponding "--object-names" arg to pair with
> > "--no-object-names", and "last-one-wins" logic. Also added a test to
> > validate this new arg and the logic.
> 
> Thanks for a quick turnaround (unfortunately, I was OOO yesterday
> and I am half-sick today, so please expect slow responses---sorry
> about that).
> 
> > In adding the test, I noticed that I had left in a test about --oid-only
> > that doesn't apply after the changes from v2->v3; that test is removed.
> 
> I noticed in range-diff, too.  So now --object-names can be used
> with --pretty (not that "rev-list --pretty --objects" makes much
> sense in the first place, so no point in testing that it works).

Yeah, it works. It looks weird, but it works pretty much as you'd
expect:

emilyshaffer@podkayne:~/git-second [stray-whitespace]$ tg2 rev-list
--object-names --objects --pretty=short --max-count=1 HEAD | head -n 20
commit 701c66d5f2fafe163892fa0968ce8bca041dbc92
Author: Emily Shaffer 

rev-list: teach --no-object-names to enable piping

d4b1d372d16aaff35b221afce017f90542fd9293 
41d4cd23fd97f599053a19555a173894da71e560 .clang-format
42cdc4bbfb05934bb9c3ed2fe0e0d45212c32d7a .editorconfig

emilyshaffer@podkayne:~/git-second [stray-whitespace]$ tg2 rev-list
--no-object-names --objects --pretty=short --max-count=1 HEAD | head -n
20
commit 701c66d5f2fafe163892fa0968ce8bca041dbc92
Author: Emily Shaffer 

rev-list: teach --no-object-names to enable piping

d4b1d372d16aaff35b221afce017f90542fd9293
41d4cd23fd97f599053a19555a173894da71e560
42cdc4bbfb05934bb9c3ed2fe0e0d45212c32d7a

Ah, but when I was grabbing these samples, I noticed that I didn't
update the manpage for rev-list. So please wait while I reroll again...
sorry!

 - Emily


[PATCH v5] rev-list: teach --no-object-names to enable piping

2019-06-19 Thread Emily Shaffer
Allow easier parsing by cat-file by giving rev-list an option to print
only the OID of a non-commit object without any additional information.
This is a short-term shim; later on, rev-list should be taught how to
print the types of objects it finds in a format similar to cat-file's.

Before this commit, the output from rev-list needed to be massaged
before being piped to cat-file, like so:

  git rev-list --objects HEAD | cut -f 1 -d ' ' |
git cat-file --batch-check

This was especially unexpected when dealing with root trees, as an
invisible whitespace exists at the end of the OID:

  git rev-list --objects --filter=tree:1 --max-count=1 HEAD |
xargs -I% echo "AA%AA"

Now, it can be piped directly, as in the added test case:

  git rev-list --objects --no-object-names HEAD | git cat-file --batch-check

Signed-off-by: Emily Shaffer 
Change-Id: I489bdf0a8215532e54017513ff7541d70e1b
---
Since v4, added the new options to `git help rev-list`.

 Documentation/git-rev-list.txt |  1 +
 Documentation/rev-list-options.txt | 10 ++
 builtin/rev-list.c | 19 ++-
 t/t6000-rev-list-misc.sh   | 20 
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 88609ff435..9392760b25 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -48,6 +48,7 @@ SYNOPSIS
 [ --date=]
 [ [ --objects | --objects-edge | --objects-edge-aggressive ]
   [ --unpacked ]
+  [ --object-names | --no-object-names ]
   [ --filter= [ --filter-print-omitted ] ] ]
 [ --missing= ]
 [ --pretty | --header ]
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 71a1fcc093..286fc163f1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -708,6 +708,16 @@ ifdef::git-rev-list[]
Only useful with `--objects`; print the object IDs that are not
in packs.
 
+--object-names::
+   Only useful with `--objects`; print the names of the object IDs
+   that are found. This is the default behavior.
+
+--no-object-names::
+   Only useful with `--objects`; does not print the names of the object
+   IDs that are found. This inverts `--object-names`. This flag allows
+   the output to be more easily parsed by commands such as
+   linkgit:git-cat-file[1].
+
 --filter=::
Only useful with one of the `--objects*`; omits objects (usually
blobs) from the list of printed objects.  The ''
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 660172b014..301ccb970b 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -49,6 +49,7 @@ static const char rev_list_usage[] =
 "--objects | --objects-edge\n"
 "--unpacked\n"
 "--header | --pretty\n"
+"--[no-]object-names\n"
 "--abbrev= | --no-abbrev\n"
 "--abbrev-commit\n"
 "--left-right\n"
@@ -75,6 +76,9 @@ enum missing_action {
 };
 static enum missing_action arg_missing_action;
 
+/* display only the oid of each object encountered */
+static int arg_show_object_names = 1;
+
 #define DEFAULT_OIDSET_SIZE (16*1024)
 
 static void finish_commit(struct commit *commit);
@@ -255,7 +259,10 @@ static void show_object(struct object *obj, const char 
*name, void *cb_data)
display_progress(progress, ++progress_counter);
if (info->flags & REV_LIST_QUIET)
return;
-   show_object_with_name(stdout, obj, name);
+   if (arg_show_object_names)
+   show_object_with_name(stdout, obj, name);
+   else
+   printf("%s\n", oid_to_hex(&obj->oid));
 }
 
 static void show_edge(struct commit *commit)
@@ -484,6 +491,16 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
if (skip_prefix(arg, "--missing=", &arg))
continue; /* already handled above */
 
+   if (!strcmp(arg, ("--no-object-names"))) {
+   arg_show_object_names = 0;
+   continue;
+   }
+
+   if (!strcmp(arg, ("--object-names"))) {
+   arg_show_object_names = 1;
+   continue;
+   }
+
usage(rev_list_usage);
 
}
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 0507999729..52a9e38d66 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -48,6 +48,26 @@ test_expect_success 'rev-list --objects with pathspecs and 
copied files' '
! grep one output
 '
 
+test_expect_success 'rev-list --objects --no-object-names has no space/names' '
+   git rev-list --objects -

Re: [RFC PATCH] rev-list: clarify --abbrev and --abbrev-commit usage

2019-06-19 Thread Emily Shaffer
On Wed, Jun 19, 2019 at 05:21:59PM -0400, Jeff King wrote:
> On Fri, Jun 14, 2019 at 03:56:54PM -0700, Emily Shaffer wrote:
> 
> > > Ah, I see. I don't consider "|" to indicate an exclusion to the point
> > > that the options are rejected. Only that you wouldn't want to use both,
> > > because one counteracts the other. So every "--no-foo" is mutually
> > > exclusive with "--foo" in the sense that one override the other. But the
> > > outcome is "last one wins", and not "whoops, we cannot figure out what
> > > you meant". And that's what the original:
> > > 
> > >   --abbrev= | --no-abbrev
> > > 
> > > before your patch was trying to say (and I suspect there are many other
> > > cases of "|" with this kind of last-one-wins behavior).
> > 
> > For what it's worth, in this case it's not last-one-wins - --no-abbrev
> > always wins:
> 
> Ah, thanks for pointing that; I hadn't noticed. That _is_ unlike most of
> the rest of Git. I'm tempted to say it's a bug and should be fixed, but
> I worry slightly that it could have an unexpected effect.
> 
> > I think a good solution here is to go and add --abbrev-commit=
> > without breaking support for --abbrev=; I'm a little more worried
> > about changing --no-abbrev to last-one-wins but I'll take a crack at it
> > and see what the test suite says. While I'm at it, I'll check for
> > last-one-wins with multiple instances of --abbrev[-commit]=.
> 
> I think --abbrev-commit= sounds safe enough, though I worry it may
> get a bit complicated because we'd presumably want to fall back to the
>  from --abbrev=. I'll see how your patch turns out. :)
> 
> I like the idea of changing --no-abbrev to last-one-wins, as above, but
> the test suite may not give us that much confidence. These kinds of
> cases are often not well-covered, and we're really worried about the
> wider world of random scripts people have grown over the last 10 years.
> Of course if the test suite does break horribly that might give us extra
> caution, but I'm not sure "the test suite does not break" gives us much
> confidence.

I think at that point, based on what I've seen on the list, there's a
tendency to include a config option to enable the new behavior. And I
think that might pass the tipping point of "wasted effort" for me.
The change isn't really enabling anything that was impossible before
(last-one-wins is handy, but it's legacy public behavior now for
--abbrev). So I'm kind of starting to lean towards doing nothing at all.
It's a lot of work, for a not-very-functional change, which needs to be
specially configured to even happen... So maybe I'll save my time and
work on actual bugs instead. :)

> 
> > Having done so, I'll also change the documentation here in rev-list to:
> >  --abbrev-commit[=] [--abbrev=] | --no-abbrev
> 
> Yeah, that makes sense.
> 
> -Peff

Thanks for the input, all. I think I'll drop this.

 - Emily


Re: [PATCH] documentation: add tutorial for revision walking

2019-06-19 Thread Emily Shaffer
On Wed, Jun 19, 2019 at 04:13:35AM -0400, Eric Sunshine wrote:
> On Mon, Jun 17, 2019 at 7:20 PM Emily Shaffer  wrote:
> > On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote:
> > > On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer  
> > > wrote:
> > > > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > > > +{
> > > > +   struct option options[] = {
> > > > +   OPT_END()
> > > > +   };
> > > > +
> > > > +   argc = parse_options(argc, argv, prefix, options, walken_usage, 
> > > > 0);
> > > > +
> > > > +   ...
> > >
> > > Perhaps comment out the "..." or remove it altogether to avoid having
> > > the compiler barf when the below instructions tell the reader to build
> > > the command.
> >
> > Hmm. That part I'm not so sure about. I like to use the "..." to
> > indicate where the code in the snippet should be added around the other
> > code already in the file - which I suppose it does just as clearly if
> > it's commented - but I also hope folks are not simply copy-pasting
> > blindly from the tutorial.
> >
> > It seems like including uncommented "..." in code tutorials is pretty
> > common.
> 
> You're right, and that's not what I was "complaining" about. Looking
> back at your original email, I see that I somehow got confused and
> didn't realize or (quickly) forgot that you had already presented a
> _complete_ cmd_walken() snippet just above that spot, and that the
> cmd_walken() snippet upon which I was commenting was _incomplete_,
> thus the "..." was perfectly justified. Not realizing that the
> incomplete cmd_walken() example was just that (incomplete), I
> "complained" that the following "compile the project" instructions
> would barf on "...".
> 
> Maybe I got confused because the tiny cmd_walken() snippets followed
> one another so closely (or because I got interrupted several times
> during the review), but one way to avoid that would be to present a
> single _complete_ snippet from the start, followed by a bit of
> explanation. That is, something like this:
> 
> Open up a new file `builtin/walken.c` and set up the command handler:
> 
> 
> /* "git walken" -- Part of the "My First Revision Walk" tutorial. */
> #include "builtin.h"
> 
> int cmd_walken(int argc, const char **argv, const char *prefix)
> {
> const char * const usage[] = {
> N_("git walken"),
> NULL,
> }
> struct option options[] = {
> OPT_END()
> };
> 
> argc = parse_options(argc, argv, prefix, options, usage, 0);
> 
> printf(_("cmd_walken incoming...\n"));
> return 0;
> }
> 
> 
> `usage` is the usage message presented by `git -h walken`, and
> `options` will eventually specify command-line options.

Hmm. I can say that I personally would find that much more difficult to
follow interactively, and I'd be tempted to copy-and-paste and skim
through the wall of text if I was presented with such a snippet.
However, I could also imagine the reverse - someone becoming tired of
having their hand held through a fairly straightforward implementation,
when they're perfectly capable of reading a long description and would
just like to get on with it.

I'm really curious about what others think in this scenario, since I
imagine it boils down to individual learning styles.

(Maybe we can split the difference and present a complete patch or new
function, followed by a breakdown? That would end up even more verbose
than the current approach, though.)

... Now that I'm thinking more about this, and reading some of your
later comments on this mail, I think it might be valuable to lean on the
sample patchset for complete code samples, especially if we figure a
good way to distribute the patchset near the tutorial (as Junio and I
are discussing in another branch of this thread). Then we can keep the
tutorial concise, but have the complete code available for those who
prefer to look there.

> 
> > I don't think I have a good reason to push back on this except that I
> > think "/* ... */" is ugly :)
> >
> > I'll go through and replace "..." with some actual hints about what's
> > supposed to go there; for example, here I'll replace with "/* print and
> > return */".
> 
> Seeing as my initial review comment was in error, I'm not sure that
>

Re: [PATCH] documentation: add tutorial for revision walking

2019-06-20 Thread Emily Shaffer
On Wed, Jun 19, 2019 at 08:17:29AM -0700, Junio C Hamano wrote:
> Emily Shaffer  writes:
> 
> > Maybe there's a case for storing them as a set of patch files that are
> > revision-controlled somewhere within Documentation/? There was some
> > discussion on the IRC a few weeks ago about trying to organize these
> > tutorials into their own directory to form a sort of "Git Contribution
> > 101" course, maybe it makes sense to store there?
> >
> >   Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
> >   Documentation/contributing/myfirstcontrib/sample/*.patch
> >   Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
> >   Documentation/contributing/myfirstrevwalk/sample/*.patch
> >
> > I don't love the idea of maintaining text patches with the expectation
> > that they should cleanly apply always,...
> 
> Well, I actually think the above organization does match the intent
> of the "My first contribution codelab" perfectly.  When the codebase,
> the workflow used by the project, and/or the coding or documentation
> guideline gets updated, the text that documents how to contribute to
> the project as well as the sample patches must be updated to match
> the updated reality.
> 
> I agree with you that maintaining the *.patch files to always
> cleanly apply is less than ideal.  A topic to update the sample
> patches and tutorial text may be competing with another topic that
> updates the very API the tutorials are teaching, and the sample
> patches may not apply cleanly when two topics are merged together,
> even if the "update sample patches and tutorial text" topic does
> update them to match the API at the tip of the topic branch itself.
> One thing we _could_ do is to pin the target version of the codebase
> for the sake of tutorial.  IOW, the sample/*.patch may not apply
> cleanly to the version of the tree these patches were taken from,
> but would always apply cleanly to the most recent released version
> before the last update to the tutorial, or something like that.
> 
> Also having to review the patch to sample/*.patch files will be
> unpleasant.

I wonder if we can ease some pain for both of the above issues by
including some scripts to "inflate" the patch files into a topic branch,
or figure out some more easily-reviewed (but more complicated, I
suppose) method for sending updates to the sample/*.patch files.

Imagining workflows like this:

Doing the tutorial:
 - In worktree a/.
 - Run a magic script which creates a worktree with the sample code, b/.
 - Read through a/Documentation/MyFirstContribution.txt and generate
   a/builtins/psuh.c, referring to b/builtins/psuh.c if confused.

Rebasing the tutorial patches:
 - In worktree a/.
 - Run a magic script which checks out a new branch at the last known
   good base for the patchset, then applies all the patches.
 - Now faced with, likely, a topic branch based on v (where n is
   latest release).
 - `git rebase v -x (make && ./bin-wrappers/git psuh)`
 - Interactively fix conflicts
 - Run a script to generate a magic interdiff from the old version of
   patches
 - Mail out magic interdiff to list and get approval
 - (Maybe maintainer does this when interdiff is happy? Maybe updater
   does this when review looks good?) Run a magic script to regenerate
   patches from rebased branch, and note somewhere they are based on
   v
 - Mail sample/*.patch (based on v) to list (if maintainer rolled the
   patches after interdiff approval, this step can be skipped)

(This seems to still be a lot of steps, even with the magic script..)

Alternatively, for the same process:
 Updater: Run a magic script to create topic branch based on v
   (like before)
 U: `git rebase v -x (make && ./bin-wrappers/git psuh)`
 U: Interactively fix conflicts
 U: Run a script to turn topic branch back into sample/*.patch
 U: Send email with changes to sample/*.patch (this will be ugly and
unreadable) - message ID 
 Reviewer: Run a magic script, providing  argument, which grabs the
diff-of-.patch and generates an interdiff, or a topic branch based
on v
 R: Send comments explaining where issue is (tricky to find where to
inline in the diff-of-.patch)
 U: Reroll diff-of-.patch email
 R: Accepts
 Maintainer: Applies diff-of-.patch email normally

 I suppose for the first suggestion, there ends up being quite a lot of
 onus on the maintainer, and a lot of trust that there is no difference
 between the RFC easy-to-read interdiff patchset. For the second
 suggestion, there ends up being onus on the reviewers to run some
 magical script. Maybe we can split the difference by expecting Updater
 to provide the interdiff below the --- line? Maybe in practice the
 diff-of-.patch isn't so unreadable, if it's only minor changes needed
 to bring the tutorial up to latest?

 I'm not sure there's a way to make this totally painless using email
 tools.

  - Emily


Re: [PATCH] doc: improve usage string in MyFirstContribution

2019-06-20 Thread Emily Shaffer
On Thu, Jun 20, 2019 at 12:13:15AM +0200, Christian Couder wrote:
>  SYNOPSIS
>  
>  [verse]
> -'git-psuh'
> +'git-psuh ...'

It doesn't require 1 or more args - you can run it with no args. So it
might be better suited to state the args as optional:

  'git psuh [arg]...'

>  
>  static const char * const psuh_usage[] = {
> - N_("git psuh"),
> + N_("git psuh ..."),

...and here too.

 - Emily


[PATCH v2] documentation: add tutorial for revision walking

2019-06-26 Thread Emily Shaffer
Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer 
Helped-by: Eric Sunshine 
---
Significant changes since r1 related to Eric and Junio's comments:
 - Formatting fixes (multiline comments, whitespaces, etc.)
 - Clarifications:
   - oneline options
   - `buf` arg in object walk callback
   - topo_order
   - placement of reverse setting
   - '-h' handling
 - Converted patchset to plumbing tool, using trace where appropriate
 - Removed unnecessary return values in walken_commit_walk and
   walken_object_walk
 - Added an entire section devoted to the `omitted` list
 - Tried to remove unnecessary explanatory comments in tutorial (and
   added more explanatory comments in sample code)

There's still the question of how the samples should be
source-controlled and kept fresh. I think we left it at "Maybe checking
in a stack of *.patch files would be tolerable", but it didn't sound
like anybody was quite happy with that solution.

 - Emily

 Documentation/Makefile   |   1 +
 Documentation/MyFirstRevWalk.txt | 910 +++
 2 files changed, 911 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 76f2ecfc1b..91e5da67c4 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 00..ea2f16c4b1
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,910 @@
+My First Revision Walk
+==
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, 
the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be used to determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer 
Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+
+git checkout -b revwalk origin/master
+
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+   trace_printf(_("cmd_walken incoming...\n"));
+   return 0;
+}
+
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a 
"porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test sui

[RFC PATCH v2 04/13] walken: add handler to git_config

2019-06-26 Thread Emily Shaffer
For now, we have no configuration options we want to set up for
ourselves, but in the future we may need to. At the very least, we
should invoke git_default_config() for each config option; we will do so
inside of a skeleton config callback so that we know where to add
configuration handling later on when we need it.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index daae4f811a..2474a0d7b2 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "config.h"
 #include "parse-options.h"
 
 /*
@@ -24,11 +25,36 @@ const char * const walken_usage[] = {
 static void init_walken_defaults(void)
 {
/*
-* We don't actually need the same components `git log` does; leave this
-* empty for now.
+* We don't use any other components or have settings to initialize, so
+* leave this empty.
 */
 }
 
+/*
+ * This method will be called back by git_config(). It is used to gather values
+ * from the configuration files available to Git.
+ *
+ * Each time git_config() finds a configuration file entry, it calls this
+ * callback. Then, this function should compare it to entries which concern us,
+ * and make settings changes as necessary.
+ *
+ * If we are called with a config setting we care about, we should use one of
+ * the helpers which exist in config.h to pull out the value for ourselves, 
i.e.
+ * git_config_string(...) or git_config_bool(...).
+ *
+ * If we don't match anything, we should pass it along to another stakeholder
+ * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our 
case,
+ * we'll ignore everybody else.
+ */
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+   /*
+* For now, we don't have any custom configuration, so fall back on the
+* default config.
+*/
+   return git_default_config(var, value, cb);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -43,6 +69,8 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 
init_walken_defaults();
 
+   git_config(git_walken_config, NULL);
+
/*
 * This line is "human-readable" and we are writing a plumbing command,
 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 00/13] example implementation of revwalk tutorial

2019-06-26 Thread Emily Shaffer
Since r1, made some significant changes.

 - Added a commit for counting the 'omitted' list, to match the new
   section added to the tutorial.
 - Added significant comments to allow the sample to better stand on its
   own.
 - Fixed style issues (die() formatting, etc)
 - Distinguished between human- and machine-readable output with
   trace_printf() and printf(), to turn the command into plumbing.
 - More changes as mentioned in the tutorial patch.

Thanks.
 - Emily

Emily Shaffer (13):
  walken: add infrastructure for revwalk demo
  walken: add usage to enable -h
  walken: add placeholder to initialize defaults
  walken: add handler to git_config
  walken: configure rev_info and prepare for walk
  walken: perform our basic revision walk
  walken: filter for authors from gmail address
  walken: demonstrate various topographical sorts
  walken: demonstrate reversing a revision walk list
  walken: add unfiltered object walk from HEAD
  walken: add filtered object walk
  walken: count omitted objects
  walken: reverse the object walk order

 Makefile |   1 +
 builtin.h|   1 +
 builtin/walken.c | 290 +++
 git.c|   1 +
 4 files changed, 293 insertions(+)
 create mode 100644 builtin/walken.c

-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo

2019-06-26 Thread Emily Shaffer
Begin to add scaffolding for `git walken`, a toy command which we will
teach to perform a number of revision walks, in order to demonstrate the
mechanics of revision walking for developers new to the Git project.

This commit is the beginning of an educational series which correspond
to the tutorial in Documentation/MyFirstRevWalk.txt.

Signed-off-by: Emily Shaffer 
Change-Id: I64297621919412f54701e111366e99c4ef0feae3
---
 Makefile |  1 +
 builtin.h|  1 +
 builtin/walken.c | 13 +
 3 files changed, 15 insertions(+)
 create mode 100644 builtin/walken.c

diff --git a/Makefile b/Makefile
index f58bf14c7b..5bac1dbf8d 100644
--- a/Makefile
+++ b/Makefile
@@ -1137,6 +1137,7 @@ BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
 BUILTIN_OBJS += builtin/verify-tag.o
+BUILTIN_OBJS += builtin/walken.o
 BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
diff --git a/builtin.h b/builtin.h
index ec7e0954c4..c919736c36 100644
--- a/builtin.h
+++ b/builtin.h
@@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char 
*prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 int cmd_version(int argc, const char **argv, const char *prefix);
+int cmd_walken(int argc, const char **argv, const char *prefix);
 int cmd_whatchanged(int argc, const char **argv, const char *prefix);
 int cmd_worktree(int argc, const char **argv, const char *prefix);
 int cmd_write_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/walken.c b/builtin/walken.c
new file mode 100644
index 00..d2772a0e8f
--- /dev/null
+++ b/builtin/walken.c
@@ -0,0 +1,13 @@
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+   trace_printf(_("cmd_walken incoming...\n"));
+   return 0;
+}
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 02/13] walken: add usage to enable -h

2019-06-26 Thread Emily Shaffer
It's expected that Git commands support '-h' in order to provide a
consistent user experience (and this expectation is enforced by the
test suite). '-h' is captured by parse_options() by default; in order to
support this flag, we add a short usage text to walken.c and invoke
parse_options().

With this change, we can now add cmd_walken to the builtins set and
expect tests to pass, so we'll do so - cmd_walken is now open for
business.

Signed-off-by: Emily Shaffer 
Change-Id: I2919dc1efadb82acb335617ea24371c84b03bbce
---
 builtin/walken.c | 25 +
 git.c|  1 +
 2 files changed, 26 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index d2772a0e8f..9eea51ff70 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,9 +5,34 @@
  */
 
 #include "builtin.h"
+#include "parse-options.h"
+
+/*
+ * All builtins are expected to provide a usage to provide a consistent user
+ * experience.
+ */
+const char * const walken_usage[] = {
+   N_("git walken"),
+   NULL,
+};
 
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
+   struct option options[] = {
+   OPT_END()
+   };
+
+   /*
+* parse_options() handles showing usage if incorrect options are
+* provided, or if '-h' is passed.
+*/
+   argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+   /*
+* This line is "human-readable" and we are writing a plumbing command,
+* so we localize it and use the trace library to print only when
+* the GIT_TRACE environment variable is set.
+*/
trace_printf(_("cmd_walken incoming...\n"));
return 0;
 }
diff --git a/git.c b/git.c
index c2eec470c9..2a7fb9714f 100644
--- a/git.c
+++ b/git.c
@@ -601,6 +601,7 @@ static struct cmd_struct commands[] = {
{ "verify-pack", cmd_verify_pack },
{ "verify-tag", cmd_verify_tag, RUN_SETUP },
{ "version", cmd_version },
+   { "walken", cmd_walken, RUN_SETUP },
{ "whatchanged", cmd_whatchanged, RUN_SETUP },
{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
{ "write-tree", cmd_write_tree, RUN_SETUP },
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 07/13] walken: filter for authors from gmail address

2019-06-26 Thread Emily Shaffer
In order to demonstrate how to create grep filters for revision walks,
filter the walk performed by cmd_walken() to print only commits which
are authored by someone with a gmail address.

This commit demonstrates how to append a grep pattern to a
rev_info.grep_filter, to teach new contributors how to create their own
more generalized grep filters during revision walks.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 335dcb6b21..da2d197914 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "grep.h"
 
 /*
  * All builtins are expected to provide a usage to provide a consistent user
@@ -28,10 +29,8 @@ const char * const walken_usage[] = {
  */
 static void init_walken_defaults(void)
 {
-   /*
-* We don't use any other components or have settings to initialize, so
-* leave this empty.
-*/
+   /* Needed by our grep filter. */
+   init_grep_defaults(the_repository);
 }
 
 /*
@@ -60,6 +59,10 @@ static void final_rev_info_setup(int argc, const char 
**argv, const char *prefix
 * setup_revisions(argc, argv, rev, &opt);
 */
 
+   /* Add a grep pattern to the author line in the header. */
+   append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, 
"gmail");
+   compile_grep_patterns(&rev->grep_filter);
+
/* Let's force oneline format. */
get_commit_format("oneline", rev);
rev->verbose_header = 1;
@@ -86,10 +89,7 @@ static void final_rev_info_setup(int argc, const char 
**argv, const char *prefix
  */
 static int git_walken_config(const char *var, const char *value, void *cb)
 {
-   /*
-* For now, we don't have any custom configuration, so fall back on the
-* default config.
-*/
+   grep_config(var, value, cb);
return git_default_config(var, value, cb);
 }
 
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 03/13] walken: add placeholder to initialize defaults

2019-06-26 Thread Emily Shaffer
Eventually, we will want a good place to initialize default variables
for use during our revision walk(s) in `git walken`. For now, there's
nothing to do here, but let's add the scaffolding so that it's easy to
tell where to put the setup later on.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 9eea51ff70..daae4f811a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -16,6 +16,19 @@ const char * const walken_usage[] = {
NULL,
 };
 
+/*
+ * Within init_walken_defaults() we can call into other useful defaults to set
+ * in the global scope or on the_repository. It's okay to borrow from other
+ * functions which are doing something relatively similar to yours.
+ */
+static void init_walken_defaults(void)
+{
+   /*
+* We don't actually need the same components `git log` does; leave this
+* empty for now.
+*/
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -28,6 +41,8 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 */
argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
+   init_walken_defaults();
+
/*
 * This line is "human-readable" and we are writing a plumbing command,
 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 12/13] walken: count omitted objects

2019-06-26 Thread Emily Shaffer
It may be illuminating to see which objects were not included within a
given filter. This also demonstrates, since filter-spec "tree:1" is
used, that the 'omitted' list contains all objects which are omitted,
not just the first objects which were omitted - that is, it continues to
dereference omitted trees and commits.

This is part of a tutorial on performing revision walks.

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index a744d042d8..dc59ff5009 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -45,7 +45,7 @@ static void init_walken_defaults(void)
  * mirror those settings in post_repo_init_init.
  */
 static void final_rev_info_setup(int argc, const char **argv, const char 
*prefix,
-   struct rev_info *rev)
+struct rev_info *rev)
 {
/*
 * Optional:
@@ -145,6 +145,11 @@ static void walken_show_object(struct object *obj, const 
char *str, void *buf)
 static void walken_object_walk(struct rev_info *rev)
 {
struct list_objects_filter_options filter_options = {};
+   struct oidset omitted;
+   struct oidset_iter oit;
+   struct object_id *oid = NULL;
+   int omitted_count = 0;
+   oidset_init(&omitted, 0);
 
printf("walken_object_walk beginning...\n");
 
@@ -181,13 +186,19 @@ static void walken_object_walk(struct rev_info *rev)
walken_show_commit, walken_show_object, NULL, NULL);
}
 
+   /* Count the omitted objects. */
+   oidset_iter_init(&omitted, &oit);
+
+   while ((oid = oidset_iter_next(&oit)))
+   omitted_count++;
+
/*
 * This print statement is designed to be script-parseable. Script
 * authors will rely on the output not to change, so we will not
 * localize this string. It will go to stdout directly.
 */
-   printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
-  blob_count, tag_count, tree_count);
+   printf("commits %d\n blobs %d\n tags %d\n trees %d omitted %d\n",
+  commit_count, blob_count, tag_count, tree_count, omitted_count);
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 08/13] walken: demonstrate various topographical sorts

2019-06-26 Thread Emily Shaffer
Order the revision walk by author or commit dates, to demonstrate how to
apply topo_sort to a revision walk.

While following the tutorial, new contributors are guided to run a walk
with each sort and compare the results.

Signed-off-by: Emily Shaffer 
Change-Id: I7ce2f3e8a77c42001293637ae209087afec4ce2c
---
 builtin/walken.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index da2d197914..6cc451324a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -69,6 +69,13 @@ static void final_rev_info_setup(int argc, const char 
**argv, const char *prefix
 
/* add the HEAD to pending so we can start */
add_head_to_pending(rev);
+   
+   /* Let's play with the sort order. */
+   rev->topo_order = 1;
+
+   /* Toggle between these and observe the difference. */
+   rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+   /* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD

2019-06-26 Thread Emily Shaffer
Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer 
Change-Id: If3b11652ba011b28d29b1c3984dac4a3f80a5f53
---
 builtin/walken.c | 88 +++-
 1 file changed, 79 insertions(+), 9 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 958923c172..42b23ba1ec 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 /*
@@ -22,6 +23,11 @@ const char * const walken_usage[] = {
NULL,
 };
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+   commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+   switch (obj->type) {
+   case OBJ_TREE:
+   tree_count++;
+   break;
+   case OBJ_BLOB:
+   blob_count++;
+   break;
+   case OBJ_TAG:
+   tag_count++;
+   break;
+   case OBJ_COMMIT:
+   die(_("unexpectedly encountered a commit in "
+ "walken_show_object\n"));
+   commit_count++;
+   break;
+   default:
+   die(_("unexpected object type %s\n"), type_name(obj->type));
+   break;
+   }
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It 
does
+ * a walk of all object types.
+ */
+static void walken_object_walk(struct rev_info *rev)
+{
+   rev->tree_objects = 1;
+   rev->blob_objects = 1;
+   rev->tag_objects = 1;
+   rev->tree_blobs_in_commit_order = 1;
+   rev->exclude_promisor_objects = 1;
+
+   if (prepare_revision_walk(rev))
+   die(_("revision walk setup failed"));
+
+   commit_count = 0;
+   tag_count = 0;
+   blob_count = 0;
+   tree_count = 0;
+
+   traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+   /*
+* This print statement is designed to be script-parseable. Script
+* authors will rely on the output not to change, so we will not
+* localize this string. It will go to stdout directly.
+*/
+   printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
+  blob_count, tag_count, tree_count);
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * does the commit walk only.
@@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
struct strbuf prettybuf = STRBUF_INIT;
 
/*
- * prepare_revision_walk() gets the final steps ready for a revision
+* prepare_revision_walk() gets the final steps ready for a revision
 * walk. We check the return value for errors.
- */
+*/
if (prepare_revision_walk(rev)) {
die(_("revision walk setup failed"));
}
 
/*
- * Now we can start the real commit walk. get_revision grabs the next
+* Now we can start the real commit walk. get_revision grabs the next
 * revision based on the contents of rev.
 */
rev->diffopt.close_file = 0;
@@ -166,13 +231,18 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
/* We can set our traversal flags here. */
rev.always_show_header = 1;
 
-   /*
-* Before we do the walk, we need to set a starting point by giving it
-* something to go in `pending` - that happens in here
-*/
-   final_rev_info_setup(argc, argv, prefix, &rev);
 
-   walken_commit_walk(&rev);
+   if (1) {
+   add_head_to_pending(&rev);
+   walken_object_walk(&rev);
+   } else {
+   /*
+* Before we do the walk, we need to set a starting point by 
giving it
+* something to go in `pending` - that happens in here
+*/
+   final_rev_info_setup(argc, argv, prefix, &r

[RFC PATCH v2 13/13] walken: reverse the object walk order

2019-06-26 Thread Emily Shaffer
Demonstrate that just like commit walks, object walks can have their
order reversed. Additionally, add verbose logging of objects encountered
in order to let contributors prove to themselves that the walk has
actually been reversed. With this commit, `git walken` becomes extremely
chatty - it's recommended to pipe the output through `head` or `tail` or
to redirect it into a file.

Signed-off-by: Emily Shaffer 
Change-Id: I91883b209a61ae4d87855878291e487fe36220c4
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index dc59ff5009..37b02887a5 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -112,11 +112,13 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
 
 static void walken_show_commit(struct commit *cmt, void *buf)
 {
+   printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
commit_count++;
 }
 
 static void walken_show_object(struct object *obj, const char *str, void *buf)
 {
+   printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
switch (obj->type) {
case OBJ_TREE:
tree_count++;
@@ -158,6 +160,7 @@ static void walken_object_walk(struct rev_info *rev)
rev->tag_objects = 1;
rev->tree_blobs_in_commit_order = 1;
rev->exclude_promisor_objects = 1;
+   rev->reverse = 1;
 
if (prepare_revision_walk(rev))
die(_("revision walk setup failed"));
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 11/13] walken: add filtered object walk

2019-06-26 Thread Emily Shaffer
Demonstrate how filter specs can be used when performing a revision walk
of all object types. In this case, tree depth is used. Contributors who
are following the revision walking tutorial will be encouraged to run
the revision walk with and without the filter in order to compare the
number of objects seen in each case.

Signed-off-by: Emily Shaffer 
Change-Id: I6d22ba153c1afbc780c261c47f1fa03ea478b5ed
---
 builtin/walken.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 42b23ba1ec..a744d042d8 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "pretty.h"
 #include "line-log.h"
 #include "list-objects.h"
+#include "list-objects-filter-options.h"
 #include "grep.h"
 
 /*
@@ -143,6 +144,10 @@ static void walken_show_object(struct object *obj, const 
char *str, void *buf)
  */
 static void walken_object_walk(struct rev_info *rev)
 {
+   struct list_objects_filter_options filter_options = {};
+
+   printf("walken_object_walk beginning...\n");
+
rev->tree_objects = 1;
rev->blob_objects = 1;
rev->tag_objects = 1;
@@ -157,7 +162,24 @@ static void walken_object_walk(struct rev_info *rev)
blob_count = 0;
tree_count = 0;
 
-   traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+   if (1) {
+   /* Unfiltered: */
+   trace_printf(_("Unfiltered object walk.\n"));
+   traverse_commit_list(rev, walken_show_commit,
+   walken_show_object, NULL);
+   } else {
+   trace_printf(_("Filtered object walk with filterspec "
+   "'tree:1'.\n"));
+   /*
+* We can parse a tree depth of 1 to demonstrate the kind of
+* filtering that could occur during various operations (see
+* `git help rev-list` and read the entry on `--filter`).
+*/
+   parse_list_objects_filter(&filter_options, "tree:1");
+
+   traverse_commit_list_filtered(&filter_options, rev,
+   walken_show_commit, walken_show_object, NULL, NULL);
+   }
 
/*
 * This print statement is designed to be script-parseable. Script
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk

2019-06-26 Thread Emily Shaffer
`struct rev_info` is what's used by the struct itself.
`repo_init_revisions()` initializes the struct; then we need to set it
up for the walk we want to perform, which is done in
`final_rev_info_setup()`.

The most important step here is adding the first object we want to walk
to the pending array. Here, we take the easy road and use
`add_head_to_pending()`; there is also a way to do it with
`setup_revision_opt()` and `setup_revisions()` which we demonstrate but
do not use. If we were to forget this step, the walk would do nothing -
the pending queue would be checked, determined to be empty, and the walk
would terminate immediately.

Signed-off-by: Emily Shaffer 
Change-Id: I76754b740227cf17a449f3f536dbbe37031e6f9a
---
 builtin/walken.c | 50 
 1 file changed, 50 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 2474a0d7b2..c463eca843 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "revision.h"
 #include "config.h"
 #include "parse-options.h"
 
@@ -30,6 +31,40 @@ static void init_walken_defaults(void)
 */
 }
 
+/*
+ * cmd_log calls a second set of init after the repo_init_revisions call. We'll
+ * mirror those settings in post_repo_init_init.
+ */
+static void final_rev_info_setup(int argc, const char **argv, const char 
*prefix,
+   struct rev_info *rev)
+{
+   /*
+* Optional:
+* setup_revision_opt is used to pass options to the setup_revisions()
+* call. It's got some special items for submodules and other types of
+* optimizations, but for now, we'll just point it to HEAD and call it
+* good. First we should make sure to reset it. This is useful for more
+* complicated stuff but a decent shortcut for the first pass is
+* add_head_to_pending().
+*/
+
+   /*
+* struct setup_revision_opt opt;
+
+* memset(&opt, 0, sizeof(opt));
+* opt.def = "HEAD";
+* opt.revarg_opt = REVARG_COMMITTISH;
+* setup_revisions(argc, argv, rev, &opt);
+*/
+
+   /* Let's force oneline format. */
+   get_commit_format("oneline", rev);
+   rev->verbose_header = 1;
+
+   /* add the HEAD to pending so we can start */
+   add_head_to_pending(rev);
+}
+
 /*
  * This method will be called back by git_config(). It is used to gather values
  * from the configuration files available to Git.
@@ -61,6 +96,8 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   struct rev_info rev;
+
/*
 * parse_options() handles showing usage if incorrect options are
 * provided, or if '-h' is passed.
@@ -71,6 +108,19 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 
git_config(git_walken_config, NULL);
 
+   /*
+* Time to set up the walk. repo_init_revisions sets up rev_info with
+* the defaults, but then you need to make some configuration settings
+* to make it do what's special about your walk.
+*/
+   repo_init_revisions(the_repository, &rev, prefix);
+
+   /*
+* Before we do the walk, we need to set a starting point by giving it
+* something to go in `pending` - that happens in here
+*/
+   final_rev_info_setup(argc, argv, prefix, &rev);
+
/*
 * This line is "human-readable" and we are writing a plumbing command,
 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 06/13] walken: perform our basic revision walk

2019-06-26 Thread Emily Shaffer
Add the final steps needed and implement the walk loop itself. We add a
method walken_commit_walk() which performs the final setup to revision.c
and then iterates over commits from get_revision().

This basic walk only prints the subject line of each commit in the
history. It is nearly equivalent to `git log --oneline`.

Signed-off-by: Emily Shaffer 
Change-Id: If6dc5f3c9d14df077b99e42806cf790c96191582
---
 builtin/walken.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index c463eca843..335dcb6b21 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,8 +6,11 @@
 
 #include "builtin.h"
 #include "revision.h"
+#include "commit.h"
 #include "config.h"
 #include "parse-options.h"
+#include "pretty.h"
+#include "line-log.h"
 
 /*
  * All builtins are expected to provide a usage to provide a consistent user
@@ -90,6 +93,41 @@ static int git_walken_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
+/*
+ * walken_commit_walk() is invoked by cmd_walken() after initialization. It
+ * does the commit walk only.
+ */
+static void walken_commit_walk(struct rev_info *rev)
+{
+   struct commit *commit;
+   struct strbuf prettybuf = STRBUF_INIT;
+
+   /*
+ * prepare_revision_walk() gets the final steps ready for a revision
+* walk. We check the return value for errors.
+ */
+   if (prepare_revision_walk(rev)) {
+   die(_("revision walk setup failed"));
+   }
+
+   /*
+ * Now we can start the real commit walk. get_revision grabs the next
+* revision based on the contents of rev.
+*/
+   rev->diffopt.close_file = 0;
+   while ((commit = get_revision(rev))) {
+   if (!commit)
+   continue;
+   strbuf_reset(&prettybuf);
+   pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+   /*
+* We expect this part of the output to be machine-parseable -
+* one commit message per line - so we must not localize it.
+*/
+   puts(prettybuf.buf);
+   }
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
struct option options[] = {
@@ -115,12 +153,17 @@ int cmd_walken(int argc, const char **argv, const char 
*prefix)
 */
repo_init_revisions(the_repository, &rev, prefix);
 
+   /* We can set our traversal flags here. */
+   rev.always_show_header = 1;
+
/*
 * Before we do the walk, we need to set a starting point by giving it
 * something to go in `pending` - that happens in here
 */
final_rev_info_setup(argc, argv, prefix, &rev);
 
+   walken_commit_walk(&rev);
+
/*
 * This line is "human-readable" and we are writing a plumbing command,
 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog



[RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list

2019-06-26 Thread Emily Shaffer
The final installment in the tutorial about sorting revision walk
outputs. This commit reverses the commit list, so that we see newer
commits last (handy since we aren't using a pager).

It's important to note that rev->reverse needs to be set after
add_head_to_pending() or before setup_revisions(). (This is mentioned in
the accompanying tutorial.)

Signed-off-by: Emily Shaffer 
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 6cc451324a..958923c172 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -69,6 +69,9 @@ static void final_rev_info_setup(int argc, const char **argv, 
const char *prefix
 
/* add the HEAD to pending so we can start */
add_head_to_pending(rev);
+
+   /* Reverse the order */
+   rev->reverse = 1;

/* Let's play with the sort order. */
rev->topo_order = 1;
-- 
2.22.0.410.gd8fdbe21b5-goog



  1   2   >