On Wed, May 15, 2019 at 08:11:52PM -0700, Jonathan Nieder wrote:
> Hi,
>
> Emily Shaffer wrote:
>
> > grep_buffer creates a struct grep_source gs and calls grep_source()
> > with it. However, gs.name is null, which eventually produces a
> > segmentation fault in
> > grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> > not set.
>
> Thanks for catching it. Taking a step back, I think the problem is in
> the definition of "struct grep_source":
>
> struct grep_source {
> char *name;
>
> enum grep_source_type {
> GREP_SOURCE_OID,
> GREP_SOURCE_FILE,
> GREP_SOURCE_BUF,
> } type;
> void *identifier;
>
> ...
> };
>
> What is the difference between a 'name' and an 'identifier'? Who is
> responsible for free()ing them? Can they be NULL? This is pretty
> underdocumented for a public type.
>
> If we take the point of view that 'name' should always be non-NULL,
> then I wonder:
>
> - can we document that?
> - can grep_source_init enforce that?
Today grep_source_init() defaults to NULL. So if we decide that 'name'
should be non-NULL it will be somewhat changing the intent.
void grep_source_init(struct grep_source *gs, enum grep_source_type
type,
const char *name, const char *path,
const void *identifier)
{
gs->type = type;
gs->name = xstrdup_or_null(name);
...
> - can we take advantage of that in grep_source as well, as a sanity
> check that the grep_source has been initialized?
> - while we're here, can we describe what the field is used for
> (prefixing output with context before a ":", I believe)?
In general the documentation for grep.[ch] is pretty light. There aren't
any header comments and `Documentation/technical/api-grep.txt` is a
todo. So I agree that we should document it anywhere we can.
> > Jonathan Nieder proposed alternatively adding some check to grep_source()
> > to ensure that if opt->status_only is unset, gs->name must be non-NULL
> > (and yell about it if not), as well as some extra comments indicating
> > what assumptions are made about the data coming into functions like
> > grep_source(). I'm fine with that as well (although I'm not sure it
> > makes sense semantically to require a name which the user probably can't
> > easily set, or else ban the user from printing LOC during grep). Mostly
> > I'm happy with any solution besides a segfault with no error logging :)
>
> Let's compare the two possibilities.
>
> The advantage of "(in memory)" is that it Just Works, which should
> make a nicer development experience with getting a new caller mostly
> working on the way to getting them working just the way you want.
>
> The disadvantage is that if we start outputting that in production, we
> and static analyzers are less likely to notice. In other words,
> printing "(in memory)" is leaking details to the end user that do not
> match what the end user asked for. NULL would instead produce a
> crash, prompting the author of the caller to fix it.
>
> What was particularly pernicious about this example is that the NULL
> dereference only occurs if the grep has a match. So I suppose I'm
> leaning toward (in addition to adding some comments to document the
> struct) adding a check like
>
> if (!gs->name && !opt->status_only)
> BUG("grep calls that could print name require name");
>
> to grep_source.
Why not both? :)
But seriously, I am planning to push a second patch with both, per
Junio's reply.
I'll consider the documentation out of scope for now since I'm not sure
I know enough about grep.[ch]'s intent or history to document anything
yet. :)
>
> That would also sidestep the question of whether this debugging aid
> should be translated. :)
>
> Sensible?
>
> Thanks,
> Jonathan