On Thu, Apr 14, 2016 at 11:07:09AM -0700, Junio C Hamano wrote:

> Even though a Git commit object is designed to be capable of storing
> any binary data as its payload, in practice people use it to describe
> the changes in textual form, and tools like "git log" are designed to
> treat the payload as text.
> 
> Detect and warn when we see any commit object with a NUL byte in
> it.
> 
> Note that a NUL byte in the header part is already detected as a
> grave error.  This change is purely about the message part.
> 
> Signed-off-by: Junio C Hamano <gits...@pobox.com>

Thanks, I was just reading over some of the old threads, and wondering
if it was time to resurrect this idea.

> @@ -610,6 +611,7 @@ static int fsck_commit_buffer(struct commit *commit, 
> const char *buffer,
>       struct commit_graft *graft;
>       unsigned parent_count, parent_line_count = 0, author_count;
>       int err;
> +     const char *buffer_begin = buffer;
>  
>       if (verify_headers(buffer, size, &commit->object, options))
>               return -1;

You need this "buffer_begin" because we move the "buffer" pointer
forward as we parse. But perhaps whole-buffer checks should simply go at
the top (next to verify_headers) before we start advancing the pointer.
To me, that makes the function's flow more natural.

But alternatively...

> @@ -671,6 +673,12 @@ static int fsck_commit_buffer(struct commit *commit, 
> const char *buffer,
>               if (err)
>                       return err;
>       }
> +     if (memchr(buffer_begin, '\0', size)) {
> +             err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT,
> +                          "NUL byte in the commit object body");
> +             if (err)
> +                     return err;
> +     }

Here we've parsed to the end of the headers we know about. We know
there's no NUL there, because verify_headers() would have complained.
And because the individual header parsers would have complained. So I
actually think we could check from "buffer" (of course we do still need
to record the beginning of the buffer to adjust "size" appropriately).

It's a little more efficient (we don't have to memchr over the same
bytes again). But I'd worry a little that doing it that way would
introduce coupling between this check and verify_headers(), though (so
that if the latter ever changes, our check may start missing cases).

So yet another alternative would be to include this check in
verify_headers(). It would parse to the end of the headers as now, and
then from there additionally look for a NUL in the body.

Of the three approaches, I think I like that third one. It's the most
efficient, and I think the flow is pretty clear. We'd probably want to
rename verify_headers(), though. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to