Re: crash on git diff-tree -Ganything for new files with textconv filter

Jeff King Tue, 30 Oct 2012 05:18:56 -0700

On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:

> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
> 
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> > 
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
> 
> Hmm. That is problematic for two reasons.
> 
>   1. The whole diff call chain will have to be modified to pass the
>      options around, so they can make it down to the
>      diff_populate_filespec level. Alternatively, we could do some kind
>      of global hack, which is ugly but would work OK in practice.
> 
>   2. Reusing a working tree file is only half of the reason a filespec
>      might be mmap'd. It might also be because we are literally diffing
>      the working tree. "-G" was meant to be used to limit log traversal,
>      but it also works to reduce the diff output for something like "git
>      diff HEAD^".
> 
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.


Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:

  1. Use a regex library that does not require NUL termination. If we
     are bound by the regular regexec interface, this is not feasible.
     But the GNU implementation works on arbitrary-length buffers (you
     just have to use a slightly different interface), and we already
     carry it in compat. It would mean platforms which provide a working
     but non-GNU regexec would have to start defining NO_REGEX.

  2. Figure out a way to get one extra zero byte via mmap. If the
     requested size does not fall on a page boundary, you get extra
     zero-ed bytes. Unfortunately, requesting an extra byte does not
     do what we want; you get SIGBUS accessing it.

  3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
     way we only incur the cost when we need it.

  4. Avoid mmap-ing in the first place when we are using -G or
     --pickaxe-regex (e.g., by doing a big read()). At first glance,
     this sounds more efficient than loading the data one way and then
     making another copy. But mmap+memcpy, aside from the momentary
     doubled memory requirement, is probably just as fast or faster than
     calling read() repeatedly.

I am really tempted by (1).

Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: crash on git diff-tree -Ganything for new files with textconv filter

Reply via email to