On Mon, Oct 29, 2012 at 06:47:05PM -0400, Jeff King wrote:
> On Mon, Oct 29, 2012 at 06:35:21PM -0400, Jeff King wrote:
>
> > The patch below fixes it, but it's terribly inefficient (it just detects
> > the situation and reallocates). It would be much better to disable the
> > reuse_worktree_file mmap when we populate the filespec, but it is too
> > late to pass an option; we may have already populated from an earlier
> > diffcore stage.
> >
> > I guess if we teach the whole diff code that "-G" (and --pickaxe-regex)
> > is brittle, we can disable the optimization from the beginning based on
> > the diff options. I'll take a look.
>
> Hmm. That is problematic for two reasons.
>
> 1. The whole diff call chain will have to be modified to pass the
> options around, so they can make it down to the
> diff_populate_filespec level. Alternatively, we could do some kind
> of global hack, which is ugly but would work OK in practice.
>
> 2. Reusing a working tree file is only half of the reason a filespec
> might be mmap'd. It might also be because we are literally diffing
> the working tree. "-G" was meant to be used to limit log traversal,
> but it also works to reduce the diff output for something like "git
> diff HEAD^".
>
> I really wish there were an alternate regexec interface we could use
> that took a pointer/size pair. Bleh.
Thinking on it more, my patch, hacky thought it seems, may not be the
worst solution. Here are the options that I see:
1. Use a regex library that does not require NUL termination. If we
are bound by the regular regexec interface, this is not feasible.
But the GNU implementation works on arbitrary-length buffers (you
just have to use a slightly different interface), and we already
carry it in compat. It would mean platforms which provide a working
but non-GNU regexec would have to start defining NO_REGEX.
2. Figure out a way to get one extra zero byte via mmap. If the
requested size does not fall on a page boundary, you get extra
zero-ed bytes. Unfortunately, requesting an extra byte does not
do what we want; you get SIGBUS accessing it.
3. Copy mmap'd data at point-of-use into a NUL-terminated buffer. That
way we only incur the cost when we need it.
4. Avoid mmap-ing in the first place when we are using -G or
--pickaxe-regex (e.g., by doing a big read()). At first glance,
this sounds more efficient than loading the data one way and then
making another copy. But mmap+memcpy, aside from the momentary
doubled memory requirement, is probably just as fast or faster than
calling read() repeatedly.
I am really tempted by (1).
Given that (2) does not work, unless somebody comes up with something
clever there, that would make (3) the next best choice.
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html