tags 446270 + patch thanks On Sun, Nov 23, 2008 at 05:36:48PM +0100, Peter De Wachter wrote: >The root cause of this bug is the use of mbtowc in 64-egf-speedup.patch >and 67-w.patch. These patches try to use mbtowc to look at the >character before and after the match to check if the match is a whole >word. But when a binary file is being grepped, mbtowc gets passed >random junk rather than a valid UTF-8 character. As a consequence, its >internal state gets messed up, and you get nonsense for the following >matches. The fix is to use mbrtowc so you can reset its state. A patch >is attached. > >65-dfa-optional.patch is a red herring. I guess that patch just exposes >the bug because it causes grep to use a different code path. But you >get the same bug with grep -F, which is not touched by that patch. > >-- >Peter De Wachter
>--- a/build-tree/grep-2.5.3/src/search.c
>+++ b/build-tree/grep-2.5.3/src/search.c
>@@ -502,7 +502,7 @@
> }
> else
> s = last_char;
>- mr = mbtowc (&pwc, s, match - s);
>+ mr = mbrtowc (&pwc, s, match - s, &mbs);
> if (mr <= 0)
> {
> memset (&mbs, '\0', sizeof (mbstate_t));
>@@ -531,8 +531,8 @@
> wchar_t nwc;
> int mr;
>
>- mr = mbtowc (&nwc, buf + start + len,
>- end - buf - start - len - 1);
>+ mr = mbrtowc (&nwc, buf + start + len,
>+ end - buf - start - len - 1,
>&mbs);
> if (mr <= 0)
> {
> memset (&mbs, '\0', sizeof (mbstate_t));
>@@ -941,7 +941,7 @@
> }
> else
> s = last_char;
>- mr = mbtowc (&pwc, s, beg - s);
>+ mr = mbrtowc (&pwc, s, beg - s, &mbs);
> if (mr <= 0)
> memset (&mbs, '\0', sizeof (mbstate_t));
> else if ((iswalnum (pwc) || pwc == L'_')
>@@ -959,7 +959,7 @@
> wchar_t nwc;
> int mr;
>
>- mr = mbtowc (&nwc, beg + len, buf + size - beg - len);
>+ mr = mbrtowc (&nwc, beg + len, buf + size - beg - len, &mbs);
> if (mr <= 0)
> {
> memset (&mbs, '\0', sizeof (mbstate_t));
Peter De Wachter, thank you for the patch.
Nicolas François, when you have a chance, please review the patch above.
signature.asc
Description: Digital signature

