Jim Meyering wrote: > Norihirio Tanaka wrote: >> URL: >> <http://savannah.gnu.org/patch/?7134> >> >> Summary: Patch for is_mb_middle in searchutil.c >> Project: grep >> Submitted by: noritnk >> Submitted on: 2010年03月26日 13時00分00秒 >> Category: None >> Priority: 5 - Normal >> Status: None >> Privacy: Public >> Assigned to: None >> Originator Email: >> Open/Closed: Open >> Discussion Lock: Any >> >> Details: >> >> I seem is_mb_middle has two bugs. >> This patch (merged) will be corrected both their bugs. >> >> 1. fgrep (2.6.1) hangs on a pattern with invalid sequence. >> >> ==> See attachment `fgrep-hang'. >> >> 2. A complete multibyte string matches with incomplate >> multibyte pattern with truncated octet character in fgrep (2.6.1). >> >> ==> See attachment `truncated-character. > > Thank you for the patch and test cases! > > I've adjusted your test scripts slightly, hooked them > up via tests/Makefile.am and added most of the commit logs. > I confirm that each test exercises a bug that is fixed by the patch. > I'll study your patch over the weekend. > Hmm... or maybe not. Continue reading...
I could not risk studying your patch (don't want to have to wait for the copyright assignment process), so wrote my own. Since I've had to modify the test case (with your patch, grep printed the line, with mine it does not), I gather that our solutions are fundamentally different. In addition, I've changed the test to timeout after 10 seconds when run against a version of grep with the flaw. >From 4767f2f87399987afed8787de5cd4973358f76cd Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Sun, 28 Mar 2010 15:33:10 +0200 Subject: [PATCH 1/2] grep -F: avoid infinite loop when searching for incomplete MB character Searching for an incomplete non-prefix of a multi-byte character should find no match. Just as these print nothing, printf '\357\274\241\357\274\241\n' \ | perl -CIO -ne '/\241\357/ and print' printf '\357\274\241\n' | perl -CIO -ne '/\274\241/ and print' printf '\357\274\241\n' | perl -CIO -ne '/\241/ and print' printf '\357\274\241\n' | perl -CIO -ne '/\274/ and print' These should also print nothing, but with grep-2.6 and grep-2.6.1, they would infloop: printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\241' printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274' printf '\357\274\241\n' | LC_ALL=en_US.UTF-8 src/grep -F $'\274\241' * src/kwsearch.c (Fexecute): Don't infloop when searching for an incomplete non-prefix part of a multi-byte character. * NEWS (Bug fixes): Mention it. Reported and diagnosed by Norihiro Tanaka. --- NEWS | 3 +++ src/kwsearch.c | 3 +++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/NEWS b/NEWS index 6c9943e..91a0b42 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,9 @@ GNU grep NEWS -*- outline -*- ** Bug fixes + grep -F no longer goes into an infinite loop when it finds a match + for an incomplete (non-prefix of a) multibyte character + The --include option works once again. [bug introduced in 2.6] Using any of the --include or --exclude* options would cause a NULL diff --git a/src/kwsearch.c b/src/kwsearch.c index fa801e6..85a28bc 100644 --- a/src/kwsearch.c +++ b/src/kwsearch.c @@ -104,8 +104,11 @@ Fexecute (char const *buf, size_t size, size_t *match_size, if (offset == (size_t) -1) goto failure; #ifdef MBS_SUPPORT + char const *s0 = mb_start; if (MB_CUR_MAX > 1 && is_mb_middle (&mb_start, beg + offset, buf + size)) { + if (mb_start == s0) + goto failure; beg = mb_start - 1; continue; /* It is a part of multibyte character. */ } -- 1.7.0.3.448.g82eeb >From f68aa3d7d9d74407d02a1f3552903a0495910015 Mon Sep 17 00:00:00 2001 From: Norihiro Tanaka <[email protected]> Date: Sun, 28 Mar 2010 17:56:22 +0200 Subject: [PATCH 2/2] tests: add tests for the fgrep-infloop bug * tests/init.cfg (require_timeout_): New function. * tests/fgrep-infloop: New file. Test for the above fix. * tests/Makefile.am (TESTS): Add it. --- NEWS | 4 ++-- tests/Makefile.am | 1 + tests/fgrep-infloop | 20 ++++++++++++++++++++ tests/init.cfg | 6 ++++++ 4 files changed, 29 insertions(+), 2 deletions(-) create mode 100644 tests/fgrep-infloop diff --git a/NEWS b/NEWS index 91a0b42..2e8ef15 100644 --- a/NEWS +++ b/NEWS @@ -4,8 +4,8 @@ GNU grep NEWS -*- outline -*- ** Bug fixes - grep -F no longer goes into an infinite loop when it finds a match - for an incomplete (non-prefix of a) multibyte character + grep -F no longer goes into an infinite loop when it finds a match for an + incomplete (non-prefix of a) multibyte character. [bug introduced in 2.6] The --include option works once again. [bug introduced in 2.6] diff --git a/tests/Makefile.am b/tests/Makefile.am index f9ba21b..9e0224f 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -29,6 +29,7 @@ TESTS = \ ere.sh \ euc-mb \ fedora \ + fgrep-infloop \ file.sh \ fmbtest \ foad1 \ diff --git a/tests/fgrep-infloop b/tests/fgrep-infloop new file mode 100644 index 0000000..159baca --- /dev/null +++ b/tests/fgrep-infloop @@ -0,0 +1,20 @@ +#!/bin/sh +# This would infloop for grep-2.6.1 +: ${srcdir=.} +. "$srcdir/init.sh"; path_prepend_ ../src + +require_timeout_ + +encode() { echo "$1" | tr ABC '\357\274\241'; } + +fail=0 + +for LOC in en_US.UTF-8 $LOCALE_FR_UTF8; do + out=out1-$LOC + encode ABC \ + | LC_ALL=$LOC timeout 10s grep -F "$(encode BC)" > $out 2>&1 + test $? = 1 || fail=1 + compare $out /dev/null || fail=1 +done + +Exit $fail diff --git a/tests/init.cfg b/tests/init.cfg index 0ec60f1..6f957b3 100644 --- a/tests/init.cfg +++ b/tests/init.cfg @@ -32,3 +32,9 @@ do done test "$envvar_check_fail" = 1 && fail_ "failed to unset the above envvars" + +require_timeout_() +{ + ( timeout --version ) > /dev/null 2>&1 \ + || skip_ your system lacks the timeout program +} -- 1.7.0.3.448.g82eeb
