This adds a failing test. I was flabbergasted to see that this test fails. The only reason it doesn't fail with Debian's patches is because they've turned off the DFA matcher altogether. Set the envvar to show it fail there, too:
$ echo AA| GREP_USE_DFA=1 LC_ALL=en_US.UTF-8 /bin/grep '\([A]\|[B]\)\{2\}' [Exit 1] $ Thanks to Paolo for the test. I've confirmed that using gawk's dfaexec solves the problem, and breaks nothing else, assuming we omit the bit where it appends a newline-sentinel to what should be a const* pointer buffer. To allow that latter, I had to make one more small change. Once I merge with recent changes and tease apart the minimal fix, I'll post the patch. In the mean time, please defer invasive changes to dfaexec. >From 720850a595f7196a778ef5ac7b9a167196df0631 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@redhat.com> Date: Sun, 7 Mar 2010 20:07:30 +0100 Subject: [PATCH] tests: add test case for dfaexec bug * tests/dfaexec-multibyte: New test. * tests/Makefile.am (TESTS): Add it. Reported by Paolo Bonzini in http://bugzilla.redhat.com/544407 --- tests/Makefile.am | 1 + tests/dfaexec-multibyte | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 0 deletions(-) create mode 100644 tests/dfaexec-multibyte diff --git a/tests/Makefile.am b/tests/Makefile.am index 1f54ec7..06460bc 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -19,6 +19,7 @@ TESTS = \ bre.sh \ case-fold-char-class \ case-fold-char-type \ + dfaexec-multibyte \ empty.sh \ ere.sh \ file.sh \ diff --git a/tests/dfaexec-multibyte b/tests/dfaexec-multibyte new file mode 100644 index 0000000..9b035fd --- /dev/null +++ b/tests/dfaexec-multibyte @@ -0,0 +1,25 @@ +#!/bin/sh +# This would fail for grep-2.5.3 +: ${srcdir=.} +. "$srcdir/init.sh"; path_prepend_ ../src + +printf 'aa\n' > exp-aa || framework_failure +printf 'ab\n' > exp-ab || framework_failure + +fail=0 + +for LOC in en_US.UTF-8 zh_CN $LOCALE_FR_UTF8; do + echo aa | LC_ALL=$LOC grep -E '([a]|[b]){2}' > out || fail=1 + compare out exp-aa || fail=1 + + echo aa | LC_ALL=$LOC grep -E '([b]|[a]){2}' > out || fail=1 + compare out exp-aa || fail=1 + + echo ab | LC_ALL=$LOC grep -E '([b]|[a]){2}' > out || fail=1 + compare out exp-ab || fail=1 + + echo ab | LC_ALL=$LOC grep -E '([a]|[b]){2}' > out || fail=1 + compare out exp-ab || fail=1 +done + +Exit $fail -- 1.7.0.2.329.gdaec6