The correct course of action for grep is to defer range interpretation to regex, because otherwise you can get mismatches between regexes with backreferences and those without.
For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing result that the first regex won't match a superset of the language described by the second regex. The source of the confusion is that, even though grep's dfa.c was changed to use range checking instead of strcoll, that code is only invoked if dfaexec is called with backref = NULL, and that never happens for grep! In the end, all that's needed for RRI is compiling --with-included-regex, and in that case the patch is almost a no-op. Almost, because there are corner cases that aren't handled correctly (e.g. [a-[.e.]], or regular expressions that include a NUL character), but this can be handled separately. * NEWS: Revert paragraph introduced by commit 1078b64302. * src/dfa.c (parse_bracket_exp): Revert back to regcomp/regexec. Signed-off-by: Paolo Bonzini <bonz...@gnu.org> --- NEWS | 9 --------- src/dfa.c | 20 ++++++++++++++++++-- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index 2ff7272..0130b90 100644 --- a/NEWS +++ b/NEWS @@ -10,15 +10,6 @@ GNU grep NEWS -*- outline -*- grep (without -i) in a multibyte locale is now up to 7 times faster when processing many matched lines. - Range expressions in unibyte locales now ordinarily use the rational - range interpretation, in which [a-z] matches only lower-case ASCII - letters regardless of locale, and similarly for other ranges. (This - was already true for multibyte locales.) Portable programs should - continue to specify the C locale when using range expressions, since - these expressions have unspecified behavior in non-GNU systems and - are not yet guaranteed to use the rational range interpretation even - in GNU systems. - ** Maintenance grep's --mmap option was disabled in March of 2010, and began to diff --git a/src/dfa.c b/src/dfa.c index f7453c7..a133e03 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -1106,14 +1106,30 @@ parse_bracket_exp (void) } else { + /* Defer to the system regex library about the meaning + of range expressions. */ + regex_t re; + char pattern[6] = { '[', 0, '-', 0, ']', 0 }; + char subject[2] = { 0, 0 }; c1 = c; if (case_fold) { c1 = tolower (c1); c2 = tolower (c2); } - for (c = c1; c <= c2; c++) - setbit_case_fold_c (c, ccl); + + pattern[1] = c1; + pattern[3] = c2; + regcomp (&re, pattern, REG_NOSUB); + for (c = 0; c < NOTCHAR; ++c) + { + if ((case_fold && isupper (c))) + continue; + subject[0] = c; + if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH) + setbit_case_fold_c (c, ccl); + } + regfree (&re); } colon_warning_state |= 8; -- 1.8.5.3