On Wed, Mar 22, 2017 at 2:58 PM, John P. Linderman <jpl....@gmail.com> wrote: > I used to use LC_ALL=C, but, as I vaguely recall, it got in the way of > dealing with UNICODE. I tried a couple LC values aimed at UNICODE and the > US, but something always went pear-shaped. I finally give up. I am perfectly > happy to suffer a tiny bit of performance, to have most things work without > thinking. A factor of 6, or 35, is not tiny, since I use grep and friends > intensely. That's how I discovered the performance problem to begin with. > Anyway, thank you for fixing my problem. I suspect that many of us pioneers > (using UNIX since 1973) have '[0-9]' wired into our fingers. > > On Wed, Mar 22, 2017 at 2:01 PM, Paul Eggert <egg...@cs.ucla.edu> wrote: >> >> On 03/22/2017 05:44 AM, John P. Linderman wrote: >>> >>> That puts the runtimes on equal footing: >>> >> In my measurements, P[0-9] is still a tiny bit slower if one is using >> glibc regex, due to a performance problem in glibc. You can work around it >> by configuring --with-included-regex. It's probably not worth worrying >> about, though. >> >> By the way, using LC_ALL=C should help avoid performance problems like >> these in the future, if all you're doing is something where single-byte >> pattern matching suffices.
I've just pulled that gnulib change into grep's repository with the attached, along with a NEWS update:
From e2b7253524b92c316e2f51fc1998b8595554e777 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@fb.com> Date: Tue, 21 Mar 2017 20:19:49 -0700 Subject: [PATCH] gnulib: update to latest for dfa [0-9] performance improvement This pulls in the following change that is very relevant to grep: commit 6afba02d7869d39ed7f61981045ddbdcb2814101 Author: Paul Eggert <egg...@cs.ucla.edu> dfa: make [0-9] faster in non-C locales * gnulib: Update to latest. * NEWS (Improvements): Describe the effect on grep. --- NEWS | 5 +++++ gnulib | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index c387aef..0a8cbbc 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,11 @@ GNU grep NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Improvements + + grep '[0-9]' is now just as fast as grep '[[:digit:]]' when run + in a multi-byte locale. Before, it was several times slower. + ** Changes in behavior The following changes affect only MS-Windows platforms. First, the diff --git a/gnulib b/gnulib index fad631e..bd78ca3 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit fad631e74dd6c1f0c7fadd99bad1b4b732f6eeb8 +Subproject commit bd78ca3d3d7b5ec2679af1ea3d23e278c81e01d8 -- 2.9.3