On Wed, Mar 22, 2017 at 2:58 PM, John P. Linderman <jpl....@gmail.com> wrote:
> I used to use LC_ALL=C, but, as I vaguely recall, it got in the way of
> dealing with UNICODE. I tried a couple LC values aimed at UNICODE and the
> US, but something always went pear-shaped. I finally give up. I am perfectly
> happy to suffer a tiny bit of performance, to have most things work without
> thinking. A factor of 6, or 35, is not tiny, since I use grep and friends
> intensely. That's how I discovered the performance problem to begin with.
> Anyway, thank you for fixing my problem. I suspect that many of us pioneers
> (using UNIX since 1973) have '[0-9]' wired into our fingers.
>
> On Wed, Mar 22, 2017 at 2:01 PM, Paul Eggert <egg...@cs.ucla.edu> wrote:
>>
>> On 03/22/2017 05:44 AM, John P. Linderman wrote:
>>>
>>> That puts the runtimes on equal footing:
>>>
>> In my measurements, P[0-9] is still a tiny bit slower if one is using
>> glibc regex, due to a performance problem in glibc. You can work around it
>> by configuring --with-included-regex. It's probably not worth worrying
>> about, though.
>>
>> By the way, using LC_ALL=C should help avoid performance problems like
>> these in the future, if all you're doing is something where single-byte
>> pattern matching suffices.

I've just pulled that gnulib change into grep's repository with the
attached, along with a NEWS update:
From e2b7253524b92c316e2f51fc1998b8595554e777 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyer...@fb.com>
Date: Tue, 21 Mar 2017 20:19:49 -0700
Subject: [PATCH] gnulib: update to latest for dfa [0-9] performance
 improvement

This pulls in the following change that is very relevant to grep:

    commit 6afba02d7869d39ed7f61981045ddbdcb2814101
    Author: Paul Eggert <egg...@cs.ucla.edu>
    dfa: make [0-9] faster in non-C locales

* gnulib: Update to latest.
* NEWS (Improvements): Describe the effect on grep.
---
 NEWS   | 5 +++++
 gnulib | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index c387aef..0a8cbbc 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU grep NEWS                                    -*- outline 
-*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Improvements
+
+  grep '[0-9]' is now just as fast as grep '[[:digit:]]' when run
+  in a multi-byte locale.  Before, it was several times slower.
+
 ** Changes in behavior

   The following changes affect only MS-Windows platforms.  First, the
diff --git a/gnulib b/gnulib
index fad631e..bd78ca3 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit fad631e74dd6c1f0c7fadd99bad1b4b732f6eeb8
+Subproject commit bd78ca3d3d7b5ec2679af1ea3d23e278c81e01d8
-- 
2.9.3

Reply via email to