On Sun, 17 Jul 2016 12:21:46 +0200 (CEST) Jens Schleusener <jens.schleuse...@t-online.de> wrote:
> First, as I mentioned in my mail, my "tests" are non-representative > and done on a server system that runs also other jobs just to get a first > impression. > > Currently I have redone some of the tests here are the more detailed results > (hopefully readable in this mail). > > OS: OpenSUSE Leap 42.1 (64-bit) > gcc: version 4.8.5 (SUSE Linux) > > Main testfile was an Apache access log file (in nearly combined log format) > with a size of 157 MB and 673623 lines that looks like: > > 66.249.78.85 - - [16/Jul/2016:00:00:02 +0200] "GET > /dox/phpMyAdmin-4.6.1-all-languages/namespacePMA_1_1libraries_1_1properties_1_1options_1_1items_1_1TextPropertyItem.html > HTTP/1.1" 410 1977 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; > +http://www.google.com/bot.html)" 0 - - > > The test command was > > time -p env LC_ALL=en_US.UTF-8 grep pattern logfile > /dev/shm/out > > The tested grep versions: > > 1: GNU grep 2.16 (OpenSuSE Leap 42.1) > 2: GNU grep 2.25 (self-compiled) > 3: GNU grep 2.25 (self-compiled with 0001-grep-use-fastmap-in-regex.patch) > > "self-compiled" means just "./configure; make; make install". > > Times are in seconds (rounded; so the sum user+sys sometimes is different > from the real value) and at least 2 measurements were done. Naturally the > output time and the current load may have had an influence (but probably not > a drastic one). > > pattern vrs real user sys > ------------ --- ---- ---- --- > \([a-b]\)\1 > 1: 24.3 23.6 0.6 > 2: 18.2 18.2 0.0 > 3: 1.9 1.9 0.0 > > \([a-b]\) > 1: 9.4 8.8 0.6 > 2: 3.4 3.4 0.0 > 3: 0.7 0.6 0.1 > > [a-b] > 1: 8.8 8.1 0.7 > 2: 3.2 3.1 0.0 > 3: 0.4 0.4 0.1 > > "GET /dox/.*-[0-9\.]*.*/.*\.html.* HTTP/1.1" 410 > 1: 7.62 7.60 0.02 > 2: 0.33 0.32 0.01 > 3: 0.29 0.28 0.01 > > No idea if that values are meaningful but as a layman I have the impression > grep version 3 is faster than 2 and 2 is faster than 1 ;-) > > >> By the way I had to remove one of the two "=" in your patch otherwise >> > >> gcc issued an error (but caution, I am a C-layman). > > > > Thanks, I fixed it. I made a mistake before sending the patch. Of > > course, "=" should be one. > > No problem, I used exact your new patch version. > > Regards > > Jens > > P.S.: OT and you are probably the wrong address: I would like to see some > "agrep" functionality in GNU grep. Thanks. I expect that this patch improves pattern starting with range, character class, collating element and equivalent class in a multibyte locale as three of until third case. forth case seems to be mainly improved by other optimization. > However, if grep uses fastmap, fails in case-fold-titlecase test. It > means that grep's behavior differ from sed and gawk, as they use fastmap, > although it seems to be a bug in regex. However, I think this patch should be suspended because of this issue. I reported it to glibc developers. https://sourceware.org/bugzilla/show_bug.cgi?id=20381