bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-27 Thread Jim Meyering
On Thu, Sep 25, 2014 at 5:23 PM, Paul Eggert wrote: > Thanks for looking into that. The attached patches solve those performance > problems for me. I've pushed this follow-up patch to suppress a new warning: 0001-maint-suppress-a-false-positive-Wcast-align-warning.patch Description: Binary dat

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-27 Thread Zoltán Herczeg
Hi, >Sorry, I assume you meant \x9c here? Anyway, the point is that >conceptually you walk through the input byte sequence left-to-right, >converting it to characters as you go, and if you encounter an encoding >error in the process you convert the error to the corresponding >"character" outs

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-27 Thread Paul Eggert
Zoltán Herczeg wrote: He said 'I still want "." to match a single (valid) UTF-8 character.' That's what the GNU matchers do, yes. '.' does not match an invalid byte. It's a reasonable default. If you have some users who want '.' to match an invalid byte, you can add a flag for them, just a

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-27 Thread Paul Eggert
Jim Meyering wrote: I've pushed this follow-up patch to suppress a new warning: Thanks, I expect I didn't get that warning because I built on x86-64, which allows unaligned accesses so GCC doesn't complain. (Incidentally, I had already tried modifying the code to exploit the fact that unali

bug#17576: [PATCH] dfa: speed-up at initial state

2014-09-27 Thread Paul Eggert
Norihiro Tanaka wrote: The test case "k" is 50% faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by two changes. Thanks, I finally got around to looking at this and got similar performance results to yours. That __attribute__((noinline)) bothers me, though, as it's not p