bug#17576: [PATCH] dfa: speed-up at initial state

2014-09-28 Thread Norihiro Tanaka
Paul Eggert wrote: > Thanks, I finally got around to looking at this and got similar > performance results to yours. That __attribute__((noinline)) bothers > me, though, as it's not portable and is a bit inelegant. I figured > out a different way to avoid the inlining, and tweaked the commentary

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-28 Thread Zoltán Herczeg
>> In the regex world, matching performance is the key aspect of an engine > >Absolutely. That's why we're having this discussion: libpcre is slow when >matching binary data. For me the question is whether binary search needs to supported on PCRE level. There are other questions like this. Peop

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-28 Thread Paul Eggert
Zoltán Herczeg wrote: For me the question is whether binary search needs to supported on PCRE level. It's purely a performance question. GNU grep already uses libpcre to search binary data, and it works now. It's just slow, that all. I'm willing to live with this, and tell users "Sorry, b

bug#18454: Improve performance when -P (PCRE) is used in UTF-8 locales

2014-09-28 Thread Jim Meyering
Nice! I didn't know about _Pragma. It's much better to encapsulate that, keeping the #pragma directives out of function bodies.

bug#18580: [PATCH] dfa: check end of an input buffer after a transition in non-UTF8 multibyte locales

2014-09-28 Thread Norihiro Tanaka
If a state has neither ANYCHAR nor MBCSET and next character is eolbyte, the next state is -1. So exit loop, checked whether a position is end of buffer or not. However, if a state has either ANYCHAR or MBCSET, even if next character is eolbyte, next state mayn't be -1. So we must check whether

bug#18580: [PATCH] dfa: check end of an input buffer after a transition in non-UTF8 multibyte locales

2014-09-28 Thread Paul Eggert
Thanks, can you provide a test case that illustrates the problem? We could add it to the test suite.

bug#18580: [PATCH] dfa: check end of an input buffer after a transition in non-UTF8 multibyte locales

2014-09-28 Thread Paul Eggert
Also, there are two calls to transit_state but that patch affects only one of them. Why shouldn't both calls be patched?

bug#18580: [PATCH] dfa: check end of an input buffer after a transition in non-UTF8 multibyte locales

2014-09-28 Thread Norihiro Tanaka
Thanks for the review. > Thanks, can you provide a test case that illustrates the problem? We > could add it to the test suite. I checked both values of `p' and `end' after first transit_state in following operation with GDB, but I can't have generated core dump by this bug yet. $ echo | env