Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-05 Thread Tom Lane
Heikki Linnakangas writes: > On 09/05/2016 07:10 PM, Tom Lane wrote: >> In any case, this is getting very far afield from the current patch. >> I'm willing to add a regexp.linux.ut8.sql test file if you think it's >> important to have some canned tests that exercise this new code, but >> otherwise

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-05 Thread Heikki Linnakangas
On 09/05/2016 07:10 PM, Tom Lane wrote: Heikki Linnakangas writes: On 09/04/2016 08:44 PM, Tom Lane wrote: I guess I could follow the lead of collate.linux.utf8.sql and produce a test that's only promised to pass on one platform with one encoding, but I'm not terribly excited by that. AFAIK t

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-05 Thread Tom Lane
Heikki Linnakangas writes: > On 09/04/2016 08:44 PM, Tom Lane wrote: >> I guess I could follow the lead of collate.linux.utf8.sql and produce >> a test that's only promised to pass on one platform with one encoding, >> but I'm not terribly excited by that. AFAIK that test file does not >> get run

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-05 Thread Heikki Linnakangas
On 09/04/2016 08:44 PM, Tom Lane wrote: Heikki Linnakangas writes: On 08/23/2016 03:54 AM, Tom Lane wrote: +1 for this patch in general. Some regression test cases would be nice. I'm not sure how to write such tests without introducing insurmountable platform dependencies --- particularly on

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-04 Thread Tom Lane
I wrote: > I got tired of hearing complaints about the issue described in > this thread: > https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us > Here's a proposed fix. I've not done extensive performance testing, > but it seems to be as fast or faster than the old code in c

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-04 Thread Tom Lane
Heikki Linnakangas writes: > On 08/23/2016 03:54 AM, Tom Lane wrote: >> ! the color map for characters above MAX_SIMPLE_CHR is really a 2-D array, >> ! whose rows correspond to character ranges that are explicitly mentioned >> ! in the input, and whose columns correspond to sets of relevant locale

Re: [HACKERS] Better locale-specific-character-class handling for regexps

2016-09-04 Thread Heikki Linnakangas
On 08/23/2016 03:54 AM, Tom Lane wrote: ! That's still not quite enough, though, because of locale-dependent ! character classes such as [[:alpha:]]. In Unicode locales these classes ! may have thousands of entries that are above MAX_SIMPLE_CHR, and we ! certainly don't want to be searching larg

[HACKERS] Better locale-specific-character-class handling for regexps

2016-08-22 Thread Tom Lane
I got tired of hearing complaints about the issue described in this thread: https://www.postgresql.org/message-id/flat/24241.1329347196%40sss.pgh.pa.us Here's a proposed fix. I've not done extensive performance testing, but it seems to be as fast or faster than the old code in cases where there a