Re: Support LIKE with nondeterministic collations

2024-11-27 Thread Peter Eisentraut
On 20.11.24 08:29, jian he wrote: in match_pattern_prefix maybe change if (expr_coll && !get_collation_isdeterministic(expr_coll)) return NIL; to if (OidIsValid(expr_coll) && !get_collation_isdeterministic(expr_coll)) return NIL; I left it like it was, because this w

Re: Support LIKE with nondeterministic collations

2024-11-19 Thread jian he
On Tue, Nov 19, 2024 at 9:51 PM Peter Eisentraut wrote: > > On 18.11.24 04:30, jian he wrote: > > we can optimize when trailing (last character) is not wildcards. > > > > SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents; > > within the for loop > > for(;;) > > { > > intcmp; > > CHE

Re: Support LIKE with nondeterministic collations

2024-11-19 Thread Peter Eisentraut
On 18.11.24 04:30, jian he wrote: we can optimize when trailing (last character) is not wildcards. SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents; within the for loop for(;;) { intcmp; CHECK_FOR_INTERRUPTS(); } pg_strncoll comparison will become Ha12foofoo a12foo

Re: Support LIKE with nondeterministic collations

2024-11-17 Thread jian he
On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut wrote: > > On 15.11.24 05:26, jian he wrote: > > /* > > * Now build a substring of the text and try to match it against > > * the subpattern. t is the start of the text, t1 is one past the > > * last byte. We start with a zero-length string. > >

Re: Support LIKE with nondeterministic collations

2024-11-15 Thread Peter Eisentraut
On 15.11.24 05:26, jian he wrote: /* * Now build a substring of the text and try to match it against * the subpattern. t is the start of the text, t1 is one past the * last byte. We start with a zero-length string. */ t1 = t t1len = tlen; for (;;) { int cmp; CHECK_FOR_INTERRUPTS(); cmp = pg_str

Re: Support LIKE with nondeterministic collations

2024-11-14 Thread jian he
On Tue, Nov 12, 2024 at 3:45 PM Peter Eisentraut wrote: > > On 11.11.24 14:25, Heikki Linnakangas wrote: > > Sadly the algorithm is O(n^2) with non-deterministic collations.Is there > > any way this could be optimized? We make no claims on how expensive any > > functions or operators are, so I sup

Re: Support LIKE with nondeterministic collations

2024-11-12 Thread Peter Eisentraut
On 11.11.24 14:25, Heikki Linnakangas wrote: Sadly the algorithm is O(n^2) with non-deterministic collations.Is there any way this could be optimized? We make no claims on how expensive any functions or operators are, so I suppose a slow implementation is nevertheless better than throwing an er

Re: Support LIKE with nondeterministic collations

2024-11-11 Thread Heikki Linnakangas
On 04/11/2024 10:26, Peter Eisentraut wrote: On 29.10.24 18:15, Jacob Champion wrote: libfuzzer is unhappy about the following code in MatchText: +    while (p1len > 0) +    { +    if (*p1 == '\\') +    { +    found_escape = true; +  

Re: Support LIKE with nondeterministic collations

2024-11-04 Thread Peter Eisentraut
On 29.10.24 18:15, Jacob Champion wrote: libfuzzer is unhappy about the following code in MatchText: +while (p1len > 0) +{ +if (*p1 == '\\') +{ +found_escape = true; +NextByte(p1, p1len); +

Re: Support LIKE with nondeterministic collations

2024-10-29 Thread Jacob Champion
On Sun, Sep 15, 2024 at 11:26 PM Peter Eisentraut wrote: > > Here is an updated patch. It is rebased over the various recent changes > in the locale APIs. No other changes. libfuzzer is unhappy about the following code in MatchText: > +while (p1len > 0) > +{ > +

Re: Support LIKE with nondeterministic collations

2024-09-15 Thread Peter Eisentraut
Here is an updated patch. It is rebased over the various recent changes in the locale APIs. No other changes. On 30.07.24 21:46, Peter Eisentraut wrote: On 27.07.24 00:32, Paul A Jungwirth wrote: On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut wrote: Here is an updated patch for this.

Re: Support LIKE with nondeterministic collations

2024-08-01 Thread Daniel Verite
Jeff Davis wrote: > > col LIKE 'smith%' collate "nd" > > > > is equivalent to: > > > > col >= 'smith' collate "nd" AND col < U&'smith\' collate "nd" > > That logic seems to assume something about the collation. If you have a > collation that orders strings by their sha256 hash,

Re: Support LIKE with nondeterministic collations

2024-07-31 Thread Jeff Davis
On Fri, 2024-05-03 at 16:58 +0200, Daniel Verite wrote: >    * Generating bounds for a sort key (prefix matching) > >    Having sort keys for strings allows for easy creation of bounds - >    sort keys that are guaranteed to be smaller or larger than any > sort >    key from a give range. For exam

Re: Support LIKE with nondeterministic collations

2024-07-30 Thread Peter Eisentraut
On 27.07.24 00:32, Paul A Jungwirth wrote: On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut wrote: Here is an updated patch for this. I took a look at this. I added some tests and found a few that give the wrong result (I believe). The new tests are included in the attached patch, along with

Re: Support LIKE with nondeterministic collations

2024-07-26 Thread Paul A Jungwirth
On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut wrote: > Here is an updated patch for this. I took a look at this. I added some tests and found a few that give the wrong result (I believe). The new tests are included in the attached patch, along with the results I expect. Here are the failures:

Re: Support LIKE with nondeterministic collations

2024-06-27 Thread Peter Eisentraut
Here is an updated patch for this. I have added some more documentation based on the discussions, including some examples taken directly from the emails here. One thing I have been struggling with a bit is the correct use of LIKE_FALSE versus LIKE_ABORT in the MatchText() code. I have made s

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut
On 03.05.24 17:47, Daniel Verite wrote: Peter Eisentraut wrote: However, off the top of my head, this definition has three flaws: (1) It would make the single-character wildcard effectively an any-number-of-characters wildcard, but only in some circumstances, which could be confusing,

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut
On 03.05.24 16:58, Daniel Verite wrote: * Generating bounds for a sort key (prefix matching) Having sort keys for strings allows for easy creation of bounds - sort keys that are guaranteed to be smaller or larger than any sort key from a give range. For example, if bounds are pro

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Daniel Verite
Peter Eisentraut wrote: > However, off the top of my head, this definition has three flaws: (1) > It would make the single-character wildcard effectively an > any-number-of-characters wildcard, but only in some circumstances, which > could be confusing, (2) it would be difficult to com

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Daniel Verite
Peter Eisentraut wrote: > Yes, certainly, and there is also no indexing support (other than for > exact matches). The ICU docs have this note about prefix matching: https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut
On 03.05.24 15:20, Robert Haas wrote: On Fri, May 3, 2024 at 4:52 AM Peter Eisentraut wrote: What the implementation does is, it walks through the pattern. It sees '_', so it steps over one character in the input string, which is '.' here. Then we have 'foo.' left to match in the input string

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Robert Haas
On Fri, May 3, 2024 at 4:52 AM Peter Eisentraut wrote: > What the implementation does is, it walks through the pattern. It sees > '_', so it steps over one character in the input string, which is '.' > here. Then we have 'foo.' left to match in the input string. Then it > takes from the pattern

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut
On 03.05.24 02:11, Robert Haas wrote: On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut wrote: On 30.04.24 14:39, Daniel Verite wrote: postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct; ?column? -- f (1 row) The first two results look fine, but the next one is

Re: Support LIKE with nondeterministic collations

2024-05-02 Thread Robert Haas
On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut wrote: > On 30.04.24 14:39, Daniel Verite wrote: > >postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct; > > ?column? > >-- > > f > >(1 row) > > > > The first two results look fine, but the next one is inconsistent. > >

Re: Support LIKE with nondeterministic collations

2024-05-02 Thread Peter Eisentraut
On 30.04.24 14:39, Daniel Verite wrote: postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct; ?column? -- f (1 row) The first two results look fine, but the next one is inconsistent. This is correct, because '_' means "any single character". This is independent of

Re: Support LIKE with nondeterministic collations

2024-04-30 Thread Daniel Verite
Peter Eisentraut wrote: > This patch adds support for using LIKE with nondeterministic > collations. So you can do things such as > > col LIKE 'foo%' COLLATE case_insensitive Nice! > The pattern is partitioned into substrings at wildcard characters > (so 'foo%bar' is partitioned i