On 20.11.24 08:29, jian he wrote:
in match_pattern_prefix maybe change
if (expr_coll && !get_collation_isdeterministic(expr_coll))
return NIL;
to
if (OidIsValid(expr_coll) && !get_collation_isdeterministic(expr_coll))
return NIL;
I left it like it was, because this w
On Tue, Nov 19, 2024 at 9:51 PM Peter Eisentraut wrote:
>
> On 18.11.24 04:30, jian he wrote:
> > we can optimize when trailing (last character) is not wildcards.
> >
> > SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
> > within the for loop
> > for(;;)
> > {
> > intcmp;
> > CHE
pattern ..." in this patch. Please check if
this is what you had in mind.
From e9252c1c8ec60e7c4813490908d6ceb575840420 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Tue, 19 Nov 2024 14:47:39 +0100
Subject: [PATCH v8] Support LIKE with nondeterministic collations
MIME-Version: 1
On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut wrote:
>
> On 15.11.24 05:26, jian he wrote:
> > /*
> > * Now build a substring of the text and try to match it against
> > * the subpattern. t is the start of the text, t1 is one past the
> > * last byte. We start with a zero-length string.
> >
On 15.11.24 05:26, jian he wrote:
/*
* Now build a substring of the text and try to match it against
* the subpattern. t is the start of the text, t1 is one past the
* last byte. We start with a zero-length string.
*/
t1 = t
t1len = tlen;
for (;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
cmp = pg_str
On Tue, Nov 12, 2024 at 3:45 PM Peter Eisentraut wrote:
>
> On 11.11.24 14:25, Heikki Linnakangas wrote:
> > Sadly the algorithm is O(n^2) with non-deterministic collations.Is there
> > any way this could be optimized? We make no claims on how expensive any
> > functions or operators are, so I sup
new patch version with an interrupt check.From bc962a3ad09f9ddaa6e9bc345376f26d16471612 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Tue, 12 Nov 2024 08:43:27 +0100
Subject: [PATCH v7] Support LIKE with nondeterministic collations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
On 04/11/2024 10:26, Peter Eisentraut wrote:
On 29.10.24 18:15, Jacob Champion wrote:
libfuzzer is unhappy about the following code in MatchText:
+ while (p1len > 0)
+ {
+ if (*p1 == '\\')
+ {
+ found_escape = true;
+
walk off the end of the
buffer. (I fixed it locally by duplicating the ERROR case that's
directly above this.)
Thanks. Here is an updated patch with that fixed.
From 17f793293469bae8d5818edee62f0429fd02df67 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Mon, 4 Nov 2024 09:1
On Sun, Sep 15, 2024 at 11:26 PM Peter Eisentraut wrote:
>
> Here is an updated patch. It is rebased over the various recent changes
> in the locale APIs. No other changes.
libfuzzer is unhappy about the following code in MatchText:
> +while (p1len > 0)
> +{
> +
#x27; COLLATE ignore_accents;
So the accent character will be ignored if it's adjacent to another
fixed substring in the pattern.
From d27d247ad824dab5e2bb3638cebef21da4cc925c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Mon, 16 Sep 2024 07:57:56 +0200
Subject: [PATCH v5] Support
Jeff Davis wrote:
> > col LIKE 'smith%' collate "nd"
> >
> > is equivalent to:
> >
> > col >= 'smith' collate "nd" AND col < U&'smith\' collate "nd"
>
> That logic seems to assume something about the collation. If you have a
> collation that orders strings by their sha256 hash,
On Fri, 2024-05-03 at 16:58 +0200, Daniel Verite wrote:
> * Generating bounds for a sort key (prefix matching)
>
> Having sort keys for strings allows for easy creation of bounds -
> sort keys that are guaranteed to be smaller or larger than any
> sort
> key from a give range. For exam
IKE U&'_bc' COLLATE ignore_accents; -- true
The second one matches because
SELECT U&'\0308bc' = 'bc' COLLATE ignore_accents;
So the accent character will be ignored if it's adjacent to another
fixed substring in the pattern.
From e9cc011946b714ce3d4f043
On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut wrote:
> Here is an updated patch for this.
I took a look at this. I added some tests and found a few that give
the wrong result (I believe). The new tests are included in the
attached patch, along with the results I expect. Here are the
failures:
7; like '_oo' COLLATE ign_punct;
?column?
--
f
(1 row)
The first two results look fine, but the next one is inconsistent.
From 34f5bb1e8f0ffbb39b1efc9777736f6b4d6c4caa Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Fri, 28 Jun 2024 06:55:45 +0200
Subject: [PATCH
On 03.05.24 17:47, Daniel Verite wrote:
Peter Eisentraut wrote:
However, off the top of my head, this definition has three flaws: (1)
It would make the single-character wildcard effectively an
any-number-of-characters wildcard, but only in some circumstances, which
could be confusing,
On 03.05.24 16:58, Daniel Verite wrote:
* Generating bounds for a sort key (prefix matching)
Having sort keys for strings allows for easy creation of bounds -
sort keys that are guaranteed to be smaller or larger than any sort
key from a give range. For example, if bounds are pro
Peter Eisentraut wrote:
> However, off the top of my head, this definition has three flaws: (1)
> It would make the single-character wildcard effectively an
> any-number-of-characters wildcard, but only in some circumstances, which
> could be confusing, (2) it would be difficult to com
Peter Eisentraut wrote:
> Yes, certainly, and there is also no indexing support (other than for
> exact matches).
The ICU docs have this note about prefix matching:
https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching
On 03.05.24 15:20, Robert Haas wrote:
On Fri, May 3, 2024 at 4:52 AM Peter Eisentraut wrote:
What the implementation does is, it walks through the pattern. It sees
'_', so it steps over one character in the input string, which is '.'
here. Then we have 'foo.' left to match in the input string
On Fri, May 3, 2024 at 4:52 AM Peter Eisentraut wrote:
> What the implementation does is, it walks through the pattern. It sees
> '_', so it steps over one character in the input string, which is '.'
> here. Then we have 'foo.' left to match in the input string. Then it
> takes from the pattern
On 03.05.24 02:11, Robert Haas wrote:
On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut wrote:
On 30.04.24 14:39, Daniel Verite wrote:
postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
?column?
--
f
(1 row)
The first two results look fine, but the next one is
On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut wrote:
> On 30.04.24 14:39, Daniel Verite wrote:
> >postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
> > ?column?
> >--
> > f
> >(1 row)
> >
> > The first two results look fine, but the next one is inconsistent.
>
>
On 30.04.24 14:39, Daniel Verite wrote:
postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
?column?
--
f
(1 row)
The first two results look fine, but the next one is inconsistent.
This is correct, because '_' means "any single character". This is
independent of
Peter Eisentraut wrote:
> This patch adds support for using LIKE with nondeterministic
> collations. So you can do things such as
>
> col LIKE 'foo%' COLLATE case_insensitive
Nice!
> The pattern is partitioned into substrings at wildcard characters
> (so 'foo%bar' is partitioned i
this by matching by character,
but for nondeterministic collations we have to go by substring.From 3f6b584a0f34cabecac69f3cfd663dadfd34f464 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Mon, 29 Apr 2024 07:58:20 +0200
Subject: [PATCH v1] Support LIKE with nondeterministic collations
T
27 matches
Mail list logo