On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut <pe...@eisentraut.org> wrote: > Here is an updated patch for this.
I took a look at this. I added some tests and found a few that give the wrong result (I believe). The new tests are included in the attached patch, along with the results I expect. Here are the failures: -- inner %% matches b then zero: SELECT U&'cb\0061\0308' LIKE U&'c%%\00E4' COLLATE ignore_accents; ?column? ---------- - t + f (1 row) -- trailing _ matches two codepoints that form one char: SELECT U&'cb\0061\0308' LIKE U&'cb_' COLLATE ignore_accents; ?column? ---------- - t + f (1 row) -- leading % matches zero: SELECT U&'\0061\0308bc' LIKE U&'%\00E4bc' COLLATE ignore_accents; ?column? ---------- - t + f (1 row) -- leading % matches zero (with later %): SELECT U&'\0061\0308bc' LIKE U&'%\00E4%c' COLLATE ignore_accents; ?column? ---------- - t + f (1 row) I think the 1st, 3rd, and 4th failures are all from % not backtracking to match zero chars. The 2nd failure I'm not sure about. Maybe my expectation is wrong, but then why does the same test pass with __ leading not trailing? Surely they should be consistent. > I have added some more documentation based on the discussions, including > some examples taken directly from the emails here. This looks good to me. > One thing I have been struggling with a bit is the correct use of > LIKE_FALSE versus LIKE_ABORT in the MatchText() code. I have made some > small tweaks about this in this version that I think are more correct, > but it could use another look. Maybe also some more tests to verify > this one way or the other. I haven't looked at this yet. Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
v3-0001-Support-LIKE-with-nondeterministic-collations.patch
Description: Binary data