Re: Support LIKE with nondeterministic collations

Peter Eisentraut Fri, 03 May 2024 11:54:07 -0700

On 03.05.24 16:58, Daniel Verite wrote:

    * Generating bounds for a sort key (prefix matching)


    Having sort keys for strings allows for easy creation of bounds -
    sort keys that are guaranteed to be smaller or larger than any sort
    key from a give range. For example, if bounds are produced for a
    sortkey of string “smith”, strings between upper and lower bounds
    with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
    of upper bounds can be generated - the first one will match only
    strings of equal length, while the second one will match all the
    strings with the same initial prefix.

    CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
    the maximum primary weight, so that for example the string
    “smith\uFFFF” can be used as the upper bound rather than modifying
    the sort key for “smith”.

In other words it says that

   col LIKE 'smith%' collate "nd"

is equivalent to:

   col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"

which could be obtained from an index scan, assuming a btree
index on "col" collate "nd".

U+FFFF is a valid code point but a "non-character" [1] so it's
not supposed to be present in normal strings.


Thanks, this could be very useful!

Re: Support LIKE with nondeterministic collations

Reply via email to