On Wed, Aug 7, 2019 at 4:11 PM Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > On Wed, Aug 7, 2019 at 2:25 PM Markus Winand <markus.win...@winand.at> wrote: > > I was playing around with JSON path quite a bit and might have found one > > case where the current implementation doesn’t follow the standard. > > > > The functionality in question are the comparison operators except ==. They > > use the database default collation rather then the standard-mandated > > "Unicode codepoint collation” (SQL-2:2016 9.39 General Rule 12 c iii 2 D, > > last sentence in first paragraph). > > Thank you for pointing! Nikita is about to write a patch fixing that.
Please, see the attached patch. Our idea is to not sacrifice "==" operator performance for standard conformance. So, "==" remains per-byte comparison. For consistency in other operators we compare code points first, then do per-byte comparison. In some edge cases, when same Unicode codepoints have different binary representations in database encoding, this behavior diverges standard. In future we can implement strict standard conformance by normalization of input JSON strings. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
0001-Use-Unicode-codepoint-collation-in-jsonpath-2.patch
Description: Binary data