On Thu, Aug 8, 2019 at 12:55 AM Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > On Wed, Aug 7, 2019 at 4:11 PM Alexander Korotkov > <a.korot...@postgrespro.ru> wrote: > > On Wed, Aug 7, 2019 at 2:25 PM Markus Winand <markus.win...@winand.at> > > wrote: > > > I was playing around with JSON path quite a bit and might have found one > > > case where the current implementation doesn’t follow the standard. > > > > > > The functionality in question are the comparison operators except ==. > > > They use the database default collation rather then the standard-mandated > > > "Unicode codepoint collation” (SQL-2:2016 9.39 General Rule 12 c iii 2 D, > > > last sentence in first paragraph). > > > > Thank you for pointing! Nikita is about to write a patch fixing that. > > Please, see the attached patch. > > Our idea is to not sacrifice "==" operator performance for standard > conformance. So, "==" remains per-byte comparison. For consistency > in other operators we compare code points first, then do per-byte > comparison. In some edge cases, when same Unicode codepoints have > different binary representations in database encoding, this behavior > diverges standard. In future we can implement strict standard > conformance by normalization of input JSON strings.
Previous version of patch has buggy implementation of compareStrings(). Revised version is attached. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
0001-Use-Unicode-codepoint-collation-in-jsonpath-3.patch
Description: Binary data