Re: insensitive collations

Andreas Karlsson Mon, 14 Jan 2019 06:37:43 -0800

On 1/10/19 8:44 AM, Peter Eisentraut wrote:

On 09/01/2019 19:49, Andreas Karlsson wrote:

Maybe this is orthogonal and best handled elsewhere but have you when
working with string equality given unicode normalization forms[1] any
thought?


Nondeterministic collations do address this by allowing canonically
equivalent code point sequences to compare as equal.  You still need a
collation implementation that actually does compare them as equal; ICU
does this, glibc does not AFAICT.


Ah, right! You could use -ks-identic[1] for this.

Would there be any point in adding unicode normalization support into
the collation system or is this best handle for example with a function
run on INSERT or with something else entirely?


I think there might be value in a feature that normalizes strings as
they enter the database, as a component of the encoding conversion
infrastructure.  But that would be a separate feature.

Agreed. And if we ever implement this we could theoretically optimizethe equality of -ks-identic to do a strcmp() rather than having tocollate anything.

I think it could also be useful to just add functions which cannormalize strings, which was in a proposal to the SQL standard which wasnot accepted.[2]


Notes

1. http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
2. https://dev.mysql.com/worklog/task/?id=2048

Andreas

Re: insensitive collations

Reply via email to