Hello Benson,

Le 11/10/2021 à 19:56, Benson Muite a écrit :
When comparing strings using C++, the default behavior is to order by
UTF8 codepoints which impacts comparing strings such as a < b < c
[1][2].  This may not be appropriate in all cases and like in the sort
function [3], it may be helpful to have an optional  field for
comparison keys.

It's certainly not appropriate in most cases except the most rudimentary use cases (for example if keys are ASCII-only). We should ideally implement the official Unicode collation algorithm, however it is a non-trivial endeavour. See the already opened issue at https://issues.apache.org/jira/browse/ARROW-12046

Regards

Antoine.

Reply via email to