[
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170994#comment-15170994
]
Michael McCandless commented on LUCENE-7052:
--------------------------------------------
bq. Mike, why did you add an implementation of codePoints() instead of using
the CharSequence version (returning IntStream) + toArray()?
Oh, because I didn't even know about {{CharSequence.codePoints}}!
+1 to your patch, thanks.
bq. Neither is actually pretty as the treeset invokes a comparator multiple
times for the same string, causing multiple identical string-int[] conversions
along the way. This is test-method only though, so it doesn't matter much.
It's definitely inefficient (converting to a sortable key on every comparison),
but it keeps the code simple, which I think is usually the right tradeoff for a
test case?
bq. As this is now all gone, I'd suggest to also remove the utf8AsUtf16
comparator. Mabye remove the comparators at all and just implement
BytesRef.compareTo() and use that one for sorting?
+1, that sounds awesome!
> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
> Key: LUCENE-7052
> URL: https://issues.apache.org/jira/browse/LUCENE-7052
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master, 6.0
>
> Attachments: LUCENE-7052-cleanup1.patch, LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]