[jira] [Commented] (LUCENE-7052) BytesRefHash.sort should always sort in unicode code point order

Michael McCandless (JIRA) Sun, 28 Feb 2016 02:40:48 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170994#comment-15170994
 ]


Michael McCandless commented on LUCENE-7052:
--------------------------------------------

bq. Mike, why did you add an implementation of codePoints() instead of using 
the CharSequence version (returning IntStream) + toArray()? 

Oh, because I didn't even know about {{CharSequence.codePoints}}!

+1 to your patch, thanks.

bq. Neither is actually pretty as the treeset invokes a comparator multiple 
times for the same string, causing multiple identical string-int[] conversions 
along the way. This is test-method only though, so it doesn't matter much.

It's definitely inefficient (converting to a sortable key on every comparison), 
but it keeps the code simple, which I think is usually the right tradeoff for a 
test case?

bq. As this is now all gone, I'd suggest to also remove the utf8AsUtf16 
comparator. Mabye remove the comparators at all and just implement 
BytesRef.compareTo() and use that one for sorting?

+1, that sounds awesome!

> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7052
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.0
>
>         Attachments: LUCENE-7052-cleanup1.patch, LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass 
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7052) BytesRefHash.sort should always sort in unicode code point order

Reply via email to