[jira] [Commented] (LUCENE-7052) BytesRefHash.sort should always sort in unicode code point order

Steve Rowe (JIRA) Sat, 27 Feb 2016 18:18:32 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170801#comment-15170801
 ]


Steve Rowe commented on LUCENE-7052:
------------------------------------

Mike, why did you add an implementation of codePoints() instead of using the 
CharSequence version (returning IntStream) + toArray()?  The test passes for me 
with this patch:

{noformat}
diff --git a/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java 
b/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
index 50d921b..c3a58ff 100644
--- a/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
+++ b/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
@@ -168,15 +168,6 @@ public class TestBytesRefHash extends LuceneTestCase {
     }
   }
 
-  private static int[] codePoints(String input) {
-    int length = Character.codePointCount(input, 0, input.length());
-    int word[] = new int[length];
-    for (int i = 0, j = 0, cp = 0; i < input.length(); i += 
Character.charCount(cp)) {
-      word[j++] = cp = input.codePointAt(i);
-    }
-    return word;
-  }
-
   /**
    * Test method for
    * {@link org.apache.lucene.util.BytesRefHash#sort()}.
@@ -191,8 +182,8 @@ public class TestBytesRefHash extends LuceneTestCase {
       SortedSet<String> strings = new TreeSet<>(new Comparator<String>() {
           @Override
           public int compare(String a, String b) {
-            int[] aCodePoints = codePoints(a);
-            int[] bCodePoints = codePoints(b);
+            int[] aCodePoints = a.codePoints().toArray();
+            int[] bCodePoints = b.codePoints().toArray();
             for(int i=0;i<Math.min(aCodePoints.length, 
bCodePoints.length);i++) {
               if (aCodePoints[i] < bCodePoints[i]) {
                 return -1;
{noformat}

> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7052
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.0
>
>         Attachments: LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass 
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7052) BytesRefHash.sort should always sort in unicode code point order

Reply via email to