[
https://issues.apache.org/jira/browse/LUCENE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170801#comment-15170801
]
Steve Rowe commented on LUCENE-7052:
------------------------------------
Mike, why did you add an implementation of codePoints() instead of using the
CharSequence version (returning IntStream) + toArray()? The test passes for me
with this patch:
{noformat}
diff --git a/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
b/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
index 50d921b..c3a58ff 100644
--- a/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
+++ b/lucene/core/src/test/org/apache/lucene/util/TestBytesRefHash.java
@@ -168,15 +168,6 @@ public class TestBytesRefHash extends LuceneTestCase {
}
}
- private static int[] codePoints(String input) {
- int length = Character.codePointCount(input, 0, input.length());
- int word[] = new int[length];
- for (int i = 0, j = 0, cp = 0; i < input.length(); i +=
Character.charCount(cp)) {
- word[j++] = cp = input.codePointAt(i);
- }
- return word;
- }
-
/**
* Test method for
* {@link org.apache.lucene.util.BytesRefHash#sort()}.
@@ -191,8 +182,8 @@ public class TestBytesRefHash extends LuceneTestCase {
SortedSet<String> strings = new TreeSet<>(new Comparator<String>() {
@Override
public int compare(String a, String b) {
- int[] aCodePoints = codePoints(a);
- int[] bCodePoints = codePoints(b);
+ int[] aCodePoints = a.codePoints().toArray();
+ int[] bCodePoints = b.codePoints().toArray();
for(int i=0;i<Math.min(aCodePoints.length,
bCodePoints.length);i++) {
if (aCodePoints[i] < bCodePoints[i]) {
return -1;
{noformat}
> BytesRefHash.sort should always sort in unicode code point order
> ----------------------------------------------------------------
>
> Key: LUCENE-7052
> URL: https://issues.apache.org/jira/browse/LUCENE-7052
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master, 6.0
>
> Attachments: LUCENE-7052.patch
>
>
> Today {{BytesRefHash.sort}} takes a custom {{Comparator}} but we always pass
> it {{BytesRef.getUTF8SortedAsUnicodeComparator()}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]