ChrisHegarty opened a new issue, #13426:
URL: https://github.com/apache/lucene/issues/13426
A recent change, #13406 , added an assertion that may be incorrect.
The assertion asserts that the number of entries matches that of the number
of inputs processed. This may not be the case then a duplicate entry is passed
in. For example,
Add a duplicate entry:
```
$ git diff
diff --git
a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
index 045b64eaa07..4513885a36b 100644
---
a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
+++
b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
@@ -5,6 +5,7 @@ C샤프
세종시 세종 시
대한민국날씨
대한민국
+대한민국
날씨
21세기대한민국
세기
\ No newline at end of file
```
```
$ ./gradlew :lucene:analysis:nori:test --tests
"org.apache.lucene.analysis.ko.TestKoreanTokenizer"
```
```
reproduce with: gradlew test --tests
TestKoreanTokenizer.testPartOfSpeechsWithCompound -Dtests.seed=6445594235429961
-Dtests.locale=bs-BA -Dtests.timezone=Atlantic/Faeroe -Dtests.asserts=true
-Dtests.file.encoding=UTF-8
> java.lang.AssertionError
> at
__randomizedtesting.SeedInfo.seed([6445594235429961:9CEA448B4AB5C1F2]:0)
> at
org.apache.lucene.analysis.ko.dict.UserDictionary.<init>(UserDictionary.java:137)
> at
org.apache.lucene.analysis.ko.dict.UserDictionary.open(UserDictionary.java:69)
> at
org.apache.lucene.analysis.ko.TestKoreanTokenizer.readDict(TestKoreanTokenizer.java:51)
> at
org.apache.lucene.analysis.ko.TestKoreanTokenizer.setUp(TestKoreanTokenizer.java:63)
> at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
> at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
> at
[email protected]/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
> at
[email protected]/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at
[email protected]/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at
[email protected]/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at
[email protected]/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at
[email protected]/org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at
[email protected]/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at
[email protected]/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
> at
[email protected]/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
> at
[email protected]/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
> at
[email protected]/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
> at
[email protected]/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
...
```
I encountered this assertion firing when testing a snapshot of the Lucene
branch with Elasticsearch. The
testNoriAnalyzerDuplicateUserDictRuleWithLegacyVersion test fails (hits the
assertion), see
https://github.com/elastic/elasticsearch/blob/main/plugins/analysis-nori/src/test/java/org/elasticsearch/plugin/analysis/nori/NoriAnalysisTests.java#L132C5-L145C1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]