I tried reproducing on Windows 7 with the repro line, same JVM version and
flags, still no luck.

This failure is caused by HTMLStripCharFilter incorrectly converting the
string "�" - HTML character references for high surrogate
U+D86C followed by non-surrogate U+E28F - into the corresponding
characters.  But HTMLStripCharFilter should never emit unpaired surrogates
- the test is:

    assertAnalyzesTo(analyzer, " �", new String[] {
"\uFFFD\uE28F" } );

So even if MockTokenizer.readCodePoint() didn't assert properly paired
surrogates, assertAnalyzesTo() would have failed because U+D86C was passed
through unchanged instead of being converted to U+FFFD, the Unicode
replacement character.

I can't see a code path that would cause this behavior: AFAICT unpaired
surrogate numeric character references are always converted to U+FFFD.

Steve

On Tue, Nov 18, 2014 at 1:54 PM, Robert Muir <[email protected]> wrote:

> I can't reproduce this, but its also not a random test. Just very
> simple asserts.
>
> I tried reproducing on linux with the master seed, same jvm version
> and flags, no luck.
>
> On Tue, Nov 18, 2014 at 1:32 PM, Policeman Jenkins Server
> <[email protected]> wrote:
> > Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/
> > Java: 64bit/jdk1.8.0_20 -XX:-UseCompressedOops -XX:+UseSerialGC
> (asserts: true)
> >
> > 1 tests failed.
> > REGRESSION:
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates
> >
> > Error Message:
> > unpaired high surrogate: d86c, followed by: e28f
> >
> > Stack Trace:
> > java.lang.AssertionError: unpaired high surrogate: d86c, followed by:
> e28f
> >         at
> __randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0)
> >         at
> org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191)
> >         at
> org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136)
> >         at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403)
> >         at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352)
> >         at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
> >         at
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:483)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> >         at
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> >         at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> >         at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> >         at
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> >         at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> >         at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> >         at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >         at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> >         at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
> >         at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> >         at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> >         at
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> >         at
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> >         at
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> >         at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> >         at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> >         at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >         at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >         at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >         at
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
> >         at
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> >         at
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> >         at
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> >         at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> >         at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
> >         at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> >
> > Build Log:
> > [...truncated 5690 lines...]
> >    [junit4] Suite:
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest
> >    [junit4]   2> NOTE: reproduce with: ant test
> -Dtestcase=HTMLStripCharFilterTest -Dtests.method=testUTF16Surrogates
> -Dtests.seed=A2044F8C235991A -Dtests.slow=true -Dtests.locale=pl_PL
> -Dtests.timezone=Pacific/Midway -Dtests.asserts=true
> -Dtests.file.encoding=Cp1252
> >    [junit4] FAILURE 0.14s | HTMLStripCharFilterTest.testUTF16Surrogates
> <<<
> >    [junit4]    > Throwable #1: java.lang.AssertionError: unpaired high
> surrogate: d86c, followed by: e28f
> >    [junit4]    >        at
> __randomizedtesting.SeedInfo.seed([A2044F8C235991A:5660FE2D40DB7620]:0)
> >    [junit4]    >        at
> org.apache.lucene.analysis.MockTokenizer.readCodePoint(MockTokenizer.java:191)
> >    [junit4]    >        at
> org.apache.lucene.analysis.MockTokenizer.incrementToken(MockTokenizer.java:136)
> >    [junit4]    >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:403)
> >    [junit4]    >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:352)
> >    [junit4]    >        at
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
> >    [junit4]    >        at
> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
> >    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
> >    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50):
> {dummy=PostingsFormat(name=Asserting)}, docValues:{},
> sim=RandomSimilarityProvider(queryNorm=true,coord=no): {}, locale=pl_PL,
> timezone=Pacific/Midway
> >    [junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_20
> (64-bit)/cpus=2,threads=1,free=25032976,total=97173504
> >    [junit4]   2> NOTE: All tests run in this JVM: [TestCondition2,
> EdgeNGramTokenFilterTest, TestPathHierarchyTokenizer, TestOnlyInCompound,
> TestSoraniNormalizationFilter, TestIrishLowerCaseFilter,
> WikipediaTokenizerTest, TestSpanishLightStemFilterFactory,
> TestKStemFilterFactory, TestWordnetSynonymParser,
> TestPortugueseMinimalStemFilterFactory, TestHindiFilters,
> TestPatternCaptureGroupTokenFilter, TestIndonesianStemFilterFactory,
> TestTurkishLowerCaseFilter, TokenTypeSinkTokenizerTest,
> TestStandardAnalyzer, TestTeeSinkTokenFilter,
> TestKeywordMarkerFilterFactory, TestSolrSynonymParser,
> TestGermanStemFilterFactory, ShingleFilterTest,
> TestGreekLowerCaseFilterFactory, TestPatternReplaceCharFilterFactory,
> TestAnalysisSPILoader, TestRussianAnalyzer,
> TestEnglishMinimalStemFilterFactory, QueryAutoStopWordAnalyzerTest,
> TestReversePathHierarchyTokenizer, TestSoraniNormalizationFilterFactory,
> TestPersianCharFilter, TestPortugueseAnalyzer, TestItalianLightStemFilter,
> TestCharacterUtils, TestGermanAnalyzer,
> TestPrefixAndSuffixAwareTokenFilter, TestGermanNormalizationFilterFactory,
> DateRecognizerSinkTokenizerTest, TestBulgarianStemFilterFactory,
> TestIndonesianAnalyzer, TestSegmentingTokenizerBase, TestGalicianAnalyzer,
> TestWordlistLoader, TestPortugueseStemFilter, TestCJKAnalyzer,
> TestSynonymFilterFactory, TestNeedAffix,
> TestGalicianMinimalStemFilterFactory, TestCJKBigramFilterFactory,
> TestGermanLightStemFilter, TestTypeTokenFilterFactory,
> TestMappingCharFilter, TestDutchAnalyzer,
> TestDelimitedPayloadTokenFilterFactory, TestReverseStringFilterFactory,
> TestIrishLowerCaseFilterFactory, TestArabicNormalizationFilter,
> TestDoubleEscape, TestTurkishAnalyzer, TestZeroAffix,
> TestSerbianNormalizationFilter, TestCJKWidthFilterFactory,
> TestSoraniStemFilter, TestFilesystemResourceLoader, TestKeywordAnalyzer,
> TestLengthFilter, TestKeywordRepeatFilter, TestStopFilterFactory,
> TestElision, TestFrenchLightStemFilterFactory, TestCondition,
> TestTruncateTokenFilterFactory, TestNorwegianAnalyzer, TestSnowball,
> TestFrenchLightStemFilter, TestMorph, TestEmptyTokenStream,
> TestTypeTokenFilter, TestClassicAnalyzer, TestBrazilianAnalyzer,
> TestKeepCase, TestPatternReplaceCharFilter, TestFrenchMinimalStemFilter,
> TestItalianAnalyzer, TestCollationKeyAnalyzer, TestStopAnalyzer,
> TestScandinavianFoldingFilterFactory, TestPorterStemFilter,
> TestHungarianLightStemFilterFactory,
> TestHyphenationCompoundWordTokenFilterFactory, TestSoraniAnalyzer,
> TestRemoveDuplicatesTokenFilterFactory, TestReverseStringFilter,
> TestCaseSensitive, TestAllAnalyzersHaveFactories, TestRomanianAnalyzer,
> TestHunspellStemFilterFactory, TestCommonGramsQueryFilterFactory,
> TestHyphenatedWordsFilter, TestPortugueseLightStemFilter,
> TestFinnishLightStemFilter, TokenOffsetPayloadTokenFilterTest,
> TestComplexPrefix, TypeAsPayloadTokenFilterTest,
> TestUAX29URLEmailTokenizer, TestNGramFilters, TestFactories,
> TestHTMLStripCharFilterFactory, TestStopFilter, TestBulgarianStemmer,
> TestCircumfix, TestAlternateCasing, TestSingleTokenTokenFilter,
> TestStandardFactories, TestLimitTokenCountFilterFactory, TestCzechAnalyzer,
> TestFullStrip, NGramTokenFilterTest, TestThaiAnalyzer,
> TestNorwegianLightStemFilterFactory, TestSnowballPorterFilterFactory,
> TestSwedishLightStemFilterFactory, TestGermanLightStemFilterFactory,
> ShingleAnalyzerWrapperTest, TestTrimFilter, TestPatternTokenizer,
> TestBrazilianStemFilterFactory, TestDanishAnalyzer,
> TokenRangeSinkTokenizerTest, TestGreekStemFilterFactory,
> TestCharTokenizers, TestDependencies, TestRandomChains,
> HTMLStripCharFilterTest]
> >    [junit4] Completed in 1.88s, 31 tests, 1 failure <<< FAILURES!
> >
> > [...truncated 405 lines...]
> > BUILD FAILED
> > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:525:
> The following error occurred while executing this line:
> > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:473:
> The following error occurred while executing this line:
> > C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:61:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:39:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\build.xml:452:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\common-build.xml:2140:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\analysis\build.xml:106:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\analysis\build.xml:38:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\module-build.xml:58:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\common-build.xml:1358:
> The following error occurred while executing this line:
> >
> C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\lucene\common-build.xml:965:
> There were test failures: 268 suites, 1353 tests, 1 failure, 1 ignored
> >
> > Total time: 27 minutes 45 seconds
> > Build step 'Invoke Ant' marked build as failure
> > [description-setter] Description set: Java: 64bit/jdk1.8.0_20
> -XX:-UseCompressedOops -XX:+UseSerialGC (asserts: true)
> > Archiving artifacts
> > Recording test results
> > Email was triggered for: Failure - Any
> > Sending email for trigger: Failure - Any
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to