That's lines 460-497 (not 491) in HTMLStripCharFilter.jflex

On Sun, Nov 23, 2014 at 1:24 PM, Steve Rowe <[email protected]> wrote:

> The matching rule is on lines 460-491 in HTMLStripCharFilter.jflex: <
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.jflex?view=markup#l460>
> - it matches an overly-broad set of char ref pairs, then validates
> correctly paired surrogates, and backtracks if the pair are not valid.
>
> The JDK methods are Integer.parseInt(), Character.isHighSurrogate() and
> Character.isLowSurrogate().
>
> On Sun, Nov 23, 2014 at 1:06 PM, Robert Muir <[email protected]> wrote:
>
>> Is the character processing here all done by the charfilter, or does
>> it use some encoding methods from the JDK?
>>
>> when i looked at it, it looked like a jvm bug.
>>
>> On Sun, Nov 23, 2014 at 1:04 PM, Steve Rowe <[email protected]> wrote:
>> > This is the same line in the same test that failed on Windows under a
>> > 1.8.0_20 JVM five days ago
>> > <http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/4439/>, but
>> in a
>> > different way.
>> >
>> > This test's input is the string "&#55404;&#57999;" - HTML character
>> > references for U+D86D U+E28F - and the expected output is the char
>> sequence
>> > U+FFFD U+E28F (the Unicode replacement character followed by the second
>> > input char).
>> >
>> > In the Windows failure, the output was U+D86D U+E28F (improperly paired
>> high
>> > surrogate).
>> >
>> > In this Linux failure, the output is U+2B68F (properly paired UTF-16
>> U+D86D
>> > U+DE8F).
>> >
>> > Very weird.
>> >
>> > I'm beasting this suite now on Windows under Oracle JVM 1.8.0_20 to see
>> if I
>> > can get it to fail.  No dice so far after 140 trials.
>> >
>> >
>> > On Sun, Nov 23, 2014 at 6:19 AM, Policeman Jenkins Server
>> > <[email protected]> wrote:
>> >>
>> >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11492/
>> >> Java: 32bit/jdk1.8.0_20 -server -XX:+UseParallelGC (asserts: false)
>> >>
>> >> 1 tests failed.
>> >> FAILED:
>> >>
>> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates
>> >>
>> >> Error Message:
>> >> term 0 expected:<[�]> but was:<[𫚏]>
>> >>
>> >> Stack Trace:
>> >> org.junit.ComparisonFailure: term 0 expected:<[�]> but was:<[𫚏]>
>> >>         at
>> >>
>> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0)
>> >>         at org.junit.Assert.assertEquals(Assert.java:125)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353)
>> >>         at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
>> >>         at
>> >>
>> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
>> >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>         at
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>         at
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>         at java.lang.reflect.Method.invoke(Method.java:483)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>> >>         at
>> >>
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
>> >>         at
>> >>
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>> >>         at
>> >>
>> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> >>         at
>> >>
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
>> >>         at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> >>
>> >>
>> >> Build Log:
>> >> [...truncated 5753 lines...]
>> >>    [junit4] Suite:
>> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest
>> >>    [junit4]   2> NOTE: reproduce with: ant test
>> >> -Dtestcase=HTMLStripCharFilterTest -Dtests.method=testUTF16Surrogates
>> >> -Dtests.seed=CF8F65E969B602B9 -Dtests.multiplier=3 -Dtests.slow=true
>> >> -Dtests.locale=th_TH -Dtests.timezone=PLT -Dtests.asserts=false
>> >> -Dtests.file.encoding=UTF-8
>> >>    [junit4] FAILURE 0.07s J0 |
>> HTMLStripCharFilterTest.testUTF16Surrogates
>> >> <<<
>> >>    [junit4]    > Throwable #1: org.junit.ComparisonFailure: term 0
>> >> expected:<[�]> but was:<[𫚏]>
>> >>    [junit4]    >        at
>> >>
>> __randomizedtesting.SeedInfo.seed([CF8F65E969B602B9:93CFDF3CEB58ED83]:0)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:180)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:295)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:299)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:303)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:353)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertAnalyzesTo(BaseTokenStreamTestCase.java:362)
>> >>    [junit4]    >        at
>> >>
>> org.apache.lucene.analysis.charfilter.HTMLStripCharFilterTest.testUTF16Surrogates(HTMLStripCharFilterTest.java:600)
>> >>    [junit4]    >        at java.lang.Thread.run(Thread.java:745)
>> >>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50):
>> >> {dummy=BlockTreeOrds(blocksize=128)}, docValues:{},
>> sim=DefaultSimilarity,
>> >> locale=th_TH, timezone=PLT
>> >>    [junit4]   2> NOTE: Linux 3.13.0-39-generic i386/Oracle Corporation
>> >> 1.8.0_20 (32-bit)/cpus=8,threads=1,free=88329216,total=222035968
>> >>    [junit4]   2> NOTE: All tests run in this JVM:
>> >> [TestPatternReplaceCharFilter, TestArabicNormalizationFilter,
>> >> TestPatternReplaceCharFilterFactory, TestWikipediaTokenizerFactory,
>> >> TestCondition2, TestIrishLowerCaseFilterFactory,
>> TestGalicianStemFilter,
>> >> TestWordlistLoader, TestElisionFilterFactory, TestLengthFilter,
>> >> TestGermanLightStemFilterFactory, EdgeNGramTokenFilterTest,
>> >> TestSerbianNormalizationFilterFactory, TestPortugueseLightStemFilter,
>> >> TestSwedishLightStemFilterFactory, TestPatternReplaceFilterFactory,
>> >> TestElision, TestCzechStemFilterFactory, TestSpanishLightStemFilter,
>> >> TestSingleTokenTokenFilter, TestHindiStemmer, TestKeepWordFilter,
>> >> TestLimitTokenCountFilter, TestShingleFilterFactory, TestTrimFilter,
>> >> TestCapitalizationFilterFactory, TestFactories,
>> >> TestGalicianMinimalStemFilterFactory, TestFlagLong, TestIgnore,
>> >> TestGermanMinimalStemFilterFactory, TestUAX29URLEmailTokenizerFactory,
>> >> TestPatternCaptureGroupTokenFilter, TestAlternateCasing,
>> TestCzechAnalyzer,
>> >> TestOnlyInCompound, TestPersianNormalizationFilter,
>> >> TestGermanNormalizationFilterFactory, WikipediaTokenizerTest,
>> >> TestMultiWordSynonyms, TestTruncateTokenFilter, TestPersianAnalyzer,
>> >> TestArabicAnalyzer, TestRemoveDuplicatesTokenFilter,
>> >> TestSoraniStemFilterFactory, TestPorterStemFilterFactory,
>> >> TestCodepointCountFilterFactory, TokenTypeSinkTokenizerTest,
>> >> TestSoraniAnalyzer, TestApostropheFilter,
>> QueryAutoStopWordAnalyzerTest,
>> >> TestTwoSuffixes, TestScandinavianFoldingFilterFactory,
>> TestArmenianAnalyzer,
>> >> TestFinnishAnalyzer, TestFlagNum, TestIndonesianStemmer,
>> >> TestLimitTokenCountAnalyzer,
>> TestScandinavianNormalizationFilterFactory,
>> >> TestReversePathHierarchyTokenizer, TestGalicianMinimalStemFilter,
>> >> TestPersianNormalizationFilterFactory, TestNeedAffix,
>> >> TestGermanLightStemFilter, TestLimitTokenPositionFilterFactory,
>> >> TestStopFilterFactory, TestMappingCharFilter, HTMLStripCharFilterTest]
>> >>    [junit4] Completed on J0 in 2.12s, 31 tests, 1 failure <<< FAILURES!
>> >>
>> >> [...truncated 403 lines...]
>> >> BUILD FAILED
>> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:525: The
>> >> following error occurred while executing this line:
>> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:473: The
>> >> following error occurred while executing this line:
>> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:61: The
>> >> following error occurred while executing this line:
>> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/extra-targets.xml:39:
>> The
>> >> following error occurred while executing this line:
>> >> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build.xml:452:
>> The
>> >> following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2141:
>> >> The following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:106:
>> >> The following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/analysis/build.xml:38:
>> >> The following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/module-build.xml:58:
>> >> The following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:1359:
>> >> The following error occurred while executing this line:
>> >>
>> >>
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:966:
>> >> There were test failures: 270 suites, 1408 tests, 1 failure, 1 ignored
>> >>
>> >> Total time: 30 minutes 5 seconds
>> >> Build step 'Invoke Ant' marked build as failure
>> >> [description-setter] Description set: Java: 32bit/jdk1.8.0_20 -server
>> >> -XX:+UseParallelGC (asserts: false)
>> >> Archiving artifacts
>> >> Recording test results
>> >> Email was triggered for: Failure - Any
>> >> Sending email for trigger: Failure - Any
>> >>
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Reply via email to