[ https://issues.apache.org/jira/browse/TIKA-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
PJ Fanning updated TIKA-4405: ----------------------------- Description: * https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831 * https://bz.apache.org/bugzilla/show_bug.cgi?id=63575 I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few of the tests that we have for the POI XWPFWordExtractor. The capitalized text test failed. The text in the XML is not capitalized but the OOXML has a marker element that says it should be capitalized. There are quite a few other POI tests where XWPFEventBasedWordExtractor does not return the same text as XWPFWordExtractor. https://github.com/apache/poi/pull/788 was: * https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831 * https://bz.apache.org/bugzilla/show_bug.cgi?id=63575 I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few of the tests that we have for the POI XWPFWordExtractor. The capitalized text test failed. The text in the XML is not capitalized but the OOXML has a marker element that says it should be capitalized. > XWPFEventBasedWordExtractor does not support run text that is marked as > capitalized > ----------------------------------------------------------------------------------- > > Key: TIKA-4405 > URL: https://issues.apache.org/jira/browse/TIKA-4405 > Project: Tika > Issue Type: Bug > Reporter: PJ Fanning > Priority: Major > > * > https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831 > * https://bz.apache.org/bugzilla/show_bug.cgi?id=63575 > I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a > few of the tests that we have for the POI XWPFWordExtractor. The capitalized > text test failed. The text in the XML is not capitalized but the OOXML has a > marker element that says it should be capitalized. > There are quite a few other POI tests where XWPFEventBasedWordExtractor does > not return the same text as XWPFWordExtractor. > https://github.com/apache/poi/pull/788 -- This message was sent by Atlassian Jira (v8.20.10#820010)