[ 
https://issues.apache.org/jira/browse/TIKA-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated TIKA-4405:
-----------------------------
    Description: 
* https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
* https://bz.apache.org/bugzilla/show_bug.cgi?id=63575

I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few 
of the tests that we have for the POI XWPFWordExtractor. The capitalized text 
test failed. The text in the XML is not capitalized but the OOXML has a marker 
element that says it should be capitalized.

There are quite a few other POI tests where XWPFEventBasedWordExtractor does 
not return the same text as XWPFWordExtractor.

https://github.com/apache/poi/pull/788

  was:
* https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
* https://bz.apache.org/bugzilla/show_bug.cgi?id=63575

I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a few 
of the tests that we have for the POI XWPFWordExtractor. The capitalized text 
test failed. The text in the XML is not capitalized but the OOXML has a marker 
element that says it should be capitalized.


> XWPFEventBasedWordExtractor does not support run text that is marked as 
> capitalized
> -----------------------------------------------------------------------------------
>
>                 Key: TIKA-4405
>                 URL: https://issues.apache.org/jira/browse/TIKA-4405
>             Project: Tika
>          Issue Type: Bug
>            Reporter: PJ Fanning
>            Priority: Major
>
> * 
> https://github.com/apache/poi/commit/80f89a3674aaf346d10b5aa1f2bdb7dea75ba831
> * https://bz.apache.org/bugzilla/show_bug.cgi?id=63575
> I am looking at copying XWPFEventBasedWordExtractor into POI code and ran a 
> few of the tests that we have for the POI XWPFWordExtractor. The capitalized 
> text test failed. The text in the XML is not capitalized but the OOXML has a 
> marker element that says it should be capitalized.
> There are quite a few other POI tests where XWPFEventBasedWordExtractor does 
> not return the same text as XWPFWordExtractor.
> https://github.com/apache/poi/pull/788



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to