[ 
https://issues.apache.org/jira/browse/TIKA-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3627.
-------------------------------
    Fix Version/s: 2.2.1
       Resolution: Fixed

This was caused by my error when I downgraded POI from 5.x back to 4.x.  We 
stopped 2.2.1-rc2 because of this and respun 2.2.1-rc3.  Thank you for 
reporting this.

> OOXML parsing is not working as intended using multiple threads
> ---------------------------------------------------------------
>
>                 Key: TIKA-3627
>                 URL: https://issues.apache.org/jira/browse/TIKA-3627
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Bernhard Geisberger
>            Priority: Blocker
>             Fix For: 2.2.1
>
>
> In the latest version, the parsing of OOXML files is broken if multiple 
> threads are used. I investigated and compared the call stack between 2.1.0 
> and 2.2.0, and came to the conclusion that this is caused by [this 
> commit|https://github.com/apache/tika/commit/10d925439cd862f74679ec5fa9a9b5863f50ce2c]
>  in line 86 of OOXMLExtractorFactory.
> In version 2.1.0, the call 
> `ExtractorFactory.setThreadPrefersEventExtractors(true)` is used in every 
> `parse` call, resulting in setting the thread-local property for every 
> thread. In version 2.2.0, the call is used in the static block. This leads to 
> the property being the default value (=false) for all other threads than the 
> first one. Effectively, this breaks the parsing of macros in  OOXML files.
> An easy workaround in version 2.2.0 is to call 
> `ExtractorFactory.setAllThreadsPreferEventExtractors(true)` at some time 
> before tika is used first.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to