[ https://issues.apache.org/jira/browse/TIKA-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-3627. ------------------------------- Fix Version/s: 2.2.1 Resolution: Fixed This was caused by my error when I downgraded POI from 5.x back to 4.x. We stopped 2.2.1-rc2 because of this and respun 2.2.1-rc3. Thank you for reporting this. > OOXML parsing is not working as intended using multiple threads > --------------------------------------------------------------- > > Key: TIKA-3627 > URL: https://issues.apache.org/jira/browse/TIKA-3627 > Project: Tika > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Bernhard Geisberger > Priority: Blocker > Fix For: 2.2.1 > > > In the latest version, the parsing of OOXML files is broken if multiple > threads are used. I investigated and compared the call stack between 2.1.0 > and 2.2.0, and came to the conclusion that this is caused by [this > commit|https://github.com/apache/tika/commit/10d925439cd862f74679ec5fa9a9b5863f50ce2c] > in line 86 of OOXMLExtractorFactory. > In version 2.1.0, the call > `ExtractorFactory.setThreadPrefersEventExtractors(true)` is used in every > `parse` call, resulting in setting the thread-local property for every > thread. In version 2.2.0, the call is used in the static block. This leads to > the property being the default value (=false) for all other threads than the > first one. Effectively, this breaks the parsing of macros in OOXML files. > An easy workaround in version 2.2.0 is to call > `ExtractorFactory.setAllThreadsPreferEventExtractors(true)` at some time > before tika is used first. -- This message was sent by Atlassian Jira (v8.20.1#820001)