[
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286272#comment-14286272
]
Uwe Schindler commented on SOLR-6991:
-------------------------------------
Hi,
I checked the code. The problem is: You cannot disable by config (because it
always tries to execute the command thats part of the default config file). If
the config file is not there, then it runs TESSERACT without any path.
The only way to work around is:
- Disable the whole parser (f*ck, because then we need to maintain our own
parser list internally). There is no way to tell TIKA to exclude some parsers
(something like AutodetectParser#disableParser(name/class/whatever)
- Use a hack with reflection to make TesseractOCRParser#TESSERACT_PRESENT
return false for any path... Just replace the static map by one that returns
false for any key (LOL) and ignores any put()
> Update to Apache TIKA 1.7
> -------------------------
>
> Key: SOLR-6991
> URL: https://issues.apache.org/jira/browse/SOLR-6991
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 5.0, Trunk, 5.1
>
> Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch
>
>
> Apache TIKA 1.7 was released:
> [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
> This is more or less a dependency update, so replacements. Not sure if we
> should do this for 5.0. In 5.0 we currently have the previous version, which
> was not yet released with Solr. If we now bring this into 5.0, we wouldn't
> have a new release 2 times. I can change the stuff this evening and let it
> bake in 5.x, so maybe we backport this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]