[
https://issues.apache.org/jira/browse/SOLR-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286111#comment-14286111
]
Hoss Man commented on SOLR-6991:
--------------------------------
bq. In fact this is not TIKA's issue and not new, a lot of stuff around Hadoop
in Solr fails with Turkish!
...my point is: it's new to Solr.
in all other cases where POSIX_SPAWN impacts Solr, we either:
* deal with it in the solr code, so we give a meaningful error to the user
explaining the problem (ie: SystemInfoHandler)
* it's in an optional feature that *NEVER* worked with turkish -- ie: the
hadoop / morephlines contribs, from the first version it was available in Solr,
would not work with turkish locale
...in this case, we're talking about an _existing_ solr feature, that has
previously worked fine if you run older Solr with turkish, and now when
upgrading to 5.0 you're going to get a weird error message.
if there's nothing better we can do keep the ExtractionRequestHandler working
or users who upgrade (even if they run with turkish) then i'm fine with assumes
in the tests and notes in the docs ... i was just hoping you'd have a better
idea.
in particular: I'm still wondering if we can leverage the classpath in a way to
override the "default" TesseractOCRConfig.properties file in the tika-parsers
jar with our own version that prevents tesseract from being used. (i agree
it's not worth switching to explicitly whitelisting the parsers in Solr code,
but is there an easy way to blacklist this parser and/or other parsers we know
are problematic?)
> Update to Apache TIKA 1.7
> -------------------------
>
> Key: SOLR-6991
> URL: https://issues.apache.org/jira/browse/SOLR-6991
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 5.0, Trunk, 5.1
>
> Attachments: SOLR-6991-forkfix.patch, SOLR-6991.patch, SOLR-6991.patch
>
>
> Apache TIKA 1.7 was released:
> [https://dist.apache.org/repos/dist/release/tika/CHANGES-1.7.txt]
> This is more or less a dependency update, so replacements. Not sure if we
> should do this for 5.0. In 5.0 we currently have the previous version, which
> was not yet released with Solr. If we now bring this into 5.0, we wouldn't
> have a new release 2 times. I can change the stuff this evening and let it
> bake in 5.x, so maybe we backport this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]