[ 
https://issues.apache.org/jira/browse/SOLR-18022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044474#comment-18044474
 ] 

Jan Høydahl commented on SOLR-18022:
------------------------------------

My intuition is that this would be a Tika level config. Sole only passes on the 
byte stream to TikaServer. Check it you find suitable config options on Tula 
side or ask the tika community.

Sounds like you could benefit from a bigger Tika box so it does not crash on 
large files. 

I’m pretty sure you can disable OCR on Tika side.

> Solr don't index sent metadata when external Tika fails
> -------------------------------------------------------
>
>                 Key: SOLR-18022
>                 URL: https://issues.apache.org/jira/browse/SOLR-18022
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 9.10
>            Reporter: Álvaro Lechner
>            Priority: Major
>
> When I send a big pdf to solr and Tika OCR causes time out, solr don't index 
> the metadata sent.
> This occurs when solr httpclient timed out or if Tika timed out and drop 
> connection



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to