[jira] [Commented] (SOLR-18030) Awkwardness in Tika Server + Extraction module + Solr Examples

ASF subversion and git services (Jira) Wed, 07 Jan 2026 14:33:43 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050479#comment-18050479
 ]


ASF subversion and git services commented on SOLR-18030:
--------------------------------------------------------

Commit 2eb47cf5f6b3b0dd0b53435a671860f9ca20c035 in solr's branch 
refs/heads/jira/SOLR-17975 from Eric Pugh
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=2eb47cf5f6b ]

SOLR-18030: Improve MLT docs (#3950)



> Awkwardness in Tika Server + Extraction module + Solr Examples
> --------------------------------------------------------------
>
>                 Key: SOLR-18030
>                 URL: https://issues.apache.org/jira/browse/SOLR-18030
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction), examples
>            Reporter: Eric Pugh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Now that extraction runs in it's own Tika Server, we need to think a bit more 
> about when we cause actions that trigger it.
> Today, when you do bin/solr start -e cloud you don't have the extraction 
> module.   
> In the Exercise 1: Index Techproducts data we load pdf docs that cause 
> extraction to run: 
> bin/solr post -c techproducts example/exampledocs/*
>  
> Leading to:
>  
> POSTing file sample.html (text/html) to [base]/extract
> PostTool: WARNING: Solr returned an error #500 (Server Error) for url: 
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html
> PostTool: WARNING: Response: {
>   "error":{
>     "metadata":{
>       "error-class":"org.apache.solr.common.SolrException",
>       "root-error-class":"java.lang.ClassNotFoundException"
>     },
>     "errorClass":"org.apache.solr.common.SolrException",
>     "msg":" Error loading class 'solr.extraction.ExtractingRequestHandler'",
>     "trace":{
>  
>  
> The easy fix would be to just have the techproducts related documents in 
> ./example/techproducts, leaving the ones that need Tika parsing in 
> ./example/exampledocs/
>  
> That would be books.json, books.csv, more_books.jsonl, sample.html, solr.xml, 
> and solr-word.pdf
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-18030) Awkwardness in Tika Server + Extraction module + Solr Examples

Reply via email to