[ 
https://issues.apache.org/jira/browse/SOLR-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-18030:
-----------------------------
    Description: 
Now that extraction runs in it's own Tika Server, we need to think a bit more 
about when we cause actions that trigger it.

Today, when you do bin/solr start -e cloud you don't have the extraction 
module.   

In the Exercise 1: Index Techproducts data we load pdf docs that cause 
extraction to run: 

bin/solr post -c techproducts example/exampledocs/*

 

Leading to:

 

POSTing file sample.html (text/html) to [base]/extract

PostTool: WARNING: Solr returned an error #500 (Server Error) for url: 
http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html

PostTool: WARNING: Response: {

  "error":{

    "metadata":{

      "error-class":"org.apache.solr.common.SolrException",

      "root-error-class":"java.lang.ClassNotFoundException"

    },

    "errorClass":"org.apache.solr.common.SolrException",

    "msg":" Error loading class 'solr.extraction.ExtractingRequestHandler'",

    "trace":{

 

 

The easy fix would be to just have the techproducts related documents in 
./example/techproducts, leaving the ones that need Tika parsing in 
./example/exampledocs/

 

That would be books.json, books.csv, more_books.jsonl, sample.html, solr.xml, 
and solr-word.pdf

 

  was:
Now that extraction runs in it's own Tika Server, we need to think a bit more 
about when we cause actions that trigger it.

Today, when you do bin/solr start -e cloud or bin/solr start -e techproducts 
you get the Extraction Module.   In the Exercise 1: Index Techproducts data we 
load pdf docs that cause extraction to run:

 

 


> Awkwardness in Tika Server + Extraction module + Solr Examples
> --------------------------------------------------------------
>
>                 Key: SOLR-18030
>                 URL: https://issues.apache.org/jira/browse/SOLR-18030
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction), examples
>            Reporter: Eric Pugh
>            Priority: Major
>
> Now that extraction runs in it's own Tika Server, we need to think a bit more 
> about when we cause actions that trigger it.
> Today, when you do bin/solr start -e cloud you don't have the extraction 
> module.   
> In the Exercise 1: Index Techproducts data we load pdf docs that cause 
> extraction to run: 
> bin/solr post -c techproducts example/exampledocs/*
>  
> Leading to:
>  
> POSTing file sample.html (text/html) to [base]/extract
> PostTool: WARNING: Solr returned an error #500 (Server Error) for url: 
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html
> PostTool: WARNING: Response: {
>   "error":{
>     "metadata":{
>       "error-class":"org.apache.solr.common.SolrException",
>       "root-error-class":"java.lang.ClassNotFoundException"
>     },
>     "errorClass":"org.apache.solr.common.SolrException",
>     "msg":" Error loading class 'solr.extraction.ExtractingRequestHandler'",
>     "trace":{
>  
>  
> The easy fix would be to just have the techproducts related documents in 
> ./example/techproducts, leaving the ones that need Tika parsing in 
> ./example/exampledocs/
>  
> That would be books.json, books.csv, more_books.jsonl, sample.html, solr.xml, 
> and solr-word.pdf
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to