[
https://issues.apache.org/jira/browse/SOLR-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050479#comment-18050479
]
ASF subversion and git services commented on SOLR-18030:
--------------------------------------------------------
Commit 2eb47cf5f6b3b0dd0b53435a671860f9ca20c035 in solr's branch
refs/heads/jira/SOLR-17975 from Eric Pugh
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=2eb47cf5f6b ]
SOLR-18030: Improve MLT docs (#3950)
> Awkwardness in Tika Server + Extraction module + Solr Examples
> --------------------------------------------------------------
>
> Key: SOLR-18030
> URL: https://issues.apache.org/jira/browse/SOLR-18030
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction), examples
> Reporter: Eric Pugh
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Now that extraction runs in it's own Tika Server, we need to think a bit more
> about when we cause actions that trigger it.
> Today, when you do bin/solr start -e cloud you don't have the extraction
> module.
> In the Exercise 1: Index Techproducts data we load pdf docs that cause
> extraction to run:
> bin/solr post -c techproducts example/exampledocs/*
>
> Leading to:
>
> POSTing file sample.html (text/html) to [base]/extract
> PostTool: WARNING: Solr returned an error #500 (Server Error) for url:
> http://localhost:8983/solr/gettingstarted/update/extract?resource.name=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html&literal.id=%2FUsers%2Fepugh%2FDocuments%2Fprojects%2Fsolr-epugh%2Fsolr%2Fpackaging%2Fbuild%2Fdev%2Fexample%2Fexampledocs%2Fsample.html
> PostTool: WARNING: Response: {
> "error":{
> "metadata":{
> "error-class":"org.apache.solr.common.SolrException",
> "root-error-class":"java.lang.ClassNotFoundException"
> },
> "errorClass":"org.apache.solr.common.SolrException",
> "msg":" Error loading class 'solr.extraction.ExtractingRequestHandler'",
> "trace":{
>
>
> The easy fix would be to just have the techproducts related documents in
> ./example/techproducts, leaving the ones that need Tika parsing in
> ./example/exampledocs/
>
> That would be books.json, books.csv, more_books.jsonl, sample.html, solr.xml,
> and solr-word.pdf
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]