Re: [PR] SOLR-17961 Remove deprecated Tika Extraction Backend [solr]

via GitHub Sat, 18 Oct 2025 09:40:53 -0700


janhoy commented on code in PR #3784:
URL: https://github.com/apache/solr/pull/3784#discussion_r2442216462



##########
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc:
##########
@@ -20,7 +20,7 @@ If the documents you need to index are in a binary format, 
such as Word, Excel,
 
 Apache Tika incorporates many different file-format parsers such as 
http://pdfbox.apache.org/[Apache PDFBox] and 
http://poi.apache.org/index.html[Apache POI] to extract the text content and 
metadata from files.
 
-Solr's `ExtractingRequestHandler` uses Tika, either in-process or a remote 
Tika server, to support extracting text and metadata from binary files.
+Solr's `ExtractingRequestHandler` uses Apache Tika via an external Tika Server 
to extract text and metadata from binary files.

Review Comment:
   See above.
   
   If we keep adding backends, I think we should invest in making them truly 
pluggable..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SOLR-17961 Remove deprecated Tika Extraction Backend [solr]

Reply via email to