janhoy commented on code in PR #3784: URL: https://github.com/apache/solr/pull/3784#discussion_r2442216462
########## solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc: ########## @@ -20,7 +20,7 @@ If the documents you need to index are in a binary format, such as Word, Excel, Apache Tika incorporates many different file-format parsers such as http://pdfbox.apache.org/[Apache PDFBox] and http://poi.apache.org/index.html[Apache POI] to extract the text content and metadata from files. -Solr's `ExtractingRequestHandler` uses Tika, either in-process or a remote Tika server, to support extracting text and metadata from binary files. +Solr's `ExtractingRequestHandler` uses Apache Tika via an external Tika Server to extract text and metadata from binary files. Review Comment: See above. If we keep adding backends, I think we should invest in making them truly pluggable.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
