Copilot commented on code in PR #3784:
URL: https://github.com/apache/solr/pull/3784#discussion_r2444069685
##########
solr/solr-ref-guide/modules/indexing-guide/pages/post-tool.adoc:
##########
@@ -134,6 +134,8 @@ The
xref:indexing-with-update-handlers.adoc#csv-formatted-index-updates[CSV hand
Index a PDF file into `gettingstarted`.
+NOTE: This requires a Tika Serer to be configured. See
xref:indexing-with-tika.adoc#tika-server[Indexing With Tika] for details.
Review Comment:
Typo: 'Tika Serer' should be 'Tika Server'.
```suggestion
NOTE: This requires a Tika Server to be configured. See
xref:indexing-with-tika.adoc#tika-server[Indexing With Tika] for details.
```
##########
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc:
##########
@@ -54,29 +54,27 @@ This is provided via the `extraction`
xref:configuration-guide:solr-modules.adoc
The "techproducts" example included with Solr is pre-configured to have Solr
Cell configured.
If you are not using the example, you will want to pay attention to the
section <<solrconfig.xml Configuration>> below.
-== Tika Extraction Backends
+== Extraction Backends
-There are two backends for this module. The `local` backend embeds Tika inside
Solr's own process, while the `tikaserver` backend uses an external Tika server
process to do the extraction.
+The ExtractionRequestHandler supports multiple backends, selectable with the
`extraction.backend` parameter. The only backend currently supported is the
`tikaserver` backend, which uses an external Tika server process to do the
extraction.
Review Comment:
Class name is incorrect; it should be 'ExtractingRequestHandler' (with
'ing'). Please update the reference.
```suggestion
The ExtractingRequestHandler supports multiple backends, selectable with the
`extraction.backend` parameter. The only backend currently supported is the
`tikaserver` backend, which uses an external Tika server process to do the
extraction.
```
##########
solr/solr-ref-guide/modules/getting-started/pages/tutorial-diy.adoc:
##########
@@ -53,6 +53,9 @@ Local Files with `bin/solr post`::
If you have a local directory of files, the Post Tool (`bin/solr post`) can
index a directory of files.
We saw this in action in our first exercise.
+
+// NOCOMMIT: The user will need to add /update/extract handler?
+// TODO: The user will need to start a Tika server
Review Comment:
Leftover editorial markers ('NOCOMMIT'/'TODO') should be removed or resolved
before publishing the docs.
```suggestion
Note: To index rich document formats (such as PDF, Microsoft Office files,
etc.), you may need to enable the `/update/extract` handler in your Solr
configuration and ensure that a Tika server is available.
```
##########
solr/modules/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml:
##########
@@ -151,14 +151,13 @@
</requestHandler>
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
- <str name="parseContext.config">parseContext.xml</str>
- <str name="extraction.backend">${solr.test.extraction.backend:local}</str>
+ <str
name="extraction.backend">${solr.test.extraction.backend:tikaserver}</str>
<str name="tikaserver.url">${solr.test.tikaserver.url:}</str>
<str
name="tikaserver.metadata.compatibility">${solr.test.tikaserver.metadata.compatibility:false}</str>
</requestHandler>
<requestHandler name="/update/extract/lit-def"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
- <str name="extraction.backend">${solr.test.extraction.backend:local}</str>
+ <str
name="extraction.backend">${solr.test.extraction.backend:tikaserver}</str>
<str name="tikaserver.url">${solr.test.tikaserver.url:}</str>
<str
name="tikaserver.metadata.compatibility">${solr.test.tikaserver.metadata.compatibility:false}</str>
<lst name="defaults">
Review Comment:
With the handler now requiring a non-empty 'tikaserver.url', the default
empty property value will cause handler initialization to fail at core load.
Either add startup=\"lazy\" to these handlers or provide a non-empty default
via the property (or ensure tests set 'solr.test.tikaserver.url').
##########
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc:
##########
@@ -589,26 +540,9 @@ So you can use the other URPs without worrying about
unexpected field additions.
=== Parser-Specific Properties
-NOTE: This setting currently applies to the `local` backend only. When using
`tikaserver` you can configure similar settings on the Tika Server side.
-
-Parsers used by Tika may have specific properties to govern how data is
extracted.
-These can be passed through Solr for special parsing situations.
-
-For instance, when using the Tika library from a Java program, the
`PDFParserConfig` class has a method `setSortByPosition(boolean)` that can
extract vertically oriented text.
-To access that method via configuration with the `ExtractingRequestHandler`,
one can add the `parseContext.config` property to `solrconfig.xml` and then set
properties in Tika's `PDFParserConfig` as in the example below.
-
-[source,xml]
-----
-<entries>
- <entry class="org.apache.tika.parser.pdf.PDFParserConfig"
impl="org.apache.tika.parser.pdf.PDFParserConfig">
- <property name="extractInlineImages" value="true"/>
- <property name="sortByPosition" value="true"/>
- </entry>
- <entry>...</entry>
-</entries>
-----
+Parser-specific properties for Tika must be configured directly on your Tika
Server instance. Consult the https://tika.apache.org/[Apache Tika
documentation] documentation of this.
Review Comment:
Redundant wording: 'documentation documentation'. Suggest: 'Consult the
Apache Tika documentation for details.'
```suggestion
Parser-specific properties for Tika must be configured directly on your Tika
Server instance. Consult the https://tika.apache.org/[Apache Tika
documentation] for details.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]