Copilot commented on code in PR #3670:
URL: https://github.com/apache/solr/pull/3670#discussion_r2412923002


##########
solr/solrj/src/java/org/apache/solr/common/SolrException.java:
##########
@@ -45,6 +45,7 @@ public enum ErrorCode {
     TOO_MANY_REQUESTS(429),
     SERVER_ERROR(500),
     SERVICE_UNAVAILABLE(503),

Review Comment:
   Add JavaDoc comment for the new GATEWAY_TIMEOUT error code to explain when 
it's used (e.g., timeouts when communicating with external services like 
TikaServer).
   ```suggestion
       SERVICE_UNAVAILABLE(503),
       /**
        * Indicates that Solr, acting as a gateway or proxy, did not receive a 
timely response from an upstream server.
        * Typically used for timeouts when communicating with external services 
such as TikaServer.
        */
   ```



##########
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingParams.java:
##########
@@ -136,4 +136,19 @@ public interface ExtractingParams {
    * .*=<defaultmypassword> at the end
    */
   public static final String PASSWORD_MAP_FILE = "passwordsFile";
+
+  /** Backend selection parameter and */

Review Comment:
   Incomplete JavaDoc comment - missing description after 'and'.
   ```suggestion
     /**
      * Backend selection parameter. Specifies which extraction backend to use 
for processing documents,
      * such as Tika or other supported backends.
      */
   ```



##########
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java:
##########
@@ -75,40 +55,34 @@ public class ExtractingDocumentLoader extends 
ContentStreamLoader {
   /** Extract Only supported format. Default */
   public static final String XML_FORMAT = "xml";
 
-  /** XHTML XPath parser. */
-  private static final XPathParser PARSER = new XPathParser("xhtml", 
XHTMLContentHandler.XHTML);
-
   final SolrCore core;
   final SolrParams params;
   final UpdateRequestProcessor processor;
   final boolean ignoreTikaException;
-  protected AutoDetectParser autoDetectParser;
+  final boolean backCompat;
 
   private final AddUpdateCommand templateAdd;
 
-  protected TikaConfig config;
-  protected ParseContextConfig parseContextConfig;
   protected SolrContentHandlerFactory factory;
+  protected ExtractionBackend backend;
 
   public ExtractingDocumentLoader(
       SolrQueryRequest req,
       UpdateRequestProcessor processor,
-      TikaConfig config,
-      ParseContextConfig parseContextConfig,
-      SolrContentHandlerFactory factory) {
+      SolrContentHandlerFactory factory,
+      ExtractionBackend backend) {
     this.params = req.getParams();
     this.core = req.getCore();
-    this.config = config;
-    this.parseContextConfig = parseContextConfig;
     this.processor = processor;
+    this.backCompat = params.getBool(ExtractingParams.BACK_COMPATIBILITY, 
true);
 
     templateAdd = new AddUpdateCommand(req);
     templateAdd.overwrite = params.getBool(UpdateParams.OVERWRITE, true);
     templateAdd.commitWithin = params.getInt(UpdateParams.COMMIT_WITHIN, -1);
+    templateAdd.overwrite = params.getBool(UpdateParams.OVERWRITE, true);

Review Comment:
   Duplicate assignment of `templateAdd.overwrite` - this field is set twice 
with the same value on lines 80 and 82.
   ```suggestion
   
   ```



##########
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc:
##########
@@ -391,6 +463,33 @@ This is only required if you have customized your Tika 
implementation.
 +
 Example: `tika.config=/path/to/tika.config`
 
+`tikaserver.timeoutSeconds`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: `180` seconds
+|===
++
+Sets the HTTP timeout when communicating with Tika Server, in seconds. Can be 
set per request as a parameter or as a default in the request handler 
configuration. If set on the request it overrides the handler default for that 
call only.
++
+Examples:
++
+- Per request: `tikaserver.timeoutSeconds=60`
+- In `solrconfig.xml`: `<int name="tikaserver.timeoutSeconds">60</int>`
+
+
+`tikaserver.url`::
++
+[%autowidth,frame=none]
+|===
+|Optional |Default: none
+|===
++
+Specifies the URL of the Tika server to use when the `extraction.backend` 
parameter is set to `tikaserver`.
+This parameter is required when using the `tikaserver` backend. Defaults to 
`http://localhost:9998` if not specified. This parameter can only be configured 
in the configuration, not per request.

Review Comment:
   Contradictory statements: first says the parameter is 'required', then says 
it 'Defaults to `http://localhost:9998` if not specified'. Clarify whether it's 
required or has a default.
   ```suggestion
   This parameter is optional when using the `tikaserver` backend. If not 
specified, it defaults to `http://localhost:9998`. This parameter can only be 
configured in the configuration, not per request.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to