janhoy commented on code in PR #3670:
URL: https://github.com/apache/solr/pull/3670#discussion_r2413075579


##########
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java:
##########
@@ -75,40 +55,33 @@ public class ExtractingDocumentLoader extends 
ContentStreamLoader {
   /** Extract Only supported format. Default */
   public static final String XML_FORMAT = "xml";
 
-  /** XHTML XPath parser. */
-  private static final XPathParser PARSER = new XPathParser("xhtml", 
XHTMLContentHandler.XHTML);
-
   final SolrCore core;
   final SolrParams params;
   final UpdateRequestProcessor processor;
   final boolean ignoreTikaException;
-  protected AutoDetectParser autoDetectParser;
+  final boolean backCompat;
 
   private final AddUpdateCommand templateAdd;
 
-  protected TikaConfig config;
-  protected ParseContextConfig parseContextConfig;
   protected SolrContentHandlerFactory factory;
+  protected ExtractionBackend backend;
 
   public ExtractingDocumentLoader(
       SolrQueryRequest req,
       UpdateRequestProcessor processor,
-      TikaConfig config,
-      ParseContextConfig parseContextConfig,
-      SolrContentHandlerFactory factory) {
+      SolrContentHandlerFactory factory,
+      ExtractionBackend backend) {
     this.params = req.getParams();
     this.core = req.getCore();
-    this.config = config;
-    this.parseContextConfig = parseContextConfig;
     this.processor = processor;
+    this.backCompat = params.getBool(ExtractingParams.BACK_COMPATIBILITY, 
true);

Review Comment:
   For some reason, this does not work as expected. It will pick up the 
back-compat parameter on the update request, but if the same is configured in 
the handler definition in solrconfig.xml, it is not part of `req.getParams()`. 
Also, we do not have access to `initParams` namedlist here, only in the handler.
   
   This means that currently back-compat is on by default (I believe default 
should be off), and can only be modified per request.
   
   I tried to instead read this parameter in `load()`, but even tehre, the 
SolrQueryRequest object does not contain config from initParams in solrconfig. 
I could do a hack and pass initParams in the ExtractingDocumentLoader 
constructor and write code to fall back to reading that, but I was sure that 
Solr would handle merging params from init and request automatically. Any 
insight?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to