Re: [PR] SOLR-17195: Add 'minPrefixLength' soft limit [solr]

via GitHub Wed, 03 Jul 2024 07:07:52 -0700


gerlowskija commented on code in PR #2499:
URL: https://github.com/apache/solr/pull/2499#discussion_r1664171169



##########
solr/core/src/java/org/apache/solr/core/SolrConfig.java:
##########
@@ -103,6 +103,8 @@ public class SolrConfig implements MapSerializable {
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static final String DEFAULT_CONF_FILE = "solrconfig.xml";
+  public static final String MIN_PREFIX_LENGTH = "minPrefixLength";

Review Comment:
   Naming-wise, I was very much following the model of `maxBooleanClauses`.  
But in hindsight, I think you're right that this is too ambiguous.
   
   
   For now I'll change this to `minPrefixQueryTermLength`, but if there's 
another option there you like better, lmk.



##########
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java:
##########
@@ -134,6 +135,19 @@ protected Query 
newWildcardQuery(org.apache.lucene.index.Term t) {
               }
             }
 
+            @Override
+            protected Query getPrefixQuery(String field, String termStr) 
throws ParseException {
+              // TODO check the field type and call QueryUtils.ensureBlah here

Review Comment:
   I noticed this in testing initially and was surprised as well.
   
   
   It looks like ComplexPhraseQParserPlugin assumes "text" and doesn't rely on 
FieldType (or its subclasses) to do query-construction in a schema-aware 
manner.  I'd be curious to see what other QParsers act similarly, but I'm not 
familiar enough with our query-parsing code generally to say whether it makes 
sense or might be problematic.  But definitely orthogonal to this PR either way.



##########
solr/core/src/java/org/apache/solr/schema/StrField.java:
##########
@@ -68,6 +71,20 @@ public List<IndexableField> createFields(SchemaField field, 
Object value) {
     return Collections.singletonList(fval);
   }
 
+  @Override
+  public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) {
+    final var query = super.getPrefixQuery(parser, sf, termStr);
+
+    // Some internal usage (e.g. faceting) creates PrefixQueries without a 
surrounding QParser, so
+    // check for null here before using QParser to access the limit value
+    if (query instanceof PrefixQuery && parser != null) {
+      final var minPrefixLength =
+          parser.getReq().getCore().getSolrConfig().prefixQueryMinPrefixLength;
+      QueryUtils.ensurePrefixQueryObeysMinimumPrefixLength(query, termStr, 
minPrefixLength);

Review Comment:
   Done!



##########
solr/core/src/test-files/solr/collection1/conf/solrconfig.xml:
##########
@@ -89,6 +89,18 @@
     -->
     <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
 
+    <!-- Minimum acceptable prefix-size for prefix-based queries on string 
fields.
+
+         Prefix-based queries consume memory in proportion to the number of 
terms in the index
+         that start with that prefix.  Short prefixes tend to match many many 
more indexed-terms
+         and consume more memory as a result, sometimes causing stability 
issues on the node.
+
+         This setting allows administrators to require that prefixes meet or 
exceed a specified
+         minimum length requirement.  Prefix queries that don't meet this 
requirement return an
+         error to users.
+    -->
+    <minPrefixLength>${solr.min.prefixLength:2}</minPrefixLength>

Review Comment:
   To avoid fragmentation, let's continue XML-tag naming discussion in this 
thread [here](https://github.com/apache/solr/pull/2499/files#r1629916574).
   
   For the governing sys prop, I've changed that to 
'solr.query.minPrefixLength' as suggested.



##########
solr/core/src/java/org/apache/solr/schema/TextField.java:
##########
@@ -165,6 +167,20 @@ public Query getFieldTermQuery(QParser parser, SchemaField 
field, String externa
     return new TermQuery(new Term(field.getName(), br));
   }
 
+  @Override
+  public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) {
+    final var query = super.getPrefixQuery(parser, sf, termStr);

Review Comment:
   > Why override these methods in TextField & StrField with duplicative code 
when you could modify FieldType.getPrefixQuery instead?
   
   Moving the logic into `FieldType.getPrefixQuery` would have it apply to all 
field types, which I'm not sure we want.  The limit will probably always be 
unhelpful for "boolean" and "enum" type fields for instance, since they're 
generally so low cardinality as to be immune to the "runaway prefix query" 
problem.
   
   For now I've acquiesced and moved the logic to `FieldType`, but curious to 
hear your thoughts on some of those exceptional cases.



##########
solr/core/src/java/org/apache/solr/search/QueryUtils.java:
##########
@@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) {
     }
   }
 
+  public static void ensurePrefixQueryObeysMinimumPrefixLength(
+      Query query, String prefix, int minPrefixLength) {
+    // TODO Should we provide a query-param to disable the limit on a 
request-by-request basis?  I
+    // can imagine scenarios where advanced users may want to enforce the 
limit on most fields,
+    // but ignore it for a few fields that they know to be low-cardinality and 
therefore "less

Review Comment:
   Done!



##########
solr/core/src/java/org/apache/solr/search/QueryUtils.java:
##########
@@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) {
     }
   }
 
+  public static void ensurePrefixQueryObeysMinimumPrefixLength(
+      Query query, String prefix, int minPrefixLength) {
+    // TODO Should we provide a query-param to disable the limit on a 
request-by-request basis?  I

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Re: [PR] SOLR-17195: Add 'minPrefixLength' soft limit [solr]

Reply via email to