gerlowskija commented on code in PR #2499: URL: https://github.com/apache/solr/pull/2499#discussion_r1664171169
########## solr/core/src/java/org/apache/solr/core/SolrConfig.java: ########## @@ -103,6 +103,8 @@ public class SolrConfig implements MapSerializable { private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); public static final String DEFAULT_CONF_FILE = "solrconfig.xml"; + public static final String MIN_PREFIX_LENGTH = "minPrefixLength"; Review Comment: Naming-wise, I was very much following the model of `maxBooleanClauses`. But in hindsight, I think you're right that this is too ambiguous. For now I'll change this to `minPrefixQueryTermLength`, but if there's another option there you like better, lmk. ########## solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java: ########## @@ -134,6 +135,19 @@ protected Query newWildcardQuery(org.apache.lucene.index.Term t) { } } + @Override + protected Query getPrefixQuery(String field, String termStr) throws ParseException { + // TODO check the field type and call QueryUtils.ensureBlah here Review Comment: I noticed this in testing initially and was surprised as well. It looks like ComplexPhraseQParserPlugin assumes "text" and doesn't rely on FieldType (or its subclasses) to do query-construction in a schema-aware manner. I'd be curious to see what other QParsers act similarly, but I'm not familiar enough with our query-parsing code generally to say whether it makes sense or might be problematic. But definitely orthogonal to this PR either way. ########## solr/core/src/java/org/apache/solr/schema/StrField.java: ########## @@ -68,6 +71,20 @@ public List<IndexableField> createFields(SchemaField field, Object value) { return Collections.singletonList(fval); } + @Override + public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) { + final var query = super.getPrefixQuery(parser, sf, termStr); + + // Some internal usage (e.g. faceting) creates PrefixQueries without a surrounding QParser, so + // check for null here before using QParser to access the limit value + if (query instanceof PrefixQuery && parser != null) { + final var minPrefixLength = + parser.getReq().getCore().getSolrConfig().prefixQueryMinPrefixLength; + QueryUtils.ensurePrefixQueryObeysMinimumPrefixLength(query, termStr, minPrefixLength); Review Comment: Done! ########## solr/core/src/test-files/solr/collection1/conf/solrconfig.xml: ########## @@ -89,6 +89,18 @@ --> <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses> + <!-- Minimum acceptable prefix-size for prefix-based queries on string fields. + + Prefix-based queries consume memory in proportion to the number of terms in the index + that start with that prefix. Short prefixes tend to match many many more indexed-terms + and consume more memory as a result, sometimes causing stability issues on the node. + + This setting allows administrators to require that prefixes meet or exceed a specified + minimum length requirement. Prefix queries that don't meet this requirement return an + error to users. + --> + <minPrefixLength>${solr.min.prefixLength:2}</minPrefixLength> Review Comment: To avoid fragmentation, let's continue XML-tag naming discussion in this thread [here](https://github.com/apache/solr/pull/2499/files#r1629916574). For the governing sys prop, I've changed that to 'solr.query.minPrefixLength' as suggested. ########## solr/core/src/java/org/apache/solr/schema/TextField.java: ########## @@ -165,6 +167,20 @@ public Query getFieldTermQuery(QParser parser, SchemaField field, String externa return new TermQuery(new Term(field.getName(), br)); } + @Override + public Query getPrefixQuery(QParser parser, SchemaField sf, String termStr) { + final var query = super.getPrefixQuery(parser, sf, termStr); Review Comment: > Why override these methods in TextField & StrField with duplicative code when you could modify FieldType.getPrefixQuery instead? Moving the logic into `FieldType.getPrefixQuery` would have it apply to all field types, which I'm not sure we want. The limit will probably always be unhelpful for "boolean" and "enum" type fields for instance, since they're generally so low cardinality as to be immune to the "runaway prefix query" problem. For now I've acquiesced and moved the logic to `FieldType`, but curious to hear your thoughts on some of those exceptional cases. ########## solr/core/src/java/org/apache/solr/search/QueryUtils.java: ########## @@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) { } } + public static void ensurePrefixQueryObeysMinimumPrefixLength( + Query query, String prefix, int minPrefixLength) { + // TODO Should we provide a query-param to disable the limit on a request-by-request basis? I + // can imagine scenarios where advanced users may want to enforce the limit on most fields, + // but ignore it for a few fields that they know to be low-cardinality and therefore "less Review Comment: Done! ########## solr/core/src/java/org/apache/solr/search/QueryUtils.java: ########## @@ -83,6 +85,25 @@ public static boolean isConstantScoreQuery(Query q) { } } + public static void ensurePrefixQueryObeysMinimumPrefixLength( + Query query, String prefix, int minPrefixLength) { + // TODO Should we provide a query-param to disable the limit on a request-by-request basis? I Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org