[GitHub] [solr] cpoerschke commented on a change in pull request #123: [SOLR-12697] Add pure DocValues support to FieldValueFeature

GitBox Sat, 15 May 2021 11:20:24 -0700


cpoerschke commented on a change in pull request #123:
URL: https://github.com/apache/solr/pull/123#discussion_r632990383




##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/FieldValueFeature.java
##########
@@ -106,18 +106,29 @@ public FieldValueFeatureWeight(IndexSearcher searcher,
 
     /**
      * Return a FeatureScorer that uses docValues or storedFields if no 
docValues are present
+     *
      * @param context the segment this FeatureScorer is working with
      * @return FeatureScorer for the current segment and field
      * @throws IOException as defined by abstract class Feature
      */
     @Override
     public FeatureScorer scorer(LeafReaderContext context) throws IOException {
       if (schemaField != null && !schemaField.stored() && 
schemaField.hasDocValues()) {
-        return new DocValuesFieldValueFeatureScorer(this, context,
-            DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS), 
schemaField.getType());
+
+        FieldInfo fieldInfo = 
context.reader().getFieldInfos().fieldInfo(field);
+        DocValuesType docValuesType = fieldInfo != null ? 
fieldInfo.getDocValuesType() : DocValuesType.NONE;
+
+        if (DocValuesType.NUMERIC.equals(docValuesType) || 
DocValuesType.SORTED.equals(docValuesType)) {
+          return new DocValuesFieldValueFeatureScorer(this, context,
+                  DocIdSetIterator.all(DocIdSetIterator.NO_MORE_DOCS), 
schemaField.getType(), docValuesType);
+          // If type is NONE, this segment has no docs with this field. That's 
not a problem, because we won't call score() anyway

Review comment:
       Earlier today I had a thought on this code area here and wanted to note 
it down (for further pondering next week) but had to repair my bicycle's 
puncture first. Anyway ...
   
   The _"If type is NONE, this segment has no docs with this field. That's not 
a problem ..."_ comment here is very helpful. I wonder though if the _"because 
we won't call score() anyway"_ part remains accurate after the refactorings. In 
the branch's initial commit I think it was accurate i.e. via the `if (docValues 
!= null` at the start of the `score()` implementation the inside of the `if` 
block wasn't called. But now there's something more interesting going on here:
   
   * If there's a segment whose documents all don't have the field then instead 
of a `DocValuesFieldValueFeatureScorer` the `FieldValueFeatureScorer` is used 
via the fall-through.
   * Conceptually that jumps out as untidy or unintuitive i.e. if there is no 
field for all the documents in the segment then (say) a 
`NoFieldValueFeatureScorer` is more obvious.
   * In terms of performance use of the `FieldValueFeatureScorer` should be 
more costly i.e. reading of the field would be attempted and nothing would be 
found.
   * In terms of test coverage this would not be caught since the default value 
is returned, albeit inefficiently, by the `FieldValueFeatureScorer` 
implementation.
   * In terms of use cases this code path may or may not be an edge case e.g. 
if a `FieldValueFeature` is for a field that is sparsely populated then a good 
proportion of segments might encounter the code path and reasons for sparseness 
may include (say) historical documents not having the field.
   
   _If_ that reasoning makes sense then a next step here could be to create and 
use a `NoFieldValueFeatureScorer` class. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[GitHub] [solr] cpoerschke commented on a change in pull request #123: [SOLR-12697] Add pure DocValues support to FieldValueFeature

Reply via email to