cpoerschke commented on a change in pull request #123: URL: https://github.com/apache/solr/pull/123#discussion_r632759790
########## File path: solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java ########## @@ -232,58 +249,113 @@ private void createCollection(String name, String config, int numShards, int num solrCluster.waitForActiveCollection(name, numShards, numShards * numReplicas); } - void indexDocument(String collection, String id, String title, String description, int popularity) - throws Exception{ + throws Exception { SolrInputDocument doc = new SolrInputDocument(); doc.setField("id", id); doc.setField("title", title); doc.setField("description", description); doc.setField("popularity", popularity); + if (popularity != 1) { + // check that empty values will be read as default + doc.setField("dvIntField", popularity); + doc.setField("dvLongField", popularity); + doc.setField("dvFloatField", ((float) popularity) / 10); + doc.setField("dvDoubleField", ((double) popularity) / 10); + doc.setField("dvStrNumField", popularity); + doc.setField("dvStrBoolField", popularity % 2 == 0 ? "T" : "F"); + } solrCluster.getSolrClient().add(collection, doc); } - private void indexDocuments(final String collection) - throws Exception { + private void indexDocuments(final String collection) throws Exception { final int collectionSize = 8; - for (int docId = 1; docId <= collectionSize; docId++) { + // put documents in random order to check that advanceExact is working correctly + List<Integer> docIds = IntStream.rangeClosed(1, collectionSize).boxed().collect(toList()); Review comment: Nice, I haven't across this `IntStream.rangeClosed` thing before! ########## File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/FieldValueFeature.java ########## @@ -146,5 +171,100 @@ public float getMaxScore(int upTo) throws IOException { return Float.POSITIVE_INFINITY; } } + + /** + * A FeatureScorer that reads the docValues for a field + */ + public class DocValuesFieldValueFeatureScorer extends FeatureWeight.FeatureScorer { + final LeafReaderContext context; + final DocIdSetIterator docValues; Review comment: > ... `advanceExact()` generally seems to be preferred over `advance()`, can you tell me why? Good question, I don't know! Intuitively something that returns `boolean` rather than `int` when the `int` is not needed seems neater but really we are after performance and not necessarily neatness. `NumericDocValues.longValue()` mentioned `advanceExact()` so that could be a clue but not an explanation. Slightly random code browsing to (say) https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java finds some delegation to https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/codecs/lucene80/IndexedDISI.java and its advanceNext method and mentions of `IndexedDISI` always make me a bit _dizzy_ since I don't yet know much about it. The last commit on https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.8.2/lucene/core/src/java/org/apache/lucene/index/NumericDocValues.java reads "LUCENE-7462: Give doc values APIs an advanceExact method." and that commit and https://issues.apache.org/jira/browse/LUCENE-7462 should have more info then. ########## File path: solr/contrib/ltr/src/test-files/solr/collection1/conf/schema.xml ########## @@ -18,13 +18,22 @@ <schema name="example" version="1.5"> <fields> - <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> + <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/> + <field name="finalScore" type="string" indexed="true" stored="true" multiValued="false"/> + <field name="finalScoreFloat" type="float" indexed="true" stored="true" multiValued="false"/> + <field name="field" type="text_general" indexed="true" stored="false" multiValued="false"/> <field name="title" type="text_general" indexed="true" stored="true"/> <field name="description" type="text_general" indexed="true" stored="true"/> <field name="keywords" type="text_general" indexed="true" stored="true" multiValued="true"/> <field name="popularity" type="int" indexed="true" stored="true" /> Review comment: observation: via the `docValues="${solr.tests.numeric.dv}"` in the `<fieldType name="int" ...` and `popularity` being used in the `TestFieldValueFeature` we have some existing test coverage for doc values. ########## File path: solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTROnSolrCloud.java ########## @@ -232,58 +249,113 @@ private void createCollection(String name, String config, int numShards, int num solrCluster.waitForActiveCollection(name, numShards, numShards * numReplicas); } - void indexDocument(String collection, String id, String title, String description, int popularity) - throws Exception{ + throws Exception { SolrInputDocument doc = new SolrInputDocument(); doc.setField("id", id); doc.setField("title", title); doc.setField("description", description); doc.setField("popularity", popularity); + if (popularity != 1) { + // check that empty values will be read as default + doc.setField("dvIntField", popularity); + doc.setField("dvLongField", popularity); + doc.setField("dvFloatField", ((float) popularity) / 10); + doc.setField("dvDoubleField", ((double) popularity) / 10); + doc.setField("dvStrNumField", popularity); + doc.setField("dvStrBoolField", popularity % 2 == 0 ? "T" : "F"); + } Review comment: I looked at `TestLTROnSolrCloud` changes today, this approach here is very elegant, combined with the `resultN_features` and `schema.xml` change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org