tomglk opened a new pull request #123: URL: https://github.com/apache/solr/pull/123
<!-- _(If you are a project committer then you may remove some/all of the following template.)_ Before creating a pull request, please file an issue in the ASF Jira system for Solr: * https://issues.apache.org/jira/projects/SOLR You will need to create an account in Jira in order to create an issue. The title of the PR should reference the Jira issue number in the form: * SOLR-####: <short description of problem or changes> SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on. Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. --> # Description This PR refers to the issue [SOLR-12697](https://issues.apache.org/jira/browse/SOLR-12697). The problem is, that the current FieldValueFeature only works for stored fields. This is not optimal, because using DocValues is faster for this use case. Also it increases the index size if you have to store fields only to use them for ltr. # Solution **Note:** This PR is based on the work of Stanislav Livotov and Christine Poerschke that can be seen in the jira ticket. It uses the latest patch (17th May 2019) from the jira ticket as base. I combined that with suggestions from Mrs. Poerschke and my own approach to the problem. This PR adds the DocValuesFieldValueFeatureScorer as a new Scorer used by the FieldValueFeatureWeight. The new scorer is used whenever a field has docValues and is not stored. Therefore it does not affect the current functionality but only is applied for fields that could not be used before. The new scorer checks the type of docValues a field has and handles NUMERIC and SORTED types. For NUMERIC fields, it simply uses the value, the SORTED type gets parsed as number or boolean-flag. # Tests New fields that have docValues=true were added to the schema.xml in order to test in TestLTROnSolrCloud that the feature-requests also return values for these fields. The TestLTRReRankingPipeline was changed from a SolrTestCase to a SolrTestCaseJ4 in order to improve readability. I ran all tests in the package `org.apache.solr.ltr`. # Please note I am aware that the structure of the FieldValueFeature is now quite hard to read and the new Scorer is a bit hidden. I decided to add another nested class to the FieldValueFeatureWeight to avoid having to duplicate a lot of code just to change the inner functionality. Unit tests for the handleBytesRef are still missing. I plan to add them, but wanted to create the PR already so that the general approach can be reviewed and discussed. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. (**Issue was already present**) - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [X] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [X] I have run `./gradlew check -x test`. - [X] I have added tests for my changes. - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org