janhoy opened a new issue, #4: URL: https://github.com/apache/solr-orbit-workloads/issues/4
Port the OSB `pmc` workload. CC-BY-2.0 dataset of scientific articles — adds proper full-text relevance benchmarking (BM25, phrase search, high-cardinality text fields) which is the biggest gap in the current workload set. ## Tasks - Convert OSB workload using `solr-orbit convert-workload` - Adapt schema to Solr field types (text_general, string, date) - Define operations: default search, phrase search, term faceting - Add 1k sample corpus for test-mode - Check whether any operations belong in `common_operations/` rather than this workload - Document dataset licence in workload README **Depends on:** apache/solr-orbit-workloads#3 (ASF dataset hosting must be resolved before corpus files can be finalised) ## References - OSB workload: https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/pmc - Creating workloads: https://github.com/apache/solr-orbit/blob/main/CREATE_WORKLOAD_GUIDE.md -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
