Any thoughts on this? I am envisioning applications to machine learning systems, where the training dataset might be a small sample of the entire dataset, and the user wants scoring to be done only on samples of the dataset.
On Fri, Jun 7, 2019 at 5:45 PM Atri Sharma <a...@apache.org> wrote: > > Hi All, > > While working on a new Query type, I was inclined to think of a couple > of use cases where the documents being scored need not be all of the > data set, but a sample of them. This can be useful for very large > datasets, where a query is only interested in getting the "feel" of > the data, and other queries where the data is being aggregated over > time, so a wide enough sample of the data is good enough for the user > at the tradeoff of improved performance. Faceting already has sampling > mechanisms, so there are ideas to be borrowed from that part. > > I have some ideas on introducing a new query type and associated > semantics to allow this functionality to be present from ground up. > Specifically, a query type which wraps another query and "feeds" > offsets to the inner query, along with a limit of collection of hits. > I can go in more detail, but wanted to get some thoughts and feedback > before delving deeper. > > Atri -- Regards, Atri Apache Concerted --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org