Any thoughts on this? I am envisioning applications to machine
learning systems, where the training dataset might be a small sample
of the entire dataset, and the user wants scoring to be done only on
samples of the dataset.

On Fri, Jun 7, 2019 at 5:45 PM Atri Sharma <a...@apache.org> wrote:
>
> Hi All,
>
> While working on a new Query type, I was inclined to think of a couple
> of use cases where the documents being scored need not be all of the
> data set, but a sample of them. This can be useful for very large
> datasets, where a query is only interested in getting the "feel" of
> the data, and other queries where the data is being aggregated over
> time, so a wide enough sample of the data is good enough for the user
> at the tradeoff of improved performance. Faceting already has sampling
> mechanisms, so there are ideas to be borrowed from that part.
>
> I have some ideas on introducing a new query type and associated
> semantics to allow this functionality to be present from ground up.
> Specifically, a query type which wraps another query and "feeds"
> offsets to the inner query, along with a limit of collection of hits.
> I can go in more detail, but wanted to get some thoughts and feedback
> before delving deeper.
>
> Atri



-- 
Regards,

Atri
Apache Concerted

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to