Joel Bernstein created SOLR-12178:
-------------------------------------
Summary: Improve efficiency of random sampling
Key: SOLR-12178
URL: https://issues.apache.org/jira/browse/SOLR-12178
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein
Currently the *random* Streaming Expression performs a distributed random
sampling using *CloudSolrClient*. This means that a random sample of *N* docs
from each shard is read into memory on the aggregator node and then a page of
*N* docs is created from the samples from from each shard. Reading all the
samples from the shards into memory in the aggregator node means the memory
consumption for random sampling grows as a function of: N*numshards. This
clearly limits both N and numshards.
This ticket will change the random sampling approach to an approach similar to
the one used in CloudSolrStream where a stream is generated from the shards
without reading all the documents into memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]