It would also depend on the query. For example collapse keeps a Map of groups heads gathered during the query. A large result set and a high cardinality group field would result in more memory usage.
Joel Bernstein http://joelsolr.blogspot.com/ On Wed, May 3, 2023 at 3:11 PM Kevin Risden <kris...@apache.org> wrote: > Here is an example calculation of bytes -> number of entries held from the > bitset. > > (2864256-12-12)/24 = 119343 long objects = 22913856 entries > > The above is from a cluster where each query is generating a bitset of size > 2864256 bytes - ~2.8 MB on heap. This is for 22 million results in the > resultset. There is some algorithmic stuff to say whether this is a spare > bitset or a fixed bitset - over a certain size result this is always a > fixed bitset [1]. It grows based on number of documents in the resultset > for the shard. > > This is easily viewable with a profiler like async-profiler where bitsets > are created for each query. I recently looked at this in > https://issues.apache.org/jira/browse/SOLR-16555 where filtercache bitsets > were being recreated over and over if there were multiple fq clauses. > SOLR-16555 drastically reduced heap usage on the cluster I was working on > (you can see some of the metrics on the PR from before/after) > > If you have a shard with 200M documents - I think that bitset could be > ~20MB per bitset per query. > > [1] > > https://github.com/apache/solr/blame/main/solr/core/src/java/org/apache/solr/search/DocSetUtil.java#L46 > > PS - for G1 GC almost all of these big bitsets are humongous allocations > (due to G1 region size) which idk is a problem or not. Its something I'd > like to look at further, but haven't had time to benchmark or look at other > approaches. > > Kevin Risden > > > On Wed, May 3, 2023 at 1:14 PM Vincenzo D'Amore <v.dam...@gmail.com> > wrote: > > > Hi Markus, > > > > thanks for your explanation. > > What if I submit a query q=*:*&rows=0 and there are 200M of documents in > > the solr core? Will I allocate an array of ScoreDoc objects so big? > > > > > > > > On Wed, May 3, 2023 at 5:32 PM Markus Jelsma <markus.jel...@openindex.io > > > > wrote: > > > > > Hello Vincenzo, > > > > > > Yes. Last time i checked, an array of ScoreDoc objects is created for > > each > > > query with the size of the numFound for the local core/replica. This > > should > > > clearly visible in VisualVM. This happens in SolrIndexSearcher. > > > > > > Regards, > > > Markus > > > > > > Op wo 3 mei 2023 om 17:20 schreef Vincenzo D'Amore <v.dam...@gmail.com > >: > > > > > > > Hi all, > > > > > > > > Just asking if there could be some correlation from the amount of > > memory > > > > allocated by a Solr query and the number of *hits* selected in solr > > logs. > > > > I haven't found anything in the Solr documentation. > > > > > > > > Do you know if there is some advice for the hits value? > > > > > > > > Thanks, > > > > Vincenzo > > > > > > > > -- > > > > Vincenzo D'Amore > > > > > > > > > > > > > -- > > Vincenzo D'Amore > > >