Here is an example calculation of bytes -> number of entries held from the bitset.
(2864256-12-12)/24 = 119343 long objects = 22913856 entries The above is from a cluster where each query is generating a bitset of size 2864256 bytes - ~2.8 MB on heap. This is for 22 million results in the resultset. There is some algorithmic stuff to say whether this is a spare bitset or a fixed bitset - over a certain size result this is always a fixed bitset [1]. It grows based on number of documents in the resultset for the shard. This is easily viewable with a profiler like async-profiler where bitsets are created for each query. I recently looked at this in https://issues.apache.org/jira/browse/SOLR-16555 where filtercache bitsets were being recreated over and over if there were multiple fq clauses. SOLR-16555 drastically reduced heap usage on the cluster I was working on (you can see some of the metrics on the PR from before/after) If you have a shard with 200M documents - I think that bitset could be ~20MB per bitset per query. [1] https://github.com/apache/solr/blame/main/solr/core/src/java/org/apache/solr/search/DocSetUtil.java#L46 PS - for G1 GC almost all of these big bitsets are humongous allocations (due to G1 region size) which idk is a problem or not. Its something I'd like to look at further, but haven't had time to benchmark or look at other approaches. Kevin Risden On Wed, May 3, 2023 at 1:14 PM Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi Markus, > > thanks for your explanation. > What if I submit a query q=*:*&rows=0 and there are 200M of documents in > the solr core? Will I allocate an array of ScoreDoc objects so big? > > > > On Wed, May 3, 2023 at 5:32 PM Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello Vincenzo, > > > > Yes. Last time i checked, an array of ScoreDoc objects is created for > each > > query with the size of the numFound for the local core/replica. This > should > > clearly visible in VisualVM. This happens in SolrIndexSearcher. > > > > Regards, > > Markus > > > > Op wo 3 mei 2023 om 17:20 schreef Vincenzo D'Amore <v.dam...@gmail.com>: > > > > > Hi all, > > > > > > Just asking if there could be some correlation from the amount of > memory > > > allocated by a Solr query and the number of *hits* selected in solr > logs. > > > I haven't found anything in the Solr documentation. > > > > > > Do you know if there is some advice for the hits value? > > > > > > Thanks, > > > Vincenzo > > > > > > -- > > > Vincenzo D'Amore > > > > > > > > -- > Vincenzo D'Amore >