Here is an example calculation of bytes -> number of entries held from the
bitset.

(2864256-12-12)/24 = 119343 long objects = 22913856 entries

The above is from a cluster where each query is generating a bitset of size
2864256 bytes - ~2.8 MB on heap. This is for 22 million results in the
resultset. There is some algorithmic stuff to say whether this is a spare
bitset or a fixed bitset - over a certain size result this is always a
fixed bitset [1]. It grows based on number of documents in the resultset
for the shard.

This is easily viewable with a profiler like async-profiler where bitsets
are created for each query. I recently looked at this in
https://issues.apache.org/jira/browse/SOLR-16555 where filtercache bitsets
were being recreated over and over if there were multiple fq clauses.
SOLR-16555 drastically reduced heap usage on the cluster I was working on
(you can see some of the metrics on the PR from before/after)

If you have a shard with 200M documents - I think that bitset could be
~20MB per bitset per query.

[1]
https://github.com/apache/solr/blame/main/solr/core/src/java/org/apache/solr/search/DocSetUtil.java#L46

PS - for G1 GC almost all of these big bitsets are humongous allocations
(due to G1 region size) which idk is a problem or not. Its something I'd
like to look at further, but haven't had time to benchmark or look at other
approaches.

Kevin Risden


On Wed, May 3, 2023 at 1:14 PM Vincenzo D'Amore <v.dam...@gmail.com> wrote:

> Hi Markus,
>
> thanks for your explanation.
> What if I submit a query q=*:*&rows=0 and there are 200M of documents in
> the solr core? Will I allocate an array of ScoreDoc objects so big?
>
>
>
> On Wed, May 3, 2023 at 5:32 PM Markus Jelsma <markus.jel...@openindex.io>
> wrote:
>
> > Hello Vincenzo,
> >
> > Yes. Last time i checked, an array of ScoreDoc objects is created for
> each
> > query with the size of the numFound for the local core/replica. This
> should
> > clearly visible in VisualVM. This happens in SolrIndexSearcher.
> >
> > Regards,
> > Markus
> >
> > Op wo 3 mei 2023 om 17:20 schreef Vincenzo D'Amore <v.dam...@gmail.com>:
> >
> > > Hi all,
> > >
> > > Just asking if there could be some correlation from the amount of
> memory
> > > allocated by a Solr query and the number of *hits* selected in solr
> logs.
> > > I haven't found anything in the Solr documentation.
> > >
> > > Do you know if there is some advice for the hits value?
> > >
> > > Thanks,
> > > Vincenzo
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
>
>
> --
> Vincenzo D'Amore
>

Reply via email to