On 8/11/2021 6:04 AM, Satya Nand wrote:
*Filter cache stats:*
https://drive.google.com/file/d/19MHEzi9m3KS4s-M86BKFiwmnGkMh3DGx/view?usp=sharing
This shows the current size as 3912, almost full.
There is an alternate format for filterCache entries, that just lists
the IDs of the matching documents. This only gets used when the
hitcount for the filter is low. I do not know what threshold it uses to
decide that the hitcount is low enough to use the alternate format, and
I do not know where in the code to look for the answer. This is
probably why you can have 3912 entries in the cache without blowing the
heap.
I bet that when the heap gets blown, the filter queries Solr receives
are such that they cannot use the alternate format, and thus require the
full 12.7 million bytes. Get enough of those, and you're going to need
more heap than 30GB. I bet that if you set the heap to 31G, the OOMEs
would occur a little less frequently. Note that if you set the heap to
32G, you actually have less memory available than if you set it to 31G
-- At 32GB, Java must switch from 32 bit pointers to 64 bit pointers.
Solr creates a LOT of objects on the heap, so that difference adds up.
Discussion item for those with an interest in the low-level code: What
kind of performance impact would it cause to use a filter bitmap
compressed with run-length encoding? Would that happen at the Lucene
level rather than the Solr level?
To fully solve this issue, you may need to re-engineer your queries so
that fq values are highly reusable, and non-reusable filters are added
to the main query. Then you would not need a very large cache to obtain
a good hit ratio.
Thanks,
Shawn