On 3/27/2022 5:30 AM, Modassar Ather wrote:
The wildcard queries are executed against the text data and yes there are a huge number of possible expansions of the wildcard query. All the 12 shards are on a single machine with 521 GB memory and each shard is started with SOLR_JAVA_MEM="-Xmx30g". So the 521 GB memory is shared by all the 12 shards.
I believe that my initial thought is correct -- you need more memory to handle 4TB of index data. I'm talking about more memory available to the OS, not Solr. This would have most likely been a problem in 6.x too, but I've seen situations where upgrading Solr can mean that insufficient memory is even more of a noticeable problem than it was in an older version.
Something you could try is increasing the heap size to 31g. I wouldn't suggest going any higher unless you see evidence that you actually need more .. Java switches to 64-bit pointers at a heap size of 32GB, and you probably need to go to something like 48GB before things break even. I actually don't expect going to a 31GB heap to make things better ... but if it does, then you might also be running into the other main problem mentioned on the wiki page -- a heap size that's too small. That makes it so Java spends more time collecting garbage than it does running the application.
I didn't know about the things Michael mentioned regarding Solr not utilizing the full capability of WordDelimiterFilter and WordDelimiterGraphFilter in older versions. Those filters tend to greatly increase cardinality, and apparently also increase heap memory utilization in recent Solr versions.
Thanks, Shawn