Thanks for your replies. Yes, adding more physical memory will help but in the current situation even the GC settings which we have used may not be the optimal one. Can you please provide some suggestions on GC settings? We are also planning to add more shards and create smaller indexes per shard on different smaller EC2s. We will evaluate the requirement of memory and the search performance.
Just to add one point, even the queries without the wildcards e.g. a boolean query or a query with 10000 ids ORed has also become slow and it is also taking more CPU and finally ending up taking more time. I understand this is due to many GC pauses so if we fine tune the GC settings the CPU utilisation should go down. I will try to set pf= empty string and check the performance and also look at "enbleGraphQueries" settings. Will a custom written word delimiter filter which creates fewer tokens as per our need will help in search performance? Thanks, Modassar On Mon, Mar 28, 2022 at 12:10 AM Michael Gibney <mich...@michaelgibney.net> wrote: > I agree with Shawn about ideally wanting more memory for the OS. > > That said, the WordDelimiterFilter config you sent aligns with my suspicion > that "graph phrase" issues are likely to explain the difference between 6.5 > and 8.11. At query-time, WordDelimiterFilter (and also equally > WordDelimiterGraphFilter) both trigger "graph phrase" behavior on `pf` > (phrase fields), and in 6.5 these would I'm fairly certain have been > completely ignored. > > So 6.5 as a point of comparison is unlikely to be helpful going forward, > since the "better performance" of 6.5 was a consequence of a bug that > caused `pf` "graph phrase" queries not being executed at all. > > This mailing list exchange from June 2021 [1] should be helpful/relevant. > (Also note that wrt the issue you're encountering, there's no real > difference between WordDelimiterFilter and WordDelimiterGraphFilter). > > [1] https://lists.apache.org/thread/kbjgztckqdody9859knq05swvx5xj20f > > On Sun, Mar 27, 2022 at 11:51 AM Shawn Heisey <apa...@elyograg.org> wrote: > > > On 3/27/2022 5:30 AM, Modassar Ather wrote: > > > The wildcard queries are executed against the text data and yes there > > are a > > > huge number of possible expansions of the wildcard query. > > > All the 12 shards are on a single machine with 521 GB memory and each > > shard > > > is started with SOLR_JAVA_MEM="-Xmx30g". So the 521 GB memory is shared > > by > > > all the 12 shards. > > > > I believe that my initial thought is correct -- you need more memory to > > handle 4TB of index data. I'm talking about more memory available to > > the OS, not Solr. This would have most likely been a problem in 6.x > > too, but I've seen situations where upgrading Solr can mean that > > insufficient memory is even more of a noticeable problem than it was in > > an older version. > > > > Something you could try is increasing the heap size to 31g. I wouldn't > > suggest going any higher unless you see evidence that you actually need > > more .. Java switches to 64-bit pointers at a heap size of 32GB, and you > > probably need to go to something like 48GB before things break even. I > > actually don't expect going to a 31GB heap to make things better ... but > > if it does, then you might also be running into the other main problem > > mentioned on the wiki page -- a heap size that's too small. That makes > > it so Java spends more time collecting garbage than it does running the > > application. > > > > I didn't know about the things Michael mentioned regarding Solr not > > utilizing the full capability of WordDelimiterFilter and > > WordDelimiterGraphFilter in older versions. Those filters tend to > > greatly increase cardinality, and apparently also increase heap memory > > utilization in recent Solr versions. > > > > Thanks, > > Shawn > > > > >