On 11/29/22 13:58, Matias Laino wrote:
Thank you Shawn, I'm definitely checking out those recommendations, but what I cannot explain is how this worked fine for the last 3 months and then suddenly this issue started happening.
I'd say you got REALLY lucky that there weren't problems sooner. How did the index size compare 3 months ago to today? How much total index data is there on each Solr node? What is the total document count? From the original message, I can conclude it's probably in the ballpark of 60 million, but it would be great to have an on-disk size and document count (max docs, not num docs) for each collection.
On our application, customers expect that when a record is created, that record should be available on search immediately (that's why the auto Soft commit of 1 second), what can you recommend for a situation like this?
I think what I would start with is lowering autoCommit to 15000 and raising autoSoftCommit to 60000. As I said, it is completely unrealistic to expect 1 second latency unless the index is VERY small. With a total document count north of 60 million, I would not call it small, even though there are users with much bigger indexes.
By chance can you gather the GC logs from your install and make them available? That can answer a LOT of questions.
On the wiki article I sent last time is a section about getting a screenshot of a process list. Can you get that and make it available?
Depending on what I learn from that info, I may have more questions. Thanks, Shawn