Hi Team, We have built an Index Queue Mechanism where we store the Ids of the Documents that needs reindexing as some of the data has been changed recently by the user. A Cron job runs in the background which keeps monitoring the queue every 5 seconds and looks out for any new ids that are added to the queue. then it picks up those ids and tries to reindex them to solr. For Reindexing, first it deletes the existing documents from solr, then it fetches the latest details from the database and then indexes it back to solr. For deleting, we use the deleteByQuery method. We could not use deleteById as fetching the Ids of the Docs is hard as they are uniquely generated by Solr itself. We are Committing the changes manually by calling the solrClient.commit(collectionName, true, true). Things were working pretty fine up until a few days ago. recently they have started failing for prod server.
I have been recently Facing an issue with one of my prod instances where I am constantly getting an error like "Task queue processing has stalled for 20121 ms with 0 remaining elements to process". my application is not able to perform any kind of indexing after this and even the search results are inconsistent now. The Same query is returning different results every time we hit it. I am not able to see the same above issue in my other test environment where I have a similar type of setup with the same amount of data. Below are the configuration details of our Solr setup Solr Version : 8.11.2 Solrj Version : 8.11.2 Solr Is Running in Cloud mode with 3 shards and 2 Replica architecture. Some things that we noticed through logs and other forums is: 1. We are using deleteByQuery Method to delete the existing Solr Documents 2. We have not implemented autoCommit, autoSoftCommit, idleTimeout, socketTimeout, stallTimeout configurational settings. 3. we are doing everything using a manual hard commit through the solrj from my Application. This is done so that we can track the progress of how many documents are indexed and how many are remaining. I saw a similar Issue existing before in solr 8.4 versions but that got resolved with solr 8.4. But I can still see the issue happening. I understand that I should not do manual commits but as this was our first release and we are only improving the setup from here, i wanted to know if there is any configuration that can be included to fix this error. Making the code change is not possible right now as giving the code change can take around month to reach the customer as the release but for now is there anything that we can do to fix this issue so that the indexing can start again in solr. Thanks in Advance for the Help. Rishabh Yadav Software Engineer 1 Esko Graphics Pvt Ind Ltd Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment. [0xC3D2] Confidential - Company Proprietary