I forgot to add that I also changed the deployment strategy to "Recreate" rather than the default strategy "RollingUpdate", so the kubernetes cluster shuts down old pods before creating the new ones, to avoid the lock file problem. It might not suit you though if you need very high availability.
-ufuk yilmaz ________________________________ From: uyil...@vivaldi.net.INVALID <uyil...@vivaldi.net.INVALID> Sent: Thursday, January 18, 2024 7:08 PM To: users@solr.apache.org <users@solr.apache.org> Subject: Re: SOLR data on ECS problem with write lock files Hello Darren, I had a very similar problem when running Solr on EKS kubernetes cluster. The solution I found was to add a pre_stop shutdown hook to the kubernetes deployment, which runs the command "/opt/solr/bin/solr stop -k solrrocks -p 8983" to gracefully stop Solr before the pod is killed. I also added a 180 second grace period via "termination_grace_period_seconds". The downside is it takes at least 3 minutes to shut down the pod now. That way lock file gets cleared before Solr is restarted. I don't know if the same approach can be used in ECS though. -ufuk yilmaz ________________________________ From: Darren Kukulka <darren.kuku...@oneserve.co.uk.INVALID> Sent: Thursday, January 18, 2024 6:59 PM To: users@solr.apache.org <users@solr.apache.org> Subject: SOLR data on ECS problem with write lock files Hi Everybody! Has anybody had issues with write.lock files on AWS ECS SOLR instances where data is stored on EFS? i.e. if the SOLR ECS task restarts SOLR thinks another process is using the write.lock file to make updates. But the truth is that the stopped ECS task has not been terminated before the new one starts up automatically Our SOLR ECS tasks use individual solr-data locations on EFS, so they are not sharing data, which makes this problem even more frustrating! I have looked so far at solrconfig.xml changes like the locktype, unlockOnStartup and writeLockTimeout directives, but I'm not sure any of these would help with our scenario. Unfortunately, we are stuck on SOLR 4.10.2, because going to 5 and above means we would have to make code changes to our product that uses SOLR as the dataimport handlers change in version 5. I'm also wondering if this could be an ECS issue, rather than SOLR itself. Perhaps the version of Docker engine we use to build the SOLR images run in ECS (20.10.13) does not play well in ECS land when a task restarts? I have not found yet if it is possible to modify the ECS lifecycle behaviour for a specific ECS cluster Any suggestions or pointers would be greatly appreciated! Cheers, Daz