I forgot to add that I also changed the deployment strategy to "Recreate" 
rather than the default strategy "RollingUpdate", so the kubernetes cluster 
shuts down old pods before creating the new ones, to avoid the lock file 
problem. It might not suit you though if you need very high availability.

-ufuk yilmaz
________________________________
From: uyil...@vivaldi.net.INVALID <uyil...@vivaldi.net.INVALID>
Sent: Thursday, January 18, 2024 7:08 PM
To: users@solr.apache.org <users@solr.apache.org>
Subject: Re: SOLR data on ECS problem with write lock files

Hello Darren,

I had a very similar problem when running Solr on EKS kubernetes cluster. The 
solution I found was to add a pre_stop shutdown hook to the kubernetes 
deployment, which runs the command "/opt/solr/bin/solr stop -k solrrocks -p 
8983" to gracefully stop Solr before the pod is killed. I also added a 180 
second grace period via "termination_grace_period_seconds". The downside is it 
takes at least 3 minutes to shut down the pod now.

That way lock file gets cleared before Solr is restarted. I don't know if the 
same approach can be used in ECS though.

-ufuk yilmaz
________________________________
From: Darren Kukulka <darren.kuku...@oneserve.co.uk.INVALID>
Sent: Thursday, January 18, 2024 6:59 PM
To: users@solr.apache.org <users@solr.apache.org>
Subject: SOLR data on ECS problem with write lock files

Hi Everybody!

Has anybody had issues with write.lock files on AWS ECS SOLR instances
where data is stored on EFS?  i.e. if the SOLR ECS task restarts SOLR
thinks another process is using the write.lock file to make updates.

But the truth is that the stopped ECS task has not been terminated before
the new one starts up automatically

Our SOLR ECS tasks use individual solr-data locations on EFS, so they are
not sharing data, which makes this problem even more frustrating!

I have looked so far at solrconfig.xml changes like the locktype,
unlockOnStartup and writeLockTimeout directives, but I'm not sure any of
these would help with our scenario.

Unfortunately, we are stuck on SOLR 4.10.2, because going to 5 and above
means we would have to make code changes to our product that uses SOLR as
the dataimport handlers change in version 5.

I'm also wondering if this could be an ECS issue, rather than SOLR itself.
Perhaps the version of Docker engine we use to build the SOLR images run in
ECS (20.10.13) does not play well in ECS land when a task restarts? I have
not found yet if it is possible to modify the ECS lifecycle behaviour for a
specific ECS cluster

Any suggestions or pointers would be greatly appreciated!

Cheers,
Daz

Reply via email to