[jira] [Commented] (SOLR-15371) Backups randomly fail sometimes

Roy Perkins (Jira) Mon, 14 Jun 2021 12:26:13 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363167#comment-17363167
 ]


Roy Perkins commented on SOLR-15371:
------------------------------------

Checked back through out ELK logs for Solr and found a similar error for when 
the backup failed:
{code:java}
2021-06-14 05:04:38.950 ERROR 
(OverseerThreadFactory-55-thread-5-processing-n:solrmulti03.DOM.DOMAIN.com:8983_solr)
 [c:search   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard 
solrmulti01.DOM.DOMAIN.com:8983_solr: 
{responseHeader={status=0,QTime=0},STATUS=failed,Response=Failed to backup 
core=search_shard1_replica_n25 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: 
file:///mnt/solr_backups/search/search-06-14-2021. Note that Backup/Restore of 
a SolrCloud collection requires a shared file system mounted at the same path 
on all nodes!}
{code}
Again, this is an NFS mount that is mounted in the same location on all the 
servers.  Restarting solr on the node hosting the leader for the shard that is 
complained about will fix the issue and allow it to run again.  But if at some 
point, that node becomes the leader for that shard again, it fails the next 
backup cycle.

> Backups randomly fail sometimes
> -------------------------------
>
>                 Key: SOLR-15371
>                 URL: https://issues.apache.org/jira/browse/SOLR-15371
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 8.5.2, 8.8.2
>            Reporter: Roy Perkins
>            Priority: Major
>
> Hi, we have an issue where sometimes one shard fails to backup due to what 
> might be a race condition in creating the folder/starting the backup.  When 
> this happens, we have to restart the first server in a shard to get the 
> backup to succeed again.  The cluster backs up to a shared NFS mount.  4/5 
> times the backup goes fine without issues (there is even another collection 
> that the backup will run for later in the morning that will succeed fine even 
> though it's all the same servers)  Below is the error I get.
> {code:java}
> "Response":"Failed to backup core=slprod_shard4_replica_n6 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///mnt/solr_backups/slprod/slprod-04-25-2021. Note that 
> Backup/Restore of a SolrCloud collection requires a shared file system 
> mounted at the same path on all nodes!"},
> {code}
> And below is the line I use to backup with (obviously with bash variables set 
> earlier in the script)
> {code:java}
> curl -s 
> "http://localhost:8983/solr/admin/collections?action=BACKUP&name=${COLLECTION}-${DATE}&collection=${COLLECTION}&location=${BACKUP_PATH}&async=1000";
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-15371) Backups randomly fail sometimes

Reply via email to