[ 
https://issues.apache.org/jira/browse/SOLR-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363163#comment-17363163
 ] 

Roy Perkins edited comment on SOLR-15371 at 6/14/21, 7:20 PM:
--------------------------------------------------------------

Like I said, it can be hard to reproduce.  It seems like it happens when the 
host I run the backup from is the leader for the shard of the collection that 
fails.  Below is some output from my backup script:
{code:java}
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "success":{
    "solrmulti03.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0}},
    "solrmulti08.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0}},
    "solrmulti01.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":4}},
    "solrmulti04.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":14}},
    "solrmulti04.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0},
      "STATUS":"completed",
      "Response":"TaskId: 100034112630053395656 webapp=null path=/admin/cores 
params={core=search_shard2_replica_n4&async=100034112630053395656&qt=/admin/cores&name=shard2&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=14"},
    "solrmulti03.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0},
      "STATUS":"completed",
      "Response":"TaskId: 100034112630053446666 webapp=null path=/admin/cores 
params={core=search_shard3_replica_n29&async=100034112630053446666&qt=/admin/cores&name=shard3&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"},
    "solrmulti08.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0},
      "STATUS":"completed",
      "Response":"TaskId: 100034112630053465731 webapp=null path=/admin/cores 
params={core=search_shard4_replica_n23&async=100034112630053465731&qt=/admin/cores&name=shard4&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"}},
  "100034112630053395656":{
    "responseHeader":{
      "status":0,
      "QTime":0},
    "STATUS":"completed",
    "Response":"TaskId: 100034112630053395656 webapp=null path=/admin/cores 
params={core=search_shard2_replica_n4&async=100034112630053395656&qt=/admin/cores&name=shard2&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=14"},
  "100034112630053446666":{
    "responseHeader":{
      "status":0,
      "QTime":0},
    "STATUS":"completed",
    "Response":"TaskId: 100034112630053446666 webapp=null path=/admin/cores 
params={core=search_shard3_replica_n29&async=100034112630053446666&qt=/admin/cores&name=shard3&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"},
  "100034112630053465731":{
    "responseHeader":{
      "status":0,
      "QTime":0},
    "STATUS":"completed",
    "Response":"TaskId: 100034112630053465731 webapp=null path=/admin/cores 
params={core=search_shard4_replica_n23&async=100034112630053465731&qt=/admin/cores&name=shard4&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"},
  "100034112630053492379":{
    "responseHeader":{
      "status":0,
      "QTime":0},
    "STATUS":"failed",
    "Response":"Failed to backup core=search_shard1_replica_n25 because 
org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
exist: file:///mnt/solr_backups/search/search-06-14-2021. Note that 
Backup/Restore of a SolrCloud collection requires a shared file system mounted 
at the same path on all nodes!"},
  "failure":{
    "solrmulti01.DOM.DOMAIN.com:8983_solr":{
      "responseHeader":{
        "status":0,
        "QTime":0},
      "STATUS":"failed",
      "Response":"Failed to backup core=search_shard1_replica_n25 because 
org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
exist: file:///mnt/solr_backups/search/search-06-14-2021. Note that 
Backup/Restore of a SolrCloud collection requires a shared file system mounted 
at the same path on all nodes!"}},
  "status":{
    "state":"failed",
    "msg":"found [1000] in failed tasks"}}
{code}


was (Author: meltingrobot):
Like I said, it can be hard to reproduce.  It seems like it happens when the 
host I run the backup from is the leader for the shard of the collection that 
fails.  Below is some output from my backup script:
{noformat}
{ "responseHeader":{ "status":0, "QTime":0}, "success":{ 
"solrmulti03.DOM.DOMAIN.com:8983_solr":{ "responseHeader":{ "status":0, 
"QTime":0}}, "solrmulti08.DOM.DOMAIN.com:8983_solr":{ "responseHeader":{ 
"status":0, "QTime":0}}, "solrmulti01.DOM.DOMAIN.com:8983_solr":{ 
"responseHeader":{ "status":0, "QTime":4}}, 
"solrmulti04.DOM.DOMAIN.com:8983_solr":{ "responseHeader":{ "status":0, 
"QTime":14}}, "solrmulti04.DOM.DOMAIN.com:8983_solr":{ "responseHeader":{ 
"status":0, "QTime":0}, "STATUS":"completed", "Response":"TaskId: 
100034112630053395656 webapp=null path=/admin/cores 
params={core=search_shard2_replica_n4&async=100034112630053395656&qt=/admin/cores&name=shard2&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=14"}, "solrmulti03.DOM.DOMAIN.com:8983_solr":{ 
"responseHeader":{ "status":0, "QTime":0}, "STATUS":"completed", 
"Response":"TaskId: 100034112630053446666 webapp=null path=/admin/cores 
params={core=search_shard3_replica_n29&async=100034112630053446666&qt=/admin/cores&name=shard3&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"}, "solrmulti08.DOM.DOMAIN.com:8983_solr":{ 
"responseHeader":{ "status":0, "QTime":0}, "STATUS":"completed", 
"Response":"TaskId: 100034112630053465731 webapp=null path=/admin/cores 
params={core=search_shard4_replica_n23&async=100034112630053465731&qt=/admin/cores&name=shard4&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"}}, "100034112630053395656":{ "responseHeader":{ "status":0, 
"QTime":0}, "STATUS":"completed", "Response":"TaskId: 100034112630053395656 
webapp=null path=/admin/cores 
params={core=search_shard2_replica_n4&async=100034112630053395656&qt=/admin/cores&name=shard2&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=14"}, "100034112630053446666":{ "responseHeader":{ "status":0, 
"QTime":0}, "STATUS":"completed", "Response":"TaskId: 100034112630053446666 
webapp=null path=/admin/cores 
params={core=search_shard3_replica_n29&async=100034112630053446666&qt=/admin/cores&name=shard3&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"}, "100034112630053465731":{ "responseHeader":{ "status":0, 
"QTime":0}, "STATUS":"completed", "Response":"TaskId: 100034112630053465731 
webapp=null path=/admin/cores 
params={core=search_shard4_replica_n23&async=100034112630053465731&qt=/admin/cores&name=shard4&action=BACKUPCORE&location=file:///mnt/solr_backups/search/search-06-14-2021&wt=javabin&version=2}
 status=0 QTime=0"}, "100034112630053492379":{ "responseHeader":{ "status":0, 
"QTime":0}, "STATUS":"failed", "Response":"Failed to backup 
core=search_shard1_replica_n25 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: 
file:///mnt/solr_backups/search/search-06-14-2021. Note that Backup/Restore of 
a SolrCloud collection requires a shared file system mounted at the same path 
on all nodes!"}, "failure":{ "solrmulti01.DOM.DOMAIN.com:8983_solr":{ 
"responseHeader":{ "status":0, "QTime":0}, "STATUS":"failed", 
"Response":"Failed to backup core=search_shard1_replica_n25 because 
org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
exist: file:///mnt/solr_backups/search/search-06-14-2021. Note that 
Backup/Restore of a SolrCloud collection requires a shared file system mounted 
at the same path on all nodes!"}}, "status":{ "state":"failed", "msg":"found 
[1000] in failed tasks"}}
{noformat}

> Backups randomly fail sometimes
> -------------------------------
>
>                 Key: SOLR-15371
>                 URL: https://issues.apache.org/jira/browse/SOLR-15371
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 8.5.2, 8.8.2
>            Reporter: Roy Perkins
>            Priority: Major
>
> Hi, we have an issue where sometimes one shard fails to backup due to what 
> might be a race condition in creating the folder/starting the backup.  When 
> this happens, we have to restart the first server in a shard to get the 
> backup to succeed again.  The cluster backs up to a shared NFS mount.  4/5 
> times the backup goes fine without issues (there is even another collection 
> that the backup will run for later in the morning that will succeed fine even 
> though it's all the same servers)  Below is the error I get.
> {code:java}
> "Response":"Failed to backup core=slprod_shard4_replica_n6 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///mnt/solr_backups/slprod/slprod-04-25-2021. Note that 
> Backup/Restore of a SolrCloud collection requires a shared file system 
> mounted at the same path on all nodes!"},
> {code}
> And below is the line I use to backup with (obviously with bash variables set 
> earlier in the script)
> {code:java}
> curl -s 
> "http://localhost:8983/solr/admin/collections?action=BACKUP&name=${COLLECTION}-${DATE}&collection=${COLLECTION}&location=${BACKUP_PATH}&async=1000";
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to