Hi all,

My apologies if this message is a duplicate. I sent it yesterday, but then 
realized I wasn’t fully subscribed to the list so I don’t think it went 
through. I haven’t seen it appear in the archives.

I’m running a cluster consisting of Solr 8.11.1 (4 nodes) and Zookeeper 3.7.0 
(3 nodes). I have a cron job that calls 
$SOLR_BASE_URL/admin/collections?action=BACKUP&name=${collection}&collection=${collection}&maxNumBackupPoints=14&location=/data/backup
 for each collection every day at 6pm. The backup usually works just fine. But 
every 10 days or so, I end up with a busted backup. Each backup directory has 
the right number of zk_backup_* directories and backup_*.properties files, but 
the index directory is empty – the current backup failed and all previous 
backups are wiped out.

I’ve taken to creating a dated tarball of the backup directory every night so 
that I can restore the last known good backup in case of this kind of 
catastrophic backup failure, but backing up my backup really feels like 
something that shouldn’t be necessary. I’ve increased my log preservation time, 
so hopefully I can catch the log output from the next time this failure 
happens. But in the meantime, is there anything that might explain why solr 
would behave like this? Anything I can look for in my configuration, or some 
way to try to reproduce the issue? (All I’ve tried so far is calling backup 
repeatedly hoping that one of them would fail in this way, but so far, no luck.)

Thanks,
Michael Klein

------
Michael B. Klein (he/him)
Software Development Tech Lead
Repository & Digital Curation
Northwestern University Library
michael.kl...@northwestern.edu<mailto:michael.kl...@northwestern.edu>

Reply via email to