Hi all, My apologies if this message is a duplicate. I sent it yesterday, but then realized I wasn’t fully subscribed to the list so I don’t think it went through. I haven’t seen it appear in the archives.
I’m running a cluster consisting of Solr 8.11.1 (4 nodes) and Zookeeper 3.7.0 (3 nodes). I have a cron job that calls $SOLR_BASE_URL/admin/collections?action=BACKUP&name=${collection}&collection=${collection}&maxNumBackupPoints=14&location=/data/backup for each collection every day at 6pm. The backup usually works just fine. But every 10 days or so, I end up with a busted backup. Each backup directory has the right number of zk_backup_* directories and backup_*.properties files, but the index directory is empty – the current backup failed and all previous backups are wiped out. I’ve taken to creating a dated tarball of the backup directory every night so that I can restore the last known good backup in case of this kind of catastrophic backup failure, but backing up my backup really feels like something that shouldn’t be necessary. I’ve increased my log preservation time, so hopefully I can catch the log output from the next time this failure happens. But in the meantime, is there anything that might explain why solr would behave like this? Anything I can look for in my configuration, or some way to try to reproduce the issue? (All I’ve tried so far is calling backup repeatedly hoping that one of them would fail in this way, but so far, no luck.) Thanks, Michael Klein ------ Michael B. Klein (he/him) Software Development Tech Lead Repository & Digital Curation Northwestern University Library michael.kl...@northwestern.edu<mailto:michael.kl...@northwestern.edu>