In playing around with the backup/restore APIs, I was caught off guard by something that I realize now is documented (but also may explain some weird anecdotal questions/complaints I've heard in the past about restoring from backups leaving collections "unusable" that didn't make sense at the time).


Specifically I'm refering to this note in the ref-guide...

While restoring, if a configset with the same name exists in ZooKeeper then Solr will reuse that, or else it will upload the backed up configset in ZooKeeper and use that.


AFAICT this means that if you run daily backups, then "mess up" your collection by making an incompatible schema change which you only notice after (re-)indexing some documents. The "RESTORE" will revert your "new" index changes back to your "old" index, but still continue to use your "new" schema that is incompatible -- leading to even more fun and confusing errors that make it seems like the restore didn't work.

(This seems like it should be particularly problematic/common for folks who use "schemaless" mode?)


IIUC, the only time the RESTORE api will actually "restore" a configset, is in the event that the collection you are restoring does not currently exist *AND* there is no configset in ZK that has the same name as the configset included in the backup.

So effectively: in order to reliably & completely restore everything from a backup, you have to first delete the collection and the configset and ignore all of the "in place" restore options?


Which leads me to wonder:

1) If the BACKUP api includes configsets by default, why doesn't the RESTORE api at least have an *option* to restore the configset at the same time?

2) Does any one actually use the "in place" RESTORE options? ... if so, how do you ensure that your configset is also "correct" relative to the backup?



-Hoss
http://www.lucidworks.com/

Reply via email to