In playing around with the backup/restore APIs, I was caught off guard by
something that I realize now is documented (but also may explain some
weird anecdotal questions/complaints I've heard in the past about
restoring from backups leaving collections "unusable" that didn't make
sense at the time).
Specifically I'm refering to this note in the ref-guide...
While restoring, if a configset with the same name exists in ZooKeeper
then Solr will reuse that, or else it will upload the backed up
configset in ZooKeeper and use that.
AFAICT this means that if you run daily backups, then "mess up" your
collection by making an incompatible schema change which you only notice
after (re-)indexing some documents. The "RESTORE" will revert your "new"
index changes back to your "old" index, but still continue to use your
"new" schema that is incompatible -- leading to even more fun and
confusing errors that make it seems like the restore didn't work.
(This seems like it should be particularly problematic/common for folks
who use "schemaless" mode?)
IIUC, the only time the RESTORE api will actually "restore" a configset,
is in the event that the collection you are restoring does not currently
exist *AND* there is no configset in ZK that has the same name as the
configset included in the backup.
So effectively: in order to reliably & completely restore everything
from a backup, you have to first delete the collection and the configset
and ignore all of the "in place" restore options?
Which leads me to wonder:
1) If the BACKUP api includes configsets by default, why doesn't the
RESTORE api at least have an *option* to restore the configset at the same
time?
2) Does any one actually use the "in place" RESTORE options? ... if so,
how do you ensure that your configset is also "correct" relative to the
backup?
-Hoss
http://www.lucidworks.com/