[
https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014824#comment-15014824
]
Gregory Chanan edited comment on SOLR-5750 at 11/20/15 12:12 AM:
-----------------------------------------------------------------
Did a cursory look at this. Some questions/comments:
1) I know we already have "location" via
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups+of+SolrCores
but it just seems needlessly risky / error prone. What if a user purposefully
or accidentally overwrites important data? You are giving anyone making a
snapshot call solr's permissions. Beyond that, making "location" a required
param is not the greatest interface. Most of the time when I'm taking a
snapshot I don't even _care_ where it is, I expect the system to just do
something sensible and let me interact with the API with some id (i.e. name).
HDFS and HBase snapshots work in this way, for example. Why not just have a
backup location specified in solr.xml with some sensible default?
2) On the above point: "I expect the system to just do something sensible and
let me interact with the API with some id (i.e. name)" -- why do I pass in a
location for RESTORE? Can't the system just remember that from the backup call?
3) There's no API for deleting a snapshot?
4) There's no API for listing snapshots? (I don't think this needs to be in an
initial version necessarily)
5) From {quote}So the idea is the location that you give should be a shared
file system so that all the replica backup along with the ZK information stay
in one place. Then during restore the same location can be used.
We can then support storing to other locations such as s3, hdfs etc as separate
Jiras then{quote}
I'm not sure the shard-at-a-time makes sense for shared file systems. For
example, it's much more efficient to take an hdfs snapshot of the entire
collection directory than of each individual shard. I haven't fully thought
through how we support both, e.g. we could do something different based on the
underlying storage of the collection (though that wouldn't let you backup a
local FS collection to a shared FS), or allow a "snapshotType" parameter or
something. I _think_ we can just make whatever you have the default here, so I
don't think we strictly need to do anything in this patch. Just pointing that
out.
was (Author: gchanan):
Did a cursory look at this. Some questions/comments:
1) I know we already have "location" via
https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups+of+SolrCores
but it just seems needlessly risky / error prone. What if a user purposefully
or accidentally overwrites important data? You are giving anyone making a
snapshot call solr's permissions. Beyond that, making "location" a required
param is not the greatest interface. Most of the time when I'm taking a
snapshot I don't even _care_ where it is, I expect the system to just do
something sensible and let me interact with the API with some id (i.e. name).
HDFS and HBase snapshots work in this way, for example. Why not just have a
backup location specified in solr.xml with some sensible default?
2) On the above point: "I expect the system to just do something sensible and
let me interact with the API with some id (i.e. name)" -- why do I pass in a
location for RESTORE? Can't the system just remember that from the backup call?
3) There's no API for deleting a snapshot?
4) There's no API for listing snapshots? (I don't think this needs to be in an
initial version necessarily)
3) From {quote}So the idea is the location that you give should be a shared
file system so that all the replica backup along with the ZK information stay
in one place. Then during restore the same location can be used.
We can then support storing to other locations such as s3, hdfs etc as separate
Jiras then{quote}
I'm not sure the shard-at-a-time makes sense for shared file systems. For
example, it's much more efficient to take an hdfs snapshot of the entire
collection directory than of each individual shard. I haven't fully thought
through how we support both, e.g. we could do something different based on the
underlying storage of the collection (though that wouldn't let you backup a
local FS collection to a shared FS), or allow a "snapshotType" parameter or
something. I _think_ we can just make whatever you have the default here, so I
don't think we strictly need to do anything in this patch. Just pointing that
out.
> Backup/Restore API for SolrCloud
> --------------------------------
>
> Key: SOLR-5750
> URL: https://issues.apache.org/jira/browse/SOLR-5750
> Project: Solr
> Issue Type: Sub-task
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Varun Thacker
> Fix For: 5.2, Trunk
>
> Attachments: SOLR-5750.patch, SOLR-5750.patch, SOLR-5750.patch,
> SOLR-5750.patch
>
>
> We should have an easy way to do backups and restores in SolrCloud. The
> ReplicationHandler supports a backup command which can create snapshots of
> the index but that is too little.
> The command should be able to backup:
> # Snapshots of all indexes or indexes from the leader or the shards
> # Config set
> # Cluster state
> # Cluster properties
> # Aliases
> # Overseer work queue?
> A restore should be able to completely restore the cloud i.e. no manual steps
> required other than bringing nodes back up or setting up a new cloud cluster.
> SOLR-5340 will be a part of this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]