Thanks Wido,
When I describe a ceph cluster is "down", I mean something is wrong with the ceph software, someone mistakenly changed the configuration file, making the conf in many nodes inconsistent, e.g. wrong fs_id, inconsistent OSD / host mapping, etc. I'm not talking about OSD failures, because I know ceph could recover from it, neither data errors, ceph will do data scrubbing. I just mean the failures may caused by wrong manual configurations or bugs in ceph software. I'm not sure if this often happens, but it did happened several times. This is why I use a replication cluster to backup master cluster's data, for production use. Is there a solution or better way to solve this?

==========
 Aegeaner

在 2014-10-08 17:55, Wido den Hollander 写道:
On 10/08/2014 11:00 AM, Aegeaner wrote:
Hi all!

For production use, I want to use two ceph clusters at the same time.
One is the master cluster, and the other is the replication cluster,
which syncs RBD snapshots with master cluster at fixed time (every day,
e.g.), by the way this article describes:
http://ceph.com/dev-notes/incremental-snapshots-with-rbd/ . In case the
master cluster is down, I mean, there is some problem with ceph so that
the whole cluster is down, I can switch from master cluster to slave
cluster.

Ok, but there will be a sync gap betweeh the master and slave cluster
since the RBD replication is not happening real-time, thus you will
loose some data if the master cluster 'burns down'.

Now the question is, if the master cluster is down, and if I have backed
up all the metadata before: the monitor map, the osd map, the pg map,
the crush map. How can I restore the master Ceph cluster from these
cluster maps? Is there a tool or certain way to do it?

So explain 'down'? Due to what?

In theory it is probably possible to bring a cluster back to life if it
has become corrupted, but on a large deployment there will be a lot of
PGmap and OSDmap changes in a very short period in time.

You will *never* get a consistent snapshot of the whole cluster at a
specific point in time.

But the question still stands, explain 'down'. What does it mean in your
case?

You could loose all your monitors at the same time. They can probably be
fixed with a backup of those maps, but I think it comes down to calling
Sage and pulling your credit card.

Thanks!

===============
Aegeaner





_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to