Hi,
I need some help to fix a broken cluster. I think we broke the cluster,
but I want to know your opinion and if you see a possibility to recover it.
Let me explain what happend.
We have a cluster (Version 0.94.9) in two datacenters (A and B). In each
12 nodes á 60 ODSs. In A we have 3 monitor nodes and in B 2. The
crushrule and replication factor forces two replicas in each datacenter.
We write objects via librados in the cluster. The objects are immutable,
so they are either present or absent.
In this cluster we tested what happens if datacenter A will fail and we
need to bring up the cluster in B by creating a monitor quorum in B. We
did this by cut off the network connection betwenn the two datacenters.
The OSDs from DC B went down like expected. Now we removed the mon Nodes
from the monmap in B (by extracting it offline and edit it). Our clients
wrote now data in both independent clusterparts before we stopped the
mons in A. (YES I know. This is a really bad thing).
Now we try to join the two sides again. But so far without success.
Only the OSDs in B are running. The OSDs in A started but the OSDs stay
down. In the mon log we see a lot of „...(leader).pg v3513957 ignoring
stats from non-active osd“ alerts.
We see, that the current osdmap epoch in the running cluster is „28873“.
In the OSDs in A the epoch is „29003“. We assume that this is the reason
why the OSDs won't to jump in.
BTW: This is only a testcluster, so no important data are harmed.
Regards
Manuel
--
Manuel Lausch
Systemadministrator
Cloud Services
1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135
Karlsruhe | Germany
Phone: +49 721 91374-1847
E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de
Amtsgericht Montabaur, HRB 5452
Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen
Member of United Internet
Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen
enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail
irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten
Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt,
diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise
auch immer zu verwenden.
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient of this e-mail, you are hereby notified that saving,
distribution or use of the content of this e-mail in any way is prohibited. If
you have received this e-mail in error, please notify the sender and delete the
e-mail.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com