Hi all,

I have 5 node cluster. I've setup the cluster to stop the resources on each 
subcluster then machine does not have quorum. However I have 2 of my 5 nodes in 
geographically separated site. I need to find a way to distinguish between 
network failures between both sites and site failure on disaster. Currently if 
the network fails the subcluster in the fisrt site will continue to serve 
because it has quorum (3 nodes). The other site subcluster (2 node) will not 
start any resources - no quorum 2/5.

In case of disaster in the first site the second site machines will not take 
over as well (no quorum). Therefore I've setup a Quorum server located in a 
different site. In case of disaster in the first site the quorum server will 
grant quorum to the second site subcluster because it can connect to the left 2 
nodes.

Nice isn't it?

I configured the quorum server as described in 
http://www.linux-ha.org/QuorumServerGuide. The certificates are valid and nodes 
connect to the quorum server.
The problem is that having configured the nodes to use the quorum server my 
resources stopped everywhere. None of the nodes can run resources anymore. I 
see those in the logs:

crmd: [9691]: info: crmd_ccm_msg_callback: Quorum lost after event=INVALID 
(id=4)
crmd: [9691]: ERROR: do_ccm_update_cache: Plurality w/o Quorum (5/5 nodes)
crmd: [9691]: info: ccm_event_detail: INVALID: trans=4, nodes=5, new=1, lost=0 
n_idx=0, new_idx=5, old_idx=10
...
WARN: cluster_status: We do not have quorum - fencing and resource management 
disabled


I can only guess that the quorum server does not return any quorum 
notifications or they are invalid? What can be the problem in my case?

Any help is highly appreciated!

Regards,
Atanas 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to