[Pacemaker] OCFS2 problems when connectivity lost

Ivan Savčić | Epix Wed, 21 Dec 2011 02:56:49 -0800

Hello,

We are having a problem with a 3-node cluster based onPacemaker/Corosync with 2 primary DRBD+OCFS2 nodes and a quorum node.

Nodes run on Debian Squeeze, all packages are from the stable branchexcept for Corosync (which is from backports for udpu functionality).Each node has a single network card.

When the network is up, everything works without any problems, gracefulshutdown of resources on any node works as intended and doesn't reflecton the remaining cluster partition.

When the network is down on one OCFS2 node, Pacemaker(no-quorum-policy="stop") tries to shut the resources down on that node,but fails to stop the OCFS2 filesystem resource stating that it is "in use".

*Both* OCFS2 nodes (ie. the one with the network down and the one whichis still up in the partition with quorum) hang with dmesg reporting thatevents, ocfs2rec and ocfs2_wq are "blocked for more than 120 seconds".

When the network is operational, umount by hand works without anyproblems, because for the testing scenario there are no services runningwhich are keeping the mountpoint busy.

Configuration we used is pretty much from "ClusterStack/LucidTesting"document [1], with clone-max="2" added where needed because of theadditional quorum node in comparison to that document.

NB: we have successfully reproduced this problem on three Ubuntu 11.10Server nodes as well.



Any ideas?

PS: I am aware that this is not an OCFS2 mailing list, but I hope thatsomeone here encountered this problem as well and can give us someinsight on it...



Thanks,

Ivan

[1]https://wiki.ubuntu.com/ClusterStack/LucidTesting#Pacemaker.2C_drbd8_and_OCFS2_or_GFS2


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] OCFS2 problems when connectivity lost

Reply via email to