-----Original Message----- From: Vladislav Bogdanov [mailto:bub...@hoster-ok.com] Sent: 10 February 2014 13:27 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced
I cannot really recall if it hangs or returns error for that (I moved to corosync2 long ago). Are you running corosync2 on RHEL7 beta? Are we able to run corosync2 on CentOS 6/RHEL 6? Anyways you probably want to run clvmd with debugging enabled. iirc you have two choices here, either you'd need to stop running instance first and then run it in the console with -f -d1, or run clvmd -C -d2 to ask all running instances to start debug logging to syslog. I prefer first one, because modern syslogs do rate-limiting. And, you'd need to run lvm commands with debugging enabled too. Thanks for this tip, I have modified clvmd to run in debug mode ("clvmd -T60 -d 2 -I cman") and I notice that on node2 reboot, I don't see any logs for clvmd actually attempting to start, so it appears there is something wrong here with clvmd. However, I did try to manually stop/start clvmd on node2 after a reboot and these were the error logs reported: Feb 10 12:37:08 test02 kernel: dlm: connecting to 1 sctp association 2 Feb 10 12:38:00 test02 kernel: dlm: Using SCTP for communications Feb 10 12:38:00 test02 clvmd[2118]: Unable to create DLM lockspace for CLVM: Address already in use Feb 10 12:38:00 test02 kernel: dlm: Can't bind to port 21064 addr number 1 Feb 10 12:38:00 test02 kernel: dlm: cannot start dlm lowcomms -98 Feb 10 12:39:37 test02 kernel: dlm: Using SCTP for communications Feb 10 12:39:37 test02 clvmd[2137]: Unable to create DLM lockspace for CLVM: Address already in use Feb 10 12:39:37 test02 kernel: dlm: Can't bind to port 21064 addr number 1 Feb 10 12:39:37 test02 kernel: dlm: cannot start dlm lowcomms -98 Feb 10 12:47:21 test02 clvmd[2159]: Unable to create DLM lockspace for CLVM: Address already in use Feb 10 12:47:21 test02 kernel: dlm: Using SCTP for communications Feb 10 12:47:21 test02 kernel: dlm: Can't bind to port 21064 addr number 1 Feb 10 12:47:21 test02 kernel: dlm: cannot start dlm lowcomms -98 Feb 10 12:48:14 test02 kernel: dlm: closing connection to node 2 Feb 10 12:48:14 test02 kernel: dlm: closing connection to node 1 So it appears that the issue is with clvmd attempting to communicated with, I presume, dlm. I tried to do some searching on this error and it appears there is a bug report, if I recall correctly, around 2004, which was fixed, so I cannot see why this error is cropping up. Some other strangeness is, that if I reboot the node a couple times, it may start up properly on 2nd node and then things appear to work properly, however, while node 2 is "down" the clvmd on node1 is still in a "hung" state even though dlm appears to think everything is good. Have you come across this issue before? Thanks for your assistance thus far, I appreciate it. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org