10.02.2014 18:54, Asgaroth wrote: > > > -----Original Message----- > From: Vladislav Bogdanov [mailto:bub...@hoster-ok.com] > Sent: 10 February 2014 13:27 > To: pacemaker@oss.clusterlabs.org > Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced > > > I cannot really recall if it hangs or returns error for that (I moved to > corosync2 long ago). > > Are you running corosync2 on RHEL7 beta? Are we able to run corosync2 on > CentOS 6/RHEL 6?
Nope, it's Centos6. In few words, It is probably safer for you to stay with cman, especially if you need GFS2. gfs_controld is not officially ported to corosync2 and is obsolete in EL7 because communication between gfs2 and dlm is moved to kernelspace there. > > Anyways you probably want to run clvmd with debugging enabled. > iirc you have two choices here, either you'd need to stop running instance > first and then run it in the console with -f -d1, or run clvmd -C -d2 to ask > all running instances to start debug logging to syslog. > I prefer first one, because modern syslogs do rate-limiting. > And, you'd need to run lvm commands with debugging enabled too. > > Thanks for this tip, I have modified clvmd to run in debug mode ("clvmd -T60 > -d 2 -I cman") and I notice that on node2 reboot, I don't see any logs for > clvmd actually attempting to start, so it appears there is something wrong > here with clvmd. However, I did try to manually stop/start clvmd on node2 You need to fix that for sure. > after a reboot and these were the error logs reported: > > Feb 10 12:37:08 test02 kernel: dlm: connecting to 1 sctp association 2 > Feb 10 12:38:00 test02 kernel: dlm: Using SCTP for communications > Feb 10 12:38:00 test02 clvmd[2118]: Unable to create DLM lockspace for CLVM: > Address already in use > Feb 10 12:38:00 test02 kernel: dlm: Can't bind to port 21064 addr number 1 > Feb 10 12:38:00 test02 kernel: dlm: cannot start dlm lowcomms -98 > Feb 10 12:39:37 test02 kernel: dlm: Using SCTP for communications Strange message, looks like something is bound to that port already. You may want to try dlm in tcp mode btw. > Feb 10 12:39:37 test02 clvmd[2137]: Unable to create DLM lockspace for CLVM: > Address already in use > Feb 10 12:39:37 test02 kernel: dlm: Can't bind to port 21064 addr number 1 > Feb 10 12:39:37 test02 kernel: dlm: cannot start dlm lowcomms -98 > Feb 10 12:47:21 test02 clvmd[2159]: Unable to create DLM lockspace for CLVM: > Address already in use > Feb 10 12:47:21 test02 kernel: dlm: Using SCTP for communications > Feb 10 12:47:21 test02 kernel: dlm: Can't bind to port 21064 addr number 1 > Feb 10 12:47:21 test02 kernel: dlm: cannot start dlm lowcomms -98 > Feb 10 12:48:14 test02 kernel: dlm: closing connection to node 2 > Feb 10 12:48:14 test02 kernel: dlm: closing connection to node 1 > > So it appears that the issue is with clvmd attempting to communicated with, > I presume, dlm. I tried to do some searching on this error and it appears > there is a bug report, if I recall correctly, around 2004, which was fixed, > so I cannot see why this error is cropping up. Some other strangeness is, > that if I reboot the node a couple times, it may start up properly on 2nd > node and then things appear to work properly, however, while node 2 is > "down" the clvmd on node1 is still in a "hung" state even though dlm appears > to think everything is good. Have you come across this issue before? > > Thanks for your assistance thus far, I appreciate it. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org