On 8/12/2012 4:44 AM, Mauro wrote: > On 11 August 2012 19:23, Stan Hoeppner <s...@hardwarefreak.com> wrote: >> On 8/11/2012 8:59 AM, Mauro wrote: >>> Hello, I'm experiencing continuous reboots of my two nodes in a >>> heartbeat+pacemaker cluster. >>> Reboots are random, one day they happen one other day not, sometime >>> for 7 days they don't happen, sometimes they happen at night. >>> They happen at random days and random time. >>> Nodes are connected to a Cisco 3570 switch and a SAN storage system. >>> Perhaps there is a misconfiguration in the interfaces? >>> Here is my interfaces file: >> .... >> >> >>> Do you think there are some errors? >> >> To determine that you need to look at your logs files, not your config >> files. If the nodes are rebooting due to fencing it will be logged >> somewhere, as should the underlying network errors that cause the fence >> to close. > > Yes, I look at my logs but the only thing I see is that node 1 fence > node 2 or node 2 fence node 1 because one node doesn't see other node, > but I don't understard what is the problem, if it is a problem of my > NIC or other.
Is there more than one set of these in any dmes files on either host: Jul 26 00:38:26 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Down Jul 26 00:38:28 [host] kernel: e100 0000:00:0d.0: eth0: NIC Link is Up 100 Mbps Full Duplex If so it may indicate a flaky NIC or switch port, possibly a bad patch cable. Is there a switch between the hosts or a cross over cable? But, look at the time interval between the down/up states. If it's always less than the cluster action threshold then this shouldn't be an issue. If it's greater than the threshold it is likely the cause of the software fence activating. There are other possible causes. This is simply the first that comes to mind. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5027f87d.2080...@hardwarefreak.com