Try this: http://clusterlabs.org/wiki/FAQ#I_Killed_a_Node_but_the_Cluster_Didn.27t_Recover
On Thu, May 28, 2009 at 10:35 PM, Ryan Steele <ry...@aweber.com> wrote: > After following the wiki example for sharing an IP address > (http://clusterlabs.org/wiki/Example_configurations), I'm able to manually > fail over the resource with crm using the following statement (my nodes are > ha1 and ha2): > > > crm resource migrate failover-ip ha2 > > > However, if I halt the box which currently owns the floating IP, or > otherwise abruptly kill networking on it, the failover never automatically > happens. > > I did follow the example explicitly, and the resource was > initially created with: > > > primitive failover-ip ocf:heartbeat:IPaddr params ip=192.168.7.250 op > monitor interval=10 > > > ...so I'm not quite sure what the issue is. The messaging layer seems to > work since crm status shows the node as being down, but the resource > allocation layer seems to be failing, probably somewhere in the CRM...? > > > I have no firewall between these nodes, so I haven't run tcpdump either to > see if the messages are making it, but I can't imagine that that's the issue > here. This is what things look like after the simulated problem: > > > r...@ha1:~# crm status > > > ============ > Last updated: Thu May 28 16:31:20 2009 > Current DC: ha1 (ha1) > Version: 1.0.2-c02b459053bfa44d509a2a0e0247b291d93662b7 > 2 Nodes configured. > 1 Resources configured. > ============ > > Node: ha1 (ha1): online > Node: ha2 (ha2): UNCLEAN (offline) > > r...@ha1:~# ifconfig > eth0 Link encap:Ethernet HWaddr 00:0c:29:cd:78:4e > inet addr:192.168.7.134 Bcast:192.168.7.255 Mask:255.255.255.0 > inet6 addr: fe80::20c:29ff:fecd:784e/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:7212 errors:0 dropped:0 overruns:0 frame:0 > TX packets:12373 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:919781 (898.2 KB) TX bytes:1489819 (1.4 MB) > Base address:0x2000 Memory:d8920000-d8940000 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:624 errors:0 dropped:0 overruns:0 frame:0 > TX packets:624 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:61572 (60.1 KB) TX bytes:61572 (60.1 KB) > > > r...@ha1:~# crm_resource -L > failover-ip (ocf::heartbeat:IPaddr) Started > > > As you can see, nothing has happened. Hopefully someone else can identify > my mistake before I do after having read this. Thanks in advance for any > help. > > > -Ryan > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker