Thanks for replying Mr. Andrew Beekhof. With reference to Message:1, I'm using Centos 5.3 Linux and the version for heartbeat I'm using is 2.1.3-3
Kindly let me know if there is a bug in this version and also do please mention how to fix it. On Fri, Jul 24, 2009 at 6:53 PM, <[email protected]>wrote: > Send Linux-HA mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.linux-ha.org/mailman/listinfo/linux-ha > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-HA digest..." > > > Today's Topics: > > 1. Re: Node ha2 is not sync with node ha1 (Andrew Beekhof) > 2. Re: stand_alone_ping: Node xx.yy.zz.ww is unreachable (read) > ([email protected]) > 3. Re: Failed takeover of drbddisk/xen stack > ( Robert Z?hrer | pronet.at ) > 4. Re: Secondary Interfaces, and Host Names (Andrew Beekhof) > 5. Re: DRBD not becoming primary when master node fails > (Andrew Beekhof) > 6. Re: heartbeat, drbd and ipail, no takeover if link is down > (alex handle) > 7. Re: Adding a node to HA-Cluster without service interruption > (Jiayin Mao) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 24 Jul 2009 14:38:55 +0200 > From: Andrew Beekhof <[email protected]> > Subject: Re: [Linux-HA] Node ha2 is not sync with node ha1 > To: General Linux-HA mailing list <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > What version are you using? > There was a bug like this but it was fixed a long time ago > > On Wed, Jul 22, 2009 at 10:02 AM, Ahmed Munir<[email protected]> > wrote: > > Hi all, > > Hoping you all fine. I've got 2 machines and I've installed Linux HA and > > OpenSIPs on them and configured them as an active-active scenario. > Machine 1 > > named ha1, is assigned with virtual IP 192.168.0.184 and machine 2 named > > ha2, is assigned with virtual IP 192.168.0.185. > > > > The integration between HA and OpenSIPs is working fine. Like if I stop > the > > service of ?HA, machine ha1 comes down, its resources are taken by > machine > > ha2 and when ha1 comes online, ha1 take its resources back from machine > ha2 > > and vice versa. > > > > If I turn off ha1 machine its resources are taken by machine ha2 and > > when ha1 comes online, ha1 take its resources back from machine ha2 which > is > > working fine. But when I turn off ha2 machine its resources are taken by > > machine ha1 and when ha2 comes online, and I check the status of ha2 > using > > crm_mon command, > > it shows me weird status as I'm listing down below; > > > > On ha1 machine; > > > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online > > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline > > > > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1 > > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1 > > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1 > > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1 > > > > On ha2 machine; > > > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): offline > > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): online > > > > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha2 > > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha2 > > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha2 > > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha2 > > > > Or sometimes on ha2 machine; > > > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online > > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline > > > > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1 > > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1 > > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1 > > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1 > > > > After that I've checked logs and I'm getting these errors as listed > below; > > > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3a9) from ha2: not in our membership > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3aa) from ha2: not in our membership > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3ab) from ha2: not in our membership > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3ac) from ha2: not in our membership > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3ad) from ha2: not in our membership > > Jul 22 14:12:07 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > > cib_apply_diff message (3b0) from ha2: not in our membership > > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime > > -1778384896 for node 0 [ha1] > > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime > > -1879048192 for node 1 [ha2] > > > > Even I've configured same settings on both machines but I don't know ?why > > I'm getting these errors. > > > > Further added I'm attaching cib.xml, OpenSips (which I created resource > file > > for OpenSIPs), ha.cf and log files. Kindly do have a look and update > > me ASAP. > > > > > > -- > > Regards, > > > > Ahmed Munir > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > ------------------------------ > > Message: 2 > Date: Fri, 24 Jul 2009 15:03:33 +0200 > From: [email protected] > Subject: Re: [Linux-HA] stand_alone_ping: Node xx.yy.zz.ww is > unreachable (read) > To: General Linux-HA mailing list <[email protected]> > Message-ID: > < > q272154738-f124736a97cc0f83fbe99f3cc72a9...@pmq4.mod5.onet.test.onet.pl> > > Content-Type: text/plain; charset=iso-8859-2 > > Below is part of the output with error message produced by command: > /usr/lib64/heartbeat/pingd -VVV -a pingd -d 10 -m 1000 -h 3.27.60.1 > > The machine has three network interfaces and is connected to three > different subnets (3.27.x.x, 192.168.x.x - cluster subnet, 172.22.x.x - > dedicated for heartbeat). > > pingd[6890]: 2009/07/24_14:44:36 debug: debug2: ping_close: Closed > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:36 debug: send_update: Sent update: > pingd=1000 (1 active ping nodes) > pingd[6890]: 2009/07/24_14:44:37 debug: debug2: stand_alone_ping: Checking > connectivity > pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_open: Got address > 3.27.60.1 for 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_open: Opened > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_write: Sent 39 bytes > to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_read: Got 59 bytes > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: dump_v4_echo: Echo from > 3.27.60.1 (exp=1080, seq=1080, id=6890, dest=3.27.60.1, data=pingd-v4): Echo > Reply > pingd[6890]: 2009/07/24_14:44:38 debug: stand_alone_ping: Node 3.27.60.1 is > alive > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_close: Closed > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:38 debug: send_update: Sent update: > pingd=1000 (1 active ping nodes) > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: stand_alone_ping: Checking > connectivity > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_open: Got address > 3.27.60.1 for 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_open: Opened > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_write: Sent 39 bytes > to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:39 debug: debug2: ping_read: Got 262 bytes > No error message: -1: Resource temporarily unavailable (11) > pingd[6890]: 2009/07/24_14:44:39 debug: process_icmp_error: No error > message: -1: Resource temporarily unavailable (11) > pingd[6890]: 2009/07/24_14:44:39 debug: debug2: dump_v4_echo: Echo from > 172.22.10.2 (exp=1081, seq=0, id=0, dest=3.27.60.1, data=E?): Unreachable > Port > pingd[6890]: 2009/07/24_14:44:39 info: stand_alone_ping: Node 3.27.60.1 is > unreachable (read) > pingd[6890]: 2009/07/24_14:44:40 debug: debug2: ping_write: Sent 39 bytes > to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:40 debug: debug2: ping_read: Got 262 bytes > No error message: -1: Resource temporarily unavailable (11) > pingd[6890]: 2009/07/24_14:44:40 debug: process_icmp_error: No error > message: -1: Resource temporarily unavailable (11) > pingd[6890]: 2009/07/24_14:44:40 debug: debug2: dump_v4_echo: Echo from > 192.168.0.5 (exp=1082, seq=0, id=0, dest=3.27.60.1, data=E?): Unreachable > Port > pingd[6890]: 2009/07/24_14:44:40 info: stand_alone_ping: Node 3.27.60.1 is > unreachable (read) > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_close: Closed > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:41 debug: send_update: Sent update: pingd=0 > (0 active ping nodes) > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: stand_alone_ping: Checking > connectivity > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_open: Got address > 3.27.60.1 for 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_open: Opened > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_write: Sent 39 bytes > to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_read: Got 59 bytes > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: dump_v4_echo: Echo from > 3.27.60.1 (exp=1083, seq=1083, id=6890, dest=3.27.60.1, data=pingd-v4): Echo > Reply > pingd[6890]: 2009/07/24_14:44:41 debug: stand_alone_ping: Node 3.27.60.1 is > alive > pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_close: Closed > connection to 3.27.60.1 > pingd[6890]: 2009/07/24_14:44:41 debug: send_update: Sent update: > pingd=1000 (1 active ping nodes) > > Thanks > Jarek > > "General Linux-HA mailing list" <[email protected]> napisa?(a): > > 2009/7/24 <[email protected]>: > > > > > > Rpm built for RHEL5: > > > heartbeat-common-2.99.2-8.1 > > > libheartbeat2-2.99.2-8.1 > > > heartbeat-2.99.2-8.1 > > > heartbeat-resources-2.99.2-8.1 > > > pacemaker-1.0.3-2.2 > > > pacemaker-mgmt-client-1.99.1-2.1 > > > libpacemaker3-1.0.3-2.2 > > > pacemaker-mgmt-1.99.1-2.1 > > > > > > If i start pingd manually (beside working heartbeat+pacemaker) it > gives me following when in /var/log/ha-debug appears "stand_alone_ping: Node > xx.yy.zz.ww is unreachable (read)": > > > > > > [r...@gate2]# date ;/usr/lib64/heartbeat/pingd -a pingd -d 10 -m 1000 > -h xx.yy.zz.ww; date > > > Thu Jul 23 19:25:24 CEST 2009 > > > No error message: -1: Resource temporarily unavailable (11) > > > No error message: -1: Resource temporarily unavailable (11) > > > No error message: -1: Resource temporarily unavailable (11) > > > No error message: -1: Resource temporarily unavailable (11) > > > No error message: -1: Resource temporarily unavailable (11) > > > No error message: -1: Resource temporarily unavailable (11) > > > ... > > > > > > System ping reports no errors. > > > > > > > If you repeat that test with some extra -V arguments, you should see > > more information (which would be helpful). > > But its pretty clear there must be a bug, so its probably worth > > creating an entry in bugzilla. > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > ------------------------------ > > Message: 3 > Date: Fri, 24 Jul 2009 15:15:03 +0200 > From: " "Robert Z?hrer | pronet.at" " <[email protected]> > Subject: Re: [Linux-HA] Failed takeover of drbddisk/xen stack > To: General Linux-HA mailing list <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-15; format=flowed > > Michael Schwartzkopff schrieb: > > Am Donnerstag, 23. Juli 2009 23:59:48 schrieb Robert Z?hrer | pronet.at: > > >> Now when performing a manual takeover (standby command for resource > >> holding node) I get errors while heartbeat wants to stop the drbd > >> resource on the loosing node. The drbd resource become unmanaged it the > >> takeover process hangs. > > > 1) Your problem is: > > Jul 23 22:52:28 greatmama-n2 lrmd: [3382]: info: RA output: > > (drbd_infra_win:stop:stderr) /dev/drbd0: State change failed: (-12) > Device is > > held open by someone Command '/sbin/drbdse > > tup /dev/drbd0 secondary' terminated with exit code 11 /sbin/drbdadm > secondary > > domU-infra: exit code 11, mapping to 1 > > > > Something still uses the DRBD So it cannot get secondary. > > Thx for your help. Yes, the xen hvm domU is still blocking the VDB/drbd > device after being shutdown-ed successful. > > xend.log shows a delay about (exactly?) 5 mins between success of xen RA > in stopping the domU and finally releasing the vbd :/ > > I've to investigate on this hopefully with some help of xen community > > Robert > > ------- xend.log -------------- > [2009-07-24 13:53:35 3143] DEBUG (XendDomainInfo:467) > XendDomainInfo.shutdown(poweroff) > [2009-07-24 13:53:35 3143] DEBUG (XendDomainInfo:1092) > XendDomainInfo.handleShutdownWatch > [2009-07-24 13:58:35 3143] INFO (XendDomainInfo:1283) Domain has > shutdown: name=infra-win2003sbs id=17 reason=poweroff. > [2009-07-24 13:58:35 3143] DEBUG (XendDomainInfo:1897) > XendDomainInfo.destroy: domid=17 > [2009-07-24 13:58:35 3143] DEBUG (XendDomainInfo:1914) > XendDomainInfo.destroyDomain(17) > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1529) Destroying device > model > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1536) Releasing devices > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1542) Removing vif/0 > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:590) > XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1542) Removing vbd/768 > [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:590) > XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 > > > ------------------------------ > > Message: 4 > Date: Fri, 24 Jul 2009 15:21:14 +0200 > From: Andrew Beekhof <[email protected]> > Subject: Re: [Linux-HA] Secondary Interfaces, and Host Names > To: General Linux-HA mailing list <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > On Tue, Jul 21, 2009 at 2:55 PM, Karl W. Lewis<[email protected]> > wrote: > > So, I'm setting up a web server on an HA-Linux Cluster using heartbeat > v2.99 > > and Pacemaker. > > > > I'm setting it up on a set of three Egenera blades. ? These blades have > no > > actual network interfaces, that is handled by the frame. ?So, there are > the > > interfaces over which the web server will talk to the world, and a > virtual > > IP Address managed by the cluster, and I have set up a "private" network > > that he cluster can use to chat that is not connected to any other > network. > > I've assigned IP addresses for that virtual interface on each blade, but > now > > I have a question that I fear will reveal my ignorance... > > > > How do I relate the private network IP Addresses to each host? > ?Currently, > > the public addresses for each host are associated with the hosts in the > > /etc/hosts file. ?And, those names are the ones that you get if you do a > > `uname -n` on any of the blades. ?Should I use the private network > addresses > > instead? > > If you want to be able to have those IPs move to another node if the > normal owner dies, then yes. > Otherwise, it probably doesn't matter too much. > > > My ha.cf file contains the line: bcast eth1 eth0. eth1 is for the > > private network, but it occurs to me that heartbeat can't know how to get > to > > the other nodes over eth1 because I've not told it what those addresses > > are. > > hint: bcast > > >?Should I re-write my hosts file such that the private addresses are > > the ones associated with the proper host names? > > > > Since these are egenera blades, I can't use serial/null modem cables to > > connect the cluster as the blades don't have any of those, either. > > > > My google skilz, such as they are, have failed me... I have been unable > to > > find any details of a setup like this. ?I'd be most appreciative if > anyone > > could provide any tips, pointers, or practical examples. > > > > Thanks, > > > > Karl > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > ------------------------------ > > Message: 5 > Date: Fri, 24 Jul 2009 15:22:21 +0200 > From: Andrew Beekhof <[email protected]> > Subject: Re: [Linux-HA] DRBD not becoming primary when master node > fails > To: General Linux-HA mailing list <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > How are you configuring the cluster for drbd? > drbddisk or the drbd OCF script? > > On Mon, Jul 20, 2009 at 11:49 AM, Husemann, > Harald<[email protected]> wrote: > > Hi all, > > > > I have a 2-node cluster, running Pacemaker (crm_resource version 1.04) > > with heartbeat stack and DRBD 8.3.0 (compiled myself) on two CentOS 5.3 > > boxes. > > I've added several resources, DRBDs and other, and I can migrate them > > between the cluster nodes without any problems. > > But, when I switch off the master node of one of the DRBD resources, > > DRBD fails to start the DRBD on the remaining node. > > I've figured out that the problem is that DRBD is unable to set the > > resource on the remaining node to primary state, since the connection to > > the old primary is lost, and this causes the disk to be in "consistent" > > and not in "UpToDate" state which is necessary for it to become primary. > > > > Googlin' a bit showed up that it's possible to force primary mode by > > using the option "overwrite-data-of-peer" with drbdadm, i. e.: > > > > drbdadm -- --overwerite-data-of-peer primary <rsc>. > > > > (It could be necessary to set the resource to "outdated" before). > > > > I've tried this manually, and it worked - but I'd like Pacemaker to do > > this for me... > > > > I took a look into the drbd resource script, and it seems it does not > > handle this situation - is there a newer version of it which takes care > > of this? > > > > Thanks, > > > > Harald > > -- > > Harald Husemann > > Netzwerk- und Systemadministrator > > Operation Management Center (OMC) > > MATERNA GmbH > > Information & Communications > > > > Westfalendamm 98 > > 44141 Dortmund > > > > Gesch?ftsf?hrer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig > > Amtsgericht Dortmund HRB 5839 > > > > Tel: +49 231 9505 222 > > Fax: +49 231 9505 100 > > www.annyway.com <http://www.annyway.com/> > > www.materna.com <http://www.materna.com/> > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > ------------------------------ > > Message: 6 > Date: Fri, 24 Jul 2009 15:54:28 +0200 > From: alex handle <[email protected]> > Subject: Re: [Linux-HA] heartbeat, drbd and ipail, no takeover if link > is down > To: [email protected] > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi all! > > I'm sorry it was a configuration mistake from me: > > ucast eth0 10.9.9.1 > ucast eth0 10.9.9.10 > ucast eth0 10.10.10.1 > ucast eth0 10.10.10.2 > > it should be > > ucast eth0 10.9.9.1 > ucast eth0 10.9.9.10 > ucast eth1 10.10.10.1 > ucast eth1 10.10.10.2 > > sorry for that :) > > Alex > > > On Thu, Jul 23, 2009 at 7:11 PM, alex handle<[email protected]> wrote: > > Hello, > > > > I'm using DRBD with heartbeat (R1 config) and ipfail. > > > > Here are my versions > > CentOS 5.3 > > heartbeat-2.1.3-3.el5.centos > > drbd-km-2.6.18_128.1.14.el5-8.3.1-3 > > drbd-8.3.1-3 (compiled from source and installed with rpm) > > > > Each node (mail1 and mail2 ) has two interfaces. > > > > eth0 -> heartbeat link, vip and uplink > > eth1 -> hearbeat link and DRBD replication link > > > > To ensure a failover if one nic fails, i set ipfail to ping the gateway. > > > > First i testet my configuration with a firewall-rule to trigger ipfail. > > > > # iptables -A OUTPUT -p icmp --icmp-type 8 -j DROP > > > > That worked perfectly well. > > You can see it in the attached files messages.mail1.iptables and > > messages.mail2.iptables. > > I takes over the vip and drbddisk starts without an error. > > > > But if i try to pull the cable (or ip link set eth0 down) on mail1 > > eth0, the drbddisk resource doesn't get stopped on mail1 > > and so it fails to start on mail2. > > You can see that in messages.mail1.linkdown and messages.mail2.linkdown. > > > > Jul 23 15:51:12 mail2 ResourceManager[11462]: [11696]: ERROR: Return > > code 1 from /etc/ha.d/resource.d/drbddisk > > Jul 23 15:51:12 mail2 ResourceManager[11462]: [11697]: CRIT: Giving up > > resources due to failure of drbddisk::home > > > > You can also see these files attached: > > ha.cf > > haresources > > drbd.conf > > > > I don't rellay know if it's a problem with heartbeat or drbd, but i > > hope you can give me a hint. > > > > Thank you for your help! > > > > > ------------------------------ > > Message: 7 > Date: Fri, 24 Jul 2009 22:34:14 +0800 > From: Jiayin Mao <[email protected]> > Subject: Re: [Linux-HA] Adding a node to HA-Cluster without service > interruption > To: General Linux-HA mailing list <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > Do you mean "autojoin any"? I had it there, but heartbeat still complained > the new node is not in their membership. > > On Fri, Jul 24, 2009 at 8:16 PM, Andrew Beekhof <[email protected]> > wrote: > > > On Thu, Jul 23, 2009 at 11:07 AM, Jiayin Mao<[email protected]> wrote: > > > Can i leave heartbeat running and add new node at the same time? > > > > only if the existing nodes already have "autojoin on" in ha.cf (ie. > > when they were last started) > > > > > I've tried > > > thousands of times, but never made it work. The heartbeat on the > running > > > node always complains the new node is not in their membership. I've set > > > autojoin to any and propagate the UUID of the crashed node to the > > > replacement using crm_uuid -w. > > > > > > On Thu, Jul 23, 2009 at 4:47 PM, Andrew Beekhof <[email protected]> > > wrote: > > > > > >> On Thu, Jul 23, 2009 at 6:22 AM, Michael Schwartzkopff< > > [email protected]> > > >> wrote: > > >> > Am Mittwoch, 22. Juli 2009 23:38:26 schrieb Alexander F?disch: > > >> >> Hi, > > >> >> > > >> >> I have a samba cluster w/ three nodes (heartbeat 2.1.3 / > > crm-enabled). > > >> Now > > >> >> I need to add a fourth one. What will be the best way to do this > w/o > > any > > >> >> service interruption? > > >> >> > > >> >> First step - Node4 has to become a member of the cluster. I would > do > > it > > >> >> like this: > > >> >> > > >> >> > > >> >> Node1: /etc/init.d/heartbeat stop > > >> >> Node1: add Node4 to /etc/ha.d/ha.cf > > >> >> Node1: /etc/init.d/heartbeat start > > >> >> > > >> >> Node2: /etc/init.d/heartbeat stop > > >> >> Node2: add Node4 to /etc/ha.d/ha.cf > > >> >> Node2: /etc/init.d/heartbeat start > > >> >> > > >> >> Node3: /etc/init.d/heartbeat stop > > >> >> Node3: add Node4 to /etc/ha.d/ha.cf > > >> >> Node3: /etc/init.d/heartbeat start > > >> >> > > >> >> On Node4 heartbeat is already configured, so "/etc/init.d/heartbeat > > >> start" > > >> >> on Node4 will be enough. > > >> > > > >> > Process looks OK. > > >> > > >> There will still be service interruption though - every time you > > >> restart heartbeat > > >> _______________________________________________ > > >> Linux-HA mailing list > > >> [email protected] > > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >> See also: http://linux-ha.org/ReportingProblems > > >> > > > > > > > > > > > > -- > > > Max Mao (Mao Jia Yin) > > > Abao Scrum Team, Engineering Department > > > ----------------------------------------------------------- > > > I am located at Beijing office, and usually work during 9:00PM and > 6:00AM > > > ET. > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > > -- > Max Mao (Mao Jia Yin) > Abao Scrum Team, Engineering Department > ----------------------------------------------------------- > I am located at Beijing office, and usually work during 9:00PM and 6:00AM > ET. > > > ------------------------------ > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > End of Linux-HA Digest, Vol 68, Issue 56 > **************************************** > -- Regards, Ahmed Munir _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
