Re: [Linux-HA] Linux-HA Digest, Vol 68, Issue 56

Ahmed Munir Sun, 26 Jul 2009 20:47:26 -0700

Thanks for replying Mr. Andrew Beekhof.

With reference to Message:1, I'm using Centos 5.3 Linux and the version for
heartbeat I'm using is 2.1.3-3


Kindly let me know if there is a bug in this version and also do please
mention how to fix it.





On Fri, Jul 24, 2009 at 6:53 PM, <[email protected]>wrote:

> Send Linux-HA mailing list submissions to
>        [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.linux-ha.org/mailman/listinfo/linux-ha
> or, via email, send a message with subject or body 'help' to
>        [email protected]
>
> You can reach the person managing the list at
>        [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-HA digest..."
>
>
> Today's Topics:
>
>   1. Re: Node ha2 is not sync with node ha1 (Andrew Beekhof)
>   2. Re: stand_alone_ping: Node xx.yy.zz.ww is unreachable (read)
>      ([email protected])
>   3. Re: Failed takeover of drbddisk/xen stack
>      (  Robert Z?hrer | pronet.at  )
>   4. Re: Secondary Interfaces, and Host Names (Andrew Beekhof)
>   5. Re: DRBD not becoming primary when master node fails
>      (Andrew Beekhof)
>   6. Re: heartbeat, drbd and ipail,    no takeover if link is down
>      (alex handle)
>   7. Re: Adding a node to HA-Cluster without service   interruption
>      (Jiayin Mao)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 24 Jul 2009 14:38:55 +0200
> From: Andrew Beekhof <[email protected]>
> Subject: Re: [Linux-HA] Node ha2 is not sync with node ha1
> To: General Linux-HA mailing list <[email protected]>
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> What version are you using?
> There was a bug like this but it was fixed a long time ago
>
> On Wed, Jul 22, 2009 at 10:02 AM, Ahmed Munir<[email protected]>
> wrote:
> > Hi all,
> > Hoping you all fine. I've got 2 machines and I've installed Linux HA and
> > OpenSIPs on them and configured them as an active-active scenario.
> Machine 1
> > named ha1, is assigned with virtual IP 192.168.0.184 and machine 2 named
> > ha2, is assigned with virtual IP 192.168.0.185.
> >
> > The integration between HA and OpenSIPs is working fine. Like if I stop
> the
> > service of ?HA, machine ha1 comes down, its resources are taken by
> machine
> > ha2 and when ha1 comes online, ha1 take its resources back from machine
> ha2
> > and vice versa.
> >
> > If I turn off ha1 machine its resources are taken by machine ha2 and
> > when ha1 comes online, ha1 take its resources back from machine ha2 which
> is
> > working fine. But when I turn off ha2 machine its resources are taken by
> > machine ha1 and when ha2 comes online, and I check the status of ha2
> using
> > crm_mon command,
> > it shows me weird status as I'm listing down below;
> >
> > On ha1 machine;
> >
> > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online
> > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline
> >
> > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1
> > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1
> > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1
> > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1
> >
> > On ha2 machine;
> >
> > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): offline
> > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): online
> >
> > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha2
> > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha2
> > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha2
> > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha2
> >
> > Or sometimes on ha2 machine;
> >
> > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online
> > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline
> >
> > IPaddr_1 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1
> > IPaddr_2 ? ? ? ? ? (heartbeat::ocf:IPaddr): ? ? ? ?Started ha1
> > OpenSips_1 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1
> > OpenSips_2 ? ? ?(heartbeat::ocf:OpenSips): ? ? ?Started ha1
> >
> > After that I've checked logs and I'm getting these errors as listed
> below;
> >
> > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3a9) from ha2: not in our membership
> > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3aa) from ha2: not in our membership
> > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3ab) from ha2: not in our membership
> > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3ac) from ha2: not in our membership
> > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3ad) from ha2: not in our membership
> > Jul 22 14:12:07 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> > cib_apply_diff message (3b0) from ha2: not in our membership
> > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime
> > -1778384896 for node 0 [ha1]
> > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime
> > -1879048192 for node 1 [ha2]
> >
> > Even I've configured same settings on both machines but I don't know ?why
> > I'm getting these errors.
> >
> > Further added I'm attaching cib.xml, OpenSips (which I created resource
> file
> > for OpenSIPs), ha.cf and log files. Kindly do have a look and update
> > me ASAP.
> >
> >
> > --
> > Regards,
> >
> > Ahmed Munir
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 24 Jul 2009 15:03:33 +0200
> From: [email protected]
> Subject: Re: [Linux-HA] stand_alone_ping: Node xx.yy.zz.ww is
>        unreachable (read)
> To: General Linux-HA mailing list <[email protected]>
> Message-ID:
>        <
> q272154738-f124736a97cc0f83fbe99f3cc72a9...@pmq4.mod5.onet.test.onet.pl>
>
> Content-Type: text/plain; charset=iso-8859-2
>
> Below is part of the output with error message produced by command:
> /usr/lib64/heartbeat/pingd -VVV -a pingd -d 10 -m 1000 -h 3.27.60.1
>
> The machine has three network interfaces and is connected to three
> different subnets (3.27.x.x, 192.168.x.x - cluster subnet, 172.22.x.x -
> dedicated for heartbeat).
>
> pingd[6890]: 2009/07/24_14:44:36 debug: debug2: ping_close: Closed
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:36 debug: send_update: Sent update:
> pingd=1000 (1 active ping nodes)
> pingd[6890]: 2009/07/24_14:44:37 debug: debug2: stand_alone_ping: Checking
> connectivity
> pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_open: Got address
> 3.27.60.1 for 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_open: Opened
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:37 debug: debug2: ping_write: Sent 39 bytes
> to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_read: Got 59 bytes
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: dump_v4_echo: Echo from
> 3.27.60.1 (exp=1080, seq=1080, id=6890, dest=3.27.60.1, data=pingd-v4): Echo
> Reply
> pingd[6890]: 2009/07/24_14:44:38 debug: stand_alone_ping: Node 3.27.60.1 is
> alive
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_close: Closed
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:38 debug: send_update: Sent update:
> pingd=1000 (1 active ping nodes)
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: stand_alone_ping: Checking
> connectivity
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_open: Got address
> 3.27.60.1 for 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_open: Opened
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:38 debug: debug2: ping_write: Sent 39 bytes
> to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:39 debug: debug2: ping_read: Got 262 bytes
> No error message: -1: Resource temporarily unavailable (11)
> pingd[6890]: 2009/07/24_14:44:39 debug: process_icmp_error: No error
> message: -1: Resource temporarily unavailable (11)
> pingd[6890]: 2009/07/24_14:44:39 debug: debug2: dump_v4_echo: Echo from
> 172.22.10.2 (exp=1081, seq=0, id=0, dest=3.27.60.1, data=E?): Unreachable
> Port
> pingd[6890]: 2009/07/24_14:44:39 info: stand_alone_ping: Node 3.27.60.1 is
> unreachable (read)
> pingd[6890]: 2009/07/24_14:44:40 debug: debug2: ping_write: Sent 39 bytes
> to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:40 debug: debug2: ping_read: Got 262 bytes
> No error message: -1: Resource temporarily unavailable (11)
> pingd[6890]: 2009/07/24_14:44:40 debug: process_icmp_error: No error
> message: -1: Resource temporarily unavailable (11)
> pingd[6890]: 2009/07/24_14:44:40 debug: debug2: dump_v4_echo: Echo from
> 192.168.0.5 (exp=1082, seq=0, id=0, dest=3.27.60.1, data=E?): Unreachable
> Port
> pingd[6890]: 2009/07/24_14:44:40 info: stand_alone_ping: Node 3.27.60.1 is
> unreachable (read)
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_close: Closed
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:41 debug: send_update: Sent update: pingd=0
> (0 active ping nodes)
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: stand_alone_ping: Checking
> connectivity
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_open: Got address
> 3.27.60.1 for 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_open: Opened
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_write: Sent 39 bytes
> to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_read: Got 59 bytes
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: dump_v4_echo: Echo from
> 3.27.60.1 (exp=1083, seq=1083, id=6890, dest=3.27.60.1, data=pingd-v4): Echo
> Reply
> pingd[6890]: 2009/07/24_14:44:41 debug: stand_alone_ping: Node 3.27.60.1 is
> alive
> pingd[6890]: 2009/07/24_14:44:41 debug: debug2: ping_close: Closed
> connection to 3.27.60.1
> pingd[6890]: 2009/07/24_14:44:41 debug: send_update: Sent update:
> pingd=1000 (1 active ping nodes)
>
> Thanks
> Jarek
>
> "General Linux-HA mailing list" <[email protected]> napisa?(a):
>  > 2009/7/24  <[email protected]>:
>  > >
>  > > Rpm built for RHEL5:
>  > > heartbeat-common-2.99.2-8.1
>  > > libheartbeat2-2.99.2-8.1
>  > > heartbeat-2.99.2-8.1
>  > > heartbeat-resources-2.99.2-8.1
>  > > pacemaker-1.0.3-2.2
>  > > pacemaker-mgmt-client-1.99.1-2.1
>  > > libpacemaker3-1.0.3-2.2
>  > > pacemaker-mgmt-1.99.1-2.1
>  > >
>  > > If i start pingd manually (beside working heartbeat+pacemaker) it
> gives me following when in /var/log/ha-debug appears "stand_alone_ping: Node
> xx.yy.zz.ww is unreachable (read)":
>  > >
>  > > [r...@gate2]# date ;/usr/lib64/heartbeat/pingd -a pingd -d 10 -m 1000
> -h xx.yy.zz.ww; date
>  > > Thu Jul 23 19:25:24 CEST 2009
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > No error message: -1: Resource temporarily unavailable (11)
>  > > ...
>  > >
>  > > System ping reports no errors.
>  > >
>  >
>  > If you repeat that test with some extra -V arguments, you should see
>  > more information (which would be helpful).
>  > But its pretty clear there must be a bug, so its probably worth
>  > creating an entry in bugzilla.
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 24 Jul 2009 15:15:03 +0200
> From: " "Robert Z?hrer | pronet.at" "   <[email protected]>
> Subject: Re: [Linux-HA] Failed takeover of drbddisk/xen stack
> To: General Linux-HA mailing list <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> Michael Schwartzkopff schrieb:
> > Am Donnerstag, 23. Juli 2009 23:59:48 schrieb Robert Z?hrer | pronet.at:
>
> >> Now when performing a manual takeover (standby command for resource
> >> holding node) I get errors while heartbeat wants  to stop the drbd
> >> resource on the loosing node. The drbd resource become unmanaged it the
> >> takeover process hangs.
>
> > 1) Your problem is:
> > Jul 23 22:52:28 greatmama-n2 lrmd: [3382]: info: RA output:
> > (drbd_infra_win:stop:stderr) /dev/drbd0: State change failed: (-12)
> Device is
> > held open by someone Command '/sbin/drbdse
> > tup /dev/drbd0 secondary' terminated with exit code 11 /sbin/drbdadm
> secondary
> > domU-infra: exit code 11, mapping to 1
> >
> > Something still uses the DRBD So it cannot get secondary.
>
> Thx for your help. Yes, the xen hvm domU is still blocking the VDB/drbd
> device after being shutdown-ed successful.
>
> xend.log shows a delay about (exactly?) 5 mins between success of xen RA
> in stopping the domU and finally releasing the vbd :/
>
> I've to investigate on this hopefully with some help of xen community
>
> Robert
>
> ------- xend.log --------------
> [2009-07-24 13:53:35 3143] DEBUG (XendDomainInfo:467)
> XendDomainInfo.shutdown(poweroff)
> [2009-07-24 13:53:35 3143] DEBUG (XendDomainInfo:1092)
> XendDomainInfo.handleShutdownWatch
> [2009-07-24 13:58:35 3143] INFO (XendDomainInfo:1283) Domain has
> shutdown: name=infra-win2003sbs id=17 reason=poweroff.
> [2009-07-24 13:58:35 3143] DEBUG (XendDomainInfo:1897)
> XendDomainInfo.destroy: domid=17
> [2009-07-24 13:58:35 3143] DEBUG (XendDomainInfo:1914)
> XendDomainInfo.destroyDomain(17)
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1529) Destroying device
> model
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1536) Releasing devices
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1542) Removing vif/0
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:590)
> XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:1542) Removing vbd/768
> [2009-07-24 13:58:36 3143] DEBUG (XendDomainInfo:590)
> XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 24 Jul 2009 15:21:14 +0200
> From: Andrew Beekhof <[email protected]>
> Subject: Re: [Linux-HA] Secondary Interfaces, and Host Names
> To: General Linux-HA mailing list <[email protected]>
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Jul 21, 2009 at 2:55 PM, Karl W. Lewis<[email protected]>
> wrote:
> > So, I'm setting up a web server on an HA-Linux Cluster using heartbeat
> v2.99
> > and Pacemaker.
> >
> > I'm setting it up on a set of three Egenera blades. ? These blades have
> no
> > actual network interfaces, that is handled by the frame. ?So, there are
> the
> > interfaces over which the web server will talk to the world, and a
> virtual
> > IP Address managed by the cluster, and I have set up a "private" network
> > that he cluster can use to chat that is not connected to any other
> network.
> > I've assigned IP addresses for that virtual interface on each blade, but
> now
> > I have a question that I fear will reveal my ignorance...
> >
> > How do I relate the private network IP Addresses to each host?
> ?Currently,
> > the public addresses for each host are associated with the hosts in the
> > /etc/hosts file. ?And, those names are the ones that you get if you do a
> > `uname -n` on any of the blades. ?Should I use the private network
> addresses
> > instead?
>
> If you want to be able to have those IPs move to another node if the
> normal owner dies, then yes.
> Otherwise, it probably doesn't matter too much.
>
> > My ha.cf file contains the line: bcast eth1 eth0. eth1 is for the
> > private network, but it occurs to me that heartbeat can't know how to get
> to
> > the other nodes over eth1 because I've not told it what those addresses
> > are.
>
> hint: bcast
>
> >?Should I re-write my hosts file such that the private addresses are
> > the ones associated with the proper host names?
> >
> > Since these are egenera blades, I can't use serial/null modem cables to
> > connect the cluster as the blades don't have any of those, either.
> >
> > My google skilz, such as they are, have failed me... I have been unable
> to
> > find any details of a setup like this. ?I'd be most appreciative if
> anyone
> > could provide any tips, pointers, or practical examples.
> >
> > Thanks,
> >
> > Karl
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 24 Jul 2009 15:22:21 +0200
> From: Andrew Beekhof <[email protected]>
> Subject: Re: [Linux-HA] DRBD not becoming primary when master node
>        fails
> To: General Linux-HA mailing list <[email protected]>
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> How are you configuring the cluster for drbd?
> drbddisk or the drbd OCF script?
>
> On Mon, Jul 20, 2009 at 11:49 AM, Husemann,
> Harald<[email protected]> wrote:
> > Hi all,
> >
> > I have a 2-node cluster, running Pacemaker (crm_resource version 1.04)
> > with heartbeat stack and DRBD 8.3.0 (compiled myself) on two CentOS 5.3
> > boxes.
> > I've added several resources, DRBDs and other, and I can migrate them
> > between the cluster nodes without any problems.
> > But, when I switch off the master node of one of the DRBD resources,
> > DRBD fails to start the DRBD on the remaining node.
> > I've figured out that the problem is that DRBD is unable to set the
> > resource on the remaining node to primary state, since the connection to
> > the old primary is lost, and this causes the disk to be in "consistent"
> > and not in "UpToDate" state which is necessary for it to become primary.
> >
> > Googlin' a bit showed up that it's possible to force primary mode by
> > using the option "overwrite-data-of-peer" with drbdadm, i. e.:
> >
> > drbdadm -- --overwerite-data-of-peer primary <rsc>.
> >
> > (It could be necessary to set the resource to "outdated" before).
> >
> > I've tried this manually, and it worked - but I'd like Pacemaker to do
> > this for me...
> >
> > I took a look into the drbd resource script, and it seems it does not
> > handle this situation - is there a newer version of it which takes care
> > of this?
> >
> > Thanks,
> >
> > Harald
> > --
> > Harald Husemann
> > Netzwerk- und Systemadministrator
> > Operation Management Center (OMC)
> > MATERNA GmbH
> > Information & Communications
> >
> > Westfalendamm 98
> > 44141 Dortmund
> >
> > Gesch?ftsf?hrer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
> > Amtsgericht Dortmund HRB 5839
> >
> > Tel: +49 231 9505 222
> > Fax: +49 231 9505 100
> > www.annyway.com <http://www.annyway.com/>
> > www.materna.com <http://www.materna.com/>
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 24 Jul 2009 15:54:28 +0200
> From: alex handle <[email protected]>
> Subject: Re: [Linux-HA] heartbeat, drbd and ipail,      no takeover if link
>        is down
> To: [email protected]
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all!
>
> I'm sorry it was a configuration mistake from me:
>
> ucast eth0 10.9.9.1
> ucast eth0 10.9.9.10
> ucast eth0 10.10.10.1
> ucast eth0 10.10.10.2
>
> it should be
>
> ucast eth0 10.9.9.1
> ucast eth0 10.9.9.10
> ucast eth1 10.10.10.1
> ucast eth1 10.10.10.2
>
> sorry for that :)
>
> Alex
>
>
> On Thu, Jul 23, 2009 at 7:11 PM, alex handle<[email protected]> wrote:
> > Hello,
> >
> > I'm using DRBD with heartbeat (R1 config) and ipfail.
> >
> > Here are my versions
> > CentOS 5.3
> > heartbeat-2.1.3-3.el5.centos
> > drbd-km-2.6.18_128.1.14.el5-8.3.1-3
> > drbd-8.3.1-3 (compiled from source and installed with rpm)
> >
> > Each node (mail1 and mail2 ) has two interfaces.
> >
> > eth0 -> heartbeat link, vip and uplink
> > eth1 -> hearbeat link and DRBD replication link
> >
> > To ensure a failover if one nic fails, i set ipfail to ping the gateway.
> >
> > First i testet my configuration with a firewall-rule to trigger ipfail.
> >
> > # iptables -A OUTPUT -p icmp --icmp-type 8 -j DROP
> >
> > That worked perfectly well.
> > You can see it in the attached files messages.mail1.iptables and
> > messages.mail2.iptables.
> > I takes over the vip and drbddisk starts without an error.
> >
> > But if i try to pull the cable (or ip link set eth0 down) on mail1
> > eth0, the drbddisk resource doesn't get stopped on mail1
> > and so it fails to start on mail2.
> > You can see that in messages.mail1.linkdown and messages.mail2.linkdown.
> >
> > Jul 23 15:51:12 mail2 ResourceManager[11462]: [11696]: ERROR: Return
> > code 1 from /etc/ha.d/resource.d/drbddisk
> > Jul 23 15:51:12 mail2 ResourceManager[11462]: [11697]: CRIT: Giving up
> > resources due to failure of drbddisk::home
> >
> > You can also see these files attached:
> > ha.cf
> > haresources
> > drbd.conf
> >
> > I don't rellay know if it's a problem with heartbeat or drbd, but i
> > hope you can give me a hint.
> >
> > Thank you for your help!
> >
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 24 Jul 2009 22:34:14 +0800
> From: Jiayin Mao <[email protected]>
> Subject: Re: [Linux-HA] Adding a node to HA-Cluster without service
>        interruption
> To: General Linux-HA mailing list <[email protected]>
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Do you mean "autojoin any"? I had it there, but heartbeat still complained
> the new node is not in their membership.
>
> On Fri, Jul 24, 2009 at 8:16 PM, Andrew Beekhof <[email protected]>
> wrote:
>
> > On Thu, Jul 23, 2009 at 11:07 AM, Jiayin Mao<[email protected]> wrote:
> > > Can i leave heartbeat running and add new node at the same time?
> >
> > only if the existing nodes already have "autojoin on" in ha.cf (ie.
> > when they were last started)
> >
> > > I've tried
> > > thousands of times, but never made it work. The heartbeat on the
> running
> > > node always complains the new node is not in their membership. I've set
> > > autojoin to any and propagate the UUID of the crashed node to the
> > > replacement using crm_uuid -w.
> > >
> > > On Thu, Jul 23, 2009 at 4:47 PM, Andrew Beekhof <[email protected]>
> > wrote:
> > >
> > >> On Thu, Jul 23, 2009 at 6:22 AM, Michael Schwartzkopff<
> > [email protected]>
> > >> wrote:
> > >> > Am Mittwoch, 22. Juli 2009 23:38:26 schrieb Alexander F?disch:
> > >> >> Hi,
> > >> >>
> > >> >> I have a samba cluster w/ three nodes (heartbeat 2.1.3 /
> > crm-enabled).
> > >> Now
> > >> >> I need to add a fourth one. What will be the best way to do this
> w/o
> > any
> > >> >> service interruption?
> > >> >>
> > >> >> First step - Node4 has to become a member of the cluster. I would
> do
> > it
> > >> >> like this:
> > >> >>
> > >> >>
> > >> >> Node1: /etc/init.d/heartbeat stop
> > >> >> Node1: add Node4 to /etc/ha.d/ha.cf
> > >> >> Node1: /etc/init.d/heartbeat start
> > >> >>
> > >> >> Node2: /etc/init.d/heartbeat stop
> > >> >> Node2: add Node4 to /etc/ha.d/ha.cf
> > >> >> Node2: /etc/init.d/heartbeat start
> > >> >>
> > >> >> Node3: /etc/init.d/heartbeat stop
> > >> >> Node3: add Node4 to /etc/ha.d/ha.cf
> > >> >> Node3: /etc/init.d/heartbeat start
> > >> >>
> > >> >> On Node4 heartbeat is already configured, so "/etc/init.d/heartbeat
> > >> start"
> > >> >> on Node4 will be enough.
> > >> >
> > >> > Process looks OK.
> > >>
> > >> There will still be service interruption though - every time you
> > >> restart heartbeat
> > >> _______________________________________________
> > >> Linux-HA mailing list
> > >> [email protected]
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> See also: http://linux-ha.org/ReportingProblems
> > >>
> > >
> > >
> > >
> > > --
> > > Max Mao (Mao Jia Yin)
> > > Abao Scrum Team, Engineering Department
> > > -----------------------------------------------------------
> > > I am located at Beijing office, and usually work during 9:00PM and
> 6:00AM
> > > ET.
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
>
>
> --
> Max Mao (Mao Jia Yin)
> Abao Scrum Team, Engineering Department
> -----------------------------------------------------------
> I am located at Beijing office, and usually work during 9:00PM and 6:00AM
> ET.
>
>
> ------------------------------
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> End of Linux-HA Digest, Vol 68, Issue 56
> ****************************************
>



-- 
Regards,

Ahmed Munir
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Linux-HA Digest, Vol 68, Issue 56

Reply via email to