Le 18/05/2013 20:23, christopher barry a écrit :
On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote:
Le 16/05/2013 21:45, christopher barry a écrit :
Greetings,

I've setup a new 2-node mysql cluster using
* drbd 8.3.1.3
* corosync 1.4.2
* pacemaker 117
on Debian Wheezy nodes.

failover seems to be working fine for everything except the ips manually
configured on the interfaces.

This sentence makes no sense to me.
The cluster will not failover something that is not clusterized (a
'manually' configured IP...)

What are you trying to achieve exactly ?
Also, could you pastebin the output of "crm_mon -Arf1" I find it more
easy to read.



see config here:
http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
+g09RcJvhHbgrY1JuN7D+gA4=

If I bring down an interface, when the cluster restarts it, it only
starts it with the vip - the original ip and route have been removed.

Makes sense if you added the 'original' IP manually...
You should have non-VIP in /etc/sysconfig/network/ifcfg-*
But then again, please precise what you are trying to achieve.


not sure what to do to make sure the permanent ip and the routes get
restored. I'm not all that versed on the cluster commandline yet, and
I'm using LCMC for most of my usage.



(@howard2.rjmetrics.com)-(14:00 / Sat May 18)
[-][~]# crm_mon -Arf1
============
Last updated: Sat May 18 14:00:27 2013
Last change: Thu May 16 17:33:07 2013 via crm_attribute on
howard3.rjmetrics.com
Stack: openais
Current DC: howard3.rjmetrics.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
6 Resources configured.
============

Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ]

Full list of resources:

  Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
      Masters: [ howard2.rjmetrics.com ]
      Slaves: [ howard3.rjmetrics.com ]
  Resource Group: g_mysql
      p_fs_mysql        (ocf::heartbeat:Filesystem):    Started
howard2.rjmetrics.com
      ClusterPrivateIP  (ocf::heartbeat:IPaddr2):       Started
howard2.rjmetrics.com
      ClusterPublicIP   (ocf::heartbeat:IPaddr2):       Started
howard2.rjmetrics.com
      p_mysql   (ocf::heartbeat:mysql): Started howard2.rjmetrics.com

Node Attributes:
* Node howard3.rjmetrics.com:
     + master-p_drbd_mysql:0            : 1000
* Node howard2.rjmetrics.com:
     + master-p_drbd_mysql:1            : 10000

Migration summary:
* Node howard3.rjmetrics.com:
    p_drbd_mysql:1: migration-threshold=1000000 fail-count=1
* Node howard2.rjmetrics.com:
    ClusterPublicIP: migration-threshold=1000000 fail-count=1

Failed actions:
     p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29,
rc=-2, status=Timed Out): unknown exec error
     ClusterPublicIP_monitor_30000 (node=howard2.rjmetrics.com, call=122,
rc=7, status=complete): not running


howard2 and howard3 are the two clustered servers.

During testing, when I ifdown either eth0 or eth1, the cluster starts
the vip back up, but the other non-vip IPs and routes do not get
started. I'm running Debian, so these are configured
in /etc/network/interfaces. Saying 'manually' configured was misleading
on my part, sorry about that.

Mhh, I cannot reproduce right now but I was pretty sure that IPaddr2 used "ip addr add X.X.X.X/YY dev ZZ" so I was expecting that ifdowning device ZZ would prevent pacemaker to re-up the VIP as the underlaying device doesn't exists anymore. It's even proved by the fact that the non-vip doesn't come up again: IPaddr2 doesn't ifup, it add an alias to an existing device.
See "sudo crm ra meta IPaddr2" and search for "nic="

Anyway, "ifdown" is not a valid use case to test your cluster, this doesn't represent any possible valid production scenario.


eth0 is the public interface, and eth1 is the private interface. eth2
and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled
between the nodes.

The test I was doing was to pull cables from eth0 and eth1, which hung
the cluster. My assumption is that I need to add more configuration
elements to manage the other IPs and also setup some ping hosts that
when unreachable will initiate failover. What would help me I think is
an example config or pointers to how to add these elements.

Well, without digging much in your configuration, you need ping-nodes yes so that your most connected nodes "wins", and you also need fencing, that is mandatory on any cluster.

Here's sample configuration for ping nodes and a location constraing so that the most connected nodes hosts the resource "foo":


primitive ping-gw-sw1-sw2 ocf:pacemaker:ping \
params host_list="192.168.10.1 192.168.2.11 192.168.2.12" dampen="35s" attempts="2" timeout="2" multiplier="100" \
        op monitor interval="15s"

clone ping-nq-sw-swsec-clone ping-gw-sw1-sw2 \
        meta target-role="Started"

location IPHA-on-connected-node foo \
        rule $id="IPHA-on-connected-node-rule" pingd: defined pingd

See http://www.hastexo.com/resources/hints-and-kinks/network-connectivity-check-pacemaker


On another note, the test made the drbd link disconnect, with both disks
now marked as standalone in the lcmc gui. Right-clicking the disks or
the conenction does not allow any action other than view logs, which
say:

May 16 17:33:08 howard3 kernel: [781360.146362] block drbd0: Split-Brain
detected but unresolved, dropping connection!
May 16 17:33:08 howard3 kernel: [781360.146451] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0
May 16 17:33:08 howard3 kernel: [781360.149042] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
May 16 17:33:08 howard3 kernel: [781360.149051] block drbd0:
conn( WFReportParams -> Disconnecting )
May 16 17:33:08 howard3 kernel: [781360.149060] block drbd0: error
receiving ReportState, l: 4!
May 16 17:33:08 howard3 kernel: [781360.149154] block drbd0: asender
terminated
May 16 17:33:08 howard3 kernel: [781360.149159] block drbd0: Terminating
drbd0_asender
May 16 17:33:08 howard3 kernel: [781360.149609] block drbd0: Connection
closed
May 16 17:33:08 howard3 kernel: [781360.149619] block drbd0:
conn( Disconnecting -> StandAlone )
May 16 17:33:08 howard3 kernel: [781360.149811] block drbd0: receiver
terminated
May 16 17:33:08 howard3 kernel: [781360.149815] block drbd0: Terminating
drbd0_receiver

I'm really not sure how to proceed. Please let me know any additional
information you may need.

I know nothing about shared storage.


Thanks for your time Florian, it's much appreciated.


You'r welcome.


--
Cheers,
Florian Crouzat

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to