Hi there! I'm working with a pacemaker cluster that acts as the gateway of several vlans (subnets) of my network. So, all resources i'm managing are: virtual IPs (several), firewall, xorp, radvd, dhcp, etc.. (all network infraestructure-related services). There are 26 resources, 21 being virtual IPs.
We have been using our current configuration for 3 or 4 month, and after some test, we are not very happy with the implementation. The cluster is active-passive, with 2 nodes. Servers are sunfire amd64 with 16 cores. I see mainly two issues, maybe both related: 1) IPadd2 and IPv6addr RAs are slow or our configuration make their response slow. So, move resources from one node to other takes so long (we don't have an exact time, but I think ~2-3 minutes). I suspect some kind of bad interaction between the firewall and those RAs, but we check pretty deep. 2) The constraints we add for colocation and grouping are not allowing the cluster to quickly reply to failover situations. We group all VIPs in one group and colocate the other resources where the VIP group is. Here is our current config in the production cluster: node node1 node node2 primitive p_dhcp lsb:/etc/init.d/isc-dhcp-server \ op monitor interval="5" primitive p_firewall lsb:/etc/init.d/firewall \ op monitor interval="20" timeout="5" onfail="restart" \ op start interval="0" timeout="50" \ op stop interval="0" timeout="20" primitive p_ipv ocf:heartbeat:IPaddr2 \ params ip="10.0.3.50" nic="bond0" primitive p_ipv_nat ocf:heartbeat:IPaddr2 \ params ip="10.0.3.57" nic="bond1.27" primitive p_ipv_openvpn ocf:heartbeat:IPaddr2 \ params ip="10.0.3.51" nic="bond0" primitive p_ipv_v6 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:18::9" nic="bond0" primitive p_ipv_v6_vlan27 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:27::1" cidr_netmask="64" nic="bond1.27" primitive p_ipv_v6_vlan31 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:31::1" cidr_netmask="64" nic="bond1.31" primitive p_ipv_v6_vlan34 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:34::1" cidr_netmask="64" nic="bond1.34" primitive p_ipv_v6_vlan51 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:51::1" cidr_netmask="64" nic="bond1.51" primitive p_ipv_v6_vlan54 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:54::1" cidr_netmask="64" nic="bond1.54" primitive p_ipv_v6_vlan6 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:6::1" cidr_netmask="64" nic="bond1.6" primitive p_ipv_v6_vlan7 ocf:heartbeat:IPv6addr \ params ipv6addr="fc00:7::1" cidr_netmask="64" nic="bond1.7" primitive p_ipv_vlan10 ocf:heartbeat:IPaddr2 \ params ip="10.0.3.65" nic="bond1.10" primitive p_ipv_vlan23 ocf:heartbeat:IPaddr2 \ params ip="10.0.2.193" nic="bond1.23" primitive p_ipv_vlan27 ocf:heartbeat:IPaddr2 \ params ip="10.0.4.129" nic="bond1.27" primitive p_ipv_vlan28 ocf:heartbeat:IPaddr2 \ params ip="10.0.3.9" nic="bond1.28" primitive p_ipv_vlan31 ocf:heartbeat:IPaddr2 \ params ip="10.0.8.129" nic="bond1.31" primitive p_ipv_vlan34 ocf:heartbeat:IPaddr2 \ params ip="10.0.5.1" nic="bond1.34" primitive p_ipv_vlan51 ocf:heartbeat:IPaddr2 \ params ip="10.0.4.1" nic="bond1.51" primitive p_ipv_vlan54 ocf:heartbeat:IPaddr2 \ params ip="10.0.3.193" nic="bond1.54" primitive p_ipv_vlan6 ocf:heartbeat:IPaddr2 \ params ip="10.0.5.1" nic="bond1.6" primitive p_ipv_vlan7 ocf:heartbeat:IPaddr2 \ params ip="10.0.2.1" nic="bond1.7" primitive p_openvpn lsb:/etc/init.d/openvpn \ op monitor interval="5" primitive p_radvd lsb:/etc/init.d/radvd \ op monitor interval="5" primitive p_xorp lsb:/etc/init.d/xorp \ op monitor interval="5" group g_ipv p_ipv_vlan27 p_ipv p_ipv_vlan7 p_ipv_vlan6 p_ipv_vlan54 p_ipv_vlan51 p_ipv_vlan31 p_ipv_vlan34 p_ipv_nat p_ipv_vlan23 p_ipv_vlan10 p_ipv_vlan28 p_ipv_openvpn p_ipv_v6 p_ipv_v6_vlan51 p_ipv_v6_vlan31 p_ipv_v6_vlan6 p_ipv_v6_vlan7 p_ipv_v6_vlan27 p_ipv_v6_vlan54 p_ipv_v6_vlan34 colocation dhcp-ipv inf: p_dhcp g_ipv colocation firewall-ipv inf: p_firewall g_ipv colocation openvpn-ipv inf: p_openvpn p_ipv colocation radvd-ipv inf: p_radvd g_ipv colocation xorp-ipv inf: p_xorp g_ipv property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1352111523" rsc_defaults $id="rsc-options" \ resource-stickiness="1000000" This generates a status like this: root@node2:~# crm status ============ Last updated: Mon Nov 5 13:31:09 2012 Last change: Mon Nov 5 11:32:03 2012 via crmd on node1 Stack: openais Current DC: node2 - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 26 Resources configured. ============ Online: [ node1 node2 ] p_dhcp (lsb:/etc/init.d/isc-dhcp-server): Started node2 p_firewall (lsb:/etc/init.d/firewall): Started node2 p_openvpn (lsb:/etc/init.d/openvpn): Started node2 p_radvd (lsb:/etc/init.d/radvd): Started node2 p_xorp (lsb:/etc/init.d/xorp): Started node2 Resource Group: g_ipv p_ipv_vlan27 (ocf::heartbeat:IPaddr2): Started node2 p_ipv (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan7 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan6 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan54 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan51 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan31 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan34 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_nat (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan23 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan10 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_vlan28 (ocf::heartbeat:IPaddr2): Started node2 p_ipv_openvpn (ocf::heartbeat:IPaddr2): Started node2 p_ipv_v6 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan51 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan31 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan6 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan7 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan27 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan54 (ocf::heartbeat:IPv6addr): Started node2 p_ipv_v6_vlan34 (ocf::heartbeat:IPv6addr): Started node2 We have a test environment, where I set the same cluster, but with 80 VIPs (mixed IPv4 and IPv6). We changed the configuration there to not grouping nothing, but having a location constraint per resource that forces all resources to be in the same active node. In this configuration we cannot move resources from one node to other unless we set the active node in standby. [...] location p_dhcp_prefer_rasca p_dhcp inf: rasca location p_firewall_prefer_rasca p_firewall inf: rasca location p_ipv_205_prefer_rasca p_ipv_205 inf: rasca location p_ipv_206_prefer_rasca p_ipv_206 inf: rasca location p_ipv_207_prefer_rasca p_ipv_207 inf: rasca location p_ipv_208_prefer_rasca p_ipv_208 inf: rasca location p_ipv_209_prefer_rasca p_ipv_209 inf: rasca location p_ipv_210_prefer_rasca p_ipv_210 inf: rasca location p_ipv_211_prefer_rasca p_ipv_211 inf: rasca location p_ipv_212_prefer_rasca p_ipv_212 inf: rasca location p_ipv_213_prefer_rasca p_ipv_213 inf: rasca location p_ipv_214_prefer_rasca p_ipv_214 inf: rasca location p_ipv_215_prefer_rasca p_ipv_215 inf: rasca location p_ipv_216_prefer_rasca p_ipv_216 inf: rasca location p_ipv_217_prefer_rasca p_ipv_217 inf: rasca location p_ipv_218_prefer_rasca p_ipv_218 inf: rasca [...] Despite those tests, we cannot conclude anything. I don't know what is better approach, since there aren't a really substancial difference in performance. I see in GitHub that the code of the RAs are in development. So, here the questions: · What would you recommend for approaching this implementation? · Do you know some method for measuring response time in an accuarate way? · Do you recommend using newer RA code from github? The software: Debian Wheezy packages, corosync 1.4.2-3, pacemaker 1.1.7-1, kernel 3.2.0-3-amd64 Best regards -- # # Arturo Borrero Gonzalez || cer.i...@linuxmail.org # Use debian gnu/linux! # _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org