2013.11.09. 1:00 keltezéssel, Dennis Jacobfeuerborn írta:
On 09.11.2013 00:15, Dennis Jacobfeuerborn wrote:
Hi,
I'm finally moving forward with creating a redundant gateway System for
a network but I'm running into trouble. This is the configuration that
I'm using:
node gw01 \
attributes standby="off"
node gw02 \
attributes standby="off"
primitive p_ip_gw_ext ocf:heartbeat:IPaddr2 \
params ip="192.168.100.132" cidr_netmask="29" nic="eth0" \
op monitor interval="10s"
primitive p_ip_gw_int ocf:heartbeat:IPaddr2 \
params ip="192.168.214.4" cidr_netmask="24" nic="eth1" \
op monitor interval="10s"
primitive p_route_ext ocf:heartbeat:Route \
params destination="default" device="eth0"
gateway="192.168.100.129" \
op monitor interval="10" timeout="20" depth="0"
primitive p_route_int ocf:heartbeat:Route \
params destination="default" device="eth1"
gateway="192.168.214.1" \
op monitor interval="10" timeout="20" depth="0"
group g_gateway p_ip_gw_ext p_ip_gw_int
colocation c_route_ext -inf: p_route_ext p_ip_gw_ext
colocation c_routes -inf: p_route_ext p_route_int
property $id="cib-bootstrap-options" \
dc-version="1.1.10-1.el6_4.4-368c726" \
cluster-infrastructure="cman" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1383949342"
The setup is fairly simple. One IP on the public interface, one IP on
the private interface. On the active system the default route is
configured through the public interface, on the secondary system the
default route is configured through the private interface.
The problem is that when I put the active node on standby the
p_route_int resource stays on the secondary system and the p_route_ext
resource gets stopped.
My interpretation is that since there is no explicit priority defined
for either route and with only one node online only one route can be
placed that pacemaker decides arbitrarily to keep p_route_int online and
take p_route_ext offline.
What I really want to express is that p_route_ext should always be
placed first and p_route_int only be placed if possible (i.e. forced
migration) but if not should be taken offline instead (i.e. a node is
down).
Any ideas on how to accomplish this?
After sending the mail it occured to me that "place one route not on
the same node as the other" is a wrong way to look at this. Instead I
did this:
colocation c_route_ext inf: p_route_ext p_ip_gw_ext
colocation c_route_int -inf: p_route_int p_ip_gw_ext
i.e. place the ext route with the ext ip and the int route on a node
where the ext ip is *not* running. That way the routes no longer
depend on each other and the priority of the routes no longer matters.
Sorry for the noise.
Regards,
Dennis
Hi Dennis & list,
I am having a very similar problem with my cluster, but I am not sure if
your solution is truly good, so let me tell you about what I just
experienced:
I have two resource groups, one is responsible for having a VPN +
external IP + internal interface IP for routing traffic from other
machines through the VPN tunnel and the other one is just a single
routing resource which routes the other node's traffic through the VPN
tunnel.
If everything is fine then node1 has the VPN tunnel up and node2 has the
routing resource, meaning it sends all traffic through VPN on node1.
This is achieved by saying:
colocation routes -inf: vpn_resource_group route_resource_group
If both nodes are up and running then everything is fine, but today I
stopped node2 (which was running the route_resource_group) and to my
surprise (well, for the first 30 seconds before I realized what probably
happened) the VPN-service group stopped too. According to my theory this
is what happened: Corosync saw that the route_resource_group isn't
running anywhere, so it tried to start it on node1, but the colocation
rule told it not to do so, ergo the route_resource_group went to
'failed' state, but since it is in colocation with the
vpn_resource_group the vpn_resource_group failed too, bringing the whole
cluster and all my services down.
So I am wondering how one could tell corosync that if there is no way to
run resource2 (route_resource_group) then let that stop but keep running
resource1 (vpn_resource_group).
Thank you a lot!
Domonkos
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org