Question concerning Virtual Routers and problems during failover

2017-09-12 Thread Tim Gipson
Hey all,

I’ve found what I think could be a possible issue with the redundant VPC router 
pairs in Clousdstack.  The issue was first noticed when routers were failing 
over from master to backup.  When the backup router became master, everything 
continued to work properly and traffic flowed as normal.  However, when it 
failed from the new master back to the original master the virtual router 
stopped allowing traffic through any network interfaces and any failover after 
that resulted in virtual routers that were not passing traffic.

I can reproduce this behavior by doing a manual failover (logging in and 
issuing a reboot command on the router) from master to backup and then back to 
the original master.  From what I can tell, the iptables rules on the router 
are somehow modified during the failover (or a manual reboot) in such a way as 
to make them completely nonfunctional.  I did a side-by-side comparison of the 
iptables rules before and after a failover (or a manual reboot) and there are 
definite differences.  Sometimes rules are changed, sometimes they are 
duplicated, and I’ve even found that some rules are missing completely out of 
iptables.

We are running in a CentOS 7 environment and using KVM as our hypervisor.  Our 
CS version is 4.8 with standard images for the VRs.  As mentioned previously, 
our VRs are in redundant pairs for VPCs.

I’ve attached two iptables outputs, one from a working router and one from a 
broken router after failover.

Any help or direction you could provide to help me further identify why this is 
happening would be appreciated.

Thanks!

Tim Gipson
<https://www.ena.com/>

 

# Generated by iptables-save v1.4.14 on Tue Aug 29 21:08:17 2017
*mangle
:PREROUTING ACCEPT [445:57066]
:INPUT ACCEPT [547:62882]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [537:50055]
:POSTROUTING ACCEPT [537:50055]
:ACL_OUTBOUND_eth2 - [0:0]
:VPN_STATS_eth1 - [0:0]
-A PREROUTING -m state --state RELATED,ESTABLISHED -j CONNMARK --restore-mark 
--nfmask 0x --ctmask 0x
-A PREROUTING -m state --state RELATED,ESTABLISHED -j CONNMARK --restore-mark 
--nfmask 0x --ctmask 0x
-A PREROUTING -i eth2 -m state --state NEW -j CONNMARK --set-xmark 
0x2/0x
-A PREROUTING -s 172.16.64.0/24 ! -d 172.16.64.1/32 -i eth2 -m state --state 
NEW -j ACL_OUTBOUND_eth2
-A PREROUTING -i eth1 -m state --state NEW -j CONNMARK --set-xmark 
0x1/0x
-A FORWARD -j VPN_STATS_eth1
-A ACL_OUTBOUND_eth2 -d 224.0.0.18/32 -j ACCEPT
-A ACL_OUTBOUND_eth2 -j ACCEPT
-A ACL_OUTBOUND_eth2 -d 225.0.0.50/32 -j ACCEPT
-A VPN_STATS_eth1 -o eth1 -m mark --mark 0x525
-A VPN_STATS_eth1 -i eth1 -m mark --mark 0x524
COMMIT
# Completed on Tue Aug 29 21:08:17 2017
# Generated by iptables-save v1.4.14 on Tue Aug 29 21:08:17 2017
*filter
:INPUT DROP [36:4240]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [537:50055]
:ACL_INBOUND_eth2 - [0:0]
:NETWORK_STATS - [0:0]
:NETWORK_STATS_eth1 - [0:0]
-A INPUT -i eth0 -p tcp -m tcp --dport 10086 -j ACCEPT
-A INPUT -j NETWORK_STATS
-A INPUT -d 172.16.64.3/32 -i eth2 -p tcp -m tcp --dport 80 -m state --state 
NEW -j ACCEPT
-A INPUT -d 172.16.64.3/32 -i eth2 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -d 172.16.64.3/32 -i eth2 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i eth2 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -j NETWORK_STATS
-A INPUT -i eth2 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i eth2 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i eth2 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i eth2 -p tcp -m tcp --dport 80 -m state --state NEW -j ACCEPT
-A INPUT -i eth2 -p tcp -m tcp --dport 8080 -m state --state NEW -j ACCEPT
-A INPUT -d 224.0.0.18/32 -j ACCEPT
-A INPUT -d 225.0.0.50/32 -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 3922 -m state --state NEW,ESTABLISHED -j 
ACCEPT
-A INPUT -d 224.0.0.18/32 -j ACCEPT
-A INPUT -d 225.0.0.50/32 -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 3922 -m state --state NEW,ESTABLISHED -j 
ACCEPT
-A FORWARD -j NETWORK_STATS
-A FORWARD -j NETWORK_STATS_eth1
-A FORWARD -j NETWORK_STATS
-A FORWARD -d 172.16.64.0/24 -o eth2 -j ACL_INBOUND_eth2
-A FORWARD -s 172.16.64.0/22 ! -d 172.16.64.0/22 -j ACCEPT
-A OUTPUT -j NETWORK_STATS
-A OUTPUT -j NETWORK_STATS
-A ACL_INBOUND_eth2 -d 225.0.0.50/32 -j ACCEPT
-A ACL_INBOUND_eth2 -d 224.0.0.18/32 -j ACCEPT
-A NETWORK_STATS -i eth0 -o eth2 -p tcp
-A NETWORK_STATS -i eth2 -o eth0 -p tcp
-A NETWORK_STATS ! -i eth0 -o eth2 -p tcp
-A NETWORK_STATS -i eth2 ! -o eth0 -p tcp
-A NETWORK_STATS -i eth0 -o eth2 -p tcp
-A NETWORK_STATS -i eth2 -o eth0 -p tcp
-A NETWORK_STATS ! -i eth0 -o eth2 -p tcp
-A NETWORK_STATS -i eth2 ! -o eth0 -p tcp
-A NETWORK_STATS_eth1 -d 172.16.64.0/24 -o eth1
-A NETWORK_STATS_eth1 -s 172.16.64.0/24 -o eth1
COMMIT
# Completed on Tue Aug 29 21:08:17 2017
# Generated by iptables-save v1.4.14 on Tue Aug 29 21:08:17 2017
*nat
:PREROUTING ACCEPT [70:3660]
:INPUT ACCEPT [16:1104]
:OUTPUT ACCEPT [10:641]
:POSTROUTING ACCEP

Re: Question concerning Virtual Routers and problems during failover

2017-09-13 Thread Tim Gipson
Sure, I’ll need to recreate a failure scenario so I can capture all that data 
for you.  I’ll post it here as soon as I’ve got it.

Thanks!

Tim Gipson
Systems Engineer
Direct: 615-312-6157
Mobile: 615-585-3652

 <https://www.ena.com/>
 

On 9/12/17, 10:53 PM, "Nitin Kumar Maharana" 
 wrote:

Hi Tim,

Can you please attach both VR’s cloud.log(present in VR path 
/var/log/cloud.log) as well as management server log of the failure case.
Which will help us finding out the exact cause of the failure.


Thanks,
Nitin
On 13-Sep-2017, at 12:42 AM, Tim Gipson 
mailto:tgip...@ena.com.invalid>> wrote:

Hey all,

I’ve found what I think could be a possible issue with the redundant VPC 
router pairs in Clousdstack.  The issue was first noticed when routers were 
failing over from master to backup.  When the backup router became master, 
everything continued to work properly and traffic flowed as normal.  However, 
when it failed from the new master back to the original master the virtual 
router stopped allowing traffic through any network interfaces and any failover 
after that resulted in virtual routers that were not passing traffic.

I can reproduce this behavior by doing a manual failover (logging in and 
issuing a reboot command on the router) from master to backup and then back to 
the original master.  From what I can tell, the iptables rules on the router 
are somehow modified during the failover (or a manual reboot) in such a way as 
to make them completely nonfunctional.  I did a side-by-side comparison of the 
iptables rules before and after a failover (or a manual reboot) and there are 
definite differences.  Sometimes rules are changed, sometimes they are 
duplicated, and I’ve even found that some rules are missing completely out of 
iptables.

We are running in a CentOS 7 environment and using KVM as our hypervisor.  
Our CS version is 4.8 with standard images for the VRs.  As mentioned 
previously, our VRs are in redundant pairs for VPCs.

I’ve attached two iptables outputs, one from a working router and one from 
a broken router after failover.

Any help or direction you could provide to help me further identify why 
this is happening would be appreciated.

    Thanks!

Tim Gipson
<https://www.ena.com/>





DISCLAIMER
==
This e-mail may contain privileged and confidential information which is 
the property of Accelerite, a Persistent Systems business. It is intended only 
for the use of the individual or entity to which it is addressed. If you are 
not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Accelerite, a Persistent Systems business does not accept any liability for 
virus infected mails.




Re: Question concerning Virtual Routers and problems during failover

2017-09-13 Thread Tim Gipson
I just opened a JIRA issue  
https://issues.apache.org/jira/browse/CLOUDSTACK-10074 and added my IPtables 
files as well as the management logs and the logs from the routers.  I started 
the manual failover at around 14:40 so that should help anyone wanting to look 
at the logs.

Thanks!

Tim Gipson
Systems Engineer
Direct: 615-312-6157
Mobile: 615-585-3652

 <https://www.ena.com/>
 

On 9/13/17, 10:18 AM, "Tim Gipson"  wrote:

Sure, I’ll need to recreate a failure scenario so I can capture all that 
data for you.  I’ll post it here as soon as I’ve got it.

Thanks!
    
Tim Gipson
Systems Engineer
Direct: 615-312-6157
Mobile: 615-585-3652

 <https://www.ena.com/>
 

On 9/12/17, 10:53 PM, "Nitin Kumar Maharana" 
 wrote:

Hi Tim,

Can you please attach both VR’s cloud.log(present in VR path 
/var/log/cloud.log) as well as management server log of the failure case.
Which will help us finding out the exact cause of the failure.


Thanks,
Nitin
    On 13-Sep-2017, at 12:42 AM, Tim Gipson 
mailto:tgip...@ena.com.invalid>> wrote:

Hey all,

I’ve found what I think could be a possible issue with the redundant 
VPC router pairs in Clousdstack.  The issue was first noticed when routers were 
failing over from master to backup.  When the backup router became master, 
everything continued to work properly and traffic flowed as normal.  However, 
when it failed from the new master back to the original master the virtual 
router stopped allowing traffic through any network interfaces and any failover 
after that resulted in virtual routers that were not passing traffic.

I can reproduce this behavior by doing a manual failover (logging in 
and issuing a reboot command on the router) from master to backup and then back 
to the original master.  From what I can tell, the iptables rules on the router 
are somehow modified during the failover (or a manual reboot) in such a way as 
to make them completely nonfunctional.  I did a side-by-side comparison of the 
iptables rules before and after a failover (or a manual reboot) and there are 
definite differences.  Sometimes rules are changed, sometimes they are 
duplicated, and I’ve even found that some rules are missing completely out of 
iptables.

We are running in a CentOS 7 environment and using KVM as our 
hypervisor.  Our CS version is 4.8 with standard images for the VRs.  As 
mentioned previously, our VRs are in redundant pairs for VPCs.

I’ve attached two iptables outputs, one from a working router and one 
from a broken router after failover.

Any help or direction you could provide to help me further identify why 
this is happening would be appreciated.
    
    Thanks!

Tim Gipson
<https://www.ena.com/>





DISCLAIMER
==
This e-mail may contain privileged and confidential information which 
is the property of Accelerite, a Persistent Systems business. It is intended 
only for the use of the individual or entity to which it is addressed. If you 
are not the intended recipient, you are not authorized to read, retain, copy, 
print, distribute or use this message. If you have received this communication 
in error, please notify the sender and delete all copies of this message. 
Accelerite, a Persistent Systems business does not accept any liability for 
virus infected mails.