The ideal solution would be to push the IP of CloudStack core(s) in 
/var/cache/cloud/cmdline and ping would be initiated just to that host.

This will avoid the gateway guessing all together. 

However, at this moment - no CS GW IP is passed and hence we take another non 
harmful route.

-----Original Message-----
From: Musayev, Ilya [mailto:imusa...@webmd.net] 
Sent: Wednesday, December 05, 2012 2:38 PM
To: cloudstack-dev@incubator.apache.org
Subject: RouterVM bug fix patch - PLS DISCUSS

I've previously started few threads on router vm issue. After the reboot of 
Router VM, the link local IP could not be reached until we initiate a ping 
request from within the Router VM. This inturn creates an ARP entry and network 
is functional from then on.


You can see the details of my setup under this tutorial.

https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Advanced+Network+Tutorial+-+Step+by+Step


Example setup - I have 2 network - Link Local is untagged and Guest Network 
uses VLAN Tagging:

Guest Network: 10.18.24.0/23
Guest Network GW: 10.18.24.1
Guest Router IP: 10.18.24.20

Link Local Network: 10.12.34.0/24
Link Local GW: 10.12.34.1
Link Local Router IP: 10.12.34.30


CloudStack Net: 10.15.32.0/23
CloudStack Net GW: 10.15.32.1
CloudStack IP: 10.15.32.141


When router VM powers up, based on variables that are passed via 
/var/cache/cloud/cmdline, the key variables for this patch are:
*Localgw=10.12.34.1
*Gateway=10.18.24.1
*Mgmtcidr=10.15.32.0/23

With one of somewhat recent patches you will notice that Router VM will ping 
the Localgw and Geteway variables. This may fix some issue with local routing - 
however my CS Host and gateway is 2 hops away and therefore - the ARP table is 
not populated completely for inbound communication to function properly. 
Outbound from the RouterVM communication is fine.

My proposed fix is to do a 2 second ping count to a new variable MGMTGW, 
derived via:
MGMTGW=$(echo $Mgtmcidr | awk -F "." '{print $1"."$2"."$3"."1}'

This will generate the gateway of CloudStack Net GW - however there is an 
assumption that your gateway always ends with ".1" - which is probably typical 
97%.

Even is the ping to gateway fails because its not set to ".1", in theory - the 
ARP table should be populated with proper MAC addresses and INBOUND 
communication should be functional after a 2 hop ping.

The worst case user will get is a 2 second delay even if ping fails - no harm - 
so it's a very low risk solution.

Please let me know if the proposed solution is acceptable, if so - I will push 
a patch into JIRA.

Thanks
ilya

Reply via email to