Anthony, At first I too was suspecting the ARP reply was sent on eth1 and then discarded by the gateway because the SRC MAC address of the frame did not match the MAC in the ARP payload. However, I've not seen any ARP reply on eth1. I'm attaching the dumps so you can have a look at them (I don't know whether they'll be mangled or not, though).
I agree this might be a bug. My perspective is that there's no reason for having several network interfaces unless distinct labels are being used for management/public/storage network. Is it ok for you if I report an issue on bugs.cloudstack.org and assign it to you? Regards, Salvatore > -----Original Message----- > From: Anthony Xu [mailto:xuefei...@citrix.com] > Sent: 15 June 2012 19:59 > To: cloudstack-dev@incubator.apache.org > Subject: RE: Advice on SSVM network interfaces > > Hi Salvatore, > > From your description, the ARP response is sent out through eth1, some > switches may drop this kind of package, it expects to receive ARP response > from the same port ARP request sent out. > > I think it is a bug, in this case, CloudStack should not configure eth2 and > eth3 > for SSVM, if SSVM does need several IPs, all IPs should be configured on eth1 > if they are in the same subnet. > > > > Regards, > Anthony > > > > > > > -----Original Message----- > > From: Salvatore Orlando [mailto:salvatore.orla...@eu.citrix.com] > > Sent: Friday, June 15, 2012 11:00 AM > > To: cloudstack-dev@incubator.apache.org > > Subject: Advice on SSVM network interfaces > > > > Hi, > > > > We have a test environment where a basic zone is deployed. > > Both system VMs and guest addresses are in the 192.168.0.0/16 subnet, > > even if with distinct IP ranges. > > > > We noticed that the SSVM is unable to download templates, as the > > connection over the public interface (eth2) is suddenly dropped (see > > attached dump). > > As it can be seen from the dump the connection drops because the SSVM > > fails to answer to ARP requests from the gateway on eth2. > > ARP requests sent to eth2's address fail also from other machines in > > the same network. > > > > Here are the relevant configuration info from the SSVM: > > > > root@s-3-VM:~# ip addr show > > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state > UNKNOWN > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > inet 127.0.0.1/8 scope host lo > > inet6 ::1/128 scope host > > valid_lft forever preferred_lft forever > > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > pfifo_fast > > state UP qlen 1000 > > link/ether 0e:00:a9:fe:02:5c brd ff:ff:ff:ff:ff:ff > > inet 169.254.2.92/16 brd 169.254.255.255 scope global eth0 > > inet6 fe80::c00:a9ff:fefe:25c/64 scope link > > valid_lft forever preferred_lft forever > > 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > pfifo_fast > > state UP qlen 1000 > > link/ether 06:de:1e:00:00:03 brd ff:ff:ff:ff:ff:ff > > inet 192.168.3.102/16 brd 192.168.255.255 scope global eth1 > > inet6 fe80::4de:1eff:fe00:3/64 scope link > > valid_lft forever preferred_lft forever > > 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > pfifo_fast > > state UP qlen 1000 > > link/ether 06:5d:f6:00:00:0b brd ff:ff:ff:ff:ff:ff > > inet 192.168.3.110/16 brd 192.168.255.255 scope global eth2 > > inet6 fe80::45d:f6ff:fe00:b/64 scope link > > valid_lft forever preferred_lft forever > > 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > pfifo_fast > > state UP qlen 1000 > > link/ether 06:93:c6:00:00:04 brd ff:ff:ff:ff:ff:ff > > inet 192.168.3.103/16 brd 192.168.255.255 scope global eth3 > > inet6 fe80::493:c6ff:fe00:4/64 scope link > > valid_lft forever preferred_lft forever > > > > > > root@s-3-VM:~# ip route > > 169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.2.92 > > 192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.3.102 > > 192.168.0.0/16 dev eth2 proto kernel scope link src 192.168.3.110 > > 192.168.0.0/16 dev eth3 proto kernel scope link src 192.168.3.103 > > default via 192.168.0.1 dev eth2 > > > > > > root@s-3-VM:~# sysctl -a | grep ipv4.conf.*.arp > > error: permission denied on key 'net.ipv4.route.flush' > > net.ipv4.conf.all.proxy_arp = 0 > > net.ipv4.conf.all.arp_filter = 0 > > net.ipv4.conf.all.arp_announce = 2 > > net.ipv4.conf.all.arp_ignore = 2 > > net.ipv4.conf.all.arp_accept = 0 > > net.ipv4.conf.all.arp_notify = 0 > > net.ipv4.conf.default.proxy_arp = 0 > > net.ipv4.conf.default.arp_filter = 0 > > net.ipv4.conf.default.arp_announce = 2 > > net.ipv4.conf.default.arp_ignore = 2 net.ipv4.conf.default.arp_accept > > = 0 net.ipv4.conf.default.arp_notify = 0 net.ipv4.conf.lo.proxy_arp = > > 0 net.ipv4.conf.lo.arp_filter = 0 net.ipv4.conf.lo.arp_announce = 2 > > net.ipv4.conf.lo.arp_ignore = 2 net.ipv4.conf.lo.arp_accept = 0 > > net.ipv4.conf.lo.arp_notify = 0 net.ipv4.conf.eth0.proxy_arp = 0 > > net.ipv4.conf.eth0.arp_filter = 0 net.ipv4.conf.eth0.arp_announce = 2 > > net.ipv4.conf.eth0.arp_ignore = 2 net.ipv4.conf.eth0.arp_accept = 0 > > net.ipv4.conf.eth0.arp_notify = 0 net.ipv4.conf.eth1.proxy_arp = 0 > > net.ipv4.conf.eth1.arp_filter = 0 net.ipv4.conf.eth1.arp_announce = 2 > > net.ipv4.conf.eth1.arp_ignore = 2 net.ipv4.conf.eth1.arp_accept = 0 > > net.ipv4.conf.eth1.arp_notify = 0 net.ipv4.conf.eth2.proxy_arp = 0 > > net.ipv4.conf.eth2.arp_filter = 0 net.ipv4.conf.eth2.arp_announce = 2 > > net.ipv4.conf.eth2.arp_ignore = 2 net.ipv4.conf.eth2.arp_accept = 0 > > net.ipv4.conf.eth2.arp_notify = 0 net.ipv4.conf.eth3.proxy_arp = 0 > > net.ipv4.conf.eth3.arp_filter = 0 net.ipv4.conf.eth3.arp_announce = 2 > > net.ipv4.conf.eth3.arp_ignore = 2 net.ipv4.conf.eth3.arp_accept = 0 > > net.ipv4.conf.eth3.arp_notify = 0 > > > > The behaviour actually is exactly the same one would expect if > > arp_filter is enabled on the interfaces, but the flag is clearly set > > to 0. Also setting arp_ignore to 0 does not cause the expected arp > > flux problem, as replies are sent only from the first virtual > > interface (eth1). In a way, it looks like as there are policies > > enforced through arptables, but it seems the module is not loaded, nor > > the userspace utility is available on the SSVM. > > > > Of course, changing the order in the route table as follows, ie > > putting > > eth2 before eth1 for 192.168.0.0/16, solves the issue. > > > > 169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.2.92 > > 192.168.0.0/16 dev eth2 proto kernel scope link src 192.168.3.110 > > 192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.3.102 > > 192.168.0.0/16 dev eth3 proto kernel scope link src 192.168.3.103 > > default via 192.168.0.1 dev eth2 > > > > Quite interestingly, after this change ARP requests to eth2 are > > honoured by the SSVM even after it is rebooted, and even if the > > relevant ARP cache entry in the gateway is removed. Of course, this is > > not the case when the SSVM is destroyed, as the new SSVM will have a > > different MAC address for every interface. > > > > It is also interesting noting that in another setup, where we > > configured an advanced zone, this problem does not occur. Even the > > SSVM is deployed in an adv zone, the network configuration of the SSVM > > is very similar. > > > > It would be great if you can provide some advice for debugging this > > issue, or share similar experiences. > > > > Regards, > > Salvatore