Hi,
We have a test environment where a basic zone is deployed.
Both system VMs and guest addresses are in the 192.168.0.0/16 subnet, even if
with distinct IP ranges.
We noticed that the SSVM is unable to download templates, as the connection
over the public interface (eth2) is suddenly dropped (see attached dump).
As it can be seen from the dump the connection drops because the SSVM fails to
answer to ARP requests from the gateway on eth2.
ARP requests sent to eth2's address fail also from other machines in the same
network.
Here are the relevant configuration info from the SSVM:
root@s-3-VM:~# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
qlen 1000
link/ether 0e:00:a9:fe:02:5c brd ff:ff:ff:ff:ff:ff
inet 169.254.2.92/16 brd 169.254.255.255 scope global eth0
inet6 fe80::c00:a9ff:fefe:25c/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
qlen 1000
link/ether 06:de:1e:00:00:03 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.102/16 brd 192.168.255.255 scope global eth1
inet6 fe80::4de:1eff:fe00:3/64 scope link
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
qlen 1000
link/ether 06:5d:f6:00:00:0b brd ff:ff:ff:ff:ff:ff
inet 192.168.3.110/16 brd 192.168.255.255 scope global eth2
inet6 fe80::45d:f6ff:fe00:b/64 scope link
valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
qlen 1000
link/ether 06:93:c6:00:00:04 brd ff:ff:ff:ff:ff:ff
inet 192.168.3.103/16 brd 192.168.255.255 scope global eth3
inet6 fe80::493:c6ff:fe00:4/64 scope link
valid_lft forever preferred_lft forever
root@s-3-VM:~# ip route
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.2.92
192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.3.102
192.168.0.0/16 dev eth2 proto kernel scope link src 192.168.3.110
192.168.0.0/16 dev eth3 proto kernel scope link src 192.168.3.103
default via 192.168.0.1 dev eth2
root@s-3-VM:~# sysctl -a | grep ipv4.conf.*.arp
error: permission denied on key 'net.ipv4.route.flush'
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 2
net.ipv4.conf.all.arp_accept = 0
net.ipv4.conf.all.arp_notify = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.default.arp_ignore = 2
net.ipv4.conf.default.arp_accept = 0
net.ipv4.conf.default.arp_notify = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.lo.arp_ignore = 2
net.ipv4.conf.lo.arp_accept = 0
net.ipv4.conf.lo.arp_notify = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.arp_announce = 2
net.ipv4.conf.eth0.arp_ignore = 2
net.ipv4.conf.eth0.arp_accept = 0
net.ipv4.conf.eth0.arp_notify = 0
net.ipv4.conf.eth1.proxy_arp = 0
net.ipv4.conf.eth1.arp_filter = 0
net.ipv4.conf.eth1.arp_announce = 2
net.ipv4.conf.eth1.arp_ignore = 2
net.ipv4.conf.eth1.arp_accept = 0
net.ipv4.conf.eth1.arp_notify = 0
net.ipv4.conf.eth2.proxy_arp = 0
net.ipv4.conf.eth2.arp_filter = 0
net.ipv4.conf.eth2.arp_announce = 2
net.ipv4.conf.eth2.arp_ignore = 2
net.ipv4.conf.eth2.arp_accept = 0
net.ipv4.conf.eth2.arp_notify = 0
net.ipv4.conf.eth3.proxy_arp = 0
net.ipv4.conf.eth3.arp_filter = 0
net.ipv4.conf.eth3.arp_announce = 2
net.ipv4.conf.eth3.arp_ignore = 2
net.ipv4.conf.eth3.arp_accept = 0
net.ipv4.conf.eth3.arp_notify = 0
The behaviour actually is exactly the same one would expect if arp_filter is
enabled on the interfaces, but the flag is clearly set to 0. Also setting
arp_ignore to 0 does not cause the expected arp flux problem, as replies are
sent only from the first virtual interface (eth1). In a way, it looks like as
there are policies enforced through arptables, but it seems the module is not
loaded, nor the userspace utility is available on the SSVM.
Of course, changing the order in the route table as follows, ie putting eth2
before eth1 for 192.168.0.0/16, solves the issue.
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.2.92
192.168.0.0/16 dev eth2 proto kernel scope link src 192.168.3.110
192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.3.102
192.168.0.0/16 dev eth3 proto kernel scope link src 192.168.3.103
default via 192.168.0.1 dev eth2
Quite interestingly, after this change ARP requests to eth2 are honoured by the
SSVM even after it is rebooted, and even if the relevant ARP cache entry in the
gateway is removed. Of course, this is not the case when the SSVM is destroyed,
as the new SSVM will have a different MAC address for every interface.
It is also interesting noting that in another setup, where we configured an
advanced zone, this problem does not occur. Even the SSVM is deployed in an adv
zone, the network configuration of the SSVM is very similar.
It would be great if you can provide some advice for debugging this issue, or
share similar experiences.
Regards,
Salvatore