Hi all, My apologies if this might be the wrong list to ask such questions, but we're running out of ideas with a weird and sporadic network/routing issue on several OpenVZ hosts.
Quick rundown: We use OpenVZ since around 2006 with great success and over the years we have deployed hundreds of OpenVZ servers for clients. Typically for small to medium sized ISPs. However, there is a particular set of three OpenVZ nodes (in the same data center) that gives us network related grief without end. This client has nodes and VPS's on private IP's and does NAT. There is a switch and a pfSense firewall in front of them. All nodes are pretty powerful (24 cores Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz) and ~200GB of RAM and run fully yummed Scientific Linux 6.6 (64-bit) and the latest OpenVZ kernels and VZ RPMs from OpenVZ. Small correction: One node was rolled back to 2.6.32-042stab092.2, but afterwards it inherited the same problems yet again. Bridged network setup on the nodes (br0) and all VPS's exclusively use venet-style network devices. OS's inside the VPS's are diverse and range from EL5, EL6, Fedora (various) to OpenSuSE. Symptoms: ========= All nodes and VPS's sporadically get unreachable from the outside. From within the private network (from another box for example) one can still SSH in. Pings to public IPs then still work from the nodes, but no longer work from inside the VPS's. This used to happen maybe once a year. Now it happens once or twice a day and usually once one node acts up like this the others start to act that way within the hour (or two) as well. Today all three failed within 2-3 hours and yesterday two of them failed at roughly the same time. We did a lot of troubleshooting, doc reading and googling. But we are pretty much at wits end and am looking for help. So far in case of a failure the only remedy seems to be a reboot, which is (naturally) by now rocking the boat a lot more than tolerable. Restarting the network, dropping and re-adding routes manually and/or restarting the service "vz" and/or individual VPS's or a combination of these simply don't restore connectivity. In case of failure neither routing table nor arp table on the nodes seem to change and we're running no iptables rules on the nodes. It could be that the problem is related to external factors, too. Or maybe we made a boo-boo with our configuration either on the nodes or with the network architecture in general. We would *really* appreciate some feedback and assistance to solve this issue. I compiled a snapshot of the configuration and some diagnostic output at this URL: http://d2.smd.net/.210/host210-OpenVZ-nw-issues.txt Any ideas or pointers? Many thanks in advance. -- With best regards Michael Stauber mstau...@blueonyx.it _______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users