Hey neutron dev! Im having a serious problem with my neutron router getting spin locked in nf_conntrack_tuple_taken. Has anybody else experienced it? "perf top" shows nf_conntrack_tuple_taken at 75% As the incoming request rate goes up, so nf_conntrack_tuple_taken runs very hot on CPU0 causing ksoftirqd/0 to run at 100%. At that point internal pings on the GRE network go sky high and its game over. Pinging from a vm to the subnet default gateway on the neutron goes from 0.2ms to 11s! pinging from the same vm to another vm in the same subnet stays constant at 0.2ms.
Very much indicates to me that the neutron router is having serious problems. No other part of the system seems under pressure. ipv6 is disabled, and nf_conntrack_max/nf_conntrack_hash are set to 256k. We've tried the default 3.13 and the utopic 3.16 kernel (3.16 has lots of work on removing spinlocks around nf_conntrack). 3.16 survives a little longer but still gets in the same state Neutron router 1 x Ubuntu 14.04/Icehouse 2014.1.1 on an ibm x3550 with 4 10G intel nics. eth0 - Mgt eth1 - GRE eth2 - Public eth3 - unused Compute/controller nodes 43 x Ubuntu 14.04/Icehouse 2014.1.1 ibm x240 flex blades with 4 emulex nics eth0 Mgt eth2 GRE Any help very much appreciated! Replace the l2/l3 functions with hardware is very much an option if thats a better solution. Im running out of time before my client decides to stay on AWS. BR, Stuart
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev