So I did further ping tests and explored differences between my working compute 
nodes and my non-working compute node. Firstly, it seems that the VXLAN is 
working between the nonworking compute node and controller nodes. After 
manually setting IP addresses, I can ping from an instance on the non working 
node to 172.16.1.1 (neutron gateway); when running tcpdump I can see icmp on:
-compute's bridge interface
-compute's vxlan interface
-controller's vxlan interface
-controller's bridge interface
-controller's qrouter namespace

This behavior is expected and is the same for instances on the working compute 
nodes. However if I try to ping 172.16.1.2 (neutron dhcp) from an instance on 
the nonworking compute node, pings do not flow. If I use tcpdump to listen for 
pings I cannot hear any, even listening on the compute node itself; this 
includes listening on the vxlan, bridge, and the tap device directly. Once I 
try to ping in reverse, from the dhcp netns on the controller to the instance 
on the non-working compute node, pings begin to flow. The same is true for 
pings between the instance on the nonworking compute and an instance on the 
working compute. Pings do not flow, until the working instance pings. Once 
pings are flowing between the nonworking instance and neutron DHCP; I run 
dhclient on the instance and start listening for DHCP requests with tcpdump, 
and I hear them on:
-compute's bridge interface
-compute's vxlan interface
They don't make it to the controller node.

I've re-enabled l2-population on the controller's and rebooted them just in 
case, but the problem persists. A diff of /etc/ on all compute nodes shows that 
all openstack and networking related configuration is effectively identical. 
The last difference between the non-working compute node and the working 
compute nodes as far as I can tell, is that the new node has a different 
network card. The working nodes use "Broadcom Limited NetXtreme II BCM57712 10 
Gigabit Ethernet" and the nonworking node uses a "NetXen Incorporated NX3031 
Multifunction 1/10-Gigabit Server Adapter".

Are there any known issues with neutron and this brand of network adapter? I 
looked at the capabilities on both adapters and here are the differences:

Broadcom         NetXen
 tx-tcp-ecn-segmentation: on     tx-tcp-ecn-segmentation: off [fixed]
 rx-vlan-offload: on [fixed]     rx-vlan-offload: off [fixed]
 receive-hashing: on     receive-hashing: off [fixed]
 rx-vlan-filter: on      rx-vlan-filter: off [fixed]
 tx-gre-segmentation: on         tx-gre-segmentation: off [fixed]
 tx-gre-csum-segmentation: on    tx-gre-csum-segmentation: off [fixed]
 tx-ipxip4-segmentation: on      tx-ipxip4-segmentation: off [fixed]
 tx-udp_tnl-segmentation: on     tx-udp_tnl-segmentation: off [fixed]
 tx-udp_tnl-csum-segmentation: on        tx-udp_tnl-csum-segmentation: off 
[fixed]
 tx-gso-partial: on      tx-gso-partial: off [fixed]
 loopback: off   loopback: off [fixed]
 rx-udp_tunnel-port-offload: on  rx-udp_tunnel-port-offload: off [fixed]


_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to