Hi,

I have following setup:

1) infrastructure node, IP in bond, hosting following KVM guests:
1.1) Postgres KVM guest
1.2) MQ KVM guest
1.3) DNS KVM guest
1.4) Control node with Nova API, Cinder API, Quantum Server, etc.
...
1.8) Quantum network node with quantum agents

Agents on this network node are always dying and starting up again:

# quantum agent-list
+--------------------------------------+--------------------+-----------------------------+-------+----------------+
| id                                   | agent_type         | host              
          | alive | admin_state_up |
+--------------------------------------+--------------------+-----------------------------+-------+----------------+
| 5656392b-b6fe-4570-802f-97d2154acf31 | L3 agent           | 
net01-001.int.net.net | xxx   | True           |
| 1093fb73-6622-448e-8dad-558a36cca306 | DHCP agent         | 
net01-001.int.net.net | xxx   | True           |
| 4518830d-e112-439f-a629-7defa7bd29e9 | Open vSwitch agent | 
net01-001.int.net.net | xxx   | True           |
| 86ee6d24-2e6a-4f58-addb-290fefc26401 | Open vSwitch agent | nova05            
          | :-)   | True           |
| b67697bb-3ec1-49fc-8f3c-7e4e7892e83a | Open vSwitch agent | nova04            
          | :-)   | True           |
+--------------------------------------+--------------------+-----------------------------+-------+----------------+

Few minutes after, those agents will be up again, one may die - while others 
not. 

ping net01-001
PING net01-001.int.net.net (10.10.146.34) 56(84) bytes of data.
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=1 ttl=64 
time=0.912 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=2 ttl=64 
time=0.273 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=2 ttl=64 
time=0.319 ms (DUP!)
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=3 ttl=64 
time=0.190 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=4 ttl=64 
time=0.230 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=4 ttl=64 
time=0.305 ms (DUP!)
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=5 ttl=64 
time=0.199 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=7 ttl=64 
time=0.211 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=8 ttl=64 
time=0.322 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=8 ttl=64 
time=0.409 ms (DUP!)
^C
--- net01-001.int.net.net ping statistics ---
8 packets transmitted, 7 received, +3 duplicates, 12% packet loss, time 7017ms

SSH`ing to network node is also difficult - constant freezes. Nothing 
suspicious in the logs.




Since DHCP agent may be down, spawning a VM may end in "waiting for network 
device" state. Then, it might get the internal IP and then floating - but 
accessing it also proves to be very troublesome - I believe because of L3 agent 
flapping.

My OpenStack was set up under this manual - 
https://github.com/mseknibilel/OpenStack-Grizzly-Install-Guide/blob/OVS_MultiNode/OpenStack_Grizzly_Install_Guide.rst

Only thing I did - I added HAproxy/keepalived on top of it, balancing API 
requests on control nodes. But this shouldn`t impact networking...


Anyone have any thoughts about this?

Cheers,
NM

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to