Vish, Jay, OK, this looks promising. A couple of questions…
I'm seeing this RPC timeout on the Essex 2012.1 packages released with Ubuntu 12.04. I'm assuming these packages are affected by this bug? Why would something this fundamental not show up during Essex RC.X testing? How best to 'fix' this for our production environment (thank god it's only the research organization!) My previous testing of Essex (running on Diablo) didn't exhibit this problem. However during testing, I was configured using FlatDHCP networking versus our production cloud using VLAN networking. This was what lead me to believe that it might be a network configuration issue. Apparently not. As it stands right now, we're dead in the water, so I hope some easy fix for the Ubuntu Essex release is possible. Thanks for everyone looking this. Hope to hear a resolution soon. Regards, Ross On Jun 19, 2012, at 2:13 PM, Vishvananda Ishaya wrote: Sorry, paste fail on the last message. This seems like a likely culprit: https://review.openstack.org/#/c/8339/ I'm guessing it only happens on concurrent builds? We probably need a synchronized somewhere. Vish On Jun 19, 2012, at 12:03 PM, Jay Pipes wrote: cc'ing Vish on this, as this is now occurring on every single devstack + Tempest run, for multiple servers. Vish, I am seeing the exact same issue as shown below. Instances end up in ERROR state and looking into the nova-network log, I find *no* errors at all, and yet looking at the nova-compute log, I see multiple timeout errors -- all of them trying to RPC while in the allocate_network method. Always the same method, always the same error, and no errors in nova-network or nova-api (other than just reporting a failed build) Any idea on something that may have crept in recently? This wasn't happening a week or so ago, AFAICT. Best, -jay On 06/18/2012 06:03 PM, Lillie Ross-CDSR11 wrote: I'm receiving RPC timeouts when trying to launch an instance. My installation is the Essex release running on Ubuntu 12.04. When I launch a test image, the launch fails. In my setup, Nova network runs on a controller node, and all compute instances run on separate, dedicated server nodes. The failure is repeatable. Upon examining the various logs, I see the following (see below). Any insight would be welcome. Regards, Ross >From 'nova show <instance name>' I read the following: root@cirrus1:~# nova show test +-------------------------------------+-----------------------------------------------------------------------------+ | Property | Value | +-------------------------------------+-----------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-SRV-ATTR:host | nova8 | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | OS-EXT-SRV-ATTR:instance_name | instance-00000005 | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state | networking | | OS-EXT-STS:vm_state | error | | accessIPv4 | | | accessIPv6 | | | config_drive | | | created | 2012-06-18T20:42:56Z | | fault | {u'message': u'Timeout', u'code': 500, u'created': u'2012-06-18T20:43:58Z'} | | flavor | m1.tiny | | hostId | 50272989300483e2b5e5236cd572fef3f9149ae60faa5f5660f8da54 | | id | d569b16f-10a8-4cb8-90a3-d5b664c2322d | | image | tty-linux | | key_name | admin | | metadata | {} | | name | test | | private_0 network | | | status | ERROR | | tenant_id | 1 | | updated | 2012-06-18T20:43:57Z | | user_id | 1 | +-------------------------------------+-----------------------------------------------------------------------------+ >From the nova-network.log I see the following: 2012-06-18 15:43:36 DEBUG nova.manager [-] Running periodic task VlanManager._disassociate_stale_fixed_ips from (pid=1381) periodic_tasks /usr/lib/python2.7/dist-packages /nova/manager.py:152 2012-06-18 15:43:57 ERROR nova.rpc.common [-] Timed out waiting for RPC response: timed out 2012-06-18 15:43:57 TRACE nova.rpc.common Traceback (most recent call last): 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490, in ensure 2012-06-18 15:43:57 TRACE nova.rpc.common return method(*args, **kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567, in _consume 2012-06-18 15:43:57 TRACE nova.rpc.common return self.connection.drain_events(timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 238, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return connection.drain_events(**kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 57, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return self.wait_multi(self.channels.values(), timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 63, in wait_multi 2012-06-18 15:43:57 TRACE nova.rpc.common chanmap.keys(), allowed_methods, timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 120, in _wait_multiple 2012-06-18 15:43:57 TRACE nova.rpc.common channel, method_sig, args, content = read_timeout(timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 94, in read_timeout 2012-06-18 15:43:57 TRACE nova.rpc.common return self.method_reader.read_method() 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method 2012-06-18 15:43:57 TRACE nova.rpc.common raise m 2012-06-18 15:43:57 TRACE nova.rpc.common timeout: timed out 2012-06-18 15:43:57 TRACE nova.rpc.common 2012-06-18 15:43:58 DEBUG nova.utils [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] Attempting to grab semaphore "get_dhcp" for method "_get_dhcp_ip"... from (pid=1381) i nner /usr/lib/python2.7/dist-packages/nova/utils.py:927 2012-06-18 15:43:58 DEBUG nova.utils [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] Got semaphore "get_dhcp" for method "_get_dhcp_ip"... from (pid=1381) inner /usr/lib/p ython2.7/dist-packages/nova/utils.py:931 2012-06-18 15:43:58 DEBUG nova.utils [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] Attempting to grab semaphore "get_dhcp" for method "_get_dhcp_ip"... from (pid=1381) i and from nova-compute.log I read: 2012-06-18 15:43:57 ERROR nova.rpc.common [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] Timed out waiting for RPC response: timed out 2012-06-18 15:43:57 TRACE nova.rpc.common Traceback (most recent call last): 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490, in ensure 2012-06-18 15:43:57 TRACE nova.rpc.common return method(*args, **kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567, in _consume 2012-06-18 15:43:57 TRACE nova.rpc.common return self.connection.drain_events(timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return self.transport.drain_events(self.connection, **kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 238, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return connection.drain_events(**kwargs) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 57, in drain_events 2012-06-18 15:43:57 TRACE nova.rpc.common return self.wait_multi(self.channels.values(), timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 63, in wait_multi 2012-06-18 15:43:57 TRACE nova.rpc.common chanmap.keys(), allowed_methods, timeout=timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 120, in _wait_multiple 2012-06-18 15:43:57 TRACE nova.rpc.common channel, method_sig, args, content = read_timeout(timeout) 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 94, in read_timeout 2012-06-18 15:43:57 TRACE nova.rpc.common return self.method_reader.read_method() 2012-06-18 15:43:57 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method 2012-06-18 15:43:57 TRACE nova.rpc.common raise m 2012-06-18 15:43:57 TRACE nova.rpc.common timeout: timed out 2012-06-18 15:43:57 TRACE nova.rpc.common 2012-06-18 15:43:57 ERROR nova.compute.manager [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] Instance failed network setup 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] Traceback (most recent call last): 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 559, in _allocate_network 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] requested_networks=requested_networks) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 170, in allocate_for_instance 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] 'args': args}) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] return _get_impl().call(context, topic, msg, timeout) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] return rpc_amqp.call(context, topic, msg, timeout, Connection.pool) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] rv = list(rv) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__ 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] self._iterator.next() 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in iterconsume 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] yield self.ensure(_error_callback, _consume) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in ensure 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] error_callback(e) 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in _error_callback 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] raise rpc_common.Timeout() 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] Timeout: Timeout while waiting on RPC response. 2012-06-18 15:43:57 TRACE nova.compute.manager [instance: d569b16f-10a8-4cb8-90a3-d5b664c2322d] 2012-06-18 15:43:58 ERROR nova.rpc.amqp [req-16158a6b-f3d6-49f3-977e-3ccfecc791ca 1 1] Exception during message handling 2012-06-18 15:43:58 TRACE nova.rpc.amqp Traceback (most recent call last): 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data 2012-06-18 15:43:58 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped 2012-06-18 15:43:58 TRACE nova.rpc.amqp return f(*args, **kw) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function 2012-06-18 15:43:58 TRACE nova.rpc.amqp sys.exc_info()) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2012-06-18 15:43:58 TRACE nova.rpc.amqp self.gen.next() 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function 2012-06-18 15:43:58 TRACE nova.rpc.amqp return function(self, context, instance_uuid, *args, **kwargs) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 651, in run_instance 2012-06-18 15:43:58 TRACE nova.rpc.amqp do_run_instance() 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 945, in inner 2012-06-18 15:43:58 TRACE nova.rpc.amqp retval = f(*args, **kwargs) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 650, in do_run_instance 2012-06-18 15:43:58 TRACE nova.rpc.amqp self._run_instance(context, instance_uuid, **kwargs) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 451, in _run_instance 2012-06-18 15:43:58 TRACE nova.rpc.amqp self._set_instance_error_state(context, instance_uuid) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ 2012-06-18 15:43:58 TRACE nova.rpc.amqp self.gen.next() 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 424, in _run_instance 2012-06-18 15:43:58 TRACE nova.rpc.amqp requested_networks) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 559, in _allocate_network 2012-06-18 15:43:58 TRACE nova.rpc.amqp requested_networks=requested_networks) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 170, in allocate_for_instance 2012-06-18 15:43:58 TRACE nova.rpc.amqp 'args': args}) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call 2012-06-18 15:43:58 TRACE nova.rpc.amqp return _get_impl().call(context, topic, msg, timeout) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call 2012-06-18 15:43:58 TRACE nova.rpc.amqp return rpc_amqp.call(context, topic, msg, timeout, Connection.pool) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call 2012-06-18 15:43:58 TRACE nova.rpc.amqp rv = list(rv) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__ 2012-06-18 15:43:58 TRACE nova.rpc.amqp self._iterator.next() 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in iterconsume 2012-06-18 15:43:58 TRACE nova.rpc.amqp yield self.ensure(_error_callback, _consume) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in ensure 2012-06-18 15:43:58 TRACE nova.rpc.amqp error_callback(e) 2012-06-18 15:43:58 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in _error_callback 2012-06-18 15:43:58 TRACE nova.rpc.amqp raise rpc_common.Timeout() 2012-06-18 15:43:58 TRACE nova.rpc.amqp Timeout: Timeout while waiting on RPC response. 2012-06-18 15:43:58 TRACE nova.rpc.amqp 2012-06-18 15:44:05 DEBUG nova.manager [-] Running periodic task ComputeManager._publish_service_capabilities from (pid=13262) periodic_tasks /usr/lib/python2.7/dist-packages/nova/manager.py:152 2012-06-18 15:44:05 DEBUG nova.manager [-] Notifying Schedulers of capabilities ... from (pid=13262) _publish_service_capabilities /usr/lib/python2.7/dist-packages/nova/manager.py:203 2012-06-18 15:44:05 DEBUG nova.rpc.amqp [-] Making asynchronous fanout cast... from (pid=13262) fanout_cast /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py:354 2012-06-18 15:44:05 DEBUG nova.manager [-] Running periodic task ComputeManager._poll_rescued_instances from (pid=13262) periodic_tasks /usr/lib/python2.7/dist-packages/nova/manager.py:152 2012-06-18 15:44:05 DEBUG nova.manager [-] Skipping ComputeManager._sync_power_states, 1 ticks left until next run from _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net<mailto:openstack@lists.launchpad.net> Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net<mailto:openstack@lists.launchpad.net> Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp