I'll close this because of inactivity. Please reopen this if you still see the problem.
** Changed in: oslo.messaging Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1404241 Title: nova-compute state not updated Status in OpenStack Compute (nova): Expired Status in oslo.messaging: Invalid Bug description: I'm running 2014.2.1 on CentOS7. 1 controller and 5 compute nodes are deployed using packstack. Whenever I reboot the controller node, some of nova-compute services report state=XXX even after 60 minutes after reboot completed and controller node is up and running again: [root@juno1 ~(keystone_admin)]# nova-manage service list Binary Host Zone Status State Updated_At nova-consoleauth juno1 internal enabled :-) 2014-12-19 13:17:48 nova-scheduler juno1 internal enabled :-) 2014-12-19 13:17:47 nova-conductor juno1 internal enabled :-) 2014-12-19 13:17:47 nova-cert juno1 internal enabled :-) 2014-12-19 13:17:48 nova-compute juno4 nova enabled XXX 2014-12-19 12:26:56 nova-compute juno5 nova enabled :-) 2014-12-19 13:17:47 nova-compute juno6 nova enabled :-) 2014-12-19 13:17:46 nova-compute juno3 nova enabled :-) 2014-12-19 13:17:46 nova-compute juno2 nova enabled XXX 2014-12-19 12:21:52 Here is the chunk of nova-compute log from juno4: 2014-12-19 15:46:02.082 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-12-19 15:46:02.083 5193 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: Socket closed 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 655, in ensure 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit return method() 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 735, in _consume 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=timeout) 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs) 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs) 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 299, in drain_events 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout, 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 362, in _wait_multiple 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout) 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 333, in read_timeout 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method() 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 189, in read_method 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit raise m 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed 2014-12-19 15:46:02.083 5193 TRACE oslo.messaging._drivers.impl_rabbit 2014-12-19 15:46:02.084 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-12-19 15:46:03.084 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:03.096 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:03.105 5193 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 9.167.185.90:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. 2014-12-19 15:46:03.105 5193 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 9.167.185.90:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. 2014-12-19 15:46:04.106 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-12-19 15:46:04.106 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-12-19 15:46:05.106 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:05.116 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:05.157 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:05.159 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:33.229 5193 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources 2014-12-19 15:46:33.781 5193 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'conductor': [Errno 104] Connection reset by peer 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last): 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 655, in ensure 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit return method() 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 752, in _publish 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic=topic, **kwargs) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 378, in __init__ 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit **options) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 326, in __init__ 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_rabbit.py", line 334, in reconnect 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 83, in __init__ 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 214, in revive 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self.declare() 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/messaging.py", line 100, in declare 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare() 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 163, in declare 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive, 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 613, in exchange_declare 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 56, in _send_method 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content, 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 221, in write_method 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 177, in write_frame 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit frame_type, channel, size, payload, 0xce, 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 359, in sendall 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 342, in send 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags) 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer 2014-12-19 15:46:33.781 5193 TRACE oslo.messaging._drivers.impl_rabbit 2014-12-19 15:46:33.782 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds... 2014-12-19 15:46:34.782 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 9.167.185.90:5672 2014-12-19 15:46:34.796 5193 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 9.167.185.90:5672 I see, that nova-compute lost AMQ connection with controller node because of reboot, but then connection was established again after reboot completed. I also can manage running instances on that compute node and see correct instance states. The problem can be solved if I do: systemctl restart openstack-nova- compute on nodes with XXX state. After that I see expected :-) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1404241/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp