Hi,

Nova-manage service list is not conveying loss of dependent services

Description: Though Rabbit MQ service is down, nova-compute service status is 
showing as alive, so VMs are regularly scheduled to this node and get stuck.

Proposed Solution:

Need to detect compute managers which have failed to contact the dependent 
service like RabitMQ, libvirt etc., after identifying the problem (after 
configurable number of retries) compute service should disable itself by 
logging proper reason into database and it should enable the itself once the 
detected problem resolves.

With the proposed solution when the Rabbit MQ service is down service list 
shows as follows:

nova-network nv-aw1st21-compute0001 nova disabled (unable to contact RabbitMQ) 
enabled 2013-01-02 05:33:00

To implement this approach need to introduce a new column "reason".

Please comment on the proposed approach.

Thanks,
Kobagana Kumar

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to