Hi, Some people on the field brought interesting feedback:
"As a TripleO User, I would like the deployment to stop immediately after an resource creation failure during a step of the deployment and be able to easily understand what service or resource failed to be installed". Example: If during step4 Puppet tries to deploy Neutron and OVS, but OVS fails to start for some reasons, deployment should stop at the end of the step. So there are 2 things in this user story: 1) Be able to run some service validation within a step deployment. Note about the implementation: make the validation composable per service (OVS, nova, etc) and not per role (compute, controller, etc). 2) Make this information readable and easy to access and understand for our users. I have a proof-of-concept for 1) and partially 2), with the example of OVS: https://review.openstack.org/#/c/342202/ This patch will make sure OVS is actually usable at step 4 by running 'ovs-vsctl show' during the Puppet catalog and if it's working, it will create a Puppet anchor. This anchor is currently not useful but could be in future if we want to rely on it for orchestration. I wrote the service validation in Puppet 2 years ago when doing Spinal Stack with eNovance: https://github.com/openstack/puppet-openstacklib/blob/master/manifests/service_validation.pp I think we could re-use it very easily, it has been proven to work. Also, the code is within our Puppet profiles, so it's by design composable and we don't need to make any connection with our current services with some magic. Validation will reside within Puppet manifests. If you look my PoC, this code could even live in puppet-vswitch itself (we already have this code for puppet-nova, and some others). Ok now, what if validation fails? I'm testing it here: https://review.openstack.org/#/c/342205/ If you look at /var/log/messages, you'll see: Error: /Stage[main]/Tripleo::Profile::Base::Neutron::Ovs/Openstacklib::Service_validation[openvswitch]/Exec[execute openvswitch validation]/returns: change from notrun to 0 failed So it's pretty clear by looking at logs that openvswitch service validation failed and something is wrong. You'll also notice in the logs that deployed stopped at step 4 since OVS is not considered to run. It's partially addressing 2) because we need to make it more explicit and readable. Dan Prince had the idea to use https://github.com/ripienaar/puppet-reportprint to print a nice report of Puppet catalog result (we haven't tried it yet). We could also use Operational Tools later to monitor Puppet logs and find Service validation failures. So this email is a bootstrap of discussion, it's open for feedback. Don't take my PoC as something we'll implement. It's an idea and I think it's worth to look at it. I like it for 2 reasons: - the validation code reside within our profiles, so it's composable by design. - it's flexible and allow us to test everything. It can be a bash script, a shell command, a Puppet resource (provider, service, etc). Thanks for reading so far, -- Emilien Macchi __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev