On 07/03/2013 01:08 PM, David Kranz wrote:
On 07/03/2013 12:30 PM, Day, Phil wrote:
Hi Folks,
I have a change submitted which adds the same clean shutdown logic to
stop and delete that exists for soft reboot – the rational being that
its always better to give a VM a chance to shutdown cleanly if
possible even if you’re about to delete it as sometimes other parts of
the application expect this, and if its booted from a volume you want
to leave the guest file system in a tidy state.
https://review.openstack.org/#/c/35303/
However setting the default value to 120 seconds (as per soft reboot)
causes the Jenkins gate jobs to blow the 3 hour limit. This seems to
be just a gradual accumulation of extra time rather than any one test
running much longer.
So options would seem to be:
i)Make the default wait time much shorter so that Jenkins runs OK
(tries this with 10 seconds and it works fine), and assume that users
will configure it to a more realistic value.
ii)Keep the default at 120 seconds, but make the Jenkins jobs use a
specific configuration setting (is this possible, and iof so can
someone point me at where to make the change) ?
iii)Increase the time allowed for Jenkins
iv)The ever popular something else …
Thought please.
Cheers,
Phil
The fact that changing the timeout changes gate time means the code is
actually hitting the timeout. Is that expected?
Shutdown is now relying on the guest responding to acpi. Is that what we
want? Tempest uses a specialized image and I'm not sure how it is set up
in this regard. In any event I don't think we want to add any more time
to server delete when running in the gate.
I'm also a little concerned that this seems to be a significant behavior
change when using vms that behave like the ones in the gate. In reboot
this is handled by having soft/hard options of course.
I think that's a good question, do we know that cirros actually responds
to acpi shutdown?
I'm also a bit more ok with this on the soft_reboot path (which makes
total sense to me) than the power_off path (which today is a hard kill),
and putting this in destroy just seems wrong to me. It does seem to
change the semantics quite a bit for a stable API.
For HA fencing it's really important to have a way that we can still
immediately kill a guest, dead, right now, so that if it has access to
shared resources it can't damage them when we want to give them to a
different guest.
-Sean
--
Sean Dague
http://dague.net
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev