Tim, Right now it won't and that is the problem we are trying to solve: combine HARestarter with additional script/service so one doesn't interfere with the other. And this exposes a design gap or glitch, as we effectively going to have 2 services to execute the same task. This gap is something we want to address eventually.
-- Best regards, Oleg Gelbukh Mirantis Inc. On Wed, Oct 9, 2013 at 3:28 PM, Tim Bell <tim.b...@cern.ch> wrote: > Would the HARestarter approach work for VMs which were not launched by > Heat ? > > We expect to have some applications driven by Heat but lots of others > would not be (especially the more 'pet'-like traditional workloads). > > Tim > > From: Oleg Gelbukh [mailto:ogelb...@mirantis.com] > Sent: 09 October 2013 13:01 > To: OpenStack Development Mailing List > Subject: Re: [openstack-dev] [nova] automatically evacuate instances on > compute failure > > Hello, > > We have much interest in this discussion (with focus on second scenario > outlined by Tim), and working on its design at the moment. Thanks to > everyone for valuable insights in this thread. > > It looks like external orchestration daemon problem is partially solved > already by Heat with HARestarter resource [1]. > > Hypervisor failure detection is also more or less solved problem in Nova > [2]. There are other candidates for that task as well, like Ceilometer's > hardware agent [3] (still WIP to my knowledge). > > [1] > https://github.com/openstack/heat/blob/stable/grizzly/heat/engine/resources/instance.py#L35 > [2] > http://docs.openstack.org/developer/nova/api/nova.api.openstack.compute.contrib.hypervisors.html#module-nova.api.openstack.compute.contrib.hypervisors > [3] > https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices > -- > Best regards, > Oleg Gelbukh > Mirantis Labs > > On Wed, Oct 9, 2013 at 9:26 AM, Tim Bell <tim.b...@cern.ch> wrote: > I have proposed the summit design session for Hong Kong ( > http://summit.openstack.org/cfp/details/103) to discuss exactly these > sort of points. We have the low level Nova commands but need a service to > automate the process. > > I see two scenarios > > - A hardware intervention needs to be scheduled, please rebalance this > workload elsewhere before it fails completely > - A hypervisor has failed, please recover what you can using shared > storage and give me a policy on what to do with the other VMs (restart, > leave down till repair etc.) > > Most OpenStack production sites have some sort of script doing this sort > of thing now. However, each one will be implementing the logic for > migration differently so there is no agreed best practise approach. > > Tim > > > -----Original Message----- > > From: Chris Friesen [mailto:chris.frie...@windriver.com] > > Sent: 09 October 2013 00:48 > > To: openstack-dev@lists.openstack.org > > Subject: Re: [openstack-dev] [nova] automatically evacuate instances on > compute failure > > > > On 10/08/2013 03:20 PM, Alex Glikson wrote: > > > Seems that this can be broken into 3 incremental pieces. First, would > > > be great if the ability to schedule a single 'evacuate' would be > > > finally merged > > > (_ > https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance_ > ). > > > > Agreed. > > > > > Then, it would make sense to have the logic that evacuates an entire > > > host > > > (_ > https://blueprints.launchpad.net/python-novaclient/+spec/find-and-evacuate-host_ > ). > > > The reasoning behind suggesting that this should not necessarily be in > > > Nova is, perhaps, that it *can* be implemented outside Nova using the > > > indvidual 'evacuate' API. > > > > This actually more-or-less exists already in the existing "nova > host-evacuate" command. One major issue with this however is that it > > requires the caller to specify whether all the instances are on shared > or local storage, and so it can't handle a mix of local and shared > > storage for the instances. If any of them boot off block storage for > > instance you need to move them first and then do the remaining ones as a > group. > > > > It would be nice to embed the knowledge of whether or not an instance is > on shared storage in the instance itself at creation time. I > > envision specifying this in the config file for the compute manager > along with the instance storage location, and the compute manager > > could set the field in the instance at creation time. > > > > > Finally, it should be possible to close the loop and invoke the > > > evacuation automatically as a result of a failure detection (not clear > > > how exactly this would work, though). Hopefully we will have at least > > > the first part merged soon (not sure if anyone is actively working on > > > a rebase). > > > > My interpretation of the discussion so far is that the nova maintainers > would prefer this to be driven by an outside orchestration daemon. > > > > Currently the only way a service is recognized to be "down" is if > someone calls is_up() and it notices that the service hasn't sent an update > > in the last minute. There's nothing in nova actively scanning for > compute node failures, which is where the outside daemon comes in. > > > > Also, there is some complexity involved in dealing with auto-evacuate: > > What do you do if an evacuate fails? How do you recover intelligently > if there is no admin involved? > > > > Chris > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev