Hi, Tomi Thanks for the summary.
I am a bit confused about the difference between the 2. and 1./b. Would you please give an example to explain how it would work? Suppose we have - tenant-a - vm-a on host-a - tenant-b - vm-b on host-a When a raw failure occurs on host-a, the existing sequence[1] would be 1. Monitor send "host-a failure" event to Inspector 2. Inspector find affected VMs (get all vm in host-a) 3. Inspector resets affected VMs (vm-a and vm-b) to error state 4. Controller request Notifier to notify all ... I think this is how "*1./a.*" works. For "*1./b.*" it seems to be close to the alternative sequence in fault management scenario[2]. Instead of waiting for Controller to send notification, the Inspector will directly inform the Notifier about it. Apparently, 5a is mandatory before 5b and 5c. But 5b and 5c. *(alt)* can be triggered simultaneously with async calls. If we deploy vitrage as the inspector, VMs state error could be *deduced* and notified *independently* from "5b. Update State" action. Then the time required for updating all VMs state would not matter any more. "*1./b.*" looks good to me but I'd like to hear more on "2." [1] *http://artifacts.opnfv.org/doctor/docs/index.html#figure-p1 <http://artifacts.opnfv.org/doctor/docs/index.html#figure-p1>* [2] *http://artifacts.opnfv.org/doctor/docs/index.html#figure8 <http://artifacts.opnfv.org/doctor/docs/index.html#figure8>* On Wed, Sep 28, 2016 at 1:27 PM Juvonen, Tomi (Nokia - FI/Espoo) < [email protected]> wrote: > Hi, > > > > As discussed yesterday in the Doctor meeting, there is several ways to > approach the problem and many different aspects. If trying to make > blueprint to OpenStack Nova, there is a window now open to do it couple of > weeks to make it in next Ocata release (or Danube in OPNFV). Not sure if > time to make that, but here is a summary: > > > > 1. The way we use “reset server state” is not the way it is used in > the OpenStack. Force down host doesn’t need resetting servers state. > > Do we want to state that we still want to use it anyway because we want > the notification to have alarm? > > a. Yes: > > 1. Do we want to enhance the functionality to reset servers state > for all servers on a host? > > 2. Do we want force down API to be able to optionally reset server > state for all VMs on host? > > Note! “Get valid server state” was done because the reason that there is > no server specific state changing when there is a host specific fault (as > reset server state is not called). This is why a host_status field was > added for user querying his server to know there is nothing wrong with his > VM, but it is currently down as host is in that state. > > > > b. No: > > We could try to have a change when calling force down host, it would send > a notification about effected VMs (as many notifications as there is > tenants with VMs). > > > > 2. Only inspector knows everything that is needed for different > alarms and it is just overhead to push that information trough for example > Nova to get notification that can translate to alarm. Also we do not get > the right content to alarms anyhow. This leads to a fact that only way to > have things right is to send notification from inspector to notifier to > have right kind of alarms: Tenant specific alarms with their VMs and > separate physical fault alarm (with respect to ETSI GS NFV-IFA 005) > > > > IMHO the only right choice is “2.” Next one would be the “1. / b.”. The > least feasible thing would be to do the “1. / a.”. > > > > Br, > > Tomi >
_______________________________________________ opnfv-tech-discuss mailing list [email protected] https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss
