Hi, Tomi

Thanks for the summary.

I am a bit confused about the difference between the 2. and 1./b.  Would
you please give an example to explain how it would work?

Suppose we have

- tenant-a
  - vm-a on host-a
- tenant-b
  - vm-b on host-a

When a raw failure occurs on host-a, the existing sequence[1] would be

1. Monitor send "host-a failure" event to Inspector
2. Inspector find affected VMs (get all vm in host-a)
3. Inspector resets affected VMs (vm-a and vm-b) to error state
4. Controller request Notifier to notify all
...

I think this is how "*1./a.*" works.

For "*1./b.*" it seems to be close to the alternative sequence in fault
management scenario[2]. Instead of waiting for Controller to send
notification, the Inspector will directly inform the Notifier about it.

Apparently, 5a is mandatory before 5b and 5c. But 5b and 5c. *(alt)* can
be triggered simultaneously with async calls.

If we deploy vitrage as the inspector, VMs state error could be *deduced*
and notified *independently* from "5b. Update State" action. Then the time
required for updating all VMs state would not matter any more.

"*1./b.*" looks good to me but I'd like to hear more on "2."

[1] *http://artifacts.opnfv.org/doctor/docs/index.html#figure-p1
<http://artifacts.opnfv.org/doctor/docs/index.html#figure-p1>*
[2] *http://artifacts.opnfv.org/doctor/docs/index.html#figure8
<http://artifacts.opnfv.org/doctor/docs/index.html#figure8>*

On Wed, Sep 28, 2016 at 1:27 PM Juvonen, Tomi (Nokia - FI/Espoo) <
[email protected]> wrote:

> Hi,
>
>
>
> As discussed yesterday in the Doctor meeting, there is several ways to
> approach the problem and many different aspects. If trying to make
> blueprint to OpenStack Nova, there is a window now open to do it couple of
> weeks to make it in next Ocata release (or Danube in OPNFV). Not sure if
> time to make that, but here is a summary:
>
>
>
> 1.      The way we use “reset server state” is not the way it is used in
> the OpenStack. Force down host doesn’t need resetting servers state.
>
> Do we want to state that we still want to use it anyway because we want
> the notification to have alarm?
>
> a.      Yes:
>
> 1.      Do we want to enhance the functionality to reset servers state
> for all servers on a host?
>
> 2.      Do we want force down API to be able to optionally reset server
> state for all VMs on host?
>
> Note! “Get valid server state” was done because the reason that there is
> no server specific state changing when there is a host specific fault (as
> reset server state is not called). This is why a host_status field was
> added for user querying his server to know there is nothing wrong with his
> VM, but it is currently down as host is in that state.
>
>
>
> b.      No:
>
> We could try to have a change when calling force down host, it would send
> a notification about effected VMs (as many notifications as there is
> tenants with VMs).
>
>
>
> 2.      Only inspector knows everything that is needed for different
> alarms and it is just overhead to push that information trough for example
> Nova to get notification that can translate to alarm. Also we do not get
> the right content to alarms anyhow.  This leads to a fact that only way to
> have things right is to send notification from inspector to notifier to
> have right kind of alarms: Tenant specific alarms with their VMs and
> separate physical fault alarm (with respect to ETSI GS NFV-IFA 005)
>
>
>
> IMHO the only right choice is “2.” Next one would be the “1. / b.”. The
> least feasible thing would be to do the “1. / a.”.
>
>
>
> Br,
>
> Tomi
>
_______________________________________________
opnfv-tech-discuss mailing list
[email protected]
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

Reply via email to