Hi,

As discussed yesterday in the Doctor meeting, there is several ways to approach 
the problem and many different aspects. If trying to make blueprint to 
OpenStack Nova, there is a window now open to do it couple of weeks to make it 
in next Ocata release (or Danube in OPNFV). Not sure if time to make that, but 
here is a summary:


1.      The way we use "reset server state" is not the way it is used in the 
OpenStack. Force down host doesn't need resetting servers state.
Do we want to state that we still want to use it anyway because we want the 
notification to have alarm?

a.      Yes:

1.      Do we want to enhance the functionality to reset servers state for all 
servers on a host?

2.      Do we want force down API to be able to optionally reset server state 
for all VMs on host?
Note! "Get valid server state" was done because the reason that there is no 
server specific state changing when there is a host specific fault (as reset 
server state is not called). This is why a host_status field was added for user 
querying his server to know there is nothing wrong with his VM, but it is 
currently down as host is in that state.


b.      No:
We could try to have a change when calling force down host, it would send a 
notification about effected VMs (as many notifications as there is tenants with 
VMs).


2.      Only inspector knows everything that is needed for different alarms and 
it is just overhead to push that information trough for example Nova to get 
notification that can translate to alarm. Also we do not get the right content 
to alarms anyhow.  This leads to a fact that only way to have things right is 
to send notification from inspector to notifier to have right kind of alarms: 
Tenant specific alarms with their VMs and separate physical fault alarm (with 
respect to ETSI GS NFV-IFA 005)

IMHO the only right choice is "2." Next one would be the "1. / b.". The least 
feasible thing would be to do the "1. / a.".

Br,
Tomi

From: [email protected] 
[mailto:[email protected]] On Behalf Of Juvonen, Tomi 
(Nokia - FI/Espoo)
Sent: Wednesday, September 21, 2016 9:52 AM
To: [email protected]
Subject: Suspected SPAM - [opnfv-tech-discuss] [Doctor] Reset Server State and 
alarms in general

Hi,

I had a lively discussion yesterday with OpenStack Nova cores about the reset 
server state. At first how to have that by one API call for all VMs on a host 
(hypervisor) as discussed in DOCTOR-78. But then it came to a question why we 
actually want the reset server state in the first place. It is not something 
that need to do if force down a host. If we want a notification about effected 
VMs and further an alarm, then that is another thing. So if we want that kind 
of notification, it is then something we should make a spec. Not to reset state 
to error for each VM on a host that we should not be doing in the first place 
if error was not on VM, but host level (yes before you ask, Nova can have the 
working VM state unchanged if host is down. You do not touch VM state if you do 
not want to do something for the VM or if it was actually the one having error. 
Yes and you do not want to do anything for the VM itself in all scenarios, but 
just be happy it comes up again on same host when host comes back.)

Again I realize here and what I have said a long ago before we had anything. It 
will not be possible to make alarms correctly by changing state in Nova and 
other controllers and then triggering alarm from the notification about those 
state changes. That will never have what we want for the alarms, while 
otherwise we sure need to correct states. Even for things we get a notification 
triggered by state change, we will not have information needed in alarm and 
surely we do not call APIs in vain, just to have alarm (like reset server 
state) .

We want tenant/VNFM specific alarms to tells which his VMs (virtual resources) 
are effected by fault and a cause (and surely alarms about physical faults that 
will not be consumed by tenant/VNFM and other fields needed by ETSI spec). Only 
way of having this correct for each kind of fault that can appear, is to form 
all the alarms (notification to form alarm) in the Inspector (Congress or 
Vitrage). It is the only place that has all the information needed in different 
scenarios and can make this right and has the minimum delay that is crucial in 
Telco fault management. Also if looking to have OPNFV used in production and 
one would need to be OPNFV compliant, it means we need to make things right. I 
strongly suggest that while we have the way we make alarm as a great step we 
have achieved so far as proof of concept (changing states and having alarm 
under 1 second), let's make next steps to go towards having conceptually 
correct way to achieve this and have correct alarms.

Br,
Tomi



_______________________________________________
opnfv-tech-discuss mailing list
[email protected]
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

Reply via email to