Hi, As discussed yesterday in the Doctor meeting, there is several ways to approach the problem and many different aspects. If trying to make blueprint to OpenStack Nova, there is a window now open to do it couple of weeks to make it in next Ocata release (or Danube in OPNFV). Not sure if time to make that, but here is a summary:
1. The way we use "reset server state" is not the way it is used in the OpenStack. Force down host doesn't need resetting servers state. Do we want to state that we still want to use it anyway because we want the notification to have alarm? a. Yes: 1. Do we want to enhance the functionality to reset servers state for all servers on a host? 2. Do we want force down API to be able to optionally reset server state for all VMs on host? Note! "Get valid server state" was done because the reason that there is no server specific state changing when there is a host specific fault (as reset server state is not called). This is why a host_status field was added for user querying his server to know there is nothing wrong with his VM, but it is currently down as host is in that state. b. No: We could try to have a change when calling force down host, it would send a notification about effected VMs (as many notifications as there is tenants with VMs). 2. Only inspector knows everything that is needed for different alarms and it is just overhead to push that information trough for example Nova to get notification that can translate to alarm. Also we do not get the right content to alarms anyhow. This leads to a fact that only way to have things right is to send notification from inspector to notifier to have right kind of alarms: Tenant specific alarms with their VMs and separate physical fault alarm (with respect to ETSI GS NFV-IFA 005) IMHO the only right choice is "2." Next one would be the "1. / b.". The least feasible thing would be to do the "1. / a.". Br, Tomi From: [email protected] [mailto:[email protected]] On Behalf Of Juvonen, Tomi (Nokia - FI/Espoo) Sent: Wednesday, September 21, 2016 9:52 AM To: [email protected] Subject: Suspected SPAM - [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, I had a lively discussion yesterday with OpenStack Nova cores about the reset server state. At first how to have that by one API call for all VMs on a host (hypervisor) as discussed in DOCTOR-78. But then it came to a question why we actually want the reset server state in the first place. It is not something that need to do if force down a host. If we want a notification about effected VMs and further an alarm, then that is another thing. So if we want that kind of notification, it is then something we should make a spec. Not to reset state to error for each VM on a host that we should not be doing in the first place if error was not on VM, but host level (yes before you ask, Nova can have the working VM state unchanged if host is down. You do not touch VM state if you do not want to do something for the VM or if it was actually the one having error. Yes and you do not want to do anything for the VM itself in all scenarios, but just be happy it comes up again on same host when host comes back.) Again I realize here and what I have said a long ago before we had anything. It will not be possible to make alarms correctly by changing state in Nova and other controllers and then triggering alarm from the notification about those state changes. That will never have what we want for the alarms, while otherwise we sure need to correct states. Even for things we get a notification triggered by state change, we will not have information needed in alarm and surely we do not call APIs in vain, just to have alarm (like reset server state) . We want tenant/VNFM specific alarms to tells which his VMs (virtual resources) are effected by fault and a cause (and surely alarms about physical faults that will not be consumed by tenant/VNFM and other fields needed by ETSI spec). Only way of having this correct for each kind of fault that can appear, is to form all the alarms (notification to form alarm) in the Inspector (Congress or Vitrage). It is the only place that has all the information needed in different scenarios and can make this right and has the minimum delay that is crucial in Telco fault management. Also if looking to have OPNFV used in production and one would need to be OPNFV compliant, it means we need to make things right. I strongly suggest that while we have the way we make alarm as a great step we have achieved so far as proof of concept (changing states and having alarm under 1 second), let's make next steps to go towards having conceptually correct way to achieve this and have correct alarms. Br, Tomi
_______________________________________________ opnfv-tech-discuss mailing list [email protected] https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss
