Afek, Ifat (Nokia - IL/Kfar Sava) <[email protected]> wrote:
On 16/05/2017, 4:36, "Sam P" <[email protected]> wrote:Hi Greg, In Masakari [0] for VMHA, we have already implemented some what similar function in masakri-monitors. Masakari-monitors runs on nova-compute node, and monitors the host, process or instance failures. Masakari instance monitor has similar functionality with what you have described. Please see [1] for more details on instance monitoring. [0] https://wiki.openstack.org/wiki/Masakari [1] https://github.com/openstack/masakari-monitors/tree/master/masakarimonitors/instancemonitor Once masakari-monitors detect failures, it will send notifications to masakari-api to take appropriate recovery actions to recover that VM from failures.
You can also find out more about our architectural plans by watching this talk which Sampath and I gave in Boston: https://www.openstack.org/videos/boston-2017/high-availability-for-instances-moving-to-a-converged-upstream-solution The slides are here: https://aspiers.github.io/openstack-summit-2017-boston-compute-ha/ We didn't go into much depth on monitoring and recovery of individual VMs, but as Sampath explained, Masakari already handles both of these.
Hi Greg, Sam, As Vitrage is about correlating alarms that come from different sources, and is not a monitor by itself – I think that it can benefit from information retrieved by both Masakari and Zabbix monitors. Zabbix is already integrated into Vitrage. I don’t know if there are specific tests for VM heartbeat, but I think it is very likely that there are. Regarding Masakari – looking at your documents, I believe that integrating your monitoring information into Vitrage could be quite straight forward.
Yes, this makes sense. Masakari already cleanly decouples monitoring/alerting from automated recovery, so it could support this quite nicely. And the modular converged architecture we explained in the presentation will maintain that clean separation of responsibilities whilst integrating Masakari together with other components such as Pacemaker, Mistral, and maybe Vitrage too. For example whilst so far this thread has been about VM instance monitoring, another area where Vitrage could integrate with Masakari is compute host monitoring. If you watch this part of our presentation where we explained the next generation architecture, you'll see that we propose a new "nova-host-alerter" component which has a driver-based mechanism for alerting different services when a compute host experiences a failure: https://youtu.be/YPKE1guti8E?t=32m43s So one obvious possibility would be to add a driver for Vitrage, so that Vitrage can be alerted when Pacemaker spots a host failure. Similarly, we could extend Pacemaker configurations to alert Vitrage when individual processes such as nova-compute or libvirtd fail. If you would like to discuss any of this further or have any more questions, in addition to this mailing list we are also available to talk on the #openstack-ha IRC channel! Cheers, Adam P.S. I've added the [HA] badge to this thread since this discussion is definitely related to high availability. __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
