( I have been having a discussion with Adam Spiers on [openstack-dev][vitrage][nova] on this topic ... thought I would switchover to [masakari] )
I am interested in contributing an implementation of Intrusive Instance Monitoring, initially specifically VM Heartbeat / Heath-check Monitoring thru the QEMU Guest Agent (https://wiki.libvirt.org/page/Qemu_guest_agent). I’d like to know whether Masakari project leaders would consider a blueprint on “VM Heartbeat / Health-check Monitoring”. See below for some more details, Greg. ------------------------------------- VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances to Masakari. Briefly, “VM Heartbeat / Health-check Monitoring” · is optionally enabled thru a Nova flavor extra-spec, · is a service that runs on an OpenStack Compute Node, · it sends periodic Heartbeat / Health-check Challenge Requests to a VM over a virtio-serial-device setup between the Compute Node and the VM thru QEMU, ( https://wiki.libvirt.org/page/Qemu_guest_agent ) · on loss of heartbeat or a failed health check status will result in fault event, against the VM, being reported to Masakari and any other registered reporting backends like Mistral, or Vitrage. I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not, but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture. And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking. VM HB/HC Monitoring would leverage https://wiki.libvirt.org/page/Qemu_guest_agent that would require the agent to be installed in the images for talking back to the compute host. ( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management ) Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device. i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up. Wrt failure modes / use-cases · a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing, this alone allows for detection of: o a failed or hung QEMU/KVM instance, or o a failed or hung VM’s OS, or o a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or o a failure of the VM to route basic IO via linux sockets. · I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM (https://libvirt.org/formatdomain.html#elementsWatchdog ) · However, the VM Heartbeat / Health-check Monitoring o provides a higher-level (i.e. application-level) heartbeating • i.e. if the Heartbeat requests are being answered by the Application running within the VM o provides more than just heartbeating, as the Application can use it to trigger a variety of audits, o provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud, o provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Masakari, Mistral and/or Vitrage • e.g. VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager - (StateChange) - Nova - ... - VNF Manager NOTE: perhaps the reporting to Vitrage would be a separate blueprint within Masakari.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev