( I have been having a discussion with Adam Spiers on 
[openstack-dev][vitrage][nova] on this topic ... thought I would switchover to 
[masakari] )

I am interested in contributing an implementation of Intrusive Instance 
Monitoring,
initially specifically VM Heartbeat / Heath-check Monitoring thru the QEMU 
Guest Agent (https://wiki.libvirt.org/page/Qemu_guest_agent).

I’d like to know whether Masakari project leaders would consider a blueprint on 
“VM Heartbeat / Health-check Monitoring”.
See below for some more details,
Greg.

-------------------------------------


VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box 
type monitoring of VMs / Instances to Masakari.

Briefly, “VM Heartbeat / Health-check Monitoring”
·         is optionally enabled thru a Nova flavor extra-spec,
·         is a service that runs on an OpenStack Compute Node,
·         it sends periodic Heartbeat / Health-check Challenge Requests to a VM
over a virtio-serial-device setup between the Compute Node and the VM thru QEMU,
( https://wiki.libvirt.org/page/Qemu_guest_agent )
·         on loss of heartbeat or a failed health check status will result in 
fault event, against the VM, being
reported to Masakari and any other registered reporting backends like Mistral, 
or Vitrage.

I realize this is somewhat in the gray-zone of what a cloud should be 
monitoring or not,
but I believe it provides an alternative for Applications deployed in VMs that 
do not have an external monitoring/management entity like a VNF Manager in the 
MANO architecture.
And even for VMs with VNF Managers, it provides a highly reliable alternate 
monitoring path that does not rely on Tenant Networking.

VM HB/HC Monitoring would leverage  
https://wiki.libvirt.org/page/Qemu_guest_agent
that would require the agent to be installed in the images for talking back to 
the compute host.
( there are other examples of similar approaches in openstack ... the 
murano-agent for installation, the swift-agent for object store management )
Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, 
the messaging path is internal thru a QEMU virtual serial device.  i.e. a very 
simple interface with very few dependencies ... it’s up and available very 
early in VM lifecycle and virtually always up.

Wrt failure modes / use-cases
·         a VM’s response to a Heartbeat Challenge Request can be as simple as 
just ACK-ing,
this alone allows for detection of:
o    a failed or hung QEMU/KVM instance, or
o    a failed or hung VM’s OS, or
o    a failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or
o    a failure of the VM to route basic IO via linux sockets.
·         I have had feedback that this is similar to the virtual hardware 
watchdog of QEMU/KVM (https://libvirt.org/formatdomain.html#elementsWatchdog )
·         However, the VM Heartbeat / Health-check Monitoring
o   provides a higher-level (i.e. application-level) heartbeating
•  i.e. if the Heartbeat requests are being answered by the Application running 
within the VM
o   provides more than just heartbeating, as the Application can use it to 
trigger a variety of audits,
o   provides a mechanism for the Application within the VM to report a Health 
Status / Info back to the Host / Cloud,
o   provides notification of the Heartbeat / Health-check status to 
higher-level cloud entities thru Masakari, Mistral and/or Vitrage
•  e.g.   VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - 
VNF-Manager
                                                                                
- (StateChange) - Nova - ... - VNF Manager

NOTE: perhaps the reporting to Vitrage would be a separate blueprint within 
Masakari.


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to