On Mon, 08 Feb 2010 09:13:37 -0600 Anthony Liguori <anth...@codemonkey.ws> wrote:
> On 02/08/2010 08:56 AM, Daniel P. Berrange wrote: > > On Mon, Feb 08, 2010 at 08:49:20AM -0600, Anthony Liguori wrote: > > > >> On 02/08/2010 08:12 AM, Daniel P. Berrange wrote: > >> > >>> For further backgrou, the key end goal here is that in a QMP client, upon > >>> receipt of the 'RESET' event, we need to reliably& immediately > >>> determine > >>> why it occurred. eg, triggered by watchdog, or by guest OS request. There > >>> are actually 3 possible sequences > >>> > >>> - WATCHDOG + action=reset, followed by RESET. Assuming no intervening > >>> event can occurr, the client can merely record 'WATCHDOG' and > >>> interpret > >>> it when it gets the immediately following 'RESET' event > >>> > >>> - RESET, followed by WATCHDOG + action=reset. The client doesn't know > >>> the reason for the RESET and can't wait arbitrarily for WATCHDOG since > >>> there might never be one arriving. > >>> > >>> - RESET + source=watchdog. Client directly sees the reason > >>> > >>> The second scenario is the one I'd like us to avoid at all costs, since it > >>> will require the client to introduce arbitrary delays in processing events > >>> to determine cause. The first is slightly inconvenient, but doable if we > >>> can assume no intervening events will occur, between WATCHDOG and the > >>> RESET events. The last is obviously simplest for the clients. > >>> > >>> > >> I really prefer the third option but I'm a little concerned that we're > >> throwing events around somewhat haphazardly. > >> > >> So let me ask, why does a client need to determine when a guest reset > >> and why it reset? > >> > > If a guest OS is repeatedly hanging/crashing resulting in the watchdog > > device firing, management software for the host really wants to know about > > that (so that appropriate alerts/action can be taken) and thus needs to > > be able to distinguish this from a "normal" guest OS initiated reboot. > > > > I think that's an argument for having the watchdog events independent of > the reset events. > > The watchdog condition happening is not directly related to the action > the watchdog takes. The watchdog event really belongs in a class events > that are closely associated with a particular device emulation. > > In fact, I think what we're really missing in events today is a notion > of a context. A RESET event is really a CPU event. A watchdog > expiration event is a watchdog event. A connect event is a VNC event > (Spice and chardevs will also generate connect events). This could be done by adding a 'context' member to all the events and then an event would have to be identified by the pair event_name:context. This way we can have the same event_name for events in different contexts. For example: { 'event': DISCONNECT, 'context': 'spice', [...] } { 'event': DISCONNECT, 'context': 'vnc', [...] } Note that today we have VNC_DISCONNECT and will probably have SPICE_DISCONNECT too.