thaw

Michael Roth Fri, 04 Feb 2011 08:51:43 -0800

On 02/04/2011 05:03 AM, Jes Sorensen wrote:

On 02/03/11 18:41, Michael Roth wrote:

On 02/02/2011 02:48 AM, Jes Sorensen wrote:

Yes very much so[1] - one reason why it would be nice to have virtagent
use threads to execute the actual commands. We should probably add a
flag to agent commands indicating whether they issue disk I/O or not, so
we can block attempts to execute commands that do so, while the guest is
frozen.


**Warning, epic response**

For things like logging and i/o on a frozen system...I agree we'd need
some flag for these kinds of situations. Maybe a disable_logging()
flag....i really don't like this though... I'd imagine even syslogd()
could block virtagent in this type of situation, so that would need to
be disabled as well.


One way to resolve this would be to have the logging handled in it's own
thread, which uses non blocking calls to do the actual logging.
Obviously we'd have to use a non file system based communication method
between the main thread and the logging thread :)

I suspect syslogd() already buffers to some extent. But the problemthere, as well the problem with having our own buffered loggingimplementation, is that we can't rely on being able to log an arbitrarynumber of messages. As some point that interface would need to eitherblock, or stop dropping log messages.

If it blocks, we're deadlocked again. If it drops messages, it's trivialfor someone to flood it with messages after the fsfreeze() to get backinto a state where they can execute RPC without any accounting.

So a seperate logging thread doesn't buy us much, and what we come upwith is rely gonna come down to syslogd:

1) if syslogd blocks, we must disable logging after fsfreeze().Buffering, on syslogd's side or our own, only buys us time. If wedisable logging, we must only allow absolutely required/benign RPCs.fsstatus/fsthaw are required, things like va.ping are benign, andpotentially useful, in this particular situation.copyfile/shutdown/exec/etc should be considered "high-risk"...we want tomake sure these are logged.

2) if syslogd doesnt block, it either drops messages at some point withno indication to us, or it drops them and provides some indication. Ifthere's no indication, we must treat this the same way we treat 1),since we must assume that logging is effectively disabled. So onlyrequired or benign RPCs.

if we get some indication when a call to syslog() fails to log/buffer,we can allow all RPCs, but treat failures to log as a cue to immediatelycleanup and exit the RPC. fsfreeze() under this circumstance will needto make sure it only logs after it does the unfreeze, else we may neverbe able to unfreeze at that point forward.

So the solution is dependent on syslogd's behavior. Ill have to do sometesting to confirm...but I think the above covers the possibilities.

But doing so completely subverts our attempts and providing proper
accounting of what the agent is doing to the user. A user can freeze the
filesystem, knowing that logging would be disabled, then prod at
whatever he wants. So the handling should be something specific to
fsfreeze, with stricter requirements:

If a user calls fsfreeze(), we disable logging, but also disable the
ability to do anything other than fsthaw() or fsstatus(). This actually
solves the potential deadlocking problem for other RPCs as well...since
they cant be executed in the first place.


I disagree that we should block all calls, except for fsfreeze/fsstatus/
fsthaw in this case. There are other calls that could be valid in this
situations, so I think it needs to be evaluated on a case by case basis.

So a solution for these situations is still needed, and I'm starting to
agree that threads are needed, but I don't think we should do RPCs
concurrently (not sure if that's what is being suggested or not). At
least, there's no pressing reason for it as things currently stand
(there aren't currently any RPCs where fast response times are all that
important, so it's okay to serialize them behind previous RPCs, and
HMP/QMP are command at a time), and it's something that Im fairly
confident can be added if the need arises in the future.


Eventually I think we will need to be able to support concurrent RPC
calls. There can be situations where an operation takes a long time
while it is valid to be able to ping the guest agent to verify that it
is still alive etc.

Currently we're limited to 1 RPC at a time by the monitor/hmp/qmp(except for some niche cases where virtagent has multiple requests inflight). Ideally this will remain the interface we expose to users, butthere could be situations where this isn't the case...for instance,other bits of qemu making direct calls into virtagent. I think we canadd support for concurrent RPCs incrementally though...I've thought overthe implementation details a bit and it seems to be fairlystraightforward. I think we need to get the basic use cases down firstthough.


Cheers,
Jes

Re: [Qemu-devel] [PATCH 1/2] Add virtagent file system freeze/thaw

Reply via email to