On Mon, 20 Jun 2011 18:40:38 -0500 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> On 06/20/2011 09:16 AM, Luiz Capitulino wrote: > > On Sun, 19 Jun 2011 14:00:30 -0500 > > Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > > > >> On 06/17/2011 10:25 PM, Luiz Capitulino wrote: > >>> On Fri, 17 Jun 2011 16:25:32 -0500 > >>> Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > >>> > >>>> On 06/17/2011 03:13 PM, Luiz Capitulino wrote: > >>>>> On Fri, 17 Jun 2011 14:21:31 -0500 > >>>>> Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > >>>>> > >>>>>> On 06/16/2011 01:42 PM, Luiz Capitulino wrote: > >>>>>>> On Tue, 14 Jun 2011 15:06:22 -0500 > >>>>>>> Michael Roth<mdr...@linux.vnet.ibm.com> wrote: > >>>>>>> > >>>>>>>> This is the actual guest daemon, it listens for requests over a > >>>>>>>> virtio-serial/isa-serial/unix socket channel and routes them through > >>>>>>>> to dispatch routines, and writes the results back to the channel in > >>>>>>>> a manner similar to QMP. > >>>>>>>> > >>>>>>>> A shorthand invocation: > >>>>>>>> > >>>>>>>> qemu-ga -d > >>>>>>>> > >>>>>>>> Is equivalent to: > >>>>>>>> > >>>>>>>> qemu-ga -c virtio-serial -p > >>>>>>>> /dev/virtio-ports/org.qemu.guest_agent \ > >>>>>>>> -p /var/run/qemu-guest-agent.pid -d > >>>>>>>> > >>>>>>>> Signed-off-by: Michael Roth<mdr...@linux.vnet.ibm.com> > >>>>>>> > >>>>>>> Would be nice to have a more complete description, like explaining > >>>>>>> how to > >>>>>>> do a simple test. > >>>>>>> > >>>>>>> And this can't be built... > >>>>>>> > >>>>>>>> --- > >>>>>>>> qemu-ga.c | 631 > >>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>>>> qga/guest-agent-core.h | 4 + > >>>>>>>> 2 files changed, 635 insertions(+), 0 deletions(-) > >>>>>>>> create mode 100644 qemu-ga.c > >>>>>>>> > >>>>>>>> diff --git a/qemu-ga.c b/qemu-ga.c > >>>>>>>> new file mode 100644 > >>>>>>>> index 0000000..df08d8c > >>>>>>>> --- /dev/null > >>>>>>>> +++ b/qemu-ga.c > >>>>>>>> @@ -0,0 +1,631 @@ > >>>>>>>> +/* > >>>>>>>> + * QEMU Guest Agent > >>>>>>>> + * > >>>>>>>> + * Copyright IBM Corp. 2011 > >>>>>>>> + * > >>>>>>>> + * Authors: > >>>>>>>> + * Adam Litke<agli...@linux.vnet.ibm.com> > >>>>>>>> + * Michael Roth<mdr...@linux.vnet.ibm.com> > >>>>>>>> + * > >>>>>>>> + * This work is licensed under the terms of the GNU GPL, version 2 > >>>>>>>> or later. > >>>>>>>> + * See the COPYING file in the top-level directory. > >>>>>>>> + */ > >>>>>>>> +#include<stdlib.h> > >>>>>>>> +#include<stdio.h> > >>>>>>>> +#include<stdbool.h> > >>>>>>>> +#include<glib.h> > >>>>>>>> +#include<gio/gio.h> > >>>>>>>> +#include<getopt.h> > >>>>>>>> +#include<termios.h> > >>>>>>>> +#include<syslog.h> > >>>>>>>> +#include "qemu_socket.h" > >>>>>>>> +#include "json-streamer.h" > >>>>>>>> +#include "json-parser.h" > >>>>>>>> +#include "qint.h" > >>>>>>>> +#include "qjson.h" > >>>>>>>> +#include "qga/guest-agent-core.h" > >>>>>>>> +#include "qga-qmp-commands.h" > >>>>>>>> +#include "module.h" > >>>>>>>> + > >>>>>>>> +#define QGA_VIRTIO_PATH_DEFAULT > >>>>>>>> "/dev/virtio-ports/org.qemu.guest_agent" > >>>>>>>> +#define QGA_PIDFILE_DEFAULT "/var/run/qemu-va.pid" > >>>>>>>> +#define QGA_BAUDRATE_DEFAULT B38400 /* for isa-serial channels */ > >>>>>>>> +#define QGA_TIMEOUT_DEFAULT 30*1000 /* ms */ > >>>>>>>> + > >>>>>>>> +struct GAState { > >>>>>>>> + const char *proxy_path; > >>>>>>> > >>>>>>> Where is this used? > >>>>>>> > >>>>>> > >>>>>> Nowhere actually. Will remove. > >>>>>> > >>>>>>>> + JSONMessageParser parser; > >>>>>>>> + GMainLoop *main_loop; > >>>>>>>> + guint conn_id; > >>>>>>>> + GSocket *conn_sock; > >>>>>>>> + GIOChannel *conn_channel; > >>>>>>>> + guint listen_id; > >>>>>>>> + GSocket *listen_sock; > >>>>>>>> + GIOChannel *listen_channel; > >>>>>>>> + const char *path; > >>>>>>>> + const char *method; > >>>>>>>> + bool virtio; /* fastpath to check for virtio to deal with > >>>>>>>> poll() quirks */ > >>>>>>>> + GACommandState *command_state; > >>>>>>>> + GLogLevelFlags log_level; > >>>>>>>> + FILE *log_file; > >>>>>>>> + bool logging_enabled; > >>>>>>>> +}; > >>>>>>>> + > >>>>>>>> +static void usage(const char *cmd) > >>>>>>>> +{ > >>>>>>>> + printf( > >>>>>>>> +"Usage: %s -c<channel_opts>\n" > >>>>>>>> +"QEMU Guest Agent %s\n" > >>>>>>>> +"\n" > >>>>>>>> +" -c, --channel channel method: one of unix-connect, > >>>>>>>> virtio-serial, or\n" > >>>>>>>> +" isa-serial (virtio-serial is the default)\n" > >>>>>>>> +" -p, --path channel path (%s is the default for > >>>>>>>> virtio-serial)\n" > >>>>>>>> +" -l, --logfile set logfile path, logs to stderr by default\n" > >>>>>>>> +" -f, --pidfile specify pidfile (default is %s)\n" > >>>>>>>> +" -v, --verbose log extra debugging information\n" > >>>>>>>> +" -V, --version print version information and exit\n" > >>>>>>>> +" -d, --daemonize become a daemon\n" > >>>>>>>> +" -h, --help display this help and exit\n" > >>>>>>>> +"\n" > >>>>>>>> +"Report bugs to<mdr...@linux.vnet.ibm.com>\n" > >>>>>>>> + , cmd, QGA_VERSION, QGA_VIRTIO_PATH_DEFAULT, > >>>>>>>> QGA_PIDFILE_DEFAULT); > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +static void conn_channel_close(GAState *s); > >>>>>>>> + > >>>>>>>> +static const char *ga_log_level_str(GLogLevelFlags level) > >>>>>>>> +{ > >>>>>>>> + switch (level& G_LOG_LEVEL_MASK) { > >>>>>>>> + case G_LOG_LEVEL_ERROR: > >>>>>>>> + return "error"; > >>>>>>>> + case G_LOG_LEVEL_CRITICAL: > >>>>>>>> + return "critical"; > >>>>>>>> + case G_LOG_LEVEL_WARNING: > >>>>>>>> + return "warning"; > >>>>>>>> + case G_LOG_LEVEL_MESSAGE: > >>>>>>>> + return "message"; > >>>>>>>> + case G_LOG_LEVEL_INFO: > >>>>>>>> + return "info"; > >>>>>>>> + case G_LOG_LEVEL_DEBUG: > >>>>>>>> + return "debug"; > >>>>>>>> + default: > >>>>>>>> + return "user"; > >>>>>>>> + } > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +bool ga_logging_enabled(GAState *s) > >>>>>>>> +{ > >>>>>>>> + return s->logging_enabled; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +void ga_disable_logging(GAState *s) > >>>>>>>> +{ > >>>>>>>> + s->logging_enabled = false; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +void ga_enable_logging(GAState *s) > >>>>>>>> +{ > >>>>>>>> + s->logging_enabled = true; > >>>>>>>> +} > >>>>>>> > >>>>>>> Just to check I got this right, this is needed because of the fsfreeze > >>>>>>> command, correct? Isn't it better to have a more descriptive name, > >>>>>>> like > >>>>>>> fsfrozen? > >>>>>>> > >>>>>>> First I thought this was about a log file. Then I realized this was > >>>>>>> probably about letting the user log in, but we don't seem to have this > >>>>>>> functionality so I got confused. > >>>>>>> > >>>>>> > >>>>>> Yup, this is currently due to fsfreeze support. From the perspective of > >>>>>> the fsfreeze command the explicit "is_frozen" check makes more sense, > >>>>>> but the reason it affects other RPCs is because because we can't log > >>>>>> stuff in that state. If an RPC shoots itself in the foot by writing to > >>>>>> a > >>>>>> frozen filesystem we don't really care so much, and up until recently > >>>>>> that case was handled with a pthread_cancel timeout mechanism (was > >>>>>> removed for the time being, will re-implement using a child process > >>>>>> most > >>>>>> likely). > >>>>>> > >>>>>> What we don't want to do is give a host a way to bypass the expectation > >>>>>> we set for guest owners by allowing RPCs to be logged. So that's what > >>>>>> the check is based on, rather than lower level details like *why* > >>>>>> logging is disabled. Also, I'd really like to avoid a precedence where > >>>>>> a > >>>>>> single RPC can place restrictions on all the others, so the logging > >>>>>> aspect seemed general enough that it doesn't necessarily provide that > >>>>>> precedence. > >>>>>> > >>>>>> This has come up a few times without any real consensus. I can probably > >>>>>> go either way, but I think the check for logging is easier to set > >>>>>> expectations with: "if logging is important from an auditing > >>>>>> perspective, don't execute this if logging is disabled". beyond that, > >>>>>> same behavior/symantics you'd get with anything that causes a write() > >>>>>> on > >>>>>> a filesystem in a frozen state: block. > >>>>> > >>>>> While I understand your arguments, I still find the current behavior > >>>>> confusing: you issue a guest-open-file command and get a > >>>>> QgaLoggingFailed > >>>>> error. The first question is: what does a log write fail have to do > >>>>> with me > >>>>> opening a file? The second question is: what should I do? Try again? > >>>>> Give up? > >>>>> > >>>> > >>>> I agree it's a better solution for the client there, but at the same > >>>> time: > >>>> > >>>> guest_privileged_ping(): > >>>> if fsfreeze.status == FROZEN: > >>>> syslog("privileged ping, thought you should know") > >>>> else: > >>>> return "error, filesystem frozen" > >>>> > >>>> Seems silly to the client as well. "Why does a ping command care about > >>>> the filesystem state?" > >>>> > >>>> Inability to log is the true error. That's also the case for the > >>>> guest-file-open command. > >>> > >>> There's a difference. In the guest ping the inability to log is an > >>> internal agent error, it's not interesting to the client. We could just > >>> ignore the error and reply back (unless we define that > >>> guest-privileged-ping > >>> requires writing to disk). > >>> > >> > >> To clarify, these's 2 different types of errors here and their relation > >> to fsfreeze is causing the separation to get munged a bit: > >> > >> 1) Logging/accounting errors: even though we try to make it clear > >> wherever possible this solution is based on a "trusted" hypervisor that > >> already has substantial privileges over guests (direct access to images, > >> etc), one of our internal use cases is the ability for customers to > >> properly account for actions initiated by the guest agent. Shutdowns, > >> whats files were read/written, etc. > > > > This is fine. > > > >> For some commands, this accounting is not important. guest-ping, or > >> guest-info, for instance, is uninteresting from a security accounting > >> perspective. So logging is in fact optional and logging failures are > >> silently ignored. > >> > >> guest-shutdown, guest-file-open, guest-fsfreeze, however, are important. > >> So the QgaLoggingError we currently report really means "this command > >> requires logging, but logging has been disabled". You're right that this > >> is because of fsfreeze, but it's not because of the filesystem state. > >> guest-file-open on a network mount that wasn't frozen would *still*, by > >> design, report an error due to inability to log. > > > > Completely ignoring the file-system state and considering only what you > > say about security, this doesn't make sense to me. Actually, I think it's > > a flaw, because a client could open the daemon's log file and write garbage > > on it. > > > > But, how does the ability to log protects you from anything? I would > > understand > > it's importance if the user had to provide credentials to use the agent, > > but s/he doesn't. It's like you were unable to use a linux box because > > syslogd > > is not running. > > > > This looks like a misfeature to me. If you really want to provide it, then > > you could provide a --enable-log-error which, when enabled, would make the > > daemon not to execute *any* command if the it's able to log its actions. > > Otherwise log errors are just ignored. > > > > Hmm, you're right. I'd thought that if you clobber the log file at least > the RPC that clobbered the log file would get logged, but since we only > log the guest-file-open, and we do that before the guest-file-write, > they could overwrite and fabricate the log file history and nothing > would be obviously suspicious. > > Since we can't build any accounting around something like that I'm fine > with having the logging silently fail in a freeze state for now. > > A better thought out accounting mechanism can be added as an optional > feature later if it comes to that. Yes. > >> For this case I do see your point about the error not being very helpful > >> to users, which is why I think something like: > >> > >> "Guest agent failed to log RPC due to frozen filesystems" > >> > >> Is the best way to report it. If we need to fix up the error id to > >> something like QgaLoggingToFrozenFilesystem I'd be fine with that. > >> > >> 2) EWOULDBLOCK or I/O errors due to writing to frozen filesystems: this > >> is currently treated as a PEBKAC and not enforced/reported at all. I > >> wouldn't be against disabling guest-file-write as a side-effect of > >> freezing, but that's not totally correct. Writes to non-frozen > >> filesystems like networked or sysfs mounts is technically still okay, > >> for instance. We store path information in guest-file-open and use it to > >> scan against fsfreeze's state info about frozen mounts to handle this a > >> little better. > > > > Yes, I agree with you here. > > > > Treating I/O errors/deadlocks due to fsfreeze as a user issue (as things > are currently), or working around it? If the latter, a), b), or c)? Option b) makes sense to me too. > > >> > >> Even then you still have RPCs like guest-shutdown that may do a syslog() > >> that would cause the command to freeze, or in the future we may add the > >> ability for the guest agent to execute user/distro-installed scripts > >> that may/may not need to write to the filesystem. Some these might even > >> be intended to run when the filesystem is frozen...cleanup scripts for > >> databases and whatnot for snapshotting is something I've seen come up. > >> > >> We can > >> > >> a) be very restrictive, which I'm not totally against...my initial > >> inclination was to disable everything except > >> guest-fsfreeze-thaw/guest-fsfreeze-status while frozen, but this has > >> some limitations that aren't as obscure/unlikely as one would hope. > >> > >> b) we can be completely unrestrictive about it and give users the same > >> experiences they'd get at a command-line if they, or another user on the > >> system, was messing with fsfreeze. > >> > >> c) try to capture some common cases, but make it clear that you can > >> still shoot yourself in the foot using the guest agent on a frozen > >> filesystem, evaluating when to enforce this in code on a case-by-case > >> basis. > >> > >> Personally I like b), mostly because it's the least work, but also > >> because it's actually the easiest way to set expectations about fsfreeze. > >> > >>> The guest-file-write command, on the other hand, clearly requires to write > >>> to disk, so a client would expect a EWOULDBLOCK error. > >>> > >>> EWOULDBLOCK looks good to me. We could define as general rule that > >>> commands don't block, so clients should always expect a EWOULDBLOCK. > >>> > >> > >> This falls under 2) > >> > >>>> There's nothing wrong with opening a read-only > >>>> file on a frozen filesystem, it's the fact that we can't log the open > >>>> that causes us to bail out.. > >>> > >>> But why do we bail out? Why can't we just ignore the fact we can't log? > >>> > >> > >> This falls under 1) > >> > >>>> So, what if we just munge the 2 to give the user the proper clues to fix > >>>> things, and instead return an error like: > >>>> > >>>> "Guest agent failed to log RPC (is the filesystem frozen?)"? > >>> > >>> The 'desc' part of an error is a human error descriptor, clients shouldn't > >>> parse it. The real error is the error class. > >>> > >> > >> As mentioned above, we could change the error id itself to capture both > >> the immediate error and the underlying error. > >> > > >