On Wed, 20 Mar 2013 12:26:30 -0500 mdroth <mdr...@linux.vnet.ibm.com> wrote:
> On Wed, Mar 20, 2013 at 12:58:51PM -0400, Luiz Capitulino wrote: > > On Wed, 20 Mar 2013 11:03:08 -0500 > > mdroth <mdr...@linux.vnet.ibm.com> wrote: > > > > > On Wed, Mar 20, 2013 at 10:54:51AM -0400, Luiz Capitulino wrote: > > > > > > > > On Fri, 1 Mar 2013 11:40:27 -0600 > > > > Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > > > > > > > > > Hosts hold on to handles provided by guest-file-open for periods that > > > > > can > > > > > span beyond the life of the qemu-ga process that issued them. Since > > > > > these > > > > > are issued starting from 0 on every restart, we run the risk of > > > > > issuing > > > > > duplicate handles after restarts/reboots. > > > > > > > > > > As a result, users with a stale copy of these handles may end up > > > > > reading/writing corrupted data due to their existing handles > > > > > effectively > > > > > being re-assigned to an unexpected file or offset. > > > > > > > > > > We unfortunately do not issue handles as strings, but as integers, so > > > > > a > > > > > solution such as using UUIDs can't be implemented without introducing > > > > > a > > > > > new interface. > > > > > > > > > > As a workaround, we fix this by implementing a persistent key-value > > > > > store > > > > > that will be used to track the value of the last handle that was > > > > > issued > > > > > across restarts/reboots to avoid issuing duplicates. > > > > > > > > > > The store is automatically written to the same directory we currently > > > > > set via --statedir to track fsfreeze state, and so should be > > > > > applicable > > > > > for stable releases where this flag is supported. > > > > > > > > Did you consider that statedir is normally set to /var/run which is > > > > cleared by some distros on system reboot? Is this OK? > > > > > > Yes, though it's not ideal. The statedir should be a directory that > > > persists across reboots to completely fix the problem. But using a > > > directory that at least persists until reboot will avoid duplicate handles > > > being issued for qemu-ga restarts. So it's an improvement at least, > > > which is why I think it's worth considering for stable as well. > > > > Agreed. > > > > > > Also, I find it unfortunate that fd_counter never goes down... > > > > > > The required book-keeping makes it just not worth it, imo, but it could > > > be done in the future without breaking compatibility (see below) > > > > > > Not ideal certainly, but I think it's about the best we can do with > > > integer handles unfortunately. > > > > > > > > > > > One more comment below. > > > > > > > > > > > > > > A follow-up can use this same store for handling fsfreeze state, but > > > > > that change is cosmetic and left out for now. > > > > > > > > > > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> > > > > > Cc: qemu-sta...@nongnu.org > > > > > --- > > > > > qga/commands-posix.c | 25 +++++-- > > > > > qga/guest-agent-core.h | 1 + > > > > > qga/main.c | 184 > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > 3 files changed, 204 insertions(+), 6 deletions(-) > > > > > > > > > > diff --git a/qga/commands-posix.c b/qga/commands-posix.c > > > > > index 7a0202e..5d12716 100644 > > > > > --- a/qga/commands-posix.c > > > > > +++ b/qga/commands-posix.c > > > > > @@ -129,14 +129,22 @@ static struct { > > > > > QTAILQ_HEAD(, GuestFileHandle) filehandles; > > > > > } guest_file_state; > > > > > > > > > > -static void guest_file_handle_add(FILE *fh) > > > > > +static uint64_t guest_file_handle_add(FILE *fh, Error **errp) > > > > > { > > > > > GuestFileHandle *gfh; > > > > > + uint64_t handle; > > > > > + > > > > > + handle = ga_get_fd_handle(ga_state, errp); > > > > > + if (error_is_set(errp)) { > > > > > + return 0; > > > > > + } > > > > > > > > > > gfh = g_malloc0(sizeof(GuestFileHandle)); > > > > > - gfh->id = fileno(fh); > > > > > + gfh->id = handle; > > > > > gfh->fh = fh; > > > > > QTAILQ_INSERT_TAIL(&guest_file_state.filehandles, gfh, next); > > > > > + > > > > > + return handle; > > > > > } > > > > > > > > > > static GuestFileHandle *guest_file_handle_find(int64_t id, Error > > > > > **err) > > > > > @@ -158,7 +166,7 @@ int64_t qmp_guest_file_open(const char *path, > > > > > bool has_mode, const char *mode, E > > > > > { > > > > > FILE *fh; > > > > > int fd; > > > > > - int64_t ret = -1; > > > > > + int64_t ret = -1, handle; > > > > > > > > > > if (!has_mode) { > > > > > mode = "r"; > > > > > @@ -184,9 +192,14 @@ int64_t qmp_guest_file_open(const char *path, > > > > > bool has_mode, const char *mode, E > > > > > return -1; > > > > > } > > > > > > > > > > - guest_file_handle_add(fh); > > > > > - slog("guest-file-open, handle: %d", fd); > > > > > - return fd; > > > > > + handle = guest_file_handle_add(fh, err); > > > > > + if (error_is_set(err)) { > > > > > + fclose(fh); > > > > > + return -1; > > > > > + } > > > > > + > > > > > + slog("guest-file-open, handle: %d", handle); > > > > > + return handle; > > > > > } > > > > > > > > > > void qmp_guest_file_close(int64_t handle, Error **err) > > > > > diff --git a/qga/guest-agent-core.h b/qga/guest-agent-core.h > > > > > index 3354598..624a559 100644 > > > > > --- a/qga/guest-agent-core.h > > > > > +++ b/qga/guest-agent-core.h > > > > > @@ -35,6 +35,7 @@ bool ga_is_frozen(GAState *s); > > > > > void ga_set_frozen(GAState *s); > > > > > void ga_unset_frozen(GAState *s); > > > > > const char *ga_fsfreeze_hook(GAState *s); > > > > > +int64_t ga_get_fd_handle(GAState *s, Error **errp); > > > > > > > > > > #ifndef _WIN32 > > > > > void reopen_fd_to_null(int fd); > > > > > diff --git a/qga/main.c b/qga/main.c > > > > > index db281a5..3635430 100644 > > > > > --- a/qga/main.c > > > > > +++ b/qga/main.c > > > > > @@ -15,6 +15,7 @@ > > > > > #include <stdbool.h> > > > > > #include <glib.h> > > > > > #include <getopt.h> > > > > > +#include <glib/gstdio.h> > > > > > #ifndef _WIN32 > > > > > #include <syslog.h> > > > > > #include <sys/wait.h> > > > > > @@ -30,6 +31,7 @@ > > > > > #include "qapi/qmp/qerror.h" > > > > > #include "qapi/qmp/dispatch.h" > > > > > #include "qga/channel.h" > > > > > +#include "qemu/bswap.h" > > > > > #ifdef _WIN32 > > > > > #include "qga/service-win32.h" > > > > > #include <windows.h> > > > > > @@ -53,6 +55,11 @@ > > > > > #endif > > > > > #define QGA_SENTINEL_BYTE 0xFF > > > > > > > > > > +typedef struct GAPersistentState { > > > > > +#define QGA_PSTATE_DEFAULT_FD_COUNTER 1000 > > > > > + int64_t fd_counter; > > > > > +} GAPersistentState; > > > > > + > > > > > struct GAState { > > > > > JSONMessageParser parser; > > > > > GMainLoop *main_loop; > > > > > @@ -76,6 +83,8 @@ struct GAState { > > > > > #ifdef CONFIG_FSFREEZE > > > > > const char *fsfreeze_hook; > > > > > #endif > > > > > + const gchar *pstate_filepath; > > > > > + GAPersistentState pstate; > > > > > }; > > > > > > > > > > struct GAState *ga_state; > > > > > @@ -724,6 +733,171 @@ VOID WINAPI service_main(DWORD argc, TCHAR > > > > > *argv[]) > > > > > } > > > > > #endif > > > > > > > > > > +static void set_persistent_state_defaults(GAPersistentState *pstate) > > > > > +{ > > > > > + g_assert(pstate); > > > > > + pstate->fd_counter = QGA_PSTATE_DEFAULT_FD_COUNTER; > > > > > +} > > > > > + > > > > > +static void persistent_state_from_keyfile(GAPersistentState *pstate, > > > > > + GKeyFile *keyfile) > > > > > +{ > > > > > + g_assert(pstate); > > > > > + g_assert(keyfile); > > > > > + /* if any fields are missing, either because the file was > > > > > tampered with > > > > > + * by agents of chaos, or because the field wasn't present at > > > > > the time the > > > > > + * file was created, the best we can ever do is start over with > > > > > the default > > > > > + * values. so load them now, and ignore any errors in accessing > > > > > key-value > > > > > + * pairs > > > > > + */ > > > > > + set_persistent_state_defaults(pstate); > > > > > + > > > > > + if (g_key_file_has_key(keyfile, "global", "fd_counter", NULL)) { > > > > > + pstate->fd_counter = > > > > > + g_key_file_get_int64(keyfile, "global", "fd_counter", > > > > > NULL); > > > > > + } > > > > > +} > > > > > + > > > > > +static void persistent_state_to_keyfile(const GAPersistentState > > > > > *pstate, > > > > > + GKeyFile *keyfile) > > > > > +{ > > > > > + g_assert(pstate); > > > > > + g_assert(keyfile); > > > > > + > > > > > + g_key_file_set_int64(keyfile, "global", "fd_counter", > > > > > pstate->fd_counter); > > > > > +} > > > > > + > > > > > +static gboolean write_persistent_state(const GAPersistentState > > > > > *pstate, > > > > > + const gchar *path) > > > > > +{ > > > > > + GKeyFile *keyfile = g_key_file_new(); > > > > > + GError *gerr = NULL; > > > > > + gboolean ret = true; > > > > > + gchar *data = NULL; > > > > > + gsize data_len; > > > > > + > > > > > + g_assert(pstate); > > > > > + > > > > > + persistent_state_to_keyfile(pstate, keyfile); > > > > > + data = g_key_file_to_data(keyfile, &data_len, &gerr); > > > > > + if (gerr) { > > > > > + g_critical("failed to convert persistent state to string: > > > > > %s", > > > > > + gerr->message); > > > > > + ret = false; > > > > > + goto out; > > > > > + } > > > > > + > > > > > + g_file_set_contents(path, data, data_len, &gerr); > > > > > + if (gerr) { > > > > > + g_critical("failed to write persistent state to %s: %s", > > > > > + path, gerr->message); > > > > > + ret = false; > > > > > + goto out; > > > > > + } > > > > > + > > > > > +out: > > > > > + if (gerr) { > > > > > + g_error_free(gerr); > > > > > + } > > > > > + if (keyfile) { > > > > > + g_key_file_free(keyfile); > > > > > + } > > > > > + g_free(data); > > > > > + return ret; > > > > > +} > > > > > + > > > > > +static gboolean read_persistent_state(GAPersistentState *pstate, > > > > > + const gchar *path, gboolean > > > > > frozen) > > > > > +{ > > > > > + GKeyFile *keyfile = NULL; > > > > > + GError *gerr = NULL; > > > > > + struct stat st; > > > > > + gboolean ret = true; > > > > > + > > > > > + g_assert(pstate); > > > > > + > > > > > + if (stat(path, &st) == -1) { > > > > > + /* it's okay if state file doesn't exist, but any other error > > > > > + * indicates a permissions issue or some other > > > > > misconfiguration > > > > > + * that we likely won't be able to recover from. > > > > > + */ > > > > > + if (errno != ENOENT) { > > > > > + g_critical("unable to access state file at path %s: %s", > > > > > + path, strerror(errno)); > > > > > + ret = false; > > > > > + goto out; > > > > > + } > > > > > + > > > > > + /* file doesn't exist. initialize state to default values and > > > > > + * attempt to save now. (we could wait till later when we > > > > > have > > > > > + * modified state we need to commit, but if there's a > > > > > problem, > > > > > + * such as a missing parent directory, we want to catch it > > > > > now) > > > > > + * > > > > > + * there is a potential scenario where someone either > > > > > managed to > > > > > + * update the agent from a version that didn't use a key > > > > > store > > > > > + * while qemu-ga thought the filesystem was frozen, or > > > > > + * deleted the key store prior to issuing a fsfreeze, prior > > > > > + * to restarting the agent. in this case we go ahead and > > > > > defer > > > > > + * initial creation till we actually have modified state to > > > > > + * write, otherwise fail to recover from freeze. > > > > > + */ > > > > > + set_persistent_state_defaults(pstate); > > > > > + if (!frozen) { > > > > > + ret = write_persistent_state(pstate, path); > > > > > + if (!ret) { > > > > > + g_critical("unable to create state file at path %s", > > > > > path); > > > > > + ret = false; > > > > > + goto out; > > > > > + } > > > > > + } > > > > > + ret = true; > > > > > + goto out; > > > > > + } > > > > > + > > > > > + keyfile = g_key_file_new(); > > > > > + g_key_file_load_from_file(keyfile, path, 0, &gerr); > > > > > + if (gerr) { > > > > > + g_critical("error loading persistent state from path: %s, > > > > > %s", > > > > > + path, gerr->message); > > > > > + ret = false; > > > > > + goto out; > > > > > + } > > > > > + > > > > > + persistent_state_from_keyfile(pstate, keyfile); > > > > > + > > > > > +out: > > > > > + if (keyfile) { > > > > > + g_key_file_free(keyfile); > > > > > + } > > > > > + if (gerr) { > > > > > + g_error_free(gerr); > > > > > + } > > > > > + > > > > > + return ret; > > > > > +} > > > > > + > > > > > +int64_t ga_get_fd_handle(GAState *s, Error **errp) > > > > > +{ > > > > > + int64_t handle; > > > > > + > > > > > + g_assert(s->pstate_filepath); > > > > > + /* we blacklist commands and avoid operations that potentially > > > > > require > > > > > + * writing to disk when we're in a frozen state. this includes > > > > > opening > > > > > + * new files, so we should never get here in that situation > > > > > + */ > > > > > + g_assert(!ga_is_frozen(s)); > > > > > + > > > > > + handle = s->pstate.fd_counter++; > > > > > + if (s->pstate.fd_counter < 0) { > > > > > + s->pstate.fd_counter = 0; > > > > > + } > > > > > > > > Is this handling the overflow case? Can't fd 0 be in use already? > > > > > > It could, but it's very unlikely that an overflow/counter reset would > > > result in issuing still-in-use handle, since guest-file-open would need > > > to be called 2^63 times in the meantime. > > > > Agreed, but as you do check for that case and as the right fix is simple > > and I think it's worth it. I can send a patch myself. > > > > A patch to switch to tracking a list of issued handles in the keystore, > or to return an error on overflow? To return an error on overflow. Minor, but if we do handle it let's do it right. Or, we could just add an assert like: assert(s->pstate.fd_counter >= 0); > If the former, sounds good, but keep in mind that we need to update on > every guest-file-open/guest-file-close, so we'll want to impose a > reasonable max on the number of outstanding FDs and document it. I would > lean toward something conservation like 1024, can consider bumping it in > the future if that proves insufficient. Maybe in the near future :)