Re: [Qemu-devel] [PATCH v2 1/2] QEMUSizedBuffer based QEMUFile
* Eric Blake (ebl...@redhat.com) wrote: > On 08/07/2014 04:24 AM, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > This is based on Stefan and Joel's patch that creates a QEMUFile that goes > > to a memory buffer; from: Hi Eric, I think I agree with most of your points here; and believe I've nailed them in the v3 I've just posted. Dave > > > > http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html > > > > Using the QEMUFile interface, this patch adds support functions for > > operating on in-memory sized buffers that can be written to or read from. > > > > Signed-off-by: Stefan Berger > > Signed-off-by: Joel Schopp > > > > For minor tweeks/rebase I've done to it: > > Signed-off-by: Dr. David Alan Gilbert > > --- > > include/migration/qemu-file.h | 28 +++ > > include/qemu/typedefs.h | 1 + > > qemu-file.c | 410 > > ++ > > 3 files changed, 439 insertions(+) > > > > > > > +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len); > > +QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *); > > +void qsb_free(QEMUSizedBuffer *); > > +size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length); > > Just on the prototype, I wonder if this should return ssize_t, to allow > for the possibility of failure... > > > +/** > > + * Create a QEMUSizedBuffer > > + * This type of buffer uses scatter-gather lists internally and > > + * can grow to any size. Any data array in the scatter-gather list > > + * can hold different amount of bytes. > > + * > > + * @buffer: Optional buffer to copy into the QSB > > + * @len: size of initial buffer; if @buffer is given, buffer must > > + * hold at least len bytes > > + * > > + * Returns a pointer to a QEMUSizedBuffer > > + */ > > +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len) > > +{ > > +QEMUSizedBuffer *qsb; > > +size_t alloc_len, num_chunks, i, to_copy; > > +size_t chunk_size = (len > QSB_MAX_CHUNK_SIZE) > > +? QSB_MAX_CHUNK_SIZE > > +: QSB_CHUNK_SIZE; > > + > > +if (len == 0) { > > +/* we want to allocate at least one chunk */ > > +len = QSB_CHUNK_SIZE; > > +} > > So if I call qsb_create("", 0), len is now QSB_CHUNK_SIZE... > > > + > > +num_chunks = DIV_ROUND_UP(len, chunk_size); > > ...num_chunks is now 1... > > > +alloc_len = num_chunks * chunk_size; > > + > > +qsb = g_new0(QEMUSizedBuffer, 1); > > +qsb->iov = g_new0(struct iovec, num_chunks); > > +qsb->n_iov = num_chunks; > > + > > +for (i = 0; i < num_chunks; i++) { > > +qsb->iov[i].iov_base = g_malloc0(chunk_size); > > +qsb->iov[i].iov_len = chunk_size; > > +if (buffer) { > > ...we enter the loop, and have a buffer to copy... > > > +to_copy = (len - qsb->used) > chunk_size > > + ? chunk_size : (len - qsb->used); > > +memcpy(qsb->iov[i].iov_base, &buffer[qsb->used], to_copy); > > ...and proceed to dereference QSB_CHUNK_SIZE bytes beyond the end of the > empty string that we are converting to a buffer. Oops. > > Might be as simple as adding if (buffer) { assert(len); } before > modifying len. > > > +/** > > + * Set the length of the buffer; the primary usage of this > > + * function is to truncate the number of used bytes in the buffer. > > + * The size will not be extended beyond the current number of > > + * allocated bytes in the QEMUSizedBuffer. > > + * > > + * @qsb: A QEMUSizedBuffer > > + * @new_len: The new length of bytes in the buffer > > + * > > + * Returns the number of bytes the buffer was truncated or extended > > + * to. > > Confusing; I'd suggest: > > Returns the resulting (possibly new) size of the buffer. Oh, and my > earlier question appears to be answered - this function can't fail. > > > +ssize_t qsb_get_buffer(const QEMUSizedBuffer *qsb, off_t start, > > + size_t count, uint8_t **buf) > > +{ > > +uint8_t *buffer; > > +const struct iovec *iov; > > +size_t to_copy, all_copy; > > +ssize_t index; > > +off_t s_off; > > +off_t d_off = 0; > > +char *s; > > + > > +if (start > qsb->used) { > > +r
Re: [Qemu-devel] [PATCH v3 1/2] QEMUSizedBuffer based QEMUFile
* Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: > From: "Dr. David Alan Gilbert" > > This is based on Stefan and Joel's patch that creates a QEMUFile that goes > to a memory buffer; from: Actually, just spotted a bug in this; v4 coming shortly. Dave > > http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html > > Using the QEMUFile interface, this patch adds support functions for > operating on in-memory sized buffers that can be written to or read from. > > Signed-off-by: Stefan Berger > Signed-off-by: Joel Schopp > > For fixes/tweeks I've done: > Signed-off-by: Dr. David Alan Gilbert > --- > include/migration/qemu-file.h | 28 +++ > include/qemu/typedefs.h | 1 + > qemu-file.c | 457 > ++ > 3 files changed, 486 insertions(+) > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index c90f529..6ef8ebc 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -25,6 +25,8 @@ > #define QEMU_FILE_H 1 > #include "exec/cpu-common.h" > > +#include > + > /* This function writes a chunk of data to a file at the given position. > * The pos argument can be ignored if the file is only being used for > * streaming. The handler should try to write all of the data it can. > @@ -94,11 +96,21 @@ typedef struct QEMUFileOps { > QEMURamSaveFunc *save_page; > } QEMUFileOps; > > +struct QEMUSizedBuffer { > +struct iovec *iov; > +size_t n_iov; > +size_t size; /* total allocated size in all iov's */ > +size_t used; /* number of used bytes */ > +}; > + > +typedef struct QEMUSizedBuffer QEMUSizedBuffer; > + > QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops); > QEMUFile *qemu_fopen(const char *filename, const char *mode); > QEMUFile *qemu_fdopen(int fd, const char *mode); > QEMUFile *qemu_fopen_socket(int fd, const char *mode); > QEMUFile *qemu_popen_cmd(const char *command, const char *mode); > +QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input); > int qemu_get_fd(QEMUFile *f); > int qemu_fclose(QEMUFile *f); > int64_t qemu_ftell(QEMUFile *f); > @@ -111,6 +123,22 @@ void qemu_put_byte(QEMUFile *f, int v); > void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, int size); > bool qemu_file_mode_is_not_valid(const char *mode); > > +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len); > +QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *); > +void qsb_free(QEMUSizedBuffer *); > +size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length); > +size_t qsb_get_length(const QEMUSizedBuffer *qsb); > +ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count, > + uint8_t *buf); > +ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf, > + off_t pos, size_t count); > + > + > +/* > + * For use on files opened with qemu_bufopen > + */ > +const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f); > + > static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v) > { > qemu_put_byte(f, (int)v); > diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h > index 5f20b0e..db1153a 100644 > --- a/include/qemu/typedefs.h > +++ b/include/qemu/typedefs.h > @@ -60,6 +60,7 @@ typedef struct PCIEAERLog PCIEAERLog; > typedef struct PCIEAERErr PCIEAERErr; > typedef struct PCIEPort PCIEPort; > typedef struct PCIESlot PCIESlot; > +typedef struct QEMUSizedBuffer QEMUSizedBuffer; > typedef struct MSIMessage MSIMessage; > typedef struct SerialState SerialState; > typedef struct PCMCIACardState PCMCIACardState; > diff --git a/qemu-file.c b/qemu-file.c > index a8e3912..9b83991 100644 > --- a/qemu-file.c > +++ b/qemu-file.c > @@ -878,3 +878,460 @@ uint64_t qemu_get_be64(QEMUFile *f) > v |= qemu_get_be32(f); > return v; > } > + > +#define QSB_CHUNK_SIZE (1 << 10) > +#define QSB_MAX_CHUNK_SIZE (16 * QSB_CHUNK_SIZE) > + > +/** > + * Create a QEMUSizedBuffer > + * This type of buffer uses scatter-gather lists internally and > + * can grow to any size. Any data array in the scatter-gather list > + * can hold different amount of bytes. > + * > + * @buffer: Optional buffer to copy into the QSB > + * @len: size of initial buffer; if @buffer is given, buffer must > + * hold at least len bytes > + * > + * Returns a pointer to a QEMUSizedBuffer or NULL on allocation failure > + */ > +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len) > +{ > +QEMUSizedBuffer *qsb; > +size_t alloc_len, num_chunks, i, to_copy; > +
Re: [Qemu-devel] [PATCH v4] Add machine parameter qemu-kvm-migration for live migrate compatibility with qemu-kvm
* Michael Tokarev (m...@tls.msk.ru) wrote: > 22.09.2014 23:34, Alex Bligh wrote: > > This patch series adds inbound migrate capability from qemu-kvm version > > 1.0. [...] > > Isn't it quite a bit too late already? That's an old version by > now, and supporting migration from it is interesting for long-term > support distributions - like redhat for example, with several > years of release cycle. Is it really necessary at this time to > add this ability to upstream qemu? > > It'd be very nice to have this capability right when qemu-kvm > tree has been merged (or even before that), but it's been some > years ago already. It's amazing what different combinations of QEMU people have out there; and it seems reasonable to let Alex fix this problem if it helps him; there's no reason to deny others the same fix. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] -machine vmport=off: Allow disabling of VMWare ioport emulation
* Slutz, Donald Christopher (dsl...@verizon.com) wrote: > What is happening with this patch? I would like to use this code. I need to rework it for the new machine types code; but it was pretty low down my list of priorities; but I can try and get a minute for it again. Dave > >-Don Slutz > > > From: qemu-devel-bounces+don=cloudswitch@nongnu.org > [qemu-devel-bounces+don=cloudswitch@nongnu.org] on behalf of Gerd > Hoffmann [kra...@redhat.com] > Sent: Tuesday, May 20, 2014 6:10 AM > To: Richard W.M. Jones > Cc: m...@redhat.com; qemu-devel@nongnu.org; arm...@redhat.com; Dr. David Alan > Gilbert; aligu...@amazon.com; Anthony PERARD > Subject: Re: [Qemu-devel] [PATCH] -machine vmport=off: Allow disabling of > VMWare ioport emulation > > Hi, > > > It was disabled in this patch. The commit message is saying that > > vmport cannot work in Xen, but I'm not exactly clear why. > > > > commit 1611977c3d8fdbdac6090cbd1fcee4aed6d9 > > Author: Anthony PERARD > > Date: Tue May 3 17:06:54 2011 +0100 > > > > pc, Disable vmport initialisation with Xen. > > > > This is because there is not synchronisation of the vcpu register > > between Xen and QEMU, so vmport can't work properly. > > Ah, ok. The backdoor has side effects (writing the port does modify > vcpu registers). That is the bit which is problematic for xen. Scratch > the idea then. > > Original patch is fine. > > Reviewed-by: Gerd Hoffmann > > cheers, > Gerd > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] -machine vmport=off: Allow disabling of VMWare ioport emulation
* Slutz, Donald Christopher (dsl...@verizon.com) wrote: > On 09/25/14 11:07, Dr. David Alan Gilbert wrote: > > * Slutz, Donald Christopher (dsl...@verizon.com) wrote: > >> What is happening with this patch? I would like to use this code. > > I need to rework it for the new machine types code; but it was pretty > > low down my list of priorities; but I can try and get a minute for it > > again. > > Ok, I did not see any mail about this. If I am reading this right you mean > like I did in > > commit c87b1520726f7ae1e698a41f07043d1b539ac88c > > Do you want me to attempt to "port" this patch this way? To be honest I hadn't got that far in figuring the new stuff out, but yes I think so please check with m...@redhat.com (cc'd); he was pointing me in the direction of those changes I think. But if you want to do the update, please feel free. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 1/2] QEMUSizedBuffer based QEMUFile
* Eric Blake (ebl...@redhat.com) wrote: > On 09/17/2014 05:26 AM, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > This is based on Stefan and Joel's patch that creates a QEMUFile that goes > > to a memory buffer; from: > > > > http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg05036.html > > Sheesh - a year and a half ago, and still not ready to merge. Yes; I'm dragging it screaming, hopefully over the line. > > Using the QEMUFile interface, this patch adds support functions for > > operating on in-memory sized buffers that can be written to or read from. > > > > Signed-off-by: Stefan Berger > > Signed-off-by: Joel Schopp > > > > For fixes/tweeks I've done: > > Signed-off-by: Dr. David Alan Gilbert > > --- > > include/migration/qemu-file.h | 28 +++ > > include/qemu/typedefs.h | 1 + > > qemu-file.c | 457 > > ++ > > 3 files changed, 486 insertions(+) > > > > > +QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len) > > > +for (i = 0; i < num_chunks; i++) { > > +qsb->iov[i].iov_base = g_try_malloc0(chunk_size); > > +if (!qsb->iov[i].iov_base) { > > +size_t j; > > + > > +for (j = 0; j < i; j++) { > > +g_free(qsb->iov[j].iov_base); > > +} > > +g_free(qsb->iov); > > +g_free(qsb); > > +return NULL; > > Rather than inlining all this cleanup, you could just call > qsb_free(qsb). But I can live with this version too. Done; I was wary at first, but I checked and since qsb_free uses g_free, and g_free is defined to ignore NULL pointers, it's OK to be used at this point. > > +/** > > + * Grow the QEMUSizedBuffer to the given size and allocated > > + * memory for it. > > s/allocated/allocate/ ? Fixed. > > > + * > > + * @qsb: A QEMUSizedBuffer > > + * @new_size: The new size of the buffer > > + * > > + * Returns an error code in case of memory allocation failure > > s/an error code/a negative error code/ ? Done; reformatted that text to lay the 'or' out more clearly. > > + * or the new size of the buffer otherwise. The returned size > > + * may be greater or equal to @new_size. > > + */ > > +static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size) > > +{ > > +size_t needed_chunks, i; > > + > > +if (qsb->size < new_size) { > > +struct iovec *new_iov; > > +size_t size_diff = new_size - qsb->size; > > +size_t chunk_size = (size_diff > QSB_MAX_CHUNK_SIZE) > > + ? QSB_MAX_CHUNK_SIZE : QSB_CHUNK_SIZE; > > + > > +needed_chunks = DIV_ROUND_UP(size_diff, chunk_size); > > + > > +new_iov = g_try_malloc_n(qsb->n_iov + needed_chunks, > > + sizeof(struct iovec)); > > Indentation is off. Done. > > + > > +const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f) > > +{ > > +QEMUBuffer *p; > > + > > +qemu_fflush(f); > > + > > +p = (QEMUBuffer *)f->opaque; > > Cast is not necessary (this is C, after all, not C++). Done. > > + > > +return p->qsb; > > +} > > + > > +static const QEMUFileOps buf_read_ops = { > > +.get_buffer = buf_get_buffer, > > +.close = buf_close > > +}; > > I think we prefer trailing commas, if only so that future additions > don't have to modify existing lines. > > > + > > +static const QEMUFileOps buf_write_ops = { > > +.put_buffer = buf_put_buffer, > > +.close = buf_close > > +}; > > and again. Done. > > > + > > +QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input) > > +{ > > +QEMUBuffer *s; > > + > > +if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || mode[1] != > > 0) { > > I prefer '\0' over 0 when comparing characters. Or shorthand such as > '|| *mode[1]'. But it works as is. Done as '\0' > > +error_report("qemu_bufopen: Argument validity check failed"); > > +return NULL; > > +} > > + > > +s = g_malloc0(sizeof(QEMUBuffer)); > > +if (mode[0] == 'r') { > > +s->qsb = input; > > +} > > + > > +if (s->qsb == NULL) { > > +
Re: [Qemu-devel] [PATCHv2 1/2] util: introduce bitmap_try_new
* Peter Lieven (p...@kamp.de) wrote: > regular bitmap_new simply aborts if the memory allocation fails. > bitmap_try_new returns NULL on failure and allows for proper > error handling. > > Signed-off-by: Peter Lieven > --- > include/qemu/bitmap.h | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h > index 1babd5d..edf4f17 100644 > --- a/include/qemu/bitmap.h > +++ b/include/qemu/bitmap.h > @@ -88,10 +88,19 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned > long *bitmap1, > int slow_bitmap_intersects(const unsigned long *bitmap1, > const unsigned long *bitmap2, long bits); > > -static inline unsigned long *bitmap_new(long nbits) > +static inline unsigned long *bitmap_try_new(long nbits) > { > long len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); > -return g_malloc0(len); > +return g_try_malloc0(len); > +} > + > +static inline unsigned long *bitmap_new(long nbits) > +{ > +unsigned long *ptr = bitmap_try_new(nbits); > +if (ptr == NULL) { > +abort(); > +} > +return ptr; > } This seems like a good idea; it's probably a good idea to deprecate use of bitmap_new and get rid of the other uses in the longterm (there aren't that many). size_t would be a nice type for the nbits; could you make your bitmap_try_new use that and then fix up those users as we convert them to use the try? Dave > > static inline void bitmap_zero(unsigned long *dst, long nbits) > -- > 1.7.9.5 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] qemu-file: Add copyright header to qemu-file.c
* Eduardo Habkost (ehabk...@redhat.com) wrote: > The person who created qemu-file.c (me, on commit > 093c455a8c6d8f715eabd8c8d346f08f17d686ec) didn't add a copyright/license > header to the file, even though the whole code was copied from savevm.c > (which had a copyright/license header). > > To correct this, copy the copyright information and license from > savevm.c, that's where the original code came from. > > Luckily, very few changes were made on qemu-file.c after it was created. > All the authors who touched the code are being CCed, so they can confirm > if they are OK with the copyright/license information being added. Since it's come from vl.c that seems to make sense. It should probably go to the stable branch as well, since reading the ~/LICENSE file would make you think it's GPLd. Reviewed-by: Dr. David Alan Gilbert > Cc: Dr. David Alan Gilbert > Cc: Alexey Kardashevskiy > Cc: Markus Armbruster > Cc: Juan Quintela > Signed-off-by: Eduardo Habkost > --- > qemu-file.c | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/qemu-file.c b/qemu-file.c > index a8e3912..bd5d4af 100644 > --- a/qemu-file.c > +++ b/qemu-file.c > @@ -1,3 +1,26 @@ > +/* > + * QEMU System Emulator > + * > + * Copyright (c) 2003-2008 Fabrice Bellard > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > #include "qemu-common.h" > #include "qemu/iov.h" > #include "qemu/sockets.h" > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
I've updated our github at: https://github.com/orbitfp7/qemu/tree/wp3-postcopy to have this version. and it corresponds to the tag: https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4 Dave * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This is the 4th cut of my version of postcopy; it is designed for use with > the Linux kernel additions just posted by Andrea Arcangeli here: > > http://marc.info/?l=linux-kernel&m=141235633015100&w=2 > > (Note: This is a new version compared to my previous postcopy patchset; you'll > need to update the kernel to the new version.) > > Other than the new kernel ABI (which is only a small change to the userspace > side); > the major changes are; > > a) Code for host page size != target page size > b) Support for migration over fd > From Cristian Klein; this is for libvirt support which Cristian recently > posted to the libvirt list. > c) It's now build bisectable and builds on 32bit > > Testing wise; I've now done many thousand of postcopy migrations without > failure (both of idle and busy guests); so it seems pretty solid. > > Must-TODO's: > 1) A partially repeatable migration_cancel failure > 2) virt_test's migrate.with_reboot test is failing > 3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than > the source feels like it needs looking at for postcopy. > 4) Paolo's comments with respect to the wakeup_request/is_running code > in the migration thread > 5) xbzrle needs disabling once in postcopy > > Later-TODO's: > 1) Control the rate of background page transfers during postcopy to > reduce their impact on the latency of postcopy requests. > 2) Work with RDMA > 3) Could destination RP be made blocking (as per discussion with Paolo; > I'm still worried that that changes too many assumptions) > > > > V4: > Initial support for host page size != target page size > - tested heavily on hps==tps > - only partially tested on hps!=tps systems > - This involved quite a bit of rework around the discard code > Updated to new kernel userfault ABI > - It won't work with the previous version > Fix mis-optimisation of postcopy request for wrong RAMBlock > request for block A offset n > un-needed fault for block B/m (already received - no req sent) > request for block B/l - wrongly sent as request for A/l > Fix thinko in discard bitmap processing (missed last word of bitmap) > Symptom: remap failures near the top of RAM if postcopy started late > Fix bug that caused kernel page acknowledgments to be misaligned > May have meant the guest was paused for longer than required > Fix potential for crashing cleaning up failed RP > Fixes in docs (from Yang) > Handle migration by fd as sockets if they are sockets > Build tested on 32bit > Fully build bisectable (x86-64) > > > Dave > > Cristian Klein (1): > Handle bi-directional communication for fd migration > > Dr. David Alan Gilbert (46): > QEMUSizedBuffer based QEMUFile > Tests: QEMUSizedBuffer/QEMUBuffer > Start documenting how postcopy works. > qemu_ram_foreach_block: pass up error value, and down the ramblock > name > improve DPRINTF macros, add to savevm > Add qemu_get_counted_string to read a string prefixed by a count byte > Create MigrationIncomingState > socket shutdown > Provide runtime Target page information > Return path: Open a return path on QEMUFile for sockets > Return path: socket_writev_buffer: Block even on non-blocking fd's > Migration commands > Return path: Control commands > Return path: Send responses from destination to source > Return path: Source handling of return path > qemu_loadvm errors and debug > ram_debug_dump_bitmap: Dump a migration bitmap as text > Rework loadvm path for subloops > Add migration-capability boolean for postcopy-ram. > Add wrappers and handlers for sending/receiving the postcopy-ram > migration messages. > QEMU_VM_CMD_PACKAGED: Send a packaged chunk of migration stream > migrate_init: Call from savevm > Allow savevm handlers to state whether they could go into postcopy > postcopy: OS support test > migrate_start_postcopy: Command to trigger transition to postcopy > MIG_STATE_POSTCOPY_ACTIVE: Add new migration state > qemu_savevm_state_complete: Postcopy changes > Postcopy page-map-incoming (PMI) structure > Postcopy: Maintain sentmap and calculate discard > postcopy: Incoming initialisation > postcopy: ram_enable_notify to switc
Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages
* Linus Torvalds (torva...@linux-foundation.org) wrote: > On Fri, Oct 3, 2014 at 10:08 AM, Andrea Arcangeli wrote: > > > > Overall this looks a fairly small change to the rmap code, notably > > less intrusive than the nonlinear vmas created by remap_file_pages. > > Considering that remap_file_pages() was an unmitigated disaster, and > -mm has a patch to remove it entirely, I'm not at all convinced this > is a good argument. > > We thought remap_file_pages() was a good idea, and it really really > really wasn't. Almost nobody used it, why would the anonymous page > case be any different? I've posted code that uses this interface to qemu-devel and it works nicely; so chalk up at least one user. For the postcopy case I'm using it for, we need to place a page, atomically some thread might try and access it, and must either 1) get caught by userfault etc or 2) must succeed in it's access and we'll have that happening somewhere between thousands and millions of times to pages in no particular order, so we need to avoid creating millions of mappings. Dave > > Linus -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Fullscreen Bug with GTK
* Gerd Hoffmann (kra...@redhat.com) wrote: > Hi, > > > There is a bug where the GTK (SDL appears to be broken due to some sdl2 > > incompatibility, as I understood) menu bar won't hide in fullscreen: > > SDL-1 support is still there. Are you recommending building against SDL-1 even when SDL-2 is available? (Does that lose you anything?) (I've cc'd Cole in since Fedora's virt-preview is building it with SDL-2). > > https://bugs.launchpad.net/qemu/+bug/1294898 > > > > A patch was provided by the initial reporter a long time ago. > > Patch trades one bug for another. Hiding the menu bar also kills the > accelerator keys, which is especially problematic for the fullscreen > hotkey as there is no way out of fullscreen mode then. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 0/1] -machine vmport=off: Allow disabling of VMWare ioport emulation
* Don Slutz (dsl...@verizon.com) wrote: > Changes v1 to v2 (Don Slutz): > make vmport a pc & q35 only machine opt I.E. a machine property. > > Dr. David Alan Gilbert (1): > -machine vmport=off: Allow disabling of VMWare ioport emulation Thanks for updating this! Dave > > hw/i386/pc.c | 19 +++ > hw/i386/pc_piix.c| 4 ++-- > hw/i386/pc_q35.c | 3 ++- > include/hw/i386/pc.h | 2 ++ > qemu-options.hx | 3 +++ > vl.c | 4 > 6 files changed, 32 insertions(+), 3 deletions(-) > > -- > 1.8.4 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 38/47] Add assertion to check migration_dirty_pages
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > > > I've seen it go negative once during dev, it shouldn't > > happen. > > You can move it earlier, perhaps even as patch 1, since it does not have > any dependency on postcopy and can go in at any time. OK, I moved it to the 2nd patch - just after the docs (Eric previously said he liked those at the start of a patch set). Dave > > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 32/47] postcopy: ram_enable_notify to switch on userfault
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +static int postcopy_ram_sensitise_area(const char *block_name, void > > *host_addr, > > + ram_addr_t offset, ram_addr_t > > length, > > + void *opaque) > > Weird name, and I'm not referring to the British -ise. :) > > Perhaps ram_block_enable_userfault or ram_block_enable_notify? It helps > clarity to limit the use of the "postcopy_ram_" prefix for static function. Yep, that's fair enough; I'll make it ram_block_enable_notify. Dave > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 20/47] Add migration-capability boolean for postcopy-ram.
* Eric Blake (ebl...@redhat.com) wrote: > On 10/03/2014 11:47 AM, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > Signed-off-by: Dr. David Alan Gilbert > > Reviewed-by: Eric Blake > > --- > > include/migration/migration.h | 1 + > > migration.c | 9 + > > qapi-schema.json | 6 +- > > 3 files changed, 15 insertions(+), 1 deletion(-) > > > > > # > > +# @x-postcopy-ram: Start executing on the migration target before all of > > RAM has been > > +# migrated, pulling the remaining pages along as needed. NOTE: If > > the > > +# migration fails during postcopy the VM will fail. (since 2.2) > > +# > > # Since: 1.2 > > ## > > { 'enum': 'MigrationCapability', > > - 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks'] } > > + 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > > 'x-postcopy-ram'] } > > Can we wrap this to keep things in 80 columns? Done. > Also, the question was > raised on the libvirt list on whether the interface is stable enough to > name this 'postcopy-ram' from the get-go (rather than marking the > interface experimental), so that libvirt can start using it sooner. I'm still nervous about that, what I intend to do is add one patch at the end of the series that removes the x- so that can get discussed separately. While I'm confident that the interface to libvirt is stable, removing the x- declares that the whole thing is stable and I then have to maintain migration compatibility; and it seemed sensible to let people try it for a release; however if libvirt have no way to support QEMUs ability to have experimental features, I guess no one is actually going to try it, which is very disappointing. Dave > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 00/47] Postcopy implementation
* Cristian Klein (cristian.kl...@cs.umu.se) wrote: > On 04 Oct 2014, at 4:21 , Dr. David Alan Gilbert wrote: > > > > > I've updated our github at: > > https://github.com/orbitfp7/qemu/tree/wp3-postcopy > > > > to have this version. > > > > and it corresponds to the tag: > > https://github.com/orbitfp7/qemu/releases/tag/wp3-postcopy-v4 > > Hi Dave, > > I just tested this version of post-copy using the libvirt patches I recently > posted and it works a lot better. The video streaming VM migrates with a > downtime of less than 1 second. Before post-copy finishes, the VM is a bit > slow but otherwise running well. > > I also tested the patches with a VM doing ?ping? and the downtime was around > 0.6 seconds. I suspect that this delay could be caused by libvirt and not by > qemu. Notice that, libvirt is a bit special, in the sense that the VM is > migrated in suspended state and resumed only after the network was set up on > the destination. I will investigate and let you know. That's great news - although I'm not quite sure what caused the improvement, there were quite a few minor bug fixes and things but nothing that I can think of that would directly contribute (except the patches I'd sent you which you'd already tried). Dave > > Cristian > > > * Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: > >> From: "Dr. David Alan Gilbert" > >> > >> Hi, > >> This is the 4th cut of my version of postcopy; it is designed for use with > >> the Linux kernel additions just posted by Andrea Arcangeli here: > >> > >> http://marc.info/?l=linux-kernel&m=141235633015100&w=2 > >> > >> (Note: This is a new version compared to my previous postcopy patchset; > >> you'll > >> need to update the kernel to the new version.) > >> > >> Other than the new kernel ABI (which is only a small change to the > >> userspace side); > >> the major changes are; > >> > >> a) Code for host page size != target page size > >> b) Support for migration over fd > >> From Cristian Klein; this is for libvirt support which Cristian > >> recently > >> posted to the libvirt list. > >> c) It's now build bisectable and builds on 32bit > >> > >> Testing wise; I've now done many thousand of postcopy migrations without > >> failure (both of idle and busy guests); so it seems pretty solid. > >> > >> Must-TODO's: > >> 1) A partially repeatable migration_cancel failure > >> 2) virt_test's migrate.with_reboot test is failing > >> 3) The ACPI fix in 2.1 that allowed migrating RAMBlocks to be larger than > >>the source feels like it needs looking at for postcopy. > >> 4) Paolo's comments with respect to the wakeup_request/is_running code > >> in the migration thread > >> 5) xbzrle needs disabling once in postcopy > >> > >> Later-TODO's: > >> 1) Control the rate of background page transfers during postcopy to > >> reduce their impact on the latency of postcopy requests. > >> 2) Work with RDMA > >> 3) Could destination RP be made blocking (as per discussion with Paolo; > >> I'm still worried that that changes too many assumptions) > >> > >> > >> > >> V4: > >> Initial support for host page size != target page size > >>- tested heavily on hps==tps > >>- only partially tested on hps!=tps systems > >>- This involved quite a bit of rework around the discard code > >> Updated to new kernel userfault ABI > >>- It won't work with the previous version > >> Fix mis-optimisation of postcopy request for wrong RAMBlock > >> request for block A offset n > >> un-needed fault for block B/m (already received - no req sent) > >> request for block B/l - wrongly sent as request for A/l > >> Fix thinko in discard bitmap processing (missed last word of bitmap) > >> Symptom: remap failures near the top of RAM if postcopy started late > >> Fix bug that caused kernel page acknowledgments to be misaligned > >> May have meant the guest was paused for longer than required > >> Fix potential for crashing cleaning up failed RP > >> Fixes in docs (from Yang) > >> Handle migration by fd as sockets if they are sockets > >> Build tested on 32bit > >> Fully build bisectable (x86-64) > >> > >>
Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > > > +/* These are ORable flags */ > > ... make them an "enum". OK, will do - I'd generally tended to avoid using enum for things that were ORable where the combinations weren't themselves members of the enum; but I can do that. > > +const int LOADVM_EXITCODE_QUITLOOP = 1; > > +const int LOADVM_EXITCODE_QUITPARENT = 2; > > LOADVM_QUIT_ALL, LOADVM_QUIT respectively? > > +const int LOADVM_EXITCODE_KEEPHANDLERS = 4; > > + > > Is it more common to drop or keep handlers? I'ts more common to drop them. > In either case, please add a comment to the three constants that details > how to use them. In particular, please document why you should drop > (resp. keep) handlers... Does this make it clearer: /* ORable flags that control the (potentially nested) loadvm_state loops */ enum LoadVMExitCodes { /* Quit the loop level that received this command */ LOADVM_QUIT_LOOP = 1, /* Quit this loop and our parent */ LOADVM_QUIT_PARENT = 2, /* * Keep the LoadStateEntry handler list after the loop exits, * because they're being used in another thread. */ LOADVM_KEEP_HANDLERS = 4, }; > Is it by chance that they are only used in savevm.c? Should they be > moved to a header file? They're local. > > +if (exitcode & LOADVM_EXITCODE_QUITPARENT) { > > +DPRINTF("loadvm_handlers_state_main: End of loop with QUITPARENT"); > > +exitcode &= ~LOADVM_EXITCODE_QUITPARENT; > > +exitcode &= LOADVM_EXITCODE_QUITLOOP; > > Either you want |=, or the first &= is useless. Ooh nicely spotted; yes that should be |= - now I need to figure out why this didn't break things. The idea is we have: 1 outer loadvm_state loop 2 receives packaged command 3inner_loadvm_state loop 4 receives handle_listen 5 < QUITPARENT 6< QUITLOOP 7 < QUITLOOP 8 exits so QUITPARENT causes it's parent to exit, and to do that the inner loop transforms QUITPARENT into QUITLOOP as it's exit. Dave > > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 08/47] socket shutdown
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +#ifndef WIN32 > > +if (rd) { > > +how = SHUT_RD; > > +} > > + > > +if (wr) { > > +how = rd ? SHUT_RDWR : SHUT_WR; > > +} > > + > > +#else > > +/* Untested */ > > +if (rd) { > > +how = SD_RECEIVE; > > +} > > + > > +if (wr) { > > +how = rd ? SD_BOTH : SD_SEND; > > +} > > + > > +#endif > > + > > > These are the same on Windows and non-Windows actually. Just #define > SHUT_* to 0/1/2 and avoid the wrapper. OK, something like this? (the qemu-file.c abstraction is still needed to cover QEMUFile's that aren't simple sockets, but I've removed the second layer in util/qemu-sockets.c). --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -44,6 +44,13 @@ int socket_set_fast_reuse(int fd); int send_all(int fd, const void *buf, int len1); int recv_all(int fd, void *buf, int len1, bool single_read); +#ifdef WIN32 +/* Windows has different names for the same constants with the same values */ +#define SHUT_RD 0 +#define SHUT_WR 1 +#define SHUT_RDWR 2 +#endif + /* callback function for nonblocking connect * valid fd on success, negative error code on failure */ --- a/qemu-file.c +++ b/qemu-file.c @@ -90,6 +90,13 @@ static int socket_close(void *opaque) return 0; } +static int socket_shutdown(void *opaque, bool rd, bool wr) +{ +QEMUFileSocket *s = opaque; + + return shutdown(s->fd, rd ? (wr ? SHUT_RDWR : SHUT_RD) : SHUT_WR); +} + static int stdio_get_fd(void *opaque) { QEMUFileStdio *s = opaque; -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 19/47] Rework loadvm path for subloops
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 07/10/2014 10:58, Dr. David Alan Gilbert ha scritto: > > > >>> > > +if (exitcode & LOADVM_EXITCODE_QUITPARENT) { > >>> > > +DPRINTF("loadvm_handlers_state_main: End of loop with > >>> > > QUITPARENT"); > >>> > > +exitcode &= ~LOADVM_EXITCODE_QUITPARENT; > >>> > > +exitcode &= LOADVM_EXITCODE_QUITLOOP; > >> > > >> > Either you want |=, or the first &= is useless. > > Ooh nicely spotted; yes that should be |= - now I need to figure out why > > this > > didn't break things. > > > > The idea is we have: > > 1 outer loadvm_state loop > > 2 receives packaged command > > 3inner_loadvm_state loop > > 4 receives handle_listen > > 5 < QUITPARENT > > 6< QUITLOOP > > 7 < QUITLOOP > > 8 exits > > > > so QUITPARENT causes it's parent to exit, and to do that > > the inner loop transforms QUITPARENT into QUITLOOP as it's > > exit. > > Yes, that was my understanding as well. > > We have only two nested loops, but if we had three, should it be > QUIT_PARENT or QUIT_ALL? The answer probably depends on why you've got 3 nested loops; either way is a bit of guesswork about what some potential future user wants to do. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +mis->postcopy_ram_state); > > +if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) { > > +/* > > + * Where a migration had postcopy enabled (and thus went to advise) > > + * but managed to complete within the precopy period > > + */ > > +postcopy_ram_incoming_cleanup(mis); > > +} else { > > +if ((ret >= 0) && > > +(mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) { > > Instead of the >, it is perhaps nicer to use an outer if that checks for > state != NONE? Because in fact this check is for state != NONE, having > ADVISE been handled above. You mean something like this (untested) ? if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) { if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) { /* * Where a migration had postcopy enabled (and thus went to advise) * but managed to complete within the precopy period */ postcopy_ram_incoming_cleanup(mis); } else if (ret >= 0) { /* * Postcopy was started, cleanup should happen at the end of the * postcopy thread. */ DPRINTF("process_incoming_migration_co: exiting main branch"); return; } } Dave > Paolo > > > +/* > > + * Postcopy was started, cleanup should happen at the end of > > the > > + * postcopy thread. > > + */ > > +DPRINTF("process_incoming_migration_co: exiting main branch"); > > +return; > > +} > > +} > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
* Kirill A. Shutemov (kir...@shutemov.name) wrote: > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if > > userland touches a still unmapped virtual address, a sigbus signal is > > sent instead of allocating a new page. The sigbus signal handler will > > then resolve the page fault in userland by calling the > > remap_anon_pages syscall. > > Hm. I wounder if this functionality really fits madvise(2) interface: as > far as I understand it, it provides a way to give a *hint* to kernel which > may or may not trigger an action from kernel side. I don't think an > application will behaive reasonably if kernel ignore the *advise* and will > not send SIGBUS, but allocate memory. Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are expected to do what they say ? > I would suggest to consider to use some other interface for the > functionality: a new syscall or, perhaps, mprotect(). Dave > -- > Kirill A. Shutemov -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 08/17] mm: madvise MADV_USERFAULT
* Kirill A. Shutemov (kir...@shutemov.name) wrote: > On Tue, Oct 07, 2014 at 11:46:04AM +0100, Dr. David Alan Gilbert wrote: > > * Kirill A. Shutemov (kir...@shutemov.name) wrote: > > > On Fri, Oct 03, 2014 at 07:07:58PM +0200, Andrea Arcangeli wrote: > > > > MADV_USERFAULT is a new madvise flag that will set VM_USERFAULT in the > > > > vma flags. Whenever VM_USERFAULT is set in an anonymous vma, if > > > > userland touches a still unmapped virtual address, a sigbus signal is > > > > sent instead of allocating a new page. The sigbus signal handler will > > > > then resolve the page fault in userland by calling the > > > > remap_anon_pages syscall. > > > > > > Hm. I wounder if this functionality really fits madvise(2) interface: as > > > far as I understand it, it provides a way to give a *hint* to kernel which > > > may or may not trigger an action from kernel side. I don't think an > > > application will behaive reasonably if kernel ignore the *advise* and will > > > not send SIGBUS, but allocate memory. > > > > Aren't DONTNEED and DONTDUMP similar cases of madvise operations that are > > expected to do what they say ? > > No. If kernel would ignore MADV_DONTNEED or MADV_DONTDUMP it will not > affect correctness, just behaviour will be suboptimal: more than needed > memory used or wasted space in coredump. That's not how the manpage reads for DONTNEED; it calls it out as a special case near the top, and explicitly says what will happen if you read the area marked as DONTNEED. It looks like there are openssl patches that use DONTDUMP to explicitly make sure keys etc don't land in cores. Dave > > -- > Kirill A. Shutemov -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 37/47] Page request: Consume pages off the post-copy queue
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +/* > > + * Don't break host-page chunks up with queue items > > + * so only unqueue if, > > + * a) The last item came from the queue anyway > > + * b) The last sent item was the last target-page in a host page > > + */ > > +if (last_was_from_queue || (!last_sent_block) || > > Extra parentheses. Fixed. > Is the last_was_from_queue check necessary? Or > would one of the other checks be true anyway if last_was_from_queue is true? if (last_was_from_queue || !last_sent_block || ((last_offset & (hps - 1)) == (hps - TARGET_PAGE_SIZE))) { tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &bitoffset); } The last_was_from_queue is needed. We're going around this loop in Target-page chunks (that correspond to one bit in the migration bitmap) and want to make sure we don't break into the middle of an existing host-page with an entry off the queue. So (going backwards in that if): We can take something from the queue if: a) We just sent the last TP in a HP b) We didn't send anything yet (unlikely) c) The last thing came from the queue anyway (c) is needed to override (a) when we've just sent a TP from the queue but it's not the last TP in the HP that came from the queue; otherwise we'd send the first TP from the queue and resume taking pages from the background scan. > > +/* We're sending this page, and since it's postcopy nothing > > else > > + * will dirty it, and we must make sure it doesn't get sent > > again. > > + */ > > +if (!migration_bitmap_clear_dirty(bitoffset << > > TARGET_PAGE_BITS)) { > > +DPRINTF("%s: Not dirty for postcopy %s/%zx bito=%zx > > (sent=%d)", > > +__func__, tmpblock->idstr, tmpoffset, bitoffset, > > +test_bit(bitoffset, ms->sentmap)); > > If a DPRINTF occurs in a loop, please change it to a tracepoint. OK, I'll look at that - most of arch_init (and migration) still use DPRINTF. > This function looks like a candidate for cleaning its logic up and/or > splitting it. But it can be done later by the poor soul who will touch > it next. :) Yep, I've already done it once (see 14bcfdc7f - Split ram_save_block). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 07/10/2014 19:07, Dr. David Alan Gilbert ha scritto: > >> > > >> > So I'd *much* rather have a "write()" style interface (ie _copying_ > >> > bytes from user space into a newly allocated page that gets mapped) > >> > than a "remap page" style interface > > Something like that might work for the postcopy case; it doesn't work > > for some of the other uses that need to stop a page being changed by the > > guest, but then need to somehow get a copy of that page internally to QEMU, > > and perhaps provide it back later. > > I cannot parse this. Which uses do you have in mind? Is it for > QEMU-specific or is it for other applications of userfaults? > As long as the page is atomically mapped, I'm not sure what the > difference from remap_anon_pages are (as far as the destination page is > concerned). Are you thinking of having userfaults enabled on the source > as well? What I'm talking about here is when I want to stop a page being accessed by the guest, do something with the data in qemu, and give it back to the guest sometime later. The main example is: Memory pools for guests where you swap RAM between a series of VM hosts. You have to take the page out, send it over the wire, sometime later if the guest tries to access it you userfault and pull it back. (There's at least one or two companies selling something like this, and at least one Linux based implementations with their own much more involved kernel hacks) Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages
* Linus Torvalds (torva...@linux-foundation.org) wrote: > On Mon, Oct 6, 2014 at 12:41 PM, Andrea Arcangeli wrote: > > > > Of course if somebody has better ideas on how to resolve an anonymous > > userfault they're welcome. > > So I'd *much* rather have a "write()" style interface (ie _copying_ > bytes from user space into a newly allocated page that gets mapped) > than a "remap page" style interface Something like that might work for the postcopy case; it doesn't work for some of the other uses that need to stop a page being changed by the guest, but then need to somehow get a copy of that page internally to QEMU, and perhaps provide it back later. remap_anon_pages worked for those cases as well; I can't think of another current way of doing it in userspace. I'm thinking here of systems for making VMs with memory larger than a single host; that's something that's not as well thought out. I've also seen people writing emulation that want to trap and emulate some page accesses while still having the original data available to the emulator itself. So yes, OK for now, but the result is less general. Dave > remapping anonymous pages involves page table games that really aren't > necessarily a good idea, and tlb invalidates for the old page etc. > Just don't do it. > >Linus -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > > typedef struct Visitor Visitor; > >@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState; > > typedef struct PcGuestInfo PcGuestInfo; > > typedef struct PostcopyPMI PostcopyPMI; > > typedef struct Range Range; > >-typedef struct AdapterInfo AdapterInfo; > >+typedef struct RAMBlock RAMBlock; > > > > :(, another redefinition, 'RAMBlock' also defined in > 'include/exec/cpu-all.h:314', > Am i miss something when compile qemu? Interesting; I'm not seeing that problem at all (gcc 4.8.3-7) What compiler and flags are you using? Dave > > > #endif /* QEMU_TYPEDEFS_H */ > >diff --git a/migration.c b/migration.c > >index cfdaa52..63d7699 100644 > >--- a/migration.c > >+++ b/migration.c > >@@ -26,6 +26,8 @@ > > #include "qemu/thread.h" > > #include "qmp-commands.h" > > #include "trace.h" > >+#include "exec/memory.h" > >+#include "exec/address-spaces.h" > > > > //#define DEBUG_MIGRATION > > > >@@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque) > > > > migrate_fd_cleanup_src_rp(s); > > > >+/* This queue generally should be empty - but in the case of a failed > >+ * migration might have some droppings in. > >+ */ > >+struct MigrationSrcPageRequest *mspr, *next_mspr; > >+QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) > >{ > >+QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req); > >+g_free(mspr); > >+} > >+ > > if (s->file) { > > trace_migrate_fd_cleanup(); > > qemu_mutex_unlock_iothread(); > >@@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams > >*params) > > s->state = MIG_STATE_SETUP; > > trace_migrate_set_state(MIG_STATE_SETUP); > > > >+qemu_mutex_init(&s->src_page_req_mutex); > >+QSIMPLEQ_INIT(&s->src_page_requests); > >+ > > s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > > return s; > > } > >@@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s) > > static void migrate_handle_rp_reqpages(MigrationState *ms, const char* > > rbname, > > ram_addr_t start, ram_addr_t len) > > { > >-DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len); > >+DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx", > >+rbname, start, len); > >+ > >+/* Round everything up to our host page size */ > >+long our_host_ps = sysconf(_SC_PAGESIZE); > >+if (start & (our_host_ps-1)) { > >+long roundings = start & (our_host_ps-1); > >+start -= roundings; > >+len += roundings; > >+} > >+if (len & (our_host_ps-1)) { > >+long roundings = len & (our_host_ps-1); > >+len -= roundings; > >+len += our_host_ps; > >+} > >+ > >+if (ram_save_queue_pages(ms, rbname, start, len)) { > >+source_return_path_bad(ms); > >+} > > } > > > > /* > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 36/47] Page request: Process incoming page request
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/10/8 15:49, Dr. David Alan Gilbert wrote: > >* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > > > >>> typedef struct Visitor Visitor; > >>>@@ -80,6 +81,6 @@ typedef struct FWCfgState FWCfgState; > >>> typedef struct PcGuestInfo PcGuestInfo; > >>> typedef struct PostcopyPMI PostcopyPMI; > >>> typedef struct Range Range; > >>>-typedef struct AdapterInfo AdapterInfo; > >>>+typedef struct RAMBlock RAMBlock; > >>> > >> > >>:(, another redefinition, 'RAMBlock' also defined in > >>'include/exec/cpu-all.h:314', > >>Am i miss something when compile qemu? > > > >Interesting; I'm not seeing that problem at all (gcc 4.8.3-7) > > > >What compiler and flags are you using? > > > >Dave > > > > Hi Dave, > > My compiler info: > gcc (SUSE Linux) 4.3.4 > > The configure info is: > #./configure --target-list=x86_64-softmmu --enable-debug --disable-gtk > ... > CFLAGS-pthread -I/usr/include/glib-2.0 > -I/usr/lib64/glib-2.0/include -g > QEMU_CFLAGS -fPIE -DPIE -m64 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 > -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef > -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common > -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs > -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers > -Wold-style-declaration -Wold-style-definition -Wtype-limits > -fstack-protector-all-I/usr/include/libpng12 -I/usr/include/pixman-1 > LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g > ... > > Maybe its gcc's limitation, but why this redefinition need? After i remove > one, > it compiles successfully;) OK, thanks. I'll clean them up. Dave > > Thanks, > zhanghailiang > > > > >> > >>> #endif /* QEMU_TYPEDEFS_H */ > >>>diff --git a/migration.c b/migration.c > >>>index cfdaa52..63d7699 100644 > >>>--- a/migration.c > >>>+++ b/migration.c > >>>@@ -26,6 +26,8 @@ > >>> #include "qemu/thread.h" > >>> #include "qmp-commands.h" > >>> #include "trace.h" > >>>+#include "exec/memory.h" > >>>+#include "exec/address-spaces.h" > >>> > >>> //#define DEBUG_MIGRATION > >>> > >>>@@ -504,6 +506,15 @@ static void migrate_fd_cleanup(void *opaque) > >>> > >>> migrate_fd_cleanup_src_rp(s); > >>> > >>>+/* This queue generally should be empty - but in the case of a failed > >>>+ * migration might have some droppings in. > >>>+ */ > >>>+struct MigrationSrcPageRequest *mspr, *next_mspr; > >>>+QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, > >>>next_mspr) { > >>>+QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req); > >>>+g_free(mspr); > >>>+} > >>>+ > >>> if (s->file) { > >>> trace_migrate_fd_cleanup(); > >>> qemu_mutex_unlock_iothread(); > >>>@@ -610,6 +621,9 @@ MigrationState *migrate_init(const MigrationParams > >>>*params) > >>> s->state = MIG_STATE_SETUP; > >>> trace_migrate_set_state(MIG_STATE_SETUP); > >>> > >>>+qemu_mutex_init(&s->src_page_req_mutex); > >>>+QSIMPLEQ_INIT(&s->src_page_requests); > >>>+ > >>> s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > >>> return s; > >>> } > >>>@@ -823,7 +837,25 @@ static void source_return_path_bad(MigrationState *s) > >>> static void migrate_handle_rp_reqpages(MigrationState *ms, const char* > >>> rbname, > >>> ram_addr_t start, ram_addr_t len) > >>> { > >>>-DPRINTF("migrate_handle_rp_reqpages: at %zx for len %zx", start, len); > >>>+DPRINTF("migrate_handle_rp_reqpages: in %s start %zx len %zx", > >>>+rbname, start, len); > >>>+ > >>>+/* Round everything up to our host page size */ > >>>+long our_host_ps = sysconf(_SC_PAGESIZE); > >>>+if (start & (our_host_ps-1)) { > >>>+long roundings = start & (our_host_ps-1); > >>>+start -= roundings; > >>>+len += roundings; > >>>+} > >>>+if (len & (our_host_ps-1)) { > >>>+long roundings = len & (our_host_ps-1); > >>>+len -= roundings; > >>>+len += our_host_ps; > >>>+} > >>>+ > >>>+if (ram_save_queue_pages(ms, rbname, start, len)) { > >>>+source_return_path_bad(ms); > >>>+} > >>> } > >>> > >>> /* > >>> > >> > >> > >-- > >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > >. > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 1/2] QEMUSizedBuffer based QEMUFile
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/9/29 17:41, Dr. David Alan Gilbert (git) wrote: > >+static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size) > >+{ > >+size_t needed_chunks, i; > >+ > >+if (qsb->size < new_size) { > >+struct iovec *new_iov; > >+size_t size_diff = new_size - qsb->size; > >+size_t chunk_size = (size_diff > QSB_MAX_CHUNK_SIZE) > >+ ? QSB_MAX_CHUNK_SIZE : QSB_CHUNK_SIZE; > >+ > >+needed_chunks = DIV_ROUND_UP(size_diff, chunk_size); > >+ > >+new_iov = g_try_malloc_n(qsb->n_iov + needed_chunks, > >+ sizeof(struct iovec)); > > It seems that *g_try_malloc_n* was supported since glib2-2.24 version, > But it don't check this when do *configure* before compile...;) OK, that's a shame - it was a nice easy function to use :-) I'll fix it. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 1/2] QEMUSizedBuffer based QEMUFile
;+return NULL; > >+} > >+ > >+for (i = 0; i < qsb->n_iov; i++) { > >+res = qsb_write_at(out, qsb->iov[i].iov_base, > >+pos, qsb->iov[i].iov_len); > >+if (res < 0) { > >+qsb_free(out); > >+return NULL; > >+} > >+pos += res; > >+} > >+ > >+return out; > >+} > >+ > >+typedef struct QEMUBuffer { > >+QEMUSizedBuffer *qsb; > >+QEMUFile *file; > >+} QEMUBuffer; > >+ > >+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) > >+{ > >+QEMUBuffer *s = opaque; > >+ssize_t len = qsb_get_length(s->qsb) - pos; > >+ > >+if (len <= 0) { > >+return 0; > >+} > >+ > >+if (len > size) { > >+len = size; > >+} > >+return qsb_get_buffer(s->qsb, pos, len, buf); > >+} > >+ > >+static int buf_put_buffer(void *opaque, const uint8_t *buf, > >+ int64_t pos, int size) > >+{ > >+QEMUBuffer *s = opaque; > >+ > >+return qsb_write_at(s->qsb, buf, pos, size); > >+} > >+ > >+static int buf_close(void *opaque) > >+{ > >+QEMUBuffer *s = opaque; > >+ > >+qsb_free(s->qsb); > >+ > >+g_free(s); > >+ > >+return 0; > >+} > >+ > >+const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f) > >+{ > >+QEMUBuffer *p; > >+ > >+qemu_fflush(f); > >+ > >+p = f->opaque; > >+ > >+return p->qsb; > >+} > >+ > >+static const QEMUFileOps buf_read_ops = { > >+.get_buffer = buf_get_buffer, > >+.close = buf_close, > >+}; > >+ > >+static const QEMUFileOps buf_write_ops = { > >+.put_buffer = buf_put_buffer, > >+.close = buf_close, > >+}; > >+ > >+QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input) > >+{ > >+QEMUBuffer *s; > >+ > >+if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') || > >+mode[1] != '\0') { > >+error_report("qemu_bufopen: Argument validity check failed"); > >+return NULL; > >+} > >+ > >+s = g_malloc0(sizeof(QEMUBuffer)); > >+if (mode[0] == 'r') { > >+s->qsb = input; > >+} > >+ > >+if (s->qsb == NULL) { > >+s->qsb = qsb_create(NULL, 0); > >+} > >+if (!s->qsb) { > >+g_free(s); > >+error_report("qemu_bufopen: qsb_create failed"); > >+return NULL; > >+} > >+ > >+ > >+if (mode[0] == 'r') { > >+s->file = qemu_fopen_ops(s, &buf_read_ops); > >+} else { > >+s->file = qemu_fopen_ops(s, &buf_write_ops); > >+} > >+return s->file; > >+} > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCHv4] migration: catch unknown flags in ram_load
* Peter Lieven (p...@kamp.de) wrote: > On 10.06.2014 15:00, Eric Blake wrote: > >On 06/10/2014 06:55 AM, Eric Blake wrote: > >>On 06/10/2014 03:29 AM, Peter Lieven wrote: > >>>if a saved vm has unknown flags in the memory data qemu > >>>currently simply ignores this flag and continues which > >>>yields in an unpredictable result. > >>> > >>>This patch catches all unknown flags and aborts the > >>>loading of the vm. Additionally error reports are thrown > >>>if the migration aborts abnormally. > >>> > >>> } else if (flags & RAM_SAVE_FLAG_HOOK) { > >>> ram_control_load_hook(f, flags); > >>>+} else if (flags & RAM_SAVE_FLAG_EOS) { > >>Umm, is the migration format specifically documented as having at most > >>one flag per operation, or is it valid to send two flags at once? That > >>is, can I send RAM_SAVE_FLAG_XBZRLE | RAM_SAVE_FLAG_HOOK on a single > >>packet? Should we be flagging streams that send unexpected flag > >>combinations as invalid, even when each flag is in isolation okay, > >>rather than the current behavior of silently prioritizing one flag and > >>ignoring the other? > >For that matter, would it be better to change the if-tree into a switch, > >so that the default case catches unsupported combinations? > > > >switch (flags) { > > ... > > case RAM_SAVE_FLAG_HOOK: ... > > case RAM_SAVE_FLAG_EOS: ... > > default: report unsupported flags value > >} > > > The RAM_SAVE_FLAG_HOOK is the only real flag. It seems that the > flag value is used at least somewhere in the code of RDMA. There's also RAM_SAVE_FLAG_CONTINUE that's used as a tweak to make for smaller headers. Dave > For that matter, we could handle the hook separately and everything > else in the switch statement. This would immediately solve the issue > of the very restricted space for the flags as we could use everything > below RAM_SAVE_FLAG_HOOK as counter immediately. > > Looking at the code I further see that the hook function is made to return > an error code which is not checked at the moment. > > Peter -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 02/27] vmstate: Return error in case of error
* Juan Quintela (quint...@redhat.com) wrote: > If there is an error while loading a field, we should stop reading and > not continue with the rest of fields. And we should also set an error > in qemu_file. > > Signed-off-by: Juan Quintela > --- > vmstate.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/vmstate.c b/vmstate.c > index b5882fa..c996520 100644 > --- a/vmstate.c > +++ b/vmstate.c > @@ -98,7 +98,11 @@ int vmstate_load_state(QEMUFile *f, const > VMStateDescription *vmsd, > ret = field->info->get(f, addr, size); > > } > +if (ret >= 0) { > +ret = qemu_file_get_error(f); > +} > if (ret < 0) { > +qemu_file_set_error(f, ret); > trace_vmstate_load_field_error(field->name, ret); > return ret; > } Reviewed-by: Dr. David Alan Gilbert I wonder about turning that trace into an error_report; the process is doomed anyway at this point and it would be nice to know why. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 03/27] vmstate: Refactor & increase tests for primitive types
* Juan Quintela (quint...@redhat.com) wrote: > This commit refactor the simple tests to test all integer types. We > move to hex because it is easier to read values of different types. > > Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert > --- > tests/test-vmstate.c | 273 > +++ > 1 file changed, 212 insertions(+), 61 deletions(-) > > diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c > index 8b242c4..a462335 100644 > --- a/tests/test-vmstate.c > +++ b/tests/test-vmstate.c > @@ -54,80 +54,232 @@ static QEMUFile *open_test_file(bool write) > return qemu_fdopen(fd, write ? "wb" : "rb"); > } > > -typedef struct TestSruct { > -uint32_t a, b, c, e; > -uint64_t d, f; > -bool skip_c_e; > -} TestStruct; > +#define SUCCESS(val) \ > +g_assert_cmpint((val), ==, 0) > > +#define FAILURE(val) \ > +g_assert_cmpint((val), !=, 0) > > -static const VMStateDescription vmstate_simple = { > -.name = "test", > -.version_id = 1, > -.minimum_version_id = 1, > -.fields = (VMStateField[]) { > -VMSTATE_UINT32(a, TestStruct), > -VMSTATE_UINT32(b, TestStruct), > -VMSTATE_UINT32(c, TestStruct), > -VMSTATE_UINT64(d, TestStruct), > -VMSTATE_END_OF_LIST() > -} > -}; > +static void save_vmstate(const VMStateDescription *desc, void *obj) > +{ > +QEMUFile *f = open_test_file(true); > + > +/* Save file with vmstate */ > +vmstate_save_state(f, desc, obj); > +qemu_put_byte(f, QEMU_VM_EOF); > +g_assert(!qemu_file_get_error(f)); > +qemu_fclose(f); > +} > > -static void test_simple_save(void) > +static void compare_vmstate(uint8_t *wire, size_t size) > { > -QEMUFile *fsave = open_test_file(true); > -TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4 }; > -vmstate_save_state(fsave, &vmstate_simple, &obj); > -g_assert(!qemu_file_get_error(fsave)); > -qemu_fclose(fsave); > +QEMUFile *f = open_test_file(false); > +uint8_t result[size]; > > -QEMUFile *loading = open_test_file(false); > -uint8_t expected[] = { > -0, 0, 0, 1, /* a */ > -0, 0, 0, 2, /* b */ > -0, 0, 0, 3, /* c */ > -0, 0, 0, 0, 0, 0, 0, 4, /* d */ > -}; > -uint8_t result[sizeof(expected)]; > -g_assert_cmpint(qemu_get_buffer(loading, result, sizeof(result)), ==, > +/* read back as binary */ > + > +g_assert_cmpint(qemu_get_buffer(f, result, sizeof(result)), ==, > sizeof(result)); > -g_assert(!qemu_file_get_error(loading)); > -g_assert_cmpint(memcmp(result, expected, sizeof(result)), ==, 0); > +g_assert(!qemu_file_get_error(f)); > + > +/* Compare that what is on the file is the same that what we > + expected to be there */ > +SUCCESS(memcmp(result, wire, sizeof(result))); > > /* Must reach EOF */ > -qemu_get_byte(loading); > -g_assert_cmpint(qemu_file_get_error(loading), ==, -EIO); > +qemu_get_byte(f); > +g_assert_cmpint(qemu_file_get_error(f), ==, -EIO); > > -qemu_fclose(loading); > +qemu_fclose(f); > } > > -static void test_simple_load(void) > +static int load_vmstate_one(const VMStateDescription *desc, void *obj, > +int version, uint8_t *wire, size_t size) > { > -QEMUFile *fsave = open_test_file(true); > -uint8_t buf[] = { > -0, 0, 0, 10, /* a */ > -0, 0, 0, 20, /* b */ > -0, 0, 0, 30, /* c */ > -0, 0, 0, 0, 0, 0, 0, 40, /* d */ > -QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely > */ > -}; > -qemu_put_buffer(fsave, buf, sizeof(buf)); > -qemu_fclose(fsave); > +QEMUFile *f; > +int ret; > + > +f = open_test_file(true); > +qemu_put_buffer(f, wire, size); > +qemu_fclose(f); > + > +f = open_test_file(false); > +ret = vmstate_load_state(f, desc, obj, version); > +if (ret) { > +g_assert(qemu_file_get_error(f)); > +} else{ > +g_assert(!qemu_file_get_error(f)); > +} > +qemu_fclose(f); > +return ret; > +} > > -QEMUFile *loading = open_test_file(false); > -TestStruct obj; > -vmstate_load_state(loading, &vmstate_simple, &obj, 1); > -g_assert(!qemu_file_get_error(loading)); > -g_assert_cmpint(obj.a, ==, 10); > -g_assert_cmpint(obj.b, ==, 20); > -g_assert_cmpint(obj.c, ==, 30); > -g_assert_cmpint(obj.d, ==, 40); > -qemu_fclose(loading); > + > +static
Re: [Qemu-devel] [PATCH 01/27] migration: Remove unneeded minimum_version_id_old
* Juan Quintela (quint...@redhat.com) wrote: > Once there, make checkpatch happy. > > Signed-off-by: Juan Quintela Reviewed-by: Dr. David Alan Gilbert > --- > hw/usb/hcd-ohci.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c > index cd87074..cace945 100644 > --- a/hw/usb/hcd-ohci.c > +++ b/hw/usb/hcd-ohci.c > @@ -1988,8 +1988,7 @@ static const VMStateDescription vmstate_ohci_state_port > = { > .name = "ohci-core/port", > .version_id = 1, > .minimum_version_id = 1, > -.minimum_version_id_old = 1, > -.fields = (VMStateField []) { > +.fields = (VMStateField[]) { > VMSTATE_UINT32(ctrl, OHCIPort), > VMSTATE_END_OF_LIST() > }, > @@ -2015,9 +2014,8 @@ static const VMStateDescription vmstate_ohci_eof_timer > = { > .name = "ohci-core/eof-timer", > .version_id = 1, > .minimum_version_id = 1, > -.minimum_version_id_old = 1, > .pre_load = ohci_eof_timer_pre_load, > -.fields = (VMStateField []) { > +.fields = (VMStateField[]) { > VMSTATE_TIMER(eof_timer, OHCIState), > VMSTATE_END_OF_LIST() > }, > @@ -2078,7 +2076,6 @@ static const VMStateDescription vmstate_ohci = { > .name = "ohci", > .version_id = 1, > .minimum_version_id = 1, > -.minimum_version_id_old = 1, > .fields = (VMStateField[]) { > VMSTATE_PCI_DEVICE(parent_obj, OHCIPCIState), > VMSTATE_STRUCT(state, OHCIPCIState, 1, vmstate_ohci_state, > OHCIState), > -- > 1.9.3 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 04/27] vmstate: Port versioned tests to new format
ersioned.skip_c_e = true; > +save_vmstate(&vmstate_simple_skipping, &obj_versioned); > + > +compare_vmstate(wire_simple_skip, sizeof(wire_simple_skip)); > + > +/* We abuse the fact that f has a 0x00 value in the right position */ > +SUCCESS(load_vmstate(&vmstate_simple_skipping, &obj, &obj_clone, > + obj_versioned_copy, 1, wire_simple_skip, > + sizeof(wire_simple_skip) - 8)); > + > +FIELD_EQUAL(skip_c_e); > +FIELD_EQUAL(a); > +FIELD_EQUAL(b); > +FIELD_NOT_EQUAL(c); > +FIELD_EQUAL(d); > +FIELD_NOT_EQUAL(e); > +FIELD_NOT_EQUAL(f); > + > +SUCCESS(load_vmstate(&vmstate_simple_skipping, &obj, &obj_clone, > + obj_versioned_copy, 2, wire_simple_skip, > + sizeof(wire_simple_skip))); > + > +FIELD_EQUAL(skip_c_e); > +FIELD_EQUAL(a); > +FIELD_EQUAL(b); > +FIELD_NOT_EQUAL(c); > +FIELD_EQUAL(d); > +FIELD_NOT_EQUAL(e); > +FIELD_EQUAL(f); > } Couldn't those functions just be merged and take a flag? > +#undef FIELD_EQUAL > +#undef FIELD_NOT_EQUAL > > int main(int argc, char **argv) > { > @@ -490,12 +499,10 @@ int main(int argc, char **argv) > > g_test_init(&argc, &argv, NULL); > g_test_add_func("/vmstate/simple/primitive", test_simple_primitive); > -g_test_add_func("/vmstate/versioned/load/v1", test_load_v1); > -g_test_add_func("/vmstate/versioned/load/v2", test_load_v2); > -g_test_add_func("/vmstate/field_exists/load/noskip", test_load_noskip); > -g_test_add_func("/vmstate/field_exists/load/skip", test_load_skip); > -g_test_add_func("/vmstate/field_exists/save/noskip", test_save_noskip); > -g_test_add_func("/vmstate/field_exists/save/skip", test_save_skip); > +g_test_add_func("/vmstate/simple/v1", test_simple_v1); > +g_test_add_func("/vmstate/simple/v2", test_simple_v2); > +g_test_add_func("/vmstate/simple/skip", test_simple_skip); > +g_test_add_func("/vmstate/simple/no_skip", test_simple_no_skip); > g_test_run(); > > close(temp_fd); > -- > 1.9.3 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation
* Eric Blake (ebl...@redhat.com) wrote: > On 07/10/2014 05:29 AM, Dr. David Alan Gilbert wrote: > > * Paolo Bonzini (pbonz...@redhat.com) wrote: > >> Il 07/07/2014 16:02, Dr. David Alan Gilbert ha scritto: > >>>>> Could you have instead a "migrate_start_postcopy" command, and leave the > >>>>> policy to management instead? > >>> Hmm; yes that is probably possible - although with the > >>> migration_set_parameter > >>> configuration you get the best of both worlds: > >>> 1) You can set the parameter to say a few seconds and let QEMU handle it > >>> 2) You can set the parameter really large, but (I need to check) you > >>> could > >>> drop the parameter later and then cause it to kick in. > >>> > >>> I also did it this way because it was similar to the way the > >>> auto-throttling > >>> mechanism. > >> > >> Auto-throttling doesn't let you configure when it kicks in (it doesn't even > >> need support from the destination side). For postcopy you would still have > >> a capability, like auto-throttling, just not the argument. > >> > >> The reason why I prefer a manual step from management, is because postcopy > >> is a one-way street. Suppose a newer version of management software has > >> started migration with postcopy configured, and then an older version is > >> started. It is probably an invalid thing to do, but the confusion in the > >> older version could be fatal and it's nice if there's an easy way to > >> prevent > >> it. > > > > Actually the 'migrate_start_postcopy' idea is growing on me; Eric is that > > also your preferred way of doing it? > > > > If we did this I'd: > >1) Remove the migration_set_parameter code I added > >2) and the x-postcopy_ram_start_time parameter > >3) Add a new command migrate_start_postcopy that just sets a flag > > which is tested in the same place as I currently check the timeout. > > If it's issued after a migration has finished it doesn't fail because > > that would be racy. If issued before a migration starts that's OK > > as long as postcopy is enabled and means to start postcopy mode > > immediately. > > So to make sure I understand, the idea is that the management starts > migration as normal, then after enough time has elapsed, issues a > migrate_start_postcopy to tell qemu that it is okay to switch to > postcopy at the next convenient opportunity? Is there any need for an > event telling libvirt that enough pre-copy has occurred to make a > postcopy worthwhile? And of course, I _still_ want an event for when > normal precopy migration is ready (instead of the current solution of > libvirt having to poll to track progress). > > But in answer to your question, yes, it sounds like adding a new command > (actually, per QMP conventions it should probably be > migrate-start-postcopy with dashes instead of underscore) for management > to determine if/when to allow postcopy to kick in seems okay. That's implemented in the v2 I just posted. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 43/43] Start documenting how postcopy works.
* Eric Blake (ebl...@redhat.com) wrote: > On 08/11/2014 08:29 AM, Dr. David Alan Gilbert (git) wrote: > > From: "Dr. David Alan Gilbert" > > > > Signed-off-by: Dr. David Alan Gilbert > > --- > > docs/migration.txt | 150 > > + > > 1 file changed, 150 insertions(+) > > I personally like docs early in the series (see if they make sense in > isolation, then whether the implementation matched the docs; rather than > being tainted by implementation and overlooking design flaws). No problem, I can flip that around. > > diff --git a/docs/migration.txt b/docs/migration.txt > > index 0492a45..fec2d46 100644 > > --- a/docs/migration.txt > > +++ b/docs/migration.txt > > @@ -294,3 +294,153 @@ save/send this state when we are in the middle of a > > pio operation > > (that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is > > not enabled, the values on that fields are garbage and don't need to > > be sent. > > + > > += Return path = > > + > > +In most migration scenarios there is only a single data path that runs > > +from the source VM to the destination, typically along a single fd > > (although > > +possibly with another fd or similar for some fast way of throwing pages > > across). > > + > > +However, some uses need two way comms; in particular the Postcopy > > destination > > s/comms/communication/ Thanks, all typos/etc applied. > > +needs to be able to request pages on demand from the source. > > + > > +For these scenarios there is a 'return path' from the destination to the > > source; > > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the > > return > > +path. > > + > > + Source side > > + Forward path - written by migration thread > > + Return path - opened by main thread, read by fd_handler on main > > thread > > + > > + Destination side > > + Forward path - read by main thread > > + Return path - opened by main thread, written by main thread AND > > postcopy > > +thread (protected by rp_mutex) > > + > > +Opening the return path generally sets the fd to be non-blocking so that a > > +failed destination can't block the source; and since the non-blockingness > > seems > > +to follow both directions it does alter the semantics of the forward path. > > Once you've opened a return path, haven't you already committed to the > migration? That is, since we already document that postcopy loses the > VM if something goes wrong, what point is there in trying to protect > from a failed destination? Once the source knows that the destination > has opened the return path, the source should never run the VM again. The point at which the source can't run again is quite late; plenty of things can fail before that point and be recovered from; as long as the destination hasn't started its CPU (and hence started chatting on the network or changing shared storage) recovery is possible in this implementation. Opening the return path is done early on in migration if the postcopy capability is on, before the migrate_start_postcopy is issued. The entire migration might be completed in precopy with the return path opened but unused. (The return-path blocking-ness is up for discussion anyway with the changes that Paolo suggested in the 1st series that I've not done yet). > > + > > += Postcopy = > > +'Postcopy' migration is a way to deal with migrations that refuse to > > converge; > > +it's plus side is that there is an upper bound on the amount of migration > > traffic > > s/it's/its/ > > > +and time it takes, the down side is that during the postcopy phase, a > > failure of > > +*either* side or the network connection causes the guest to be lost. > > + > ... > > > +=== Postcopy states === > > +Postcopy moves through a series of states (see postcopy_ram_state) > > +from ADVISE->LISTEN->RUNNING->END > > + > > + Advise: Set at the start of migration if postcopy is enabled, even > > + if it hasn't passed the start-time threshold; here the > > destination > > + checks it's OS has the support needed for postcopy, and performs > > s/it's/that its/ > > > + setup to ensure the RAM mappings are suitable for later postcopy. > > + (Triggered by reception of POSTCOPY_RAM_ADVISE command) > > + > > +Normal precopy now carries on as normal, until the point that t
Re: [Qemu-devel] [RFC PATCH v3 1/6] QEMUSizedBuffer/QEMUFile
* Gonglei (Arei) (arei.gong...@huawei.com) wrote: > > +/** > > + * Grow the QEMUSizedBuffer to the given size and allocated > > + * memory for it. > > + * > > + * @qsb: A QEMUSizedBuffer > > + * @new_size: The new size of the buffer > > + * > > + * Returns an error code in case of memory allocation failure > > + * or the new size of the buffer otherwise. The returned size > > + * may be greater or equal to @new_size. > > + */ > > +static ssize_t qsb_grow(QEMUSizedBuffer *qsb, size_t new_size) > > +{ > > +size_t needed_chunks, i; > > +size_t chunk_size = QSB_CHUNK_SIZE; > > + > > +if (qsb->size < new_size) { > > +needed_chunks = DIV_ROUND_UP(new_size - qsb->size, > > + chunk_size); > > + > > +qsb->iov = g_realloc_n(qsb->iov, qsb->n_iov + needed_chunks, > > + sizeof(struct iovec)); > > +if (qsb->iov == NULL) { > > +return -ENOMEM; > > +} > > OK... Is it? https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html says that g_realloc_n is 'similar to g_realloc()' except for overflow protection, g_realloc() doesn't say what it's behaviour on OOM is, but g_try_realloc says that 'Contrast with g_realloc(), which aborts the program on failure' So the only case iov will return NULL there is if the size is 0 which it can't be. So should that be a g_try_realloc_n ? > > + > > +for (i = qsb->n_iov; i < qsb->n_iov + needed_chunks; i++) { > > +qsb->iov[i].iov_base = g_malloc0(chunk_size); > > +qsb->iov[i].iov_len = chunk_size; > > +} > > + > > +qsb->n_iov += needed_chunks; > > +qsb->size += (needed_chunks * chunk_size); > > +} > > + > > +return qsb->size; > > +} > > + > > +/** > > + * Write into the QEMUSizedBuffer at a given position and a given > > + * number of bytes. This function will automatically grow the > > + * QEMUSizedBuffer. > > + * > > + * @qsb: A QEMUSizedBuffer > > + * @source: A byte array to copy data from > > + * @pos: The position withing the @qsb to write data to > > + * @size: The number of bytes to copy into the @qsb > > + * > > + * Returns an error code in case of memory allocation failure, > > + * @size otherwise. > > + */ > > +ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *source, > > + off_t pos, size_t count) > > +{ > > +ssize_t rc = qsb_grow(qsb, pos + count); > > +size_t to_copy; > > +size_t all_copy = count; > > +const struct iovec *iov; > > +ssize_t index; > > +char *dest; > > +off_t d_off, s_off = 0; > > + > > +if (rc < 0) { > > +return rc; > > +} > > + > > OK... > > > +if (pos + count > qsb->used) { > > +qsb->used = pos + count; > > +} > > + > > +index = qsb_get_iovec(qsb, pos, &d_off); > > +if (index < 0) { > > +return 0; > > +} > > + > > +while (all_copy > 0) { > > +iov = &qsb->iov[index]; > > + > > +dest = iov->iov_base; > > + > > +to_copy = iov->iov_len - d_off; > > +if (to_copy > all_copy) { > > +to_copy = all_copy; > > +} > > + > > +memcpy(&dest[d_off], &source[s_off], to_copy); > > + > > +s_off += to_copy; > > +all_copy -= to_copy; > > + > > +d_off = 0; > > +index++; > > +} > > + > > +return count; > > +} > > + > > +/** > > + * Create an exact copy of the given QEMUSizedBuffer. > > + * > > + * @qsb : A QEMUSizedBuffer > > + * > > + * Returns a clone of @qsb > > + */ > > +QEMUSizedBuffer *qsb_clone(const QEMUSizedBuffer *qsb) > > +{ > > +QEMUSizedBuffer *out = qsb_create(NULL, qsb_get_length(qsb)); > > +size_t i; > > +off_t pos = 0; > > + > > +for (i = 0; i < qsb->n_iov; i++) { > > +pos += qsb_write_at(out, qsb->iov[i].iov_base, > > +pos, qsb->iov[i].iov_len); > > If qsb_write_at() return -ENOMEM, just simply add it to pos ? qsb_create uses g_new0 so it will abort on out of memory; what should qsb_clone do if qsb_write_at returns -ENOMEM? (Admittedly anything is better than getting the position wrong). I guess the choice is to allow it to return NULL, tidying up and offering the chance sometime in the future of tidying up the other allocators. Dave > > Best regards, > -Gonglei -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 19/43] postcopy: OS support test
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote: > >From: "Dr. David Alan Gilbert" > > > >+testarea2 = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE | > >+ MAP_ANONYMOUS, -1, 0); > >+if (!testarea2) { > >+perror("postcopy_ram_hosttest: Failed to map second test area"); > > Before return, should we munmap testarea? > BTW, i think it is better to use goto statement, > which we can handle the error cases together! Yes, thank you for spotting this; I'll fix it in my next version. Dave > > Best regards, > zhanghailiang > >+return -1; > >+} > >+g_assert(((size_t)testarea2& (pagesize-1)) == 0); > >+*(char *)testarea = 0; /* Force the map of the new page */ > >+if (syscall(__NR_remap_anon_pages, testarea2, testarea, pagesize, 0) != > >+pagesize) { > >+perror("postcopy_ram_hosttest: remap_anon_pages not available"); > >+munmap(testarea, pagesize); > >+munmap(testarea2, pagesize); > >+return -1; > >+} > >+ > >+munmap(testarea, pagesize); > >+munmap(testarea2, pagesize); > >+return 0; > >+} > >+ > >+#else > >+/* No target OS support, stubs just fail */ > >+ > >+int postcopy_ram_hosttest(void) > >+{ > >+error_report("postcopy_ram_hosttest: No OS support"); > >+return -1; > >+} > >+ > >+#endif > >+ > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 00/43] Postcopy implementation
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/8/11 22:29, Dr. David Alan Gilbert (git) wrote: > >From: "Dr. David Alan Gilbert" > Hi Dave, > > I want to test your patches, but i failed to 'git am' them to the new > qemu-2.1 source. I want to know if you has a git-branch which i can > git clone directly? Thanks. I should have one setup in a few days, I'll let you know. (The code is based on HEAD as of yesterday, rather than 2.1) Dave > > Best regards, > zhanghailiang > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] ????: [PATCH v2 06/43] Return path: socket_writev_buffer:?Block even on non-blocking fd's
* chenliang (T) (chenlian...@huawei.com) wrote: > > From: "Dr. David Alan Gilbert" > > The return path uses a non-blocking fd so as not to block waiting for the > (possibly broken) destination to finish returning a message, however we still > want outbound data to behave in the same way and block. > > Signed-off-by: Dr. David Alan Gilbert > > Hi David > It is confusing why don't use blocking fd? I want to make sure that if the destination guest hung during sending a return path command, that the source wouldn't block the main thread (in which I'm currently processing the RP commands). Paolo made a suggestion to move the RP processing to a separate thread, which would also avoid the problem, and I'll be trying that. Dave > Best regards -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 1/6] generic function between migration and bitmap dump
* Sanidhya Kashyap (sanidhya.ii...@gmail.com) wrote: > I have modified the functions to be more generic i.e. I have used the > counter variable which stores the required value. If the value of counter > is 0, it means bitmap dump process whereas the value of 1 means migration. OK, but not quite what I meant: > Signed-off-by: Sanidhya Kashyap > --- > arch_init.c | 21 + > include/exec/ram_addr.h | 4 > 2 files changed, 17 insertions(+), 8 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 8ddaf35..2aacb1c 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -434,20 +434,22 @@ ram_addr_t > migration_bitmap_find_and_reset_dirty(MemoryRegion *mr, > return (next - base) << TARGET_PAGE_BITS; > } > > -static inline bool migration_bitmap_set_dirty(ram_addr_t addr) > +static inline bool qemu_bitmap_set_dirty(ram_addr_t addr, unsigned long > *bitmap, > + uint64_t *counter) > { > bool ret; > int nr = addr >> TARGET_PAGE_BITS; > > -ret = test_and_set_bit(nr, migration_bitmap); > +ret = test_and_set_bit(nr, bitmap); > > -if (!ret) { > +if (!ret && *counter) { > migration_dirty_pages++; > } > return ret; > } If you made this: if (!ret && *counter) { *counter++; } > -static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length) > +void qemu_bitmap_sync_range(ram_addr_t start, ram_addr_t length, > +unsigned long *bitmap, uint64_t *counter) > { > ram_addr_t addr; > unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); > @@ -461,8 +463,8 @@ static void migration_bitmap_sync_range(ram_addr_t start, > ram_addr_t length) > for (k = page; k < page + nr; k++) { > if (src[k]) { > unsigned long new_dirty; > -new_dirty = ~migration_bitmap[k]; > -migration_bitmap[k] |= src[k]; > +new_dirty = ~bitmap[k]; > +bitmap[k] |= src[k]; > new_dirty &= src[k]; > migration_dirty_pages += ctpopl(new_dirty); > src[k] = 0; > @@ -476,7 +478,7 @@ static void migration_bitmap_sync_range(ram_addr_t start, > ram_addr_t length) > cpu_physical_memory_reset_dirty(start + addr, > TARGET_PAGE_SIZE, > DIRTY_MEMORY_MIGRATION); > -migration_bitmap_set_dirty(start + addr); > +qemu_bitmap_set_dirty(start + addr, bitmap, counter); > } > } > } > @@ -497,6 +499,8 @@ static void migration_bitmap_sync(void) > int64_t bytes_xfer_now; > static uint64_t xbzrle_cache_miss_prev; > static uint64_t iterations_prev; > +/* counter = 1 indicates, this will be used for migration */ > +uint64_t counter = 1; > > bitmap_sync_count++; > > @@ -512,7 +516,8 @@ static void migration_bitmap_sync(void) > address_space_sync_dirty_bitmap(&address_space_memory); > > QTAILQ_FOREACH(block, &ram_list.blocks, next) { > -migration_bitmap_sync_range(block->mr->ram_addr, block->length); > +qemu_bitmap_sync_range(block->mr->ram_addr, block->length, > + migration_bitmap, &counter); Then this could become: qemu_bitmap_sync_range(block->mr->ram_addr, block->length, migration_bitmap, &migration_dirty_pages); so you don't have the dummy 'counter' varaible. Dave > } > trace_migration_bitmap_sync_end(migration_dirty_pages > - num_dirty_pages_init); > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h > index 6593be1..fcc3501 100644 > --- a/include/exec/ram_addr.h > +++ b/include/exec/ram_addr.h > @@ -162,5 +162,9 @@ static inline void > cpu_physical_memory_clear_dirty_range(ram_addr_t start, > void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t length, > unsigned client); > > + > +void qemu_bitmap_sync_range(ram_addr_t start, ram_addr_t length, > +unsigned long *bitmap, uint64_t *counter); > + > #endif > #endif > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 2/6] BitmapLog: bitmap dump code
goto log_thread_end; > +} > +log_bitmap_dirty_bitmap_sync(temp_log_bitmap_array); > +} else { > +log_bitmap_dirty_bitmap_sync(b->log_bitmap_array); > +} > +log_bitmap_unlock(); > + > +if (current_ram_bitmap_pages != prev_ram_bitmap_pages) { > +prev_ram_bitmap_pages = current_ram_bitmap_pages; > +bitmap_size = BITS_TO_LONGS(current_ram_bitmap_pages) * > +sizeof(unsigned long); > +if (b->log_bitmap_array) { > +g_free(b->log_bitmap_array); > +} > +b->log_bitmap_array = temp_log_bitmap_array; > +temp_log_bitmap_array = NULL; > +if (log_bitmap_ram_block_info_dump(fd, current_ram_bitmap_pages, > + true)) { > +b->state = LOG_BITMAP_STATE_ERROR; > +goto log_thread_end; > +} > +} else { > +if (log_bitmap_ram_block_info_dump(fd, current_ram_bitmap_pages, > + false)) { > +b->state = LOG_BITMAP_STATE_ERROR; > +goto log_thread_end; > +} > +} > + > +ret = qemu_write_full(fd, b->log_bitmap_array, bitmap_size); > +if (ret < bitmap_size) { > +b->state = LOG_BITMAP_STATE_ERROR; > +goto log_thread_end; > +} > + > +ret = qemu_write_full(fd, &marker, sizeof(char)); > +if (ret < sizeof(char)) { > +b->state = LOG_BITMAP_STATE_ERROR; > +goto log_thread_end; > +} > +g_usleep(b->current_period * 1000); > +b->current_iteration++; > +bitmap_zero(b->log_bitmap_array, current_ram_bitmap_pages); > +} > + > +/* > + * stop the logging period. > + */ > + log_thread_end: > +log_bitmap_close(b); > +log_bitmap_update_status(b); > +qemu_process_set(QEMU_PROCESS_NONE); > +return NULL; > +} > + > +void qmp_log_dirty_bitmap(const char *filename, bool has_iterations, > + int64_t iterations, bool has_period, > + int64_t period, Error **errp) > +{ > +int fd = -1; > +BitmapLogState *b = log_bitmap_get_current_state(); > +Error *local_err = NULL; > + > +if (!runstate_is_running()) { > +error_setg(errp, "Guest is not in a running state"); > +return; > +} > + > +if (!qemu_process_check(QEMU_PROCESS_NONE) && > +!qemu_process_check(QEMU_PROCESS_BITMAP_DUMP)) { > +error_setg(errp, "Dirty bitmap dumping not possible, since %s " > + "is in progress.\n", get_qemu_process_as_string()); > +return; > +} > + > +qemu_process_set(QEMU_PROCESS_BITMAP_DUMP); > + > +if (b->state == LOG_BITMAP_STATE_ACTIVE || > +b->state == LOG_BITMAP_STATE_CANCELING) { > + error_setg(errp, "dirty bitmap dump in progress"); > +return; > +} > + > +b->state = LOG_BITMAP_STATE_NONE; > + > +/* > + * checking the iteration range > + */ > +if (!has_iterations) { > +b->iterations = MIN_ITERATION_VALUE; > +} else if (!value_in_range(iterations, MIN_ITERATION_VALUE, > + MAX_ITERATION_VALUE, "iterations", > &local_err)) { > +if (local_err) { > +error_propagate(errp, local_err); > +} > +return; > +} else { > +b->iterations = iterations; > +} > + > +/* > + * checking the period range > + */ > +if (!has_period) { > +b->current_period = MIN_PERIOD_VALUE; > +} else if (!value_in_range(period, MIN_PERIOD_VALUE, > + MAX_PERIOD_VALUE, "period", &local_err)) { > +if (local_err) { > +error_propagate(errp, local_err); > +} > +return; > +} else { > +b->current_period = period; > +} > + > +fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, > S_IRUSR); > +if (fd < 0) { > +error_setg_file_open(errp, errno, filename); > +return; > +} > + > +b->fd = fd; > +qemu_thread_create(&b->thread, "dirty-bitmap-dump", > + bitmap_logging_thread, b, > + QEMU_THREAD_JOINABLE); > + > +return; > +} > + > void qmp_xen_save_devices_state(const char *filename, Error **errp) > { > QEMUFile *f; > diff --git a/vl.c b/vl.c > index fe451aa..2fa97b3 100644 > --- a/vl.c > +++ b/vl.c > @@ -205,6 +205,8 @@ bool qemu_uuid_set; > static QEMUBootSetHandler *boot_set_handler; > static void *boot_set_opaque; > > +int dirty_bitmap_user; > + > static NotifierList exit_notifiers = > NOTIFIER_LIST_INITIALIZER(exit_notifiers); > > @@ -751,6 +753,27 @@ void vm_start(void) > qapi_event_send_resume(&error_abort); > } > > +/* > + * A global variable to decide which process will only > + * execute migration or bitmap dump > + */ > + > +static QemuProcess dbu = QEMU_PROCESS_NONE; > + > +bool qemu_process_check(QemuProcess user) > +{ > +return user == dbu; > +} > + > +void qemu_process_set(QemuProcess user) > +{ > +dbu = user; > +} > + > +const char *get_qemu_process_as_string(void) > +{ > +return QemuProcess_lookup[dbu]; > +} > > /***/ > /* real time host monotonic timer */ > @@ -4518,6 +4541,7 @@ int main(int argc, char **argv, char **envp) > } > > if (incoming) { > +qemu_process_set(QEMU_PROCESS_MIGRATION); > Error *local_err = NULL; > qemu_start_incoming_migration(incoming, &local_err); > if (local_err) { > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 3/6] BitmapLog: get the information about the parameters
_new = qmp_marshal_input_query_log_dirty_bitmap, > +}, > + > +SQMP > +query-log-dirty-bitmap > +-- > + > +Get the parameters information > + > +- "current-iteration": stores current iteration value > +- "iterations": total iterations value > +- "period": the time difference in milliseconds between each iteration > + > +Example: > + > +-> { "execute": "query-log-dirty-bitmap" } > +<- { "return": { > +"current-iteration": 3 > +"iterations": 10 > + "period": 100 } } > +EQMP > diff --git a/savevm.c b/savevm.c > index 125e5ed..22e84fe 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -1515,6 +1515,23 @@ void qmp_log_dirty_bitmap(const char *filename, bool > has_iterations, > return; > } > > +BitmapLogStateInfo *qmp_query_log_dirty_bitmap(Error **errp) > +{ > +BitmapLogState *b = log_bitmap_get_current_state(); > +BitmapLogStateInfo *info = NULL; > + > +if (b->state != LOG_BITMAP_STATE_ACTIVE) { > +return info; > +} > + > +info = g_malloc0(sizeof(BitmapLogStateInfo)); > +info->current_iteration = b->current_iteration; > +info->iterations = b->iterations; > +info->period = b->current_period; > + > +return info; > +} > + > void qmp_xen_save_devices_state(const char *filename, Error **errp) > { > QEMUFile *f; > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 6/6] BitmapLog: python script for extracting bitmap from a binary file
for i in range(sqrtvalue+1): > +v1 = [] > +for j in range(sqrtvalue): > +v1.append(int(all_digits[i*sqrtvalue+j])) > +v.append(v1) > + > +im_array = array(v) > +figure(r) > +imshow(im_array, cmap=gray()) > +r += 1 > +show() > + > +def dump_bitmap(infile, draw): > +marker = 'M' > +count = 0 > +value = ' ' > +current_ram_bitmap_pages = 0 > +prev_ram_bitmap_pages = 0 > +while True: > +if len(value) == 0 or marker != 'M': > +print "issue with the dump" > +return > +bitmap_page_raw_value = infile.read(long_bytes) > +if not bitmap_page_raw_value: > +break > +current_ram_bitmap_pages = get_long_integer(bitmap_page_raw_value) > +if current_ram_bitmap_pages != prev_ram_bitmap_pages: > +prev_ram_bitmap_pages = current_ram_bitmap_pages > +dump_ram_block_info(infile) > + > +bitmap_length = get_bitmap_length(current_ram_bitmap_pages) > +bitmap_list = [] > +bitmap_raw_value = infile.read(long_bytes * bitmap_length) > +if not bitmap_raw_value: > +break > +count+=1 > +for i in range(bitmap_length): > +mark = i * long_bytes > + > bitmap_list.append((get_unsigned_long_integer(bitmap_raw_value[mark:mark+long_bytes]))) > +complete_bitmap_list.append(bitmap_list) > +value = infile.read(1) > +marker = get_char(value) > +if draw is True: > +generate_images() > +else: > +print complete_bitmap_list > + > +def main(): > + extracter = argparse.ArgumentParser(description='Extract dirty bitmap > from binary file.') > +extracter.add_argument('infile', help='Input file to extract the bitmap') > +extracter.add_argument('-d', action='store_true', dest='draw', > default=False, > +help='Draw a black and white image of the processed dirty > bitmap') > +args = extracter.parse_args() > +print 'The filename is {}'.format(args.infile) > + > +infile = open(format(args.infile), 'rb') > + > +dump_bitmap(infile, args.draw); > + > +infile.close() > + > +if __name__ == '__main__': > +main() > -- > 1.9.3 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v5 6/6] BitmapLog: python script for extracting bitmap from a binary file
* Dr. David Alan Gilbert (dgilb...@redhat.com) wrote: > * Sanidhya Kashyap (sanidhya.ii...@gmail.com) wrote: > > The file not only extracts the bitmap from the file but also draws the > > figure > > if required. Currently, figure is drawn for all the bitmaps. Later, I'll > > make > > the change to draw for different blocks. > > > > The picture is drawn by generating a matrix of 0s and 1s from the bitmap. > > The > > dimensions are calculated on the basis of total bitmap pages which is > > represented > > as sqrt(total pages) X (sqrt(total pages) + 1). The white parts indicate > > non dirtied > > region while the black - dirtied region. > > > > The python code requires some libraries such as numpy, pylab and math to > > generate > > the images. > > This is interesting; I've tried this out with a Fedora boot and recorded > about 240 > frames; the script has problems when trying to convert that big a recording > to images; > 1) It loads it all into memory - so it used about 2GB of RAM > 2) It opens each image as a window - so I then had to close about 240 > windows > > I fiddled with it a bit and created .png files, and then stiched these > together > with imagemagick to create an animation. It's quite interesting watch the OS > boot > and then see me login and start google's stress apptest which suddenly changes > the whole of memory. > > I suggest making it dump each image to a file in a similar way and making it > free the > python image data after each frame, then there shouldn't be a size limit. > (I did try and upload the image, but QEMU's wiki didn't like the size) and here's the (scaled down for size) looping gif: http://wiki.qemu-project.org/images/9/9a/Fedboot50.gif Dave > > Dave > > > > > > Signed-off-by: Sanidhya Kashyap > > --- > > scripts/extract-bitmap.py | 144 > > ++ > > 1 file changed, 144 insertions(+) > > create mode 100755 scripts/extract-bitmap.py > > > > diff --git a/scripts/extract-bitmap.py b/scripts/extract-bitmap.py > > new file mode 100755 > > index 000..942deca > > --- /dev/null > > +++ b/scripts/extract-bitmap.py > > @@ -0,0 +1,144 @@ > > +#!/usr/bin/python > > +# This python script helps in extracting the dirty bitmap present > > +# in the file after executing the log-dirty-bitmap command either > > +# from the qmp or hmp interface. This file only processes binary > > +# file obtained via command. > > +# > > +# Copyright (C) 2014 Sanidhya Kashyap > > +# > > +# Authors: > > +# Sanidhya Kashyap > > +# > > +# > > +# This work is licensed under the terms of the GNU GPL, version 2 or later. > > + > > +import struct > > +import argparse > > +from functools import partial > > +from math import sqrt > > +from numpy import array > > +from pylab import figure,imshow,show,gray > > + > > +long_bytes = 8 > > +byte_size = 8 > > +int_bytes = 4 > > +complete_bitmap_list = [] > > +block_list = [] > > + > > +def get_unsigned_long_integer(value): > > + return struct.unpack(' > + > > +def get_long_integer(value): > > + return struct.unpack(' > + > > +def get_integer(value): > > + return struct.unpack(' > + > > +def get_char(value): > > + return struct.unpack(' > + > > +def get_string(value, length): > > + name = struct.unpack('<'+str(length)+'s', value)[0] > > + for i in range(len(name)): > > + if name[i] == '\x00': > > + return name[:i] > > + > > +def dec2bin(decimal): > > +bin_value = bin(decimal)[2:] > > +if len(bin_value) < long_bytes * byte_size: > > +add_zeroes = long_bytes * byte_size - len(bin_value) > > +for i in range(add_zeroes): > > +bin_value += "0" > > +return str(bin_value) > > + > > +def get_bitmap_length(ram_bitmap_pages): > > +bitmap_length = ram_bitmap_pages / (long_bytes * byte_size) > > +if ram_bitmap_pages % (long_bytes * byte_size) != 0: > > +bitmap_length += 1 > > +return bitmap_length > > + > > +def dump_ram_block_info(infile): > > +total_blocks = get_integer(infile.read(int_bytes)) > > +for i in range(total_blocks): > > +block_name_length = get_integer(infile.read(int_bytes)) > > +block_name = get_string(infile.read(block_name_leng
Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
cc'ing in a couple of the COLOers. * Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote: > On 08/13/2014 10:03 PM, Walid Nouri wrote: > > > >While looking to find some ideas for approaches to replicating block > >devices I have read the paper about the Remus implementation. I think MC > >can take a similar approach for local disk. > > > > I agree. > > >Here are the main facts that I have understood: > > > >Local disk contents is viewed as internal state the primary and secondary. > >In the explanation they describe that for keeping disc semantics of the > >primary and to allow the primary to run speculatively all disc state > >changes are directly written to the disk. In parrallel and asynchronously > >send to the secondary. The secondary keeps the pending writing requests in > >two disk buffers. A speculation-disk-buffer and a write-out-buffer. > > > >After the reception of the next checkpoint the secondary copies the > >speculation buffer to the write out buffer, commits the checkpoint and > >applies the write out buffer to its local disk. > > > >When the primary fails the secondary must wait until write-out-buffer has > >been completely written to disk before before changing the execution mode > >to run as primary. In this case (failure of primary) the secondary > >discards pending disk writes in its speculation buffer. This protocol > >keeps the disc state consistent with the last checkpoint. > > > >Remus uses the XEN specific blktap driver. As far as I know this can?t be > >used with QEMU (KVM). > > > >I must see how drive-mirror can be used for this kind of protocol. > > > > That's all correct. Theoretically, we would do exactly the same thing: > drive-mirror on the source would write immediately to disk but follow the > same commit semantics on the destination as Xen. > > > > >I have taken a look at COLO. > > > > >IMHO there are two points. Custom changes of the TCP-Stack are a no-go for > >proprietary operating systems like Windows. It makes COLO application > >agnostic but not operating system agnostic. The other point is that with > >I/O intensive workloads COLO will tend to behave like MC. This is my point > >of view but i didn?t invest much time to understand everything in detail. > > > > Actually, if I remember correctly, the TCP stack is only modified at the > hypervisor level - they are intercepting and translating TCP sequence > numbers "in-flight" to detect divergence of the source and destination - > which is not a big problem if the implementation is well-done. The 2013 paper says: 'COLO modifies the guest OSâs TCP/IP stack in order to make the behavior more deterministic. ' but does say that an alternative might be to have a ' comparison function that operates transparently over re-assembled TCP streams' > My hope in the future was that the two approaches could be used in a > "Hybrid" manner - actually MC has much more of a performance hit for I/O > than COLO does because of its buffering requirements. > > On the other hand, MC would perform better in a memory-intensive or > CPU-intensive situation - so maybe QEMU could "switch" between the two > mechanisms at different points in time when the resource bottleneck changes. If the primary were to rate-limit the number of resynchronisations (and send the secondary a message as soon as it knew a resync was needed) that would get some of the way, but then the only difference from microcheckpointing at that point is the secondary doing a wasteful copy and sending the packets across; it seems it should be easy to disable those if it knew that a resync was going to happen. Dave > - Michael > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] spapr: Fix stale HTAB during live migration
* Samuel Mendoza-Jonas (sam...@au1.ibm.com) wrote: > If a guest reboots during a running migration, changes to the > hash page table are not necessarily updated on the destination. > Opening a new file descriptor to the HTAB forces the migration > handler to resend the entire table. > > Signed-off-by: Samuel Mendoza-Jonas > --- > hw/ppc/spapr.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 3a6d26d..7733b37 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -997,6 +997,12 @@ static void spapr_reset_htab(sPAPREnvironment *spapr) > /* Kernel handles htab, we don't need to allocate one */ > spapr->htab_shift = shift; > kvmppc_kern_htab = true; > + > +/* Make sure readers are aware of the reset */ > +if (spapr->htab_fd > 0) { > +close(spapr->htab_fd); > +spapr->htab_fd = kvmppc_get_htab_fd(false); > +} If this function can be called during migration, then if we're unlucky can't that happen during a call to htab_save_iterate from the migration thread? If so what would happen if that htab_save_iterate call was made just between the close() and the reopen? Dave > } else { > if (!spapr->htab) { > /* Allocate an htab if we don't yet have one */ > -- > 1.9.3 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > For all NICs(except virtio-net) emulated by qemu, > Such as e1000, rtl8139, pcnet and ne2k_pci, > Qemu can still receive packets when VM is not running. > If this happened in *migration's* last PAUSE VM stage, > The new dirty RAM related to the packets will be missed, > And this will lead serious network fault in VM. I'd like to understand more about when exactly this happens. migration.c:migration_thread in the last step, takes the iothread, puts the runstate into RUN_STATE_FINISH_MIGRATE and only then calls qemu_savevm_state_complete before unlocking the iothread. If that's true, how does this problem happen - can the net code still receive packets with the iothread lock taken? qemu_savevm_state_complete does a call to the RAM code, so I think this should get any RAM changes that happened before the lock was taken. I'm gently worried about threading stuff as well - is all this net code running off fd handlers on the main thread or are there separate threads? Dave > To avoid this, we forbid receiving packets in generic net code when > VM is not running. Also, when the runstate changes back to running, > we definitely need to flush queues to get packets flowing again. > > Here we implement this in the net layer: > (1) Judge the vm runstate in qemu_can_send_packet > (2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState, > Which will listen for VM runstate changes. > (3) Register a handler function for VMstate change. > When vm changes back to running, we flush all queues in the callback function. > (4) Remove checking vm state in virtio_net_can_receive > > Signed-off-by: zhanghailiang > --- > hw/net/virtio-net.c | 4 > include/net/net.h | 2 ++ > net/net.c | 32 > 3 files changed, 34 insertions(+), 4 deletions(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 268eff9..287d762 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc) > VirtIODevice *vdev = VIRTIO_DEVICE(n); > VirtIONetQueue *q = virtio_net_get_subqueue(nc); > > -if (!vdev->vm_running) { > -return 0; > -} > - > if (nc->queue_index >= n->curr_queues) { > return 0; > } > diff --git a/include/net/net.h b/include/net/net.h > index ed594f9..a294277 100644 > --- a/include/net/net.h > +++ b/include/net/net.h > @@ -8,6 +8,7 @@ > #include "net/queue.h" > #include "migration/vmstate.h" > #include "qapi-types.h" > +#include "sysemu/sysemu.h" > > #define MAX_QUEUE_NUM 1024 > > @@ -96,6 +97,7 @@ typedef struct NICState { > NICConf *conf; > void *opaque; > bool peer_deleted; > +VMChangeStateEntry *vmstate; > } NICState; > > NetClientState *qemu_find_netdev(const char *id); > diff --git a/net/net.c b/net/net.c > index 6d930ea..21f0d48 100644 > --- a/net/net.c > +++ b/net/net.c > @@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo *info, > return nc; > } > > +static void nic_vmstate_change_handler(void *opaque, > + int running, > + RunState state) > +{ > +NICState *nic = opaque; > +NetClientState *nc; > +int i, queues; > + > +if (!running) { > +return; > +} > + > +queues = MAX(1, nic->conf->peers.queues); > +for (i = 0; i < queues; i++) { > +nc = &nic->ncs[i]; > +if (nc->receive_disabled > +|| (nc->info->can_receive && !nc->info->can_receive(nc))) { > +continue; > +} > +qemu_flush_queued_packets(nc); > +} > +} > + > NICState *qemu_new_nic(NetClientInfo *info, > NICConf *conf, > const char *model, > @@ -259,6 +282,8 @@ NICState *qemu_new_nic(NetClientInfo *info, > nic->ncs = (void *)nic + info->size; > nic->conf = conf; > nic->opaque = opaque; > +nic->vmstate = > qemu_add_vm_change_state_handler(nic_vmstate_change_handler, > +nic); > > for (i = 0; i < queues; i++) { > qemu_net_client_setup(&nic->ncs[i], info, peers[i], model, name, > @@ -379,6 +404,7 @@ void qemu_del_nic(NICState *nic) > qemu_free_net_client(nc); > } > > +qemu_del_vm_change_state_handler(nic->vmstate); > g_free(nic); > } > > @@ -452,6 +478,12 @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len) > > int qemu_can_send_packet(NetClientState *sender) > { > +int vmstat = runstate_is_running(); > + > +if (!vmstat) { > +return 0; > +} > + > if (!sender->peer) { > return 1; > } > -- > 1.7.12.4 > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH V3] spapr: Fix stale HTAB during live migration
* Samuel Mendoza-Jonas (sam...@au1.ibm.com) wrote: > If a guest reboots during a running migration, changes to the > hash page table are not necessarily updated on the destination. > Opening a new file descriptor to the HTAB forces the migration > handler to resend the entire table. Yes I think that's safe. > Signed-off-by: Samuel Mendoza-Jonas > --- > Changes in v3: Pointed out by David, htab_save_iterate could > potentially try to read before htab_fd is open again. > Leave opening the fd to the functions trying to read. > Changes in v2: Forgot check on kvmppc_get_htab_fd return value > hw/ppc/spapr.c | 25 + > include/hw/ppc/spapr.h | 1 + > 2 files changed, 26 insertions(+) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 3a6d26d..5b41318 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -997,6 +997,10 @@ static void spapr_reset_htab(sPAPREnvironment *spapr) > /* Kernel handles htab, we don't need to allocate one */ > spapr->htab_shift = shift; > kvmppc_kern_htab = true; > + > +/* Check if we are overlapping a migration */ > +if (spapr->htab_fd > 0) > +spapr->need_reset = true; > } else { > if (!spapr->htab) { > /* Allocate an htab if we don't yet have one */ > @@ -1156,6 +1160,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque) > } else { > assert(kvm_enabled()); > > +spapr->need_reset = false; > spapr->htab_fd = kvmppc_get_htab_fd(false); > if (spapr->htab_fd < 0) { > fprintf(stderr, "Unable to open fd for reading hash table from > KVM: %s\n", > @@ -1309,6 +1314,16 @@ static int htab_save_iterate(QEMUFile *f, void *opaque) > if (!spapr->htab) { > assert(kvm_enabled()); > > +if (atomic_cmpxchg(&spapr->need_reset, true, false) == true) { > +close(spapr->htab_fd); > +spapr->htab_fd = kvmppc_get_htab_fd(false); > +if (spapr->htab_fd < 0) { > +fprintf(stderr, "Unable to open fd for reading hash table > from KVM: %s\n", > +strerror(errno)); Either perror or error_report() with the strerror would seem better. > +return -1; > +} > +} > + Why not make a little function for this; it seems a bad idea to have two copies of it. Also, add a comment saying why you're reopening it. Dave > rc = kvmppc_save_htab(f, spapr->htab_fd, >MAX_KVM_BUF_SIZE, MAX_ITERATION_NS); > if (rc < 0) { > @@ -1340,6 +1355,16 @@ static int htab_save_complete(QEMUFile *f, void > *opaque) > > assert(kvm_enabled()); > > +if (atomic_cmpxchg(&spapr->need_reset, true, false) == true) { > +close(spapr->htab_fd); > +spapr->htab_fd = kvmppc_get_htab_fd(false); > +if (spapr->htab_fd < 0) { > +fprintf(stderr, "Unable to open fd for reading hash table > from KVM: %s\n", > +strerror(errno)); > +return -1; > +} > +} > + > rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1); > if (rc < 0) { > return rc; > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 0c2e3c5..9ab9827 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -71,6 +71,7 @@ typedef struct sPAPREnvironment { > int htab_save_index; > bool htab_first_pass; > int htab_fd; > +bool need_reset; > > /* state for Dynamic Reconfiguration Connectors */ > sPAPRDrcEntry drc_table[SPAPR_DRC_TABLE_SIZE]; > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > On 2014/8/18 20:27, Dr. David Alan Gilbert wrote: > >* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >>For all NICs(except virtio-net) emulated by qemu, > >>Such as e1000, rtl8139, pcnet and ne2k_pci, > >>Qemu can still receive packets when VM is not running. > >>If this happened in *migration's* last PAUSE VM stage, > >>The new dirty RAM related to the packets will be missed, > >>And this will lead serious network fault in VM. > > > >I'd like to understand more about when exactly this happens. > >migration.c:migration_thread in the last step, takes the iothread, puts > >the runstate into RUN_STATE_FINISH_MIGRATE and only then calls > >qemu_savevm_state_complete > >before unlocking the iothread. > > > > Hmm, Sorry, the description above may not be exact. > > Actually, the action of receiving packets action happens after the > migration thread unlock the iothread-lock(when VM is stopped), but > before the end of the migration(to be exact, before close the socket > fd, which is used to send data to destination). > > I think before close the socket fd, we can not assure all the RAM of the > VM has been copied to the memory of network card or has been send to > the destination, we still have the chance to modify its content. (If i > am wrong, please let me know, Thanks. ;) ) > > If the above situation happens, it will dirty parts of the RAM which > will be send and parts of new dirty RAM page may be missed. As a result, > The contents of memory in the destination is not integral, And this > will lead network fault in VM. Interesting; the only reason I can think that would happen, is because arch_init.c:ram_save_page uses qemu_put_buffer_async to send most of the RAM pages and that won't make sure the write happens until later. This fix will probably help other migration code as well; postcopy runs the migration for a lot longer after stopping the CPUs, and I am seeing something that might be due to this very occasionally. Dave > Here i have made a test: > (1) Download the new qemu. > (2) Extend the time window between qemu_savevm_state_complete and > migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch > like(this will extend the chances of receiving packets between the time > window): > diff --git a/migration.c b/migration.c > index 8d675b3..597abf9 100644 > --- a/migration.c > +++ b/migration.c > @@ -614,7 +614,7 @@ static void *migration_thread(void *opaque) > qemu_savevm_state_complete(s->file); > } > qemu_mutex_unlock_iothread(); > - > +sleep(2);/*extend the time window between stop vm and end > migration*/ > if (ret < 0) { > migrate_set_state(s, MIG_STATE_ACTIVE, > MIG_STATE_ERROR); > break; > (3) Start VM (suse11sp1) and in VM ping xxx -i 0.1 > (4) Migrate the VM to another host. > > And the *PING* command in VM will very likely fail with message: > 'Destination HOST Unreachable', the NIC in VM will stay unavailable > unless you run 'service network restart'.(without step 2, you may have > to migration lots of times between two hosts before that happens). > > Thanks, > zhanghailiang > > >If that's true, how does this problem happen - can the net code > >still receive packets with the iothread lock taken? > >qemu_savevm_state_complete does a call to the RAM code, so I think this > >should get > >any RAM changes that happened before the lock was taken. > > > >I'm gently worried about threading stuff as well - is all this net code > >running off fd handlers on the main thread or are there separate threads? > > > >Dave > > > >>To avoid this, we forbid receiving packets in generic net code when > >>VM is not running. Also, when the runstate changes back to running, > >>we definitely need to flush queues to get packets flowing again. > >> > >>Here we implement this in the net layer: > >>(1) Judge the vm runstate in qemu_can_send_packet > >>(2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState, > >>Which will listen for VM runstate changes. > >>(3) Register a handler function for VMstate change. > >>When vm changes back to running, we flush all queues in the callback > >>function. > >>(4) Remove checking vm state in virtio_net_can_receive > >> > >>Signed-off-by: zhanghailiang > >>--- > >> hw/net/virtio-net.c | 4 > >> include/net/net.h
[Qemu-devel] [Bug 1358619] Re: keep savevm/loadvm and migration cause snapshot crash
Hi, Can I just check, when you do the incoming migrate, do you wait for the incoming migrate to finish before you do the loadvm, or do you do the loadvm during the incoming migrate? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1358619 Title: keep savevm/loadvm and migration cause snapshot crash Status in QEMU: New Bug description: --Version: qemu 2.1.0 public release --OS: CentOS release 6.4 --gcc: 4.4.7 Hi: I found problems when doing some tests on qemu migration and savevm/loadvm. On my experiment, a quest is migrated between two same host back and forth. Source host would savevm after migration completed and incoming host loadvm before migration coming. image=./image/centos-1.qcow2 Migration Part qemu-system-x86_64 \ -qmp tcp:$host_ip:4451,server,nowait \ -drive file=$image \ --enable-kvm \ -monitor stdio \ -m 8192 \ -device virtio-net-pci,netdev=net0,mac=$mac \ -netdev tap,id=net0,script=./qemu-ifup Incoming Part qemu-system-x86_64 \ -qmp tcp:$host_ip:4451,server,nowait \ -incoming tcp:0:4449 \ --enable-kvm \ -monitor stdio \ -drive file=$image \ -m 8192 \ -device virtio-net-pci,netdev=net0,mac=$mac \ -netdev tap,id=net0,script=./qemu-ifup Command lines: host1 $: qemu-system-x86_64 //migration part host2 $: qemu-system-x86_64 ...incoming... //incoming part After finishing boot host1 (qemu) : migrate tcp:host2:4449 host1 (qemu) : savevm s1 host1 (qemu) : quit host1 $: qemu-system-x86_64 ...incoming... //incoming part host1 (qemu) : loadvm s1 host2 (qemu) : migrate tcp:host1:4449 host2 (qemu) : savevm s2 host2 (qemu) : quit host2 $: qemu-system-x86_64 ...incoming... //incoming part host2 (qemu) : loadvm s2 host1 (qemu) : migrate tcp:host2:4449 host1 (qemu) : savevm s3 host1 (qemu) : quit host1 $: qemu-system-x86_64 ...incoming... //incoming part host1 (qemu) : loadvm s3 I wish those operation can be success every time. However problem happened irregularly when loadvm and error messages are not the same. host1 (qemu) : loadvm s3 qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt. Error -5 while activating snapshot 's3' on 'ide0-hd0' or host2 (qemu) : loadvm s2 Error -22 while loading VM state I have done some sample test on savevm/loadvm On same host (qemu) savevm test1 (qemu) loadvm test1 (qemu) savevm test2 (qemu) loadvm test2 (qemu) savevm test3 (qemu) loadvm test3 (qemu) savevm test4 (qemu) loadvm test4 (qemu) savevm test5 (qemu) loadvm test5 (qemu) savevm test6 (qemu) loadvm test6 This is OK. No any problem. Any idea? I think it is related to migration. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1358619/+subscriptions
Re: [Qemu-devel] [RFC 0/4] Adding -cdrom, -hd[abcd] and -drive file=... to Q35
* John Snow (js...@redhat.com) wrote: > The changes appear to work well, but where I'd like some feedback > is what should happen if people do something like: > > qemu -M q35 -drive if=ide,file=fedora.qcow2 > > The code as presented here is not going to look for or attempt to > connect IDE devices, because it is now looking for /AHCI/ devices. What happens if you try it - does it just silently ignore that -drive? The ideal would be for something to moan about unused drives so people realise why it's broken; but I seem to remember form a previous discussion it's hard to do on all platforms. Dave > > At worst, this may break a few existing scripts, but I actually think > that since the if=ide,file=... shorthand never worked to begin with, > the impact might actually be minimal. > > But since the legacy IDE interface of the ICH9 is as of yet unemulated, > the if=ide drives don't have a reasonable place to go yet. I am also > not sure what a reasonable way to handle people specifying BOTH > if=ide and if=ahci drives would be. > > John Snow (4): > blockdev: add if_get_max_devs > blockdev: add IF_AHCI to support -cdrom and -hd[a-d] > ide: update ide_drive_get to work with both PCI-IDE and AHCI > interfaces > ahci: implement -cdrom and -hd[a-d] > > blockdev.c| 18 +++--- > hw/i386/pc_piix.c | 2 +- > hw/i386/pc_q35.c | 4 > hw/ide/ahci.c | 17 + > hw/ide/ahci.h | 2 ++ > hw/ide/core.c | 11 +++ > include/hw/ide.h | 3 ++- > include/sysemu/blockdev.h | 3 ++- > 8 files changed, 50 insertions(+), 10 deletions(-) > > -- > 1.9.3 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode
* Paolo Bonzini (pbonz...@redhat.com) wrote: Hi Paolo, Apologies, I realised I hadn't dug into this comment. > Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto: > >From: "Dr. David Alan Gilbert" > > > >Switch to postcopy if: > > 1) There's still a significant amount to transfer > > 2) Postcopy is enabled > > 3) It's taken longer than the time set by the parameter. > > > >and change the cleanup at the end of migration to match. > > > >Signed-off-by: Dr. David Alan Gilbert > >--- > > migration.c | 92 > > - > > 1 file changed, 73 insertions(+), 19 deletions(-) > > > >diff --git a/migration.c b/migration.c > >index 0d567ef..c73fcfa 100644 > >--- a/migration.c > >+++ b/migration.c > >@@ -982,16 +982,40 @@ static int postcopy_start(MigrationState *ms) > > static void *migration_thread(void *opaque) > > { > > MigrationState *s = opaque; > >+/* Used by the bandwidth calcs, updated later */ > > int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > >+/* Really, the time we started */ > >+const int64_t initial_time_fixed = initial_time; > > int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > int64_t initial_bytes = 0; > > int64_t max_size = 0; > > int64_t start_time = initial_time; > >+int64_t pc_start_time; > >+ > > bool old_vm_running = false; > >+pc_start_time = > >s->tunables[MIGRATION_PARAMETER_NAME_X_POSTCOPY_START_TIME]; > >+ > >+/* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ > >+enum MigrationPhase current_active_type = MIG_STATE_ACTIVE; > > > > qemu_savevm_state_begin(s->file, &s->params); > > > >+if (migrate_postcopy_ram()) { > >+/* Now tell the dest that it should open it's end so it can reply */ > >+qemu_savevm_send_openrp(s->file); > >+ > >+/* And ask it to send an ack that will make stuff easier to debug */ > >+qemu_savevm_send_reqack(s->file, 1); > >+ > >+/* Tell the destination that we *might* want to do postcopy later; > >+ * if the other end can't do postcopy it should fail now, nice and > >+ * early. > >+ */ > >+qemu_savevm_send_postcopy_ram_advise(s->file); > >+} > >+ > > s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; > >+current_active_type = MIG_STATE_ACTIVE; > > migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE); > > > > DPRINTF("setup complete\n"); > >@@ -1012,37 +1036,66 @@ static void *migration_thread(void *opaque) > > " nonpost=%" PRIu64 ")\n", > > pending_size, max_size, pend_post, pend_nonpost); > > if (pending_size && pending_size >= max_size) { > >-qemu_savevm_state_iterate(s->file); > >+/* Still a significant amount to transfer */ > >+ > >+current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > >+if (migrate_postcopy_ram() && > >+s->state != MIG_STATE_POSTCOPY_ACTIVE && > >+pend_nonpost == 0 && > >+(current_time >= initial_time_fixed + pc_start_time)) { > >+ > >+if (!postcopy_start(s)) { > >+current_active_type = MIG_STATE_POSTCOPY_ACTIVE; > >+} > >+ > >+continue; > >+} else { > > You don't really need the "else" if you have a continue. However, do you > need _any_ of the "else" and "continue"? Would the next iteration of the > "while" loop do anything else but invoking qemu_savevm_state_iterate. Yes, I've dropped that 'else'; however, I've kept the continue - we're about 3 if's deep here inside the loop and there's a bunch of stuff at the end of the if's but still inside the loop that I'm not 100% sure I want to run again at this point (although it's probably OK). > >+/* Just another iteration step */ > >+qemu_savevm_state_iterate(s->file); > >+} > > } else { > > int ret; > > > >-DPRINTF("done iterating\n"); > >-
Re: [Qemu-devel] [PATCH v2] qxl-render: add more sanity checks
* Gerd Hoffmann (kra...@redhat.com) wrote: > Damn, the dirty rectangle values are signed integers. So the checks > added by commit 788fbf042fc6d5aaeab56757e6dad622ac5f0c21 are not good > enouth, we also have to make sure they are not negative. > > [ Note: There must be something broken in spice-server so we get > negative values in the first place. Bug opened: > https://bugzilla.redhat.com/show_bug.cgi?id=1135372 ] > > Signed-off-by: Gerd Hoffmann > --- > hw/display/qxl-render.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c > index cc2c2b1..bcc5c37 100644 > --- a/hw/display/qxl-render.c > +++ b/hw/display/qxl-render.c > @@ -138,7 +138,9 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice > *qxl) > if (qemu_spice_rect_is_empty(qxl->dirty+i)) { > break; > } > -if (qxl->dirty[i].left > qxl->dirty[i].right || > +if (qxl->dirty[i].left < 0 || > +qxl->dirty[i].top < 0 || > +qxl->dirty[i].left > qxl->dirty[i].right || > qxl->dirty[i].top > qxl->dirty[i].bottom || > qxl->dirty[i].right > qxl->guest_primary.surface.width || > qxl->dirty[i].bottom > qxl->guest_primary.surface.height) { Reviewed-by: Dr. David Alan Gilbert Should this go for stable as well? (I was worried for a sec about what happens if right=width or bottom=height; but looking at the code below it I think it's a dirty from left..(right-1) so we're OK?) Dave > -- > 1.8.3.1 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
* Stefan Hajnoczi (stefa...@redhat.com) wrote: > How to synchronize with an IOThread > --- > AioContext is not thread-safe so some rules must be followed when using file > descriptors, event notifiers, timers, or BHs across threads: > > 1. AioContext functions can be called safely from file descriptor, event > notifier, timer, or BH callbacks invoked by the AioContext. No locking is > necessary. > > 2. Other threads wishing to access the AioContext must use > aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > context is acquired no other thread can access it or run event loop iterations > in this AioContext. > > aio_context_acquire()/aio_context_release() calls may be nested. This > means you can call them if you're not sure whether #1 applies. > > Side note: the best way to schedule a function call across threads is to > create > a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No > acquire/release or locking is needed for the qemu_bh_schedule() call. But be > sure to acquire the AioContext for aio_bh_new() if necessary. How do these IOThreads pause during migration? Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 7/7] hw/net/stellaris_enet: Convert to vmstate
* Peter Maydell (peter.mayd...@linaro.org) wrote: > Convert this device to use vmstate for its save/load, including > providing a post_load function that sanitizes inbound data to > avoid possible buffer overflows if it is malicious. > > The sanitizing fixes CVE-2013-4532 (though nobody should be > relying on the security properties of most of the unmaintained > ARM board models anyway, and migration doesn't actually > work on this board due to issues in other device models). > > Signed-off-by: Peter Maydell Reviewed-by: Dr. David Alan Gilbert > --- > hw/net/stellaris_enet.c | 148 > ++-- > 1 file changed, 80 insertions(+), 68 deletions(-) > > diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c > index 9e8f143..c9ee5d3 100644 > --- a/hw/net/stellaris_enet.c > +++ b/hw/net/stellaris_enet.c > @@ -47,6 +47,11 @@ do { fprintf(stderr, "stellaris_enet: error: " fmt , ## > __VA_ARGS__);} while (0) > OBJECT_CHECK(stellaris_enet_state, (obj), TYPE_STELLARIS_ENET) > > typedef struct { > +uint8_t data[2048]; > +uint32_t len; > +} StellarisEnetRxFrame; > + > +typedef struct { > SysBusDevice parent_obj; > > uint32_t ris; > @@ -59,22 +64,89 @@ typedef struct { > uint32_t mtxd; > uint32_t mrxd; > uint32_t np; > -int tx_fifo_len; > +uint32_t tx_fifo_len; > uint8_t tx_fifo[2048]; > /* Real hardware has a 2k fifo, which works out to be at most 31 packets. > We implement a full 31 packet fifo. */ > -struct { > -uint8_t data[2048]; > -int len; > -} rx[31]; > -int rx_fifo_offset; > -int next_packet; > +StellarisEnetRxFrame rx[31]; > +uint32_t rx_fifo_offset; > +uint32_t next_packet; > NICState *nic; > NICConf conf; > qemu_irq irq; > MemoryRegion mmio; > } stellaris_enet_state; > > +static const VMStateDescription vmstate_rx_frame = { > +.name = "stellaris_enet/rx_frame", > +.version_id = 1, > +.minimum_version_id = 1, > +.fields = (VMStateField[]) { > +VMSTATE_UINT8_ARRAY(data, StellarisEnetRxFrame, 2048), > +VMSTATE_UINT32(len, StellarisEnetRxFrame), > +VMSTATE_END_OF_LIST() > +} > +}; > + > +static int stellaris_enet_post_load(void *opaque, int version_id) > +{ > +stellaris_enet_state *s = opaque; > +int i; > + > +/* Sanitize inbound state. Note that next_packet is an index but > + * np is a size; hence their valid upper bounds differ. > + */ > +if (s->next_packet >= ARRAY_SIZE(s->rx)) { > +return -1; > +} > + > +if (s->np > ARRAY_SIZE(s->rx)) { > +return -1; > +} > + > +for (i = 0; i < ARRAY_SIZE(s->rx); i++) { > +if (s->rx[i].len > ARRAY_SIZE(s->rx[i].data)) { > +return -1; > +} > +} > + > +if (s->rx_fifo_offset > ARRAY_SIZE(s->rx[0].data) - 4) { > +return -1; > +} > + > +if (s->tx_fifo_len > ARRAY_SIZE(s->tx_fifo)) { > +return -1; > +} > + > +return 0; > +} > + > +static const VMStateDescription vmstate_stellaris_enet = { > +.name = "stellaris_enet", > +.version_id = 2, > +.minimum_version_id = 2, > +.post_load = stellaris_enet_post_load, > +.fields = (VMStateField[]) { > +VMSTATE_UINT32(ris, stellaris_enet_state), > +VMSTATE_UINT32(im, stellaris_enet_state), > +VMSTATE_UINT32(rctl, stellaris_enet_state), > +VMSTATE_UINT32(tctl, stellaris_enet_state), > +VMSTATE_UINT32(thr, stellaris_enet_state), > +VMSTATE_UINT32(mctl, stellaris_enet_state), > +VMSTATE_UINT32(mdv, stellaris_enet_state), > +VMSTATE_UINT32(mtxd, stellaris_enet_state), > +VMSTATE_UINT32(mrxd, stellaris_enet_state), > +VMSTATE_UINT32(np, stellaris_enet_state), > +VMSTATE_UINT32(tx_fifo_len, stellaris_enet_state), > +VMSTATE_UINT8_ARRAY(tx_fifo, stellaris_enet_state, 2048), > +VMSTATE_STRUCT_ARRAY(rx, stellaris_enet_state, 31, 1, > + vmstate_rx_frame, StellarisEnetRxFrame), > +VMSTATE_UINT32(rx_fifo_offset, stellaris_enet_state), > +VMSTATE_UINT32(next_packet, stellaris_enet_state), > +VMSTATE_END_OF_LIST() > +} > +}; > + > static void stellaris_enet_update(stellaris_enet_state *s) > { > qemu_set_irq(s->irq, (s->ris & s->im) != 0); > @@ -379,63 +451,6 @@ static void stellaris_enet_reset(stellaris_enet_state *s) > s->tx_f
Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
* Stefan Hajnoczi (stefa...@gmail.com) wrote: > On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert > wrote: > > * Stefan Hajnoczi (stefa...@redhat.com) wrote: > > > > > > > >> How to synchronize with an IOThread > >> --- > >> AioContext is not thread-safe so some rules must be followed when using > >> file > >> descriptors, event notifiers, timers, or BHs across threads: > >> > >> 1. AioContext functions can be called safely from file descriptor, event > >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is > >> necessary. > >> > >> 2. Other threads wishing to access the AioContext must use > >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > >> context is acquired no other thread can access it or run event loop > >> iterations > >> in this AioContext. > >> > >> aio_context_acquire()/aio_context_release() calls may be nested. This > >> means you can call them if you're not sure whether #1 applies. > >> > >> Side note: the best way to schedule a function call across threads is to > >> create > >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). > >> No > >> acquire/release or locking is needed for the qemu_bh_schedule() call. But > >> be > >> sure to acquire the AioContext for aio_bh_new() if necessary. > > > > How do these IOThreads pause during migration? > > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread > > calls? > > Currently the only IOThread user is virtio-blk data-plane. It has a > VM state change listener registered that will stop using the IOThread > during migration. > > In the future we'll have to do more than that: > It is possible to suspend all IOThreads simply by looping over > IOThread objects and calling aio_context_acquire() on their > AioContext. You can release the AioContexts when you are done. This > would be suitable for a "stop the world" operation for migration > hand-over. That worries me for two reasons: 1) I'm assuming there is some subtlety so that it doesn't deadlock when another thread is trying to get a couple of contexts. 2) The migration code that has to pause everything is reasonably time critical (OK not super critical - but it worries if it gains more than a few ms). Doing something to each thread in series where that thread might have to finish up a transaction sounds like it could add together to be quite large. > For smaller one-off operations like block-migration.c it may also make > sense to acquire/release the AioContext. But that's not necessary > today since dataplane is disabled during migration. I guess it's probably right to hide this behind some interface on the Aio stuff that migration can call and it can worry about speed, and locking order etc. I also would we end up wanting some IOThreads to continue - e.g. could we be using them for transport of the migration stream or are they strictly for the guests use? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Question about RAM migration flags
* Peter Lieven (p...@kamp.de) wrote: > Hi, > > while working on ram migration and reading through the code I realized that > qemu does not stop loading a saved VM or rejecting an incoming migration > if there is a flag in the stream that it does not understand. An unknown flag > is simply ignored. > > In the block migration code there is a catch at the end complaining about > unknown flags, but in RAM migration there isn't. > > Is this on purpose or an error? I think it's in error; the code doesn't have much checking. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] cirrus_vga: adding sanity check for vram size
* arei.gong...@huawei.com (arei.gong...@huawei.com) wrote: > From: Gonglei > > when configure a invalid vram size for cirrus card, such as less > 2 MB, which will crash qemu. Follow the real hardware, the cirrus > card has 4 MB video memory. Also for backward compatibility, accept > 8 MB and 16 MB vram size. virt-manager/libvirt seems to default to 9 MByte of Vram for cirrus, so this would break a lot of setups. Looking at datasheets on the web seems to say the chips actually went down to 1 MB or less. I think before doing this change, it would be good to understand where the weird 9MB in libvirt/virt-manager came from, and what the limits of the emulator/drivers are. Also, is there something broken at the moment - why make the change? Dave > Signed-off-by: Gonglei > --- > For isa cirrus vga device, its' init function has been droped at > commit db895a1e6a97e919f9b86d60c969377357b05066. I have no idea how adding > check on isa_cirrus_vga device. Any ideas? Thanks. > > hw/display/cirrus_vga.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c > index d1afc76..5fec068 100644 > --- a/hw/display/cirrus_vga.c > +++ b/hw/display/cirrus_vga.c > @@ -2959,6 +2959,13 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev) > PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev); > int16_t device_id = pc->device_id; > > + /* follow real hardware, cirrus card emulated has 4 MB video memory. > + Also accept 8 MB/16 MB for backward compatibility. */ > + if (s->vga.vram_size_mb != 4 || s->vga.vram_size_mb != 8 || > + s->vga.vram_size_mb != 16) { > + error_report("Invalid cirrus_vga ram size '%u'\n", > s->vga.vram_size_mb); > + return -1; > + } > /* setup VGA */ > vga_common_init(&s->vga, OBJECT(dev), true); > cirrus_init_common(s, OBJECT(dev), device_id, 1, pci_address_space(dev), > -- > 1.7.12.4 > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] cirrus_vga: adding sanity check for vram size
* Gerd Hoffmann (kra...@redhat.com) wrote: > Hi, > > > virt-manager/libvirt seems to default to 9 MByte of Vram for cirrus, > > so this would break a lot of setups. > > It wouldn't. libvirt sticks that into the xml, but it doesn't set any > qemu parameters. The libvirt parameter actually predates the qemu > property for setting the size. Yeuch messy. > > Looking at datasheets on the web seems to say the chips actually went > > down to 1 MB or less. > > I have my doubts we emulate that correctly (register telling the guest > how much memory is actually there etc.). Also it is pretty much useless > these days, even the 4MB imply serious constrains when FullHD displays > are commonplace. Newer cirrus drivers such as the kernel's drm driver > are specifically written to qemu's cirrus cards, I have my doubs that > they are prepared to handle 1MB cirrus cards correctly. > > Bottom line: Allowing less than 4MB is asking for trouble for no good > reason ;) OK, so checking for 4MB/8MB/16MB is probably safe, and it also would have the benefit of shouting if someone fixed libvirt and it started trying to configure a 9MB version. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: > Hello list, > > i was trying to migrate older Qemu (1.5 and 1.7.2) to a machine running > Qemu 2.0. > > I started the target machine with: > > -machine type=pc-i440fx-1.5 / -machine type=pc-i440fx-1.7 I'd expect you to have to run with the same machine type on both sides. I ran some simple virt-test migrate tests (just the basic one) and got v1.5.3->v1.6.2 v1.5.3->v1.7.1 v1.5.3->v2.0.0-rc1 working for most machine types, pc-i440fx-1.5 passed with all of those. Note that's only the simplest test. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Migration from older Qemu to Qemu 2.0.0 does not work
* Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: > > > Am 09.05.2014 um 15:41 schrieb "Dr. David Alan Gilbert" > > : > > > > * Stefan Priebe - Profihost AG (s.pri...@profihost.ag) wrote: > >> Hello list, > >> > >> i was trying to migrate older Qemu (1.5 and 1.7.2) to a machine running > >> Qemu 2.0. > >> > >> I started the target machine with: > >> > >> -machine type=pc-i440fx-1.5 / -machine type=pc-i440fx-1.7 > > > > I'd expect you to have to run with the same machine type on both sides. > > I ran some simple virt-test migrate tests (just the basic one) and got > > v1.5.3->v1.6.2 > > v1.5.3->v1.7.1 > > v1.5.3->v2.0.0-rc1 > > working for most machine types, pc-i440fx-1.5 passed with all of those. > > Note that's only the simplest test. > > My test involved USB Controller virtio Network cards and rbd virtio-scsi > drives. That probably makes things more interesting :-) I'd start with the simplest config you can find and add each one of those until it breaks. > Can you give me your simple start line so I could test if this works for me > while adding some more arguments. I've got a slightly hacked up libvirt, but I'm doing ./run -t qemu --machine-type=$MACTYPE -g Linux.JeOS.19.x86_64.i440fx --qemu-bin=$SRC --qemu-dst-bin=$DST --qemu_sandbox=off --tests migrate.default.tcp --disk-bus=ide and looking at the logs we have: /opt/qemu/v1.5.3/bin/qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-i440fx-1.5 \ -nodefaults \ -vga std \ -chardev socket,id=hmp_id_hmp1,path=/tmp/monitor-hmp1-20140415-104517-rAfzfDef,server,nowait \ -mon chardev=hmp_id_hmp1,mode=readline \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140415-104517-rAfzfDef,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20140415-104517-rAfzfDef,path=/tmp/seabios-20140415-104517-rAfzfDef,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20140415-104517-rAfzfDef,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/home/dgilbert/virt-test/shared/data/images/jeos-19-64.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:18:19:1a:1b:1c,id=idz0uWFP,vectors=4,netdev=idMwmdMc,bus=pci.0,addr=05 \ -netdev user,id=idMwmdMc,hostfwd=tcp::5000-:22 \ -m 1024 \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu 'SandyBridge' \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm Dave > Thanks! > > Stefan > > > > > Dave > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Qemu live migration code
* Bechir Bani (bechir.b...@gmail.com) wrote: > Hi , > > There is someone who can explain to me the role of two trace points in the > file* Migration.c * > > The trace points are : > > * migrate_pending* : what are the attributes *pending size *and *max* ? 'pending_size' is an estimate of the amount of data left to be transferred in the iterative part. 'max_size' is an estimate of the amount of data that can be transferred in the 'max_downtime' period Thus if pending_size < max_size it can stop iterating and transfer the last part. > * migrate_transferred*: what are the attributes *transferred* ,* > time_spent* , *bandwidth* and *max_size* ? Every so often it recalculates bandwidths etc: transferred_bytes - actual bytes transferred during migration since the last time it did that set of calculations time_spent - time since it last did that set of calculations bandwidth - transferred_bytes / time_spent max_size = as in the previous trace point. Dave > > > Thank you ! > > > 2014-03-25 6:03 GMT-04:00 Sanidhya Kashyap : > > > > > > > > > On Tue, Mar 25, 2014 at 2:10 AM, Bechir Bani wrote: > > > >> Hi Sanidhya, > >> > >> > >> Which function in savevm.c can tell me about the stop time ? > >> > >> > > the migration thread function in migration.c file which has function name > > - qemu_savevm_state_complete will get executed in the stop and copy phase. > > > > > >> > >> 2014-03-24 13:46 GMT-04:00 Sanidhya Kashyap : > >> > >> savevm.c will tell you about the stop time. > >>> > >>> arch_init.c (ram_save_block) will tell about the number of pages > >>> transferred. > >>> > >>> > >>> On Mon, Mar 24, 2014 at 10:51 PM, Bechir Bani > >>> wrote: > >>> > >>>> I have a task to add trace points in the source code of Qemu. The goal > >>>> is to know the number of pages transferred at each iteration and stop > >>>> time > >>>> of the machine as well. > >>>> > >>>> > >>>> 2014-03-24 12:50 GMT-04:00 Dr. David Alan Gilbert > >>>> : > >>>> > >>>> * Bechir Bani (bechir.b...@gmail.com) wrote: > >>>>> > Hi, > >>>>> > > >>>>> > I want to know the source code of qemu which is responsible for the > >>>>> > migration of virtual machines, more precisely where the part of the > >>>>> code > >>>>> > that describes the stages of memory transfer. is that you can help > >>>>> me? > >>>>> > >>>>> It's split around a few files; memory is mostly in arch_init.c; > >>>>> It's something like: > >>>>> > >>>>>migration.c Overall management > >>>>> savevm.c > >>>>> qemu-file.cFile buffering/bytes on the wire > >>>>> vmstate.c Structured saving of individual devices > >>>>> arch_init.cRAM special code, and a few other things > >>>>> > >>>>> What are you trying to do/change? > >>>>> > >>>>> Dave > >>>>> > >>>>> -- > >>>>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> *Béchir Bani * > >>>> > >>>> *Ecole Polytechnique de Montréal * > >>>> > >>>> *Laboratoire DORSAL* > >>>> * > >>>> *Montréal - Canada* > >>>> > >>> > >>> > >> > >> > >> -- > >> *Béchir Bani * > >> > >> *Ecole Polytechnique de Montréal * > >> > >> *Laboratoire DORSAL* > >> * > >> *Montréal - Canada* > >> > > > > > > > -- > *Béchir Bani * > > *Ecole Polytechnique de Montréal * > > *Laboratoire DORSAL* > * > *Montréal - Canada* -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] virtio: allow mapping up to max queue size
* Michael S. Tsirkin (m...@redhat.com) wrote: > It's a loop from i < num_sg and the array is VIRTQUEUE_MAX_SIZE - so > it's OK if the value read is VIRTQUEUE_MAX_SIZE. > > Not a big problem in practice as people don't use > such big queues, but it's inelegant. > > Reported-by: "Dr. David Alan Gilbert" > Cc: qemu-sta...@nongnu.org > Signed-off-by: Michael S. Tsirkin Reviewed-by: Dr. David Alan Gilbert > --- > hw/virtio/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > index 7f4e7ec..3557c17 100644 > --- a/hw/virtio/virtio.c > +++ b/hw/virtio/virtio.c > @@ -430,7 +430,7 @@ void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, > unsigned int i; > hwaddr len; > > -if (num_sg >= VIRTQUEUE_MAX_SIZE) { > +if (num_sg > VIRTQUEUE_MAX_SIZE) { > error_report("virtio: map attempt out of bounds: %zd > %d", > num_sg, VIRTQUEUE_MAX_SIZE); > exit(1); > -- > MST > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] Fix 'name' option to work with -readconfig
* Markus Armbruster (arm...@redhat.com) wrote: > "Dr. David Alan Gilbert (git)" writes: > > > From: "Dr. David Alan Gilbert" > > > > The 'name' option silently failed when used in config files > > ( http://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00378.html ) > > > > Signed-off-by: Dr. David Alan Gilbert > > Reported-by: William Dauchy > > --- > > vl.c | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/vl.c b/vl.c > > index 8411a4a..47c199a 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -965,7 +965,7 @@ static int parse_sandbox(QemuOpts *opts, void *opaque) > > return 0; > > } > > > > -static void parse_name(QemuOpts *opts) > > +static int parse_name(QemuOpts *opts, void *opaque) > > { > > const char *proc_name; > > > > @@ -978,6 +978,8 @@ static void parse_name(QemuOpts *opts) > > if (proc_name) { > > os_set_proc_name(proc_name); > > } > > + > > +return 0; > > } > > > > bool usb_enabled(bool default_usb) > > @@ -3780,7 +3782,6 @@ int main(int argc, char **argv, char **envp) > > if (!opts) { > > exit(1); > > } > > -parse_name(opts); > > break; > > case QEMU_OPTION_prom_env: > > if (nb_prom_envs >= MAX_PROM_ENVS) { > > @@ -3955,6 +3956,10 @@ int main(int argc, char **argv, char **envp) > > exit(1); > > } > > > > +if (qemu_opts_foreach(qemu_find_opts("name"), parse_name, NULL, 1)) { > > +exit(1); > > +} > > + > > This will never exit, but that's okay. Ah, because my parse_name currently never fails? > > #ifndef _WIN32 > > if (qemu_opts_foreach(qemu_find_opts("add-fd"), parse_add_fd, NULL, > > 1)) { > > exit(1); > > -readconfig stores the configuration read in QemuOpts. Command line > option parsing should do the same, and no more. In particular it should > not act upon the option. That needs to be done separately, where both > command line and -readconfig settings are visible in QemuOpts. > > Your patch does exactly that. I think amending the commit message with > the previous paragraph would improve it. > > Have you checked command line option parsing (the big switch) for > similar problems? I hadn't, other than fixing up -name; tbh It's taken me a while to understand how QemuOpts is supposed to work (and hence why I didn't get this in the 1st patch); I'd seen the qemu_opts_foreach uses at the bottom of the switch, but since they looked like a loop, I'd assumed they were only for options with multiple sets of values and not looked any further. The big switch seems to be a mix of a lot of different ways of doing things; A quick scan shows rtc, option_rom, usb_device, and others all use qemu_opts_parse in the big switch but also taking an action in the switch. > Reviewed-by: Markus Armbruster Thanks. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH RFC 8/8] virtio: add endian-ambivalent support to VirtIODevice
* Andreas F?rber (afaer...@suse.de) wrote: > Am 19.05.2014 15:06, schrieb Greg Kurz: > > On Mon, 19 May 2014 10:39:09 +0200 > > Greg Kurz wrote: > >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > >> index 7fbad29..6578854 100644 > >> --- a/hw/virtio/virtio.c > >> +++ b/hw/virtio/virtio.c > [...] > >> @@ -839,10 +849,39 @@ typedef struct VirtIOSubsection { > >> int version_id; > >> void (*save)(VirtIODevice *vdev, QEMUFile *f); > >> int (*load)(VirtIODevice *vdev, QEMUFile *f); > >> -int (*needed)(VirtIODevice *vdev); > >> +bool (*needed)(VirtIODevice *vdev); > >> } VirtIOSubsection; > >> > >> +static void virtio_save_device_endian(VirtIODevice *vdev, QEMUFile *f) > >> +{ > >> +qemu_put_byte(f, vdev->device_endian); > >> +} > >> + > >> +static int virtio_load_device_endian(VirtIODevice *vdev, QEMUFile *f) > >> +{ > >> +vdev->device_endian = qemu_get_byte(f); > >> +return 0; > >> +} > >> + > >> +static bool virtio_device_endian_needed(VirtIODevice *vdev) > >> +{ > >> +/* No migration is supposed to occur while we are loading state. > >> + */ > >> +assert(vdev->device_endian != VIRTIO_DEVICE_ENDIAN_UNKNOWN); > >> +if (target_words_bigendian()) { > >> +return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_LITTLE; > >> +} else { > >> +return vdev->device_endian == VIRTIO_DEVICE_ENDIAN_BIG; > >> +} > >> +} > >> + > >> static const VirtIOSubsection virtio_subsection[] = { > >> +{ .name = "virtio/device_endian", > > > > Can anyone comment the subsection name ? Is there a chance the > > VMState port would come up with the same ? > > > >> + .version_id = 1, > >> + .save = virtio_save_device_endian, > >> + .load = virtio_load_device_endian, > >> + .needed = virtio_device_endian_needed, > >> +}, > >> { .name = NULL } > >> }; > >> > > Different question: With converting VirtIO to VMState in mind, why are > you not using a regular VMStateSubsection and loading/saving that as > part of the old-style load/save functions? Is an API for that missing? There are a handful of places that call into vmstate from a non-vmstate routine but I don't think they're using plain subsections. hw/pci/pci.c: pci_device_save/load hw/scsi/spapr_vscsi.c: vscsi_save_request hw/acpi/piix4.c: acpi_load_old Dave > > Regards, > Andreas > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] -machine vmport=off: Allow disabling of VMWare ioport emulation
* Gerd Hoffmann (kra...@redhat.com) wrote: > > /* init basic PC hardware */ > > -pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, xen_enabled(), > > -0x4); > > +pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, > > + !qemu_opt_get_bool(qemu_get_machine_opts(), > > "vmport", > > +true) || xen_enabled(), 0x4); > > pc_basic_device_init > (isa_bus, gsi, &rtc_state, &floppy, >!qemu_opt_get_bool(qemu_get_machine_opts(),"vmport",!xen_enabled()), >0x4); > > ? > > This makes vmport switchable on xen too, with traditional behavior being > the default (off on xen, on otherwise). Yes I guess that would work (although documenting it would be a little hairy); however, does anyone understand the reasons it's disabled in the Xen world? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 01/18] migration: dump vmstate info as a json file for static analysis
* Amit Shah (amit.s...@redhat.com) wrote: > This commit adds a new command, '-dump-vmstate', that takes a filename > as a parameter. When executed, QEMU will dump the vmstate information > for the machine type it's invoked with to the file, and quit. > > The JSON-format output can then be used to compare the vmstate info for > different QEMU versions, specifically to test whether live migration > would break due to changes in the vmstate data. > > This is based on a version from Andreas Färber posted here: > https://lists.gnu.org/archive/html/qemu-devel/2013-10/msg03095.html > > A Python script that compares the output of such JSON dumps is included > in the following commit. > > Signed-off-by: Amit Shah > --- > include/migration/vmstate.h | 2 + > qemu-options.hx | 9 +++ > savevm.c| 134 > > vl.c| 14 + > 4 files changed, 159 insertions(+) > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h > index 7e45048..9829c0e 100644 > --- a/include/migration/vmstate.h > +++ b/include/migration/vmstate.h > @@ -778,4 +778,6 @@ void vmstate_register_ram(struct MemoryRegion *memory, > DeviceState *dev); > void vmstate_unregister_ram(struct MemoryRegion *memory, DeviceState *dev); > void vmstate_register_ram_global(struct MemoryRegion *memory); > > +void dump_vmstate_json_to_file(FILE *out_fp); > + > #endif > diff --git a/qemu-options.hx b/qemu-options.hx > index 781af14..d376227 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -3146,6 +3146,15 @@ STEXI > prepend a timestamp to each log message.(default:on) > ETEXI > > +DEF("dump-vmstate", HAS_ARG, QEMU_OPTION_dump_vmstate, > +"-dump-vmstate \n" "", QEMU_ARCH_ALL) > +STEXI > +@item -dump-vmstate @var{file} > +@findex -dump-vmstate > +Dump json-encoded vmstate information for current machine type to file > +in @var{file} > +ETEXI > + > HXCOMM This is the last statement. Insert new options before this line! > STEXI > @end table > diff --git a/savevm.c b/savevm.c > index da8aa24..a4ce279 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -24,6 +24,7 @@ > > #include "config-host.h" > #include "qemu-common.h" > +#include "hw/boards.h" > #include "hw/hw.h" > #include "hw/qdev.h" > #include "net/net.h" > @@ -241,6 +242,139 @@ static QTAILQ_HEAD(savevm_handlers, SaveStateEntry) > savevm_handlers = > QTAILQ_HEAD_INITIALIZER(savevm_handlers); > static int global_section_id; > > +static void dump_vmstate_vmsd(FILE *out_file, > + const VMStateDescription *vmsd, int indent, > + bool is_subsection); > + > +static void dump_vmstate_vmsf(FILE *out_file, const VMStateField *field, > + int indent) checkpatch points out that some tabs managed to get into that indent line. Generally I think this patch is OK and quite useful; two thoughts: 1) I was surprised it dumped every object type, rather than just those that are instantiated; I think the latter would be more use in some circumstances, since there's a load of weird and wonderful objects that exist and are very rarely used. 2) 'fields_exists' is a weird naming to put in the json file - it's a function pointer for determining if the field is going to be present; maybe renaming as 'conditional' would make sense. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 01/18] migration: dump vmstate info as a json file for static analysis
* Amit Shah (amit.s...@redhat.com) wrote: > On (Wed) 21 May 2014 [10:44:07], Dr. David Alan Gilbert wrote: > > * Amit Shah (amit.s...@redhat.com) wrote: > > > This commit adds a new command, '-dump-vmstate', that takes a filename > > > as a parameter. When executed, QEMU will dump the vmstate information > > > for the machine type it's invoked with to the file, and quit. > > > > > > The JSON-format output can then be used to compare the vmstate info for > > > different QEMU versions, specifically to test whether live migration > > > would break due to changes in the vmstate data. > > > > > > This is based on a version from Andreas Färber posted here: > > > https://lists.gnu.org/archive/html/qemu-devel/2013-10/msg03095.html > > > > > > A Python script that compares the output of such JSON dumps is included > > > in the following commit. > > > > > > Signed-off-by: Amit Shah > > > --- > > > include/migration/vmstate.h | 2 + > > > qemu-options.hx | 9 +++ > > > savevm.c| 134 > > > > > > vl.c| 14 + > > > 4 files changed, 159 insertions(+) > > > > > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h > > > index 7e45048..9829c0e 100644 > > > --- a/include/migration/vmstate.h > > > +++ b/include/migration/vmstate.h > > > @@ -778,4 +778,6 @@ void vmstate_register_ram(struct MemoryRegion > > > *memory, DeviceState *dev); > > > void vmstate_unregister_ram(struct MemoryRegion *memory, DeviceState > > > *dev); > > > void vmstate_register_ram_global(struct MemoryRegion *memory); > > > > > > +void dump_vmstate_json_to_file(FILE *out_fp); > > > + > > > #endif > > > diff --git a/qemu-options.hx b/qemu-options.hx > > > index 781af14..d376227 100644 > > > --- a/qemu-options.hx > > > +++ b/qemu-options.hx > > > @@ -3146,6 +3146,15 @@ STEXI > > > prepend a timestamp to each log message.(default:on) > > > ETEXI > > > > > > +DEF("dump-vmstate", HAS_ARG, QEMU_OPTION_dump_vmstate, > > > +"-dump-vmstate \n" "", QEMU_ARCH_ALL) > > > +STEXI > > > +@item -dump-vmstate @var{file} > > > +@findex -dump-vmstate > > > +Dump json-encoded vmstate information for current machine type to file > > > +in @var{file} > > > +ETEXI > > > + > > > HXCOMM This is the last statement. Insert new options before this line! > > > STEXI > > > @end table > > > diff --git a/savevm.c b/savevm.c > > > index da8aa24..a4ce279 100644 > > > --- a/savevm.c > > > +++ b/savevm.c > > > @@ -24,6 +24,7 @@ > > > > > > #include "config-host.h" > > > #include "qemu-common.h" > > > +#include "hw/boards.h" > > > #include "hw/hw.h" > > > #include "hw/qdev.h" > > > #include "net/net.h" > > > @@ -241,6 +242,139 @@ static QTAILQ_HEAD(savevm_handlers, SaveStateEntry) > > > savevm_handlers = > > > QTAILQ_HEAD_INITIALIZER(savevm_handlers); > > > static int global_section_id; > > > > > > +static void dump_vmstate_vmsd(FILE *out_file, > > > + const VMStateDescription *vmsd, int indent, > > > + bool is_subsection); > > > + > > > +static void dump_vmstate_vmsf(FILE *out_file, const VMStateField *field, > > > + int indent) > > > > checkpatch points out that some tabs managed to get into that indent line. > > Will fix. > > > Generally I think this patch is OK and quite useful; two thoughts: > >1) I was surprised it dumped every object type, rather than just those > > that are instantiated; I think the latter would be more use in some > > circumstances, since there's a load of weird and wonderful objects > > that exist and are very rarely used. > > The idea is to be able to take a qemu binary and compare with another > binary; if only fields that are instantiated are used, various > invocations will have to be tried to find devices that may have > broken. > > An alternative way of checking only devices which have been added to > the running machine can be done via a monitor command (or a parameter > to the existing cmdl
Re: [Qemu-devel] [PATCH 3/7] monitor: Add migrate_set_capability completion.
* Hani Benhabiles (kroo...@gmail.com) wrote: > Signed-off-by: Hani Benhabiles > --- > hmp-commands.hx | 1 + > hmp.h | 2 ++ > monitor.c | 21 + > 3 files changed, 24 insertions(+) > > diff --git a/hmp-commands.hx b/hmp-commands.hx > index 45e1763..919af6e 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -975,6 +975,7 @@ ETEXI > .params = "capability state", > .help = "Enable/Disable the usage of a capability for > migration", > .mhandler.cmd = hmp_migrate_set_capability, > +.command_completion = migrate_set_capability_completion, > }, > > STEXI > diff --git a/hmp.h b/hmp.h > index a70804d..0c814d0 100644 > --- a/hmp.h > +++ b/hmp.h > @@ -107,5 +107,7 @@ void ringbuf_write_completion(ReadLineState *rs, int > nb_args, const char *str); > void ringbuf_read_completion(ReadLineState *rs, int nb_args, const char > *str); > void watchdog_action_completion(ReadLineState *rs, int nb_args, > const char *str); > +void migrate_set_capability_completion(ReadLineState *rs, int nb_args, > + const char *str); Thank you for doing this; I spend way too much time typing these commands. > #endif > diff --git a/monitor.c b/monitor.c > index fb300c2..6a3a5c9 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -4572,6 +4572,27 @@ void watchdog_action_completion(ReadLineState *rs, int > nb_args, const char *str) > add_completion_option(rs, str, "none"); > } > > +void migrate_set_capability_completion(ReadLineState *rs, int nb_args, > + const char *str) > +{ > +size_t len; > + > +len = strlen(str); > +readline_set_completion_index(rs, len); > +if (nb_args == 2) { > +int i; > +for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) { > +const char *name = MigrationCapability_lookup[i]; > +if (!strncmp(str, name, len)) { > +readline_add_completion(rs, name); > +} > +} > +} else if (nb_args == 3) { > +add_completion_option(rs, str, "on"); > +add_completion_option(rs, str, "off"); > +} It's a shame you have to do all of these manually; if we could tell something that we had an enum of 'MigrationCapability' then it could remove the command specific glue. Dave > +} > + > static void monitor_find_completion_by_table(Monitor *mon, > const mon_cmd_t *cmd_table, > char **args, > -- > 1.8.3.2 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/3] Start moving migration code into a migration directory
* Juan Quintela (quint...@redhat.com) wrote: > "Dr. David Alan Gilbert (git)" wrote: > G> From: "Dr. David Alan Gilbert" > > > > The migration code now occupies a fair chunk of the top level .c > > files, it seems time to give it it's own directory. > > > > I've not touched: > >arch_init.c - that's mostly RAM migration but has a few random other > > bits > > Will split the memory bits, and can go to migration. > > >savevm.c- because it's built target specific > > Damn, would have to look at it. Yes; I suspect it's the vmstate_register_ram that uses TARGET_PAGE_MASK, one of the patches in my postcopy world adds a function in exec.c that returns TARGET_PAGE_BITS so that stuff that needs to know target page size can make a call to find it out; that might solve that. Dave > > >block-migration.c - should that go in block/ or migration/ ? > > It is really on migration. we have basically: > > - ram-migration > - block-migration > > We can call other names if people preffer. > > > > > This is purely a code move; no code has changed. > > > > Signed-off-by: Dr. David Alan Gilbert > > Thanks, Juan. > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote: > >+case MIG_RPCOMM_ACK: > >+tmp32 = be32_to_cpup((uint32_t *)buf); > >+DPRINTF("RP: Received ACK 0x%x", tmp32); > >+atomic_xchg(&ms->rp_state.latest_ack, tmp32); > > I didn't see *ms->rp_state.latest_ack* been used elsewhere, what's it used > for?;) Nothing currently; I've used the REQ/ACK as debug at the moment; I was thinking that someone might want to wait on an ack being received before carrying on; but hadn't actually needed it in postcopy. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 2/7] runstate: Add runstate store
* Juan Quintela (quint...@redhat.com) wrote: > This allows us to store the current state to send it through migration. Why store the runstate as a string? The later code then ends up doing string compares and things - why not just use the enum value? Dave > Signed-off-by: Juan Quintela > --- > include/sysemu/sysemu.h | 1 + > vl.c| 10 ++ > 2 files changed, 11 insertions(+) > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > index d8539fd..ae217da 100644 > --- a/include/sysemu/sysemu.h > +++ b/include/sysemu/sysemu.h > @@ -28,6 +28,7 @@ bool runstate_check(RunState state); > void runstate_set(RunState new_state); > int runstate_is_running(void); > bool runstate_needs_reset(void); > +int runstate_store(char *str, int size); > typedef struct vm_change_state_entry VMChangeStateEntry; > typedef void VMChangeStateHandler(void *opaque, int running, RunState state); > > diff --git a/vl.c b/vl.c > index 964d634..ce8e28b 100644 > --- a/vl.c > +++ b/vl.c > @@ -677,6 +677,16 @@ bool runstate_check(RunState state) > return current_run_state == state; > } > > +int runstate_store(char *str, int size) > +{ > +const char *state = RunState_lookup[current_run_state]; > + > +if (strlen(state)+1 > size) > +return -1; > +strncpy(str, state, strlen(state)+1); > +return 0; > +} > + > static void runstate_init(void) > { > const RunStateTransition *p; > -- > 2.1.0 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 5/7] migration: create now section to store global state
; +RunState r = runstate_index(runstate); > + > +if (r == -1) { > +printf("Unknown received state %s\n", runstate); > +return -1; > +} > +ret = vm_stop_force_state(r); > +} > + > + return ret; > +} > + > +static void global_state_pre_save(void *opaque) > +{ > +GlobalState *s = opaque; > + > +s->size = strlen((char*)s->runstate) + 1; > +printf("saved state: %s\n", s->runstate); > +} > + > +static const VMStateDescription vmstate_globalstate = { > +.name = "globalstate", > +.version_id = 1, > +.minimum_version_id = 1, > +.post_load = global_state_post_load, > +.pre_save = global_state_pre_save, > +.fields = (VMStateField[]) { > +VMSTATE_INT32(size, GlobalState), > +VMSTATE_BUFFER(runstate, GlobalState), > +VMSTATE_END_OF_LIST() > +}, > +}; > + > +void register_global_state(void) > +{ > +/* We would use it independently that we receive it */ > +strcpy((char*)&global_state.runstate, ""); > +vmstate_register(NULL, 0, &vmstate_globalstate, &global_state); > +} > diff --git a/vl.c b/vl.c > index 1788b6a..75e855e 100644 > --- a/vl.c > +++ b/vl.c > @@ -4511,6 +4511,7 @@ int main(int argc, char **argv, char **envp) > return 0; > } > > +register_global_state(); > if (incoming) { > Error *local_err = NULL; > qemu_start_incoming_migration(incoming, &local_err); > -- > 2.1.0 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC 0/7] Optional toplevel sections
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 15/10/2014 17:59, Juan Quintela ha scritto: > > My idea here is that, if you don't use libvirt, you just start without > > -S. > > If you don't use libvirt or any other QEMU management layer, you're not > going to do migration except for debugging purposes. There's just too > much state going on to be able to do it reliably. I'm not sure that's entirely true - while I agree that most users will use libvirt, migration with shared disk is pretty easy; the only thing that you need to do is bring up the tap on the destination, and I'm not sure libvirt gets the timing ideal for it. Dave > > > If you use libvirt, and you *don't* need to do anything special to run > > after migration, you shouldn't use -S. > > Is this a real requirement, or just "it sounds nicer that way"? How > much time really passes between the end of migration and the issuing of > the "-cont" command? > > And the $1,000,000 questionL.aAre you _absolutely_ sure that an > automatic restart is entirely robust against a failure of the connection > between the two libvirtd instances? Could you end up with the VM > running on two hosts? Using -S gets QEMU completely out of the > equation, which is nice. > > By the way, some of the states (I can think of io-error, guest-panicked, > watchdog) can be detected on the destination and restored. Migrating a > machine with io-error state is definitely something that you want to do > no matter what versions of QEMU you have. It may be the only way to > recover for a network partition like this: > >DISK > /\ > | \ > X | > | | > SRC --- DEST > > (not impossible: e.g. the SRC->DISK is fibre channel, but the SRC->DEST > link is Ethernet. Or you have a replicated disk setup, some daemon > fails in SRC's replica but not DEST's). > > > And I would emit an event saying > > "migration was finished". > > The event should be emitted nevertheless. :) > > Paolo > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 2/7] runstate: Add runstate store
* Eric Blake (ebl...@redhat.com) wrote: > On 10/20/2014 04:52 AM, Juan Quintela wrote: > > "Dr. David Alan Gilbert" wrote: > >> * Juan Quintela (quint...@redhat.com) wrote: > >>> This allows us to store the current state to send it through migration. > >> > >> Why store the runstate as a string? The later code then ends up doing > >> string compares and things - why not just use the enum value? > > > > How do you know that it has the same values both sides? As far as I can > > see, all interaction with the outside is done with strings (i.e. QMP). > > If it's part of the migration stream, then it is not something visible > in QMP, and it is your own fault if you ever change the enum values in > such a way that the migration stream is incompatible between versions. > I think using an enum in the migration stream is just fine, and more > efficient. I think the question here really comes from RunState being an enum defined in qapi-schema.json; so we could use that directly in the migration stream if we were guaranteed that the encoding of that enum wasn't going to change. Does qapi make any guarantees about the enum encoding? Dave > > > > > But it is easier for me if I can sent the numeric value. > > > > Libvirt folks? > > As far as I can tell, libvirt is unimpacted by HOW it is represented in > the migration stream, only that the destination is able to inform > libvirt what state was received as part of migration, with libvirt > having an easy way to then get back into that state (of course, libvirt > should also still have the option to choose a different state than what > just got migrated, as in the case where the user pauses the source in > order to avoid convergence problems but wants the destination to start > running again). > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] xhci: add property to turn on/off streams support
* Gerd Hoffmann (kra...@redhat.com) wrote: > Signed-off-by: Gerd Hoffmann > --- > hw/usb/hcd-xhci.c | 15 --- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c > index a27c9d3..2930b72 100644 > --- a/hw/usb/hcd-xhci.c > +++ b/hw/usb/hcd-xhci.c > @@ -459,6 +459,7 @@ struct XHCIState { > uint32_t numintrs; > uint32_t numslots; > uint32_t flags; > +uint32_t max_pstreams_mask; > > /* Operational Registers */ > uint32_t usbcmd; > @@ -500,6 +501,7 @@ enum xhci_flags { > XHCI_FLAG_USE_MSI_X, > XHCI_FLAG_SS_FIRST, > XHCI_FLAG_FORCE_PCIE_ENDCAP, > +XHCI_FLAG_ENABLE_STREAMS, > }; > > static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid, > @@ -1384,7 +1386,7 @@ static void xhci_init_epctx(XHCIEPContext *epctx, > epctx->pctx = pctx; > epctx->max_psize = ctx[1]>>16; > epctx->max_psize *= 1+((ctx[1]>>8)&0xff); > -epctx->max_pstreams = (ctx[0] >> 10) & 0xf; > +epctx->max_pstreams = (ctx[0] >> 10) & epctx->xhci->max_pstreams_mask; > epctx->lsa = (ctx[0] >> 15) & 1; > if (epctx->max_pstreams) { > xhci_alloc_streams(epctx, dequeue); > @@ -2956,9 +2958,9 @@ static uint64_t xhci_cap_read(void *ptr, hwaddr reg, > unsigned size) > break; > case 0x10: /* HCCPARAMS */ > if (sizeof(dma_addr_t) == 4) { > -ret = 0x00087000; > +ret = 0x0008 | (xhci->max_pstreams_mask << 12); > } else { > -ret = 0x00087001; > +ret = 0x00080001 | (xhci->max_pstreams_mask << 12); > } > break; > case 0x14: /* DBOFF */ > @@ -3590,6 +3592,11 @@ static int usb_xhci_initfn(struct PCIDevice *dev) > if (xhci->numslots < 1) { > xhci->numslots = 1; > } > +if (xhci_get_flag(xhci, XHCI_FLAG_ENABLE_STREAMS)) { > +xhci->max_pstreams_mask = 7; /* == 256 primary streams */ > +} else { > +xhci->max_pstreams_mask = 0; > +} > > xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, > xhci); > > @@ -3853,6 +3860,8 @@ static Property xhci_properties[] = { > XHCIState, flags, XHCI_FLAG_SS_FIRST, true), > DEFINE_PROP_BIT("force-pcie-endcap", XHCIState, flags, > XHCI_FLAG_FORCE_PCIE_ENDCAP, false), > +DEFINE_PROP_BIT("streams", XHCIState, flags, > +XHCI_FLAG_ENABLE_STREAMS, true), That's enabling by default; so do you plan to add a patch to disable that with older machine types? Dave > DEFINE_PROP_UINT32("intrs", XHCIState, numintrs, MAXINTRS), > DEFINE_PROP_UINT32("slots", XHCIState, numslots, MAXSLOTS), > DEFINE_PROP_UINT32("p2",XHCIState, numports_2, 4), > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 46/47] postcopy: Wire up loadvm_postcopy_ram_handle_{run, end} commands
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +bool one_message = false; > > +/* This looks good, but it's possible that the device loading in > > the > > + * main thread hasn't finished yet, and so we might not be in 'RUN' > > + * state yet. > > + * TODO: Using an atomic_xchg or something for this > > This looks like a good match for QemuEvent. Or just mutex & condvar. Done, QemuEvent seems to work nicely. > > > + */ > > +while (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_LISTENING) > > { > > What if we had postcopy of something else than RAM? Can you remove the > "ram" part from the symbols that do not directly deal with RAM but just > with the protocol? Done; that's 'postcopy_state' and 'POSTCOPY_INCOMING_LISTENING' a lot of the internal command enums have also lost the 'RAM'; but not all of them (hopefully just the ones where it makes sense). Similarly the loadvm_postcopy_ram_handle's are now loadvm_postcopy_handle_... I've kept the hmp/qmp command with the 'ram'. Dave > Paolo > > > + if (!one_message) { > > +DPRINTF("%s: Waiting for RUN", __func__); > > +one_message = true; > > +} > > +} > > +} > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 47/47] End of migration for postcopy
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +mis->postcopy_ram_state); > > +if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) { > > +/* > > + * Where a migration had postcopy enabled (and thus went to advise) > > + * but managed to complete within the precopy period > > + */ > > +postcopy_ram_incoming_cleanup(mis); > > +} else { > > +if ((ret >= 0) && > > +(mis->postcopy_ram_state > POSTCOPY_RAM_INCOMING_ADVISE)) { > > +/* > > + * Postcopy was started, cleanup should happen at the end of > > the > > + * postcopy thread. > > + */ > > +DPRINTF("process_incoming_migration_co: exiting main branch"); > > +return; > > +} > > Extra parentheses and extra nesting. Done. > > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 14/47] Return path: Control commands
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > QEMU_VM_CMD_INVALID = 0, /* Must be 0 */ > > +QEMU_VM_CMD_OPENRP,/* Tell the dest to open the Return path */ > > OPEN_RETURN_PATH? > > > +QEMU_VM_CMD_REQACK,/* Request an ACK on the RP */ > > SEND_ACK or ACK_REQUESTED? > > > QEMU_VM_CMD_AFTERLASTVALID > > Pleaseseparatewords. Is this enum actually used at all? > > Please avoid the difference between QEMU_VM_CMD and MIG_RPCOMM_. > > Perhaps MIG_CMD and MIG_RPCMD_? Almost, I went with: MIG_CMD_INVALID = 0, /* Must be 0 */ MIG_CMD_OPEN_RETURN_PATH, /* Tell the dest to open the Return path */ MIG_CMD_SEND_ACK, /* Request an ACK on the RP */ MIG_CMD_PACKAGED, /* Send a wrapped stream within this stream */ MIG_CMD_POSTCOPY_ADVISE = 20, /* Prior to any page transfers, just warn we might want to do PC */ MIG_CMD_POSTCOPY_LISTEN, /* Start listening for incoming pages as it's running. */ MIG_CMD_POSTCOPY_RUN, /* Start execution */ MIG_CMD_POSTCOPY_END, /* Postcopy is finished. */ MIG_CMD_POSTCOPY_RAM_DISCARD, /* A list of pages to discard that were previously sent during precopy but are dirty. */ and MIG_RP_CMD_INVALID = 0, /* Must be 0 */ MIG_RP_CMD_SHUT, /* sibling will not send any more RP messages */ MIG_RP_CMD_ACK, /* data (seq: be32 ) */ MIG_RP_CMD_REQ_PAGES,/* data (start: be64, len: be64) */ the only oddity I get from that is from the 'SEND_ACK' you suggested; since all my functions to send commands are send_ I currently have 'qemu_savevm_send_send_ack' which while consistent looks a bit odd. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 16/47] Return path: Source handling of return path
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Il 03/10/2014 19:47, Dr. David Alan Gilbert (git) ha scritto: > > +/* Source side RP state */ > > +struct MigrationRetPathState { > > +uint32_t latest_ack; > > +QemuThreadrp_thread; > > +bool error; > > Should the QemuFile be in here? Yes, done. > > +}; > > + > > Also please do not abbrev words, and add a typedef that matches the > struct if it is useful. If it is not, just embed the struct without > giving the type a name (struct { } rp_state). Done. > > > +static bool migration_already_active(MigrationState *ms) > > +{ > > +switch (ms->state) { > > +case MIG_STATE_ACTIVE: > > +case MIG_STATE_SETUP: > > +return true; > > + > > +default: > > +return false; > > + > > +} > > +} > > Should CANCELLING also be considered active? It is on the source->dest > path. Hmm, possibly - but my intention here was just to round up all of the places that already checked for ACTIVE+SETUP so that I could add POSTCOPY_ACTIVE; only one of those places also checked for CANCELLING, so I left it out. > > +static void await_outgoing_return_path_close(MigrationState *ms) > > +{ > > +/* > > + * If this is a normal exit then the destination will send a SHUT and > > the > > + * rp_thread will exit, however if there's an error we need to cause > > + * it to exit, which we can do by a shutdown. > > + * (canceling must also shutdown to stop us getting stuck here if > > + * the destination died at just the wrong place) > > + */ > > +if (qemu_file_get_error(ms->file) && ms->return_path) { > > +qemu_file_shutdown(ms->return_path); > > +} > > As mentioned early, I think it's simpler to let these function handle > themselves the case where there is no return path, and call them > unconditionally. I still need to think about that. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Postcopy failures
* Gary Hook (gary.h...@nimboxx.com) wrote: > I see this went by: > > Il 07/10/2014 12:29, Dr. David Alan Gilbert ha scritto: > > You mean something like this (untested) ? > > > > if (mis->postcopy_ram_state != POSTCOPY_RAM_INCOMING_NONE) { > > if (mis->postcopy_ram_state == POSTCOPY_RAM_INCOMING_ADVISE) { > > /* > >* Where a migration had postcopy enabled (and thus went to > > advise) > >* but managed to complete within the precopy period > >*/ > > postcopy_ram_incoming_cleanup(mis); > > } else if (ret >= 0) { > >/* > > * Postcopy was started, cleanup should happen at the end of the > > * postcopy thread. > > */ > >DPRINTF("process_incoming_migration_co: exiting main branch"); > >return; > > } > > } > > And I wonder if this will solve the problem of a peer-to-peer migration, > using non-shared storage, failing because it appears to take a bit too lon? I > see in other threads Dr. Gilbert is making changes related to post copy and I > am very interested in getting resolution to what appears to be a timeout > problem. > > Any comments would be appreciated by this newbie to Qemu. It should be possible to postcopy block storage as well, if that's the question (it might take some work to make sure that they play nicely together; e.g. wanting to making the page transfer higher priority than block transfer). However, I thought there were other ways to do block storage migration (using blockcopy I think? but I'm not a block guy). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH v2 21/23] COLO nic: implement colo nic device interface configure()
$qlen > + > +select_ifb > +index2=$? > +if [[ $index2 -eq 100 ]]; then > +echo "index1 $index1 overflow" >>/root/network-colo.log > +exit 1 > +fi > +ip link set ifb$index2 up > +ip link set ifb$index2 qlen $qlen > +colo_tc qdisc add dev ifb$index1 root handle 1: colo dev ifb$index2 > master > +colo_tc qdisc add dev ifb$index2 root handle 1: colo dev ifb$index1 > slaver > + > +ifconfig $pif promisc > +ip link set $pif qlen $qlen > + > +# forward packets from $pif to ifb$index2 > +tc qdisc add dev $pif ingress > +tc filter add dev $pif parent : protocol ip prio 10 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev ifb$index2 > +tc filter add dev $pif parent : protocol arp prio 11 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev ifb$index2 > + > +start_master ifb$index1 > +} > + > +function uninstall_master() { > +stop_master > + > +# shutdown $ifb1 > +tc qdisc del dev $ifb1 root handle 1: colo > +ip link set $ifb1 down > + > +# don't forward packets from $pif to $ifb2 > +tc filter del dev $pif parent : protocol ip prio 10 u32 > +tc qdisc del dev $pif ingress > + > +# shutdown $ifb2 > +tc qdisc del dev $ifb2 root handle 1: colo > +ip link set $ifb2 down > + > +ifconfig $pif -promisc > +} > + > +function install_slaver() > +{ > +ifconfig $pif promisc > +ip link set $pif qlen $qlen > + > +# forward packets from $pif to $vif > +tc qdisc add dev $pif ingress > +tc filter add dev $pif parent : protocol ip prio 10 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev $vif > +tc filter add dev $pif parent : protocol arp prio 11 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev $vif > + > +ip link set $vif qlen $qlen > +# forward packets from $vif to $pif > +tc qdisc add dev $vif ingress > +tc filter add dev $vif parent : protocol ip prio 10 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev $pif > +tc filter add dev $vif parent : protocol arp prio 11 u32 match u32 0 > 0 flowid 1:2 action mirred egress redirect dev $pif > + > +brctl delif $BR $vif > +} > + > +function uninstall_slaver() > +{ > +# don't forward packets from $pif to $vif > +tc filter del dev $pif parent : protocol ip prio 10 u32 > +tc filter del dev $pif parent : protocol arp prio 11 u32 > +tc qdisc del dev $pif ingress > + > +# don't forward packets from $vif to $pif > +tc filter del dev $vif parent : protocol ip prio 10 u32 > +tc filter del dev $vif parent : protocol arp prio 11 u32 > +tc qdisc del dev $vif ingress > + > +ifconfig $pif -promisc > + > +brctl addif $BR $vif > +} > + > +echo "$@" >/root/network-colo.log > +if [[ $1 != "master" && $1 != "slaver" ]]; then > +echo "$1 != master/slaver" >>/root/network-colo.log > +exit 1 > +fi > + > +if [[ $2 != "install" && $2 != "uninstall" ]]; then > +echo "$2 != install/uninstall" >>/root/network-colo.log > +exit 1 > +fi > + > +${op}_$sides 1>>/root/network-colo.log 2>&1 > +if [[ $1 == "master" && $2 == "install" ]]; then > +echo ifb0=ifb$index1 > +echo ifb1=ifb$index2 > +fi > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 2/2] vga: flip qemu 2.2 pc machine types from cirrus to stdvga
* Gerd Hoffmann (kra...@redhat.com) wrote: > This patch switches the default display from cirrus to vga > for the new (qemu 2.2+) machine types. Old machines types > stay as-is for compatibility reasons. > > Signed-off-by: Gerd Hoffmann > --- > hw/i386/pc_piix.c | 7 +-- > hw/i386/pc_q35.c | 7 +-- > 2 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index 91a20cb..8637246 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -456,7 +456,8 @@ static void pc_xen_hvm_init(MachineState *machine) > > #define PC_I440FX_2_2_MACHINE_OPTIONS \ > PC_I440FX_MACHINE_OPTIONS, \ > -.default_machine_opts = "firmware=bios-256k.bin" > +.default_machine_opts = "firmware=bios-256k.bin", \ > +.default_display = "std" > > static QEMUMachine pc_i440fx_machine_v2_2 = { > PC_I440FX_2_2_MACHINE_OPTIONS, > @@ -466,7 +467,9 @@ static QEMUMachine pc_i440fx_machine_v2_2 = { > .is_default = 1, > }; > > -#define PC_I440FX_2_1_MACHINE_OPTIONS PC_I440FX_2_2_MACHINE_OPTIONS > +#define PC_I440FX_2_1_MACHINE_OPTIONS \ > +PC_I440FX_MACHINE_OPTIONS, \ > +.default_machine_opts = "firmware=bios-256k.bin" I think the normal way to do this is the opposite of this; i.e. make the display defaults be the new behaviour, then add the .default_display = "cirrus" to the v2_1 machine type, and everything below it will automatically inherit it, and everything newer will just use the new default. Dave > > static QEMUMachine pc_i440fx_machine_v2_1 = { > PC_I440FX_2_1_MACHINE_OPTIONS, > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index e225c6d..9fdb3fa 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -353,7 +353,8 @@ static void pc_q35_init_1_4(MachineState *machine) > > #define PC_Q35_2_2_MACHINE_OPTIONS \ > PC_Q35_MACHINE_OPTIONS, \ > -.default_machine_opts = "firmware=bios-256k.bin" > +.default_machine_opts = "firmware=bios-256k.bin", \ > +.default_display = "std" > > static QEMUMachine pc_q35_machine_v2_2 = { > PC_Q35_2_2_MACHINE_OPTIONS, > @@ -362,7 +363,9 @@ static QEMUMachine pc_q35_machine_v2_2 = { > .init = pc_q35_init, > }; > > -#define PC_Q35_2_1_MACHINE_OPTIONS PC_Q35_2_2_MACHINE_OPTIONS > +#define PC_Q35_2_1_MACHINE_OPTIONS \ > +PC_Q35_MACHINE_OPTIONS, \ > +.default_machine_opts = "firmware=bios-256k.bin" > > static QEMUMachine pc_q35_machine_v2_1 = { > PC_Q35_2_1_MACHINE_OPTIONS, > -- > 1.8.3.1 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
* Wen Congyang (we...@cn.fujitsu.com) wrote: > Hi all: > > I will start to implement disk replication. Before doing this, I think we > should decide > how to implement it. > > I have two ideas about it: > 1. implement it in qemu >Advantage: very easy, and don't take too much time >Disadvantage: the virtio disk with vhost is not supported, because the > disk I/O >operations are not handled in qemu. > > 2. update drbd and make it support colo >Advantage: we can use it for both KVM and XEN. >Disadvantage: The implementation may be complex, and need too much time to > implement it.(I don't read the drbd's codes, and can't estimate the > cost) > > I think we can use 1 to implement it first. > If you have some other idea, please let me know. For the COLO disk replication; are you talking here about 'local storage' and treating it as 'internal state' or 'external state' (as described in the first half of 4.4 in the original COLO paper)? I know some groups would like to take advantage of facilities in the storage layer to help; e.g. take snapshots/rollback etc - so I think it's best to do (1) but make the interface clean so that other mechanisms could be added. Similarly I guess things like scsi-xcopy might help for some people - I'm not saying implement these, just if possible make an interface where it could fit later. It's probably best to check with the QEMU storage guys that you can reuse anything they have; there was a discussion a few weeks back where I cc'd Fam, Stefan and Kevin in). Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
* Hongyang Yang (yan...@cn.fujitsu.com) wrote: > Hi Dave, > >For the COLO disk replication; are you talking here about 'local storage' > >and treating it as 'internal state' or 'external state' (as described in the > >first half of 4.4 in the original COLO paper)? > > 'local storage' and 'internal state'. This is what later half of 4.4 said: > 'COLO considers the state of local storage device as an internal s- > tate of the guest, and snapshots the local storage state as part of the > VM checkpoint, and plans to explore the other solution in future.' Thanks for the clarification. Dave > > > > >I know some groups would like to take advantage of facilities in the storage > >layer to help; e.g. take snapshots/rollback etc - so I think it's best > >to do (1) but make the interface clean so that other mechanisms could be > >added. > >Similarly I guess things like scsi-xcopy might help for some people - I'm > >not saying implement these, just if possible make an interface where it could > >fit later. > > > >It's probably best to check with the QEMU storage guys that you can reuse > >anything they have; there was a discussion a few weeks back where I cc'd > >Fam, Stefan and Kevin in). > > > >Dave > > > >-- > >Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >. > > > > -- > Thanks, > Yang. -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH v2 00/23] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service
* Wen Congyang (we...@cn.fujitsu.com) wrote: > On 10/29/2014 05:34 PM, Dr. David Alan Gilbert wrote: > > * Wen Congyang (we...@cn.fujitsu.com) wrote: > > > > > > > >> Hi all: > >> > >> I will start to implement disk replication. Before doing this, I think we > >> should decide > >> how to implement it. > >> > >> I have two ideas about it: > >> 1. implement it in qemu > >>Advantage: very easy, and don't take too much time > >>Disadvantage: the virtio disk with vhost is not supported, because the > >> disk I/O > >>operations are not handled in qemu. > >> > >> 2. update drbd and make it support colo > >>Advantage: we can use it for both KVM and XEN. > >>Disadvantage: The implementation may be complex, and need too much time > >> to > >> implement it.(I don't read the drbd's codes, and can't estimate > >> the cost) > >> > >> I think we can use 1 to implement it first. > >> If you have some other idea, please let me know. > > > > For the COLO disk replication; are you talking here about 'local storage' > > and treating it as 'internal state' or 'external state' (as described in the > > first half of 4.4 in the original COLO paper)? > > I don't know what is 'internal state' or 'external state'. It's OK, Yang clarified that; it's 'local storage/internal state'. > What I want to implement is: > 1. forward primary vm's write operation(start sector, size, content) to > secondary vm > 2. Cache the primary vm's write operation in secondary vm's qemu > 3. cache the secondary vm's write operation in secondary vm's qemu > 4. flush primary vm's write operation, and drop secondary vm's write > operation after a >checkpoint > 5. drop primary vm's write operation and flush secondary vm's write operation > when doing >failover. OK; and each QEMU has it's own disk image file that needs to be copied at the start? > > I know some groups would like to take advantage of facilities in the storage > > layer to help; e.g. take snapshots/rollback etc - so I think it's best > > I don't use snapshots recently years. Is it still too slow? How much time to > take snapshot/rollback? I don't know; I'd hope it could be fast if it's a COW underneath. Dave > Thanks > Wen Congyang > > > to do (1) but make the interface clean so that other mechanisms could be > > added. > > Similarly I guess things like scsi-xcopy might help for some people - I'm > > not saying implement these, just if possible make an interface where it > > could > > fit later. > > > > It's probably best to check with the QEMU storage guys that you can reuse > > anything they have; there was a discussion a few weeks back where I cc'd > > Fam, Stefan and Kevin in). > > > > Dave > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > . > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Bug in recent postcopy patch
* Gary Hook (gary.h...@nimboxx.com) wrote: > *Knock* *knock* *knock* Is this thing on? Yes - but only by luck did I notice this; it's normally better to reply to the thread that posted a patch and cc the authors! > I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am > poking around. An attempt to migrate results in a NULL pointer dereference > in savevm.c. Here is info from gdb: I've not tried migrating with block migration; so can you show the command line you used on qemu and the sequence of commands you used to trigger the migration? > Most of qemu_savevm_state_pending() succeeds, until it gets to the end. > Here¹s the relevant thread while calling is_active(): > > (gdb) backtrace > #0 block_is_active (opaque=0x7fb0ae721200 ) at > block-migration.c:860 > #1 0x7fb0adf4a13a in qemu_savevm_state_pending (f=0x7fb0b01e3a40, > max_size=max_size@entry=0, > res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, > res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) > at /home/hook/src/qemu/postcopy2/savevm.c:983 > #2 0x7fb0ae01bd82 in migration_thread (opaque=0x7fb0ae684420 > ) at migration.c:1185 > #3 0x7fb0a824d182 in start_thread (arg=0x7fb09d605700) at > pthread_create.c:312 > #4 0x7fb0a7f79fbd in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Q: why is max_size == 0? Does this seem correct? Yes, I think that's normal for the 1st time through the loop; (see migration_thread near the start max_size is initialised to 0). > We look at se->ops: > > (gdb) print *se->ops > $9 = {set_params = 0x7fb0ae028820 , save_state = 0x0, > cancel = 0x7fb0ae028f50 , > save_live_complete = 0x7fb0ae0299a0 , is_active = > 0x7fb0ae028870 , > save_live_iterate = 0x7fb0ae029480 , save_live_setup > = 0x7fb0ae029330 , > save_live_pending = 0x7fb0ae028b30 , can_postcopy = > 0x0, load_state = 0x7fb0ae0288b0 } > > Why is can_postcopy() NULL? > > (gdb) n > qemu_savevm_state_pending (f=0x7fb0b01e3a40, max_size=max_size@entry=0, > res_non_postcopiable=res_non_postcopiable@entry=0x7fb09d604c90, > res_postcopiable=res_postcopiable@entry=0x7fb09d604c88) at > /home/hook/src/qemu/postcopy2/savevm.c:989 > 989 if (se->ops->can_postcopy(se->opaque)) { > (gdb) print *se > $14 = {entry = {tqe_next = 0x7fb0aff9ab30, tqe_prev = 0x7fb0aff88f20}, > idstr = "block", '\000' , instance_id = 0, > alias_id = 0, version_id = 1, section_id = 1, ops = 0x7fb0ae6848e0 > , vmsd = 0x0, > opaque = 0x7fb0ae721200 , compat = 0x0, is_ram = 1} > (gdb) step > > Program received signal SIGSEGV, Segmentation fault. > 0x in ?? () > (gdb) > > > The patches appear to have been fully applied, but it would seem that the > savevm_block_handlers structure needs to be updated to populate this > field? Which implies that a new function will have to be written? > > Or, if I have missed the obvious, I would appreciate enlightenment. Simple bug on my part; the line: if (se->ops->can_postcopy(se->opaque)) { needs to become: if (se->ops->can_postcopy && se->ops->can_postcopy(se->opaque)) { Thanks for the report. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Bug in recent postcopy patch
* Gary Hook (gary.h...@nimboxx.com) wrote: > > > On 10/30/14, 3:08 PM, "Dr. David Alan Gilbert" wrote: > > >>I posted another thread asking about migration failure due to a copy > >> taking too long, but got no traction. In the case where the problem > >>raises > >> its head we have turned tunneling on. A tiny VM (<2GB in size) migrates > >> fine using the same procedure. Again, no shared storage. > > > >Is the guest that doesn't migrate idle or is it busily changing lots of > >memory? > > Quite idle. Boot the VM, no need to start a workload, try to migrate. > Failure. > > Also, very large VMs will fail to migrate (non-tunneled). This _seems_ to > also be related to the amount of time required to copy everything from A > to B. > > Again, tunneling seems to more quickly expose this issue as it increases > the amount of time required to copy the qcow2 file over the network. > > I will add here that I¹ve watched the qcow2 file grow, made a copy of it > (on the receiving end) before it gets deleted, and been able to start a VM > using the file. It would seem to be copasetic. > > I need to add tracing code to the emulator, in a way that doesn¹t rely > upon command line options or environment variables. I don¹t see any such > facility at this point. Specifically, I want to begin by watching what is > going through the monitor (I.e. Return values from qemu-system-x86_64 and > why there are complaints.) Unless you have any clear explanation as to why > the emulator is throwing an error, could you suggest any areas I may want > to focus my efforts? No I don't, but there again I've not done any block stuff, and it sounds like your problem is mostly related to moving the image file (which I thought libvirt preferred to do using NBD underneath now, but again, I'm not a block guy). > >> >Thanks for the report. > >> > >> Thank you for your time and ownership. > > > >No problem; note the postcopy code is still quite young, so don't > >be too surprised if you hit other issues. > > Of course; it¹s fresh out of the oven. But the migration of VMs using > non-shared storage is not (tunneled or otherwise), and that¹s really what > I am focused on. > > Again, much appreciation. Dave > Gary > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2
n a thread in order to talk the > > userfaultfd protocol) > > > >- if userfaultfd protocol is engaged, return read|write fault + fault > > address to read(ufd) syscalls > > > >- leave the "userfault" resolution mechanism independent of the > > userfaultfd protocol so we keep the two problems separated and we > > don't mix them in the same API which makes it even harder to > > finalize it. > > > > add mcopy_atomic (with a flag to map the page readonly too) > > > > The alternative would be to hide mcopy_atomic (and even > > remap_anon_pages in order to "remove" the memory atomically for > > the externalization into the cloud) as userfaultfd commands to > > write into the fd. But then there would be no much point to keep > > MADV_USERFAULT around if I do so and I could just remove it > > too or it doesn't look clean having to open the userfaultfd just > > to issue an hidden mcopy_atomic. > > > > So it becomes a decision if the basic SIGBUS mode for single > > threaded apps should be supported or not. As long as we support > > SIGBUS too and we don't force to use userfaultfd as the only > > mechanism to be notified about userfaults, having a separate > > mcopy_atomic syscall sounds cleaner. > > > > Perhaps mcopy_atomic could be used in other cases that may arise > > later that may not be connected with the userfault. > > > >Questions to double check the above plan is ok: > > > >1) should I drop the SIGBUS behavior and MADV_USERFAULT? > > > >2) should I hide mcopy_atomic as a write into the userfaultfd? > > > >NOTE: even if I hide mcopy_atomic as a userfaultfd command to write > >into the fd, the buffer pointer passed to write() syscall would > >still _not_ be pointing to the data like a regular write, but it > >would be a pointer to a command structure that points to the source > >and destination data of the "hidden" mcopy_atomic, the only > >advantage is that perhaps I could wakeup the blocked page faults > >without requiring an additional syscall. > > > >The standalone mcopy_atomic would still require a write into the > >userfaultfd as it happens now after remap_anon_pages returns, in > >order to wakeup the stopped page faults. > > > >3) should I add a registration command to trap only write faults? > > > > Sure, that is what i really need;) > > > Best Regards??? > zhanghailiang > > >The protocol can always be extended later anyway in a backwards > >compatible way but it's better if we get it fully featured from the > >start. > > > >For completeness, some answers for other questions I've seen floating > >around but that weren't posted on the list yet (you can skip reading > >the below part if not interested): > > > >- open("/dev/userfault") instead of sys_userfaultfd(), I don't see the > > benefit: userfaultfd is just like eventfd in terms of kernel API and > > registering a /dev/ device actually sounds trickier. userfault is a > > core VM feature and generally we prefer syscalls for core VM > > features instead of running ioctl on some chardev that may or may > > not exist. (like we did with /dev/ksm -> MADV_MERGEABLE) > > > >- there was a suggestion during KVMForum about allowing an external > > program to attach to any MM. Like ptrace. So you could have a single > > process managing all userfaults for different processes. However > > because I cannot allow multiple userfaultfd to register into the > > same range, this doesn't look very reliable (ptrace is kind of an > > optional/debug feature while if userfault goes wrong and returns > > -EBUSY things go bad) and there may be other complications. If I'd > > allow multiple userfaultfd to register into the same range, I > > wouldn't even know who to deliver the userfault to. It is an erratic > > behavior. Currently it'd return -EBUSY if the app has a bug and does > > that, but maybe later this can be relaxed to allow higher > > scalability with a flag (userfaultfd gets flags as parameters), but > > it still would need to be the same logic that manages userfaults and > > the only point of allowing multiple ufd to map the same range would > > be SMP scalability. So I tend to see the userfaultfd as a MM local > > thing. The thread managing the userfaults can still talk with > > another process in the local machine using pipes or sockets if it > > needs to. > > > >- the userfaultfd protocol version handshake was done this way because > > it looked more reliable. > > > > Of course we could pass the version of the protocol as parameter to > > userfaultfd too, but running the syscall multiple times until > > -EPROTO didn't return anymore doesn't seem any better than writing > > into the fd the wanted protocol until you read it back instead of > > -1ULL. It just looked more reliable not having to run the syscall > > again and again while depending on -EPROTO or some other > > -Esomething. > > > >Thanks, > >Andrea > > > >. > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Bug in recent postcopy patch
* Gary Hook (gary.h...@nimboxx.com) wrote: > On 10/30/14, 5:03 AM, "Dr. David Alan Gilbert" wrote: > > >* Gary Hook (gary.h...@nimboxx.com) wrote: > >> *Knock* *knock* *knock* Is this thing on? > > > >Yes - but only by luck did I notice this; it's normally better > >to reply to the thread that posted a patch and cc the authors! > > Well, that depends upon the developers, I think. I was gently admonished > on another list for addressing a developer (inadvertently) directly. But I > appreciate your openness, and would not want to abuse your attention. > > >> I applied the 47 pieces of the recent postcopy patch to 2.1.2 and am > >> poking around. An attempt to migrate results in a NULL pointer > >>dereference > >> in savevm.c. Here is info from gdb: > > > >I've not tried migrating with block migration; so can you > >show the command line you used on qemu and the sequence of commands > >you used to trigger the migration? > > Yessir. We invoke the emulator from libvirt. While the problem we are > dealing with applies to any VM, the one I am working with is invoked > thusly (edited for readability): > > qemu-system-x86_64 -enable-kvm -name 88dbaf46-4692-4935-bd9d-8d8fac7725a9 \ > -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off \ > -smp 1,sockets=1,cores=1,threads=1 \ > -uuid 88dbaf46-4692-4935-bd9d-8d8fac7725a9 -no-user-config -nodefaults \ > -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/88dbaf46-4692-4935-bd9d-8d > 8fac7725a9.monitor,server,nowait \ > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime \ > -no-shutdown -boot strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ > -drive > file=/mnt/store01/virt/88dbaf46-4692-4935-bd9d-8d8fac7725a9.qcow2,if=none,i > d=drive-virtio-disk0,format=qcow2,cache=writeback \ > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virt > io-disk0,bootindex=1 \ > -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw \ > -device > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=2 \ > -netdev tap,fd=29,id=hostnet0 -device > rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:07:19:5e,bus=pci.0,addr=0x3 \ > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 \ > -vnc 127.0.0.1:0,password -device VGA,id=video0,bus=pci.0,addr=0x2 \ > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ > -msg timestamp=on > > I posted another thread asking about migration failure due to a copy > taking too long, but got no traction. In the case where the problem raises > its head we have turned tunneling on. A tiny VM (<2GB in size) migrates > fine using the same procedure. Again, no shared storage. Is the guest that doesn't migrate idle or is it busily changing lots of memory? > >>Q: why is max_size == 0? Does this seem correct? > > > >Yes, I think that's normal for the 1st time through the loop; (see > >migration_thread > >near the start max_size is initialised to 0). > > Thank you; will do. > > >> > >> > >> The patches appear to have been fully applied, but it would seem that > >>the > >> savevm_block_handlers structure needs to be updated to populate this > >> field? Which implies that a new function will have to be written? > >> > >> Or, if I have missed the obvious, I would appreciate enlightenment. > > > >Simple bug on my part; the line: > > > >if (se->ops->can_postcopy(se->opaque)) { > > > >needs to become: > >if (se->ops->can_postcopy && > >se->ops->can_postcopy(se->opaque)) { > > I wondered if that were not the case. I will make that change and see what > happens. > > >Thanks for the report. > > Thank you for your time and ownership. No problem; note the postcopy code is still quite young, so don't be too surprised if you hit other issues. Dave > > Gary > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/3] Start moving migration code into a migration directory
* Amit Shah (amit.s...@redhat.com) wrote: > On (Thu) 16 Oct 2014 [08:53:52], Dr. David Alan Gilbert (git) wrote: > > > rename migration-exec.c => migration/migration-exec.c (100%) > > rename migration-fd.c => migration/migration-fd.c (100%) > > rename migration-rdma.c => migration/migration-rdma.c (100%) > > rename migration-tcp.c => migration/migration-tcp.c (100%) > > rename migration-unix.c => migration/migration-unix.c (100%) > > rename migration.c => migration/migration.c (100%) > > I'm wondering if we should also use the opportunity to cleanup the > file names: > > migration.c => main.c > migration-X.c => X.c > > ? I'd be OK with changing filenames, but they would have to be fairly clear; I don't like having a 'main.c' because that's where I'd expect to find a main() - but we do have a few core.c's as the core of each of a few hw subdirs. Dave > > Amit -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/3] Start moving migration code into a migration directory
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On 16 October 2014 08:53, Dr. David Alan Gilbert (git) > wrote: > > From: "Dr. David Alan Gilbert" > > > > The migration code now occupies a fair chunk of the top level .c > > files, it seems time to give it it's own directory. > > Missed opportunity to make the patch summary line > "Start migrating migration code into a migration directory" :-) I would certainly allow a committer to fix such a missed opportunity. Dave > > -- PMM -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/3] Start moving migration code into a migration directory
* Gary Hook (gary.h...@nimboxx.com) wrote: > > > On 10/30/14, 7:26 AM, "Amit Shah" wrote: > > >On (Thu) 16 Oct 2014 [08:53:52], Dr. David Alan Gilbert (git) wrote: > >> From: "Dr. David Alan Gilbert" > >> > >> The migration code now occupies a fair chunk of the top level .c > >> files, it seems time to give it it's own directory. > > > >s/it's/its > > 6 out of 87 .c files, and approximately 370 blocks out of 2840 (based on > du output). 13% is a "fair chunk"? I'm not sure how you got 6; migration.c migration-exec.c migration-fd.c migration-rdma.c migration-tcp.c migration-unix.c qemu-file-buf.c qemu-file.c qemu-file-stdio.c qemu-file-unix.c vmstate.c xbzrle.c so that's 12, and there are another 3 in the commit message saying they could do with being moved. That would be 15 files, or 17% - and so yes, I do call that a fair chunk. > But tidy organization is a good thing while needless renaming is not. The > only goal that the suggested renames would appear to accomplish is > additional obfuscation. How about just moving them into a subdirectory and > leave their names alone? Which is what I did; however I have sympathy with those that think that in a directory called 'migration' starting a bunch of the files with 'migration-' is excessive. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select
or: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > migration-rdma.c: In function 'qemu_rdma_write_one': > migration-rdma.c:1864:16: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > migration-rdma.c:1868:53: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c:1922:52: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c:1923:50: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c:1977:49: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c:1998:49: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c:2010:58: error: cast to pointer from integer of > different size [-Werror=int-to-pointer-cast] > migration-rdma.c: In function 'qemu_rdma_registration_handle': > migration-rdma.c:3027:21: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > migration-rdma.c:3092:41: error: cast from pointer to integer of > different size [-Werror=pointer-to-int-cast] > cc1: all warnings being treated as errors > make: *** [migration-rdma.o] Error 1 > > i.e. earlier errors than before. > > To manage notifications about this bug go to: > https://bugs.launchpad.net/qemu/+bug/1363641/+subscriptions > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
(I've cc'd in Fam, Stefan, and Kevin for Block stuff, and Yang and Eddie for Colo) * Walid Nouri (walid.no...@gmail.com) wrote: > Hello Michael, Hello Paolo > i have ???studied??? the available documentation/Information and tried to > get an idea of the QEMU live block operation possibilities. > > I think the MC protocol doesn???t need synchronous block device replication > because primary and secondary VM are not synchronous. The state of the > primary is allays ahead of the state of the secondary. When the primary is > in epoch(n) the secondary is in epoch(n-1). > > What MC needs is a block device agnostic, controlled and asynchronous > approach for replicating the contents of block devices and its state changes > to the secondary VM while the primary VM is running. Asynchronous block > transfer is important to allow maximum performance for the primary VM, while > keeping the secondary VM updated with state changes. > > The block device replication should be possible in two stages or modes. > > The first stage is the live copy of all block devices of the primary to the > secondary. This is necessary if the secondary doesn???t have an existing > image which is in sync with the primary at the time MC has started. This is > not very convenient but as far as I know actually there is no mechanism for > persistent dirty bitmap in QEMU. > > The second stage (mode) is the replication of block device state changes > (modified blocks) to keep the image on the secondary in sync with the > primary. The mirrored blocks must be buffered in ram (block buffer) until > the complete Checkpoint (RAM, vCPU, device state) can be committed. > > For keeping the complete system state consistent on the secondary system > there must be a possibility for MC to commit/discard block device state > changes. In normal operation the mirrored block device state changes (block > buffer) are committed to disk when the complete checkpoint is committed. In > case of a crash of the primary system while transferring a checkpoint the > data in the block buffer corresponding to the failed Checkpoint must be > discarded. I think for COLO there's a requirement that the secondary can do reads/writes in parallel with the primary, and the secondary can discard those reads/writes - and that doesn't happen in MC (Yang or Eddie should be able to confirm that). > The storage architecture should be ???shared nothing??? so that no shared > storage is required and primary/secondary can have separate block device > images. MC/COLO with shared storage still needs some stuff like this; but it's subtely different. They still need to be able to buffer/release modifications to the shared storage; if any of this code can also be used in the shared-storage configurations it would be good. > I think this can be achieved by drive-mirror and a filter block driver. > Another approach could be to exploit the block migration functionality of > live migration with a filter block driver. > > The drive-mirror (and live migration) does not rely on shared storage and > allow live block device copy and incremental syncing. > > A block buffer can be implemented with a QEMU filter block driver. It should > sit at the same position as the Quorum driver in the block driver hierarchy. > When using block filter approach MC will be transparent and block device > agnostic. > > The block buffer filter must have an Interface which allows MC control the > commits or discards of block device state changes. I have no idea where to > put such an interface to stay conform with QEMU coding style. > > > I???m sure there are alternative and better approaches and I???m open for > any ideas > > > Walid > > Am 17.08.2014 11:52, schrieb Paolo Bonzini: > >Il 11/08/2014 22:15, Michael R. Hines ha scritto: > >>Excellent question: QEMU does have a feature called "drive-mirror" > >>in block/mirror.c that was introduced a couple of years ago. I'm not > >>sure what the > >>adoption rate of the feature is, but I would start with that one. > > > >block/mirror.c is asynchronous, and there's no support for communicating > >checkpoints back to the master. However, the quorum disk driver could > >be what you need. > > > >There's also a series on the mailing list that lets quorum read only > >from the primary, so that quorum can still do replication and fault > >tolerance, but skip fault detection. > > > >Paolo > > > >>There is also a second fault tolerance implementation that works a > >>little differently called > >>"COLO" - you may have seen those emails on the list too, but their > >>method does not require a disk replication solution, if I recall correctly. > > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol
* Hongyang Yang (yan...@cn.fujitsu.com) wrote: > > > ??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??: > >* Yang Hongyang (yan...@cn.fujitsu.com) wrote: > >>+static int do_colo_transaction(MigrationState *s, QEMUFile *control, > >>+ QEMUFile *trans) > >>+{ > >>+int ret; > >>+ > >>+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW); > >>+if (ret) { > >>+goto out; > >>+} > >>+ > >>+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED); > > > >What happens at this point if the slave just doesn't respond? > >(i.e. the socket doesn't drop - you just don't get the byte). > > If the socket return bytes that were not expected, exit. If > socket return error, do some cleanup and quit COLO process. > refer to: colo_ctl_get() and colo_ctl_get_value() But what happens if the slave just doesn't respond at all; e.g. if the slave host loses power, it'll take a while (many seconds) before the socket will timeout. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v3 03/47] Start documenting how postcopy works.
* Hongyang Yang (yan...@cn.fujitsu.com) wrote: > > > ??? 08/28/2014 11:03 PM, Dr. David Alan Gilbert (git) ??: > >From: "Dr. David Alan Gilbert" > > > >Signed-off-by: Dr. David Alan Gilbert > >--- > >+Postcopy can be combined with precopy (i.e. normal migration) so that if > >precopy > >+doesn't finish in a given time the switch is automatically made to precopy. > > I think you mean "automatically made to postcopy" here? Thanks! > >+Source behaviour > >+ > >+Until postcopy is entered the migration stream is identical to normal > >postcopy, > >+except for the addition of a 'postcopy advise' command at the beginning to > >+let the destination know that postcopy might happen. When postcopy starts > > A comma here? Yes, thanks. Dave > > >+the source sends the page discard data and then forms the 'package' > >containing: > >+ > >+ Command: 'postcopy ram listen' > >+ The device state > >+ A series of sections, identical to the precopy streams device state > >stream > >+ containing everything except postcopiable devices (i.e. RAM) > >+ Command: 'postcopy ram run' > >+ > >+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the > >+contents are formatted in the same way as the main migration stream. > >+ > >+Destination behaviour > >+ > >+Initially the destination looks the same as precopy, with a single thread > >+reading the migration stream; the 'postcopy advise' and 'discard' commands > >+are processed to change the way RAM is managed, but don't affect the stream > >+processing. > >+ > >+-- > >+1 2 3 4 5 6 7 > >+main -DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) > >+thread | | > >+ | (page request) > >+ |\___ > >+ v\ > >+listen thread: --- page -- page -- page -- page -- page > >-- > >+ > >+ a bc > >+-- > >+ > >+On receipt of CMD_PACKAGED (1) > >+ All the data associated with the package - the ( ... ) section in the > >+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread > >+recurses into qemu_loadvm_state_main to process the contents of the package > >(2) > >+which contains commands (3,6) and devices (4...) > >+ > >+On receipt of 'postcopy ram listen' - 3 -(i.e. the 1st command in the > >package) > >+a new thread (a) is started that takes over servicing the migration stream, > >+while the main thread carries on loading the package. It loads normal > >+background page data (b) but if during a device load a fault happens (5) the > >+returned page (c) is loaded by the listen thread allowing the main threads > >+device load to carry on. > >+ > >+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the > >destination > >+CPUs start running. > >+At the end of the CMD_PACKAGED (7) the main thread returns to normal > >running behaviour > >+and is no longer used by migration, while the listen thread carries > >+on servicing page data until the end of migration. > >+ > >+=== Postcopy states === > >+ > >+Postcopy moves through a series of states (see postcopy_ram_state) > >+from ADVISE->LISTEN->RUNNING->END > >+ > >+ Advise: Set at the start of migration if postcopy is enabled, even > >+ if it hasn't had the start command; here the destination > >+ checks that its OS has the support needed for postcopy, and > >performs > >+ setup to ensure the RAM mappings are suitable for later postcopy. > >+ (Triggered by reception of POSTCOPY_RAM_ADVISE command) > >+ > >+ Listen: The first command in the package, POSTCOPY_RAM_LISTEN, switches > >+ the destination state to Listen, and starts a new thread > >+ (the 'listen thread') which takes over the job of receiving > >+ pages off the migration stream, while the main thread carries > >+ on processing the blob. With this thread able to process page > >+
Re: [Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol
* Hongyang Yang (yan...@cn.fujitsu.com) wrote: > > > ??? 09/12/2014 07:17 PM, Dr. David Alan Gilbert ??: > >* Hongyang Yang (yan...@cn.fujitsu.com) wrote: > >> > >> > >>??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??: > >>>* Yang Hongyang (yan...@cn.fujitsu.com) wrote: > > > > > > > >>>>+static int do_colo_transaction(MigrationState *s, QEMUFile *control, > >>>>+ QEMUFile *trans) > >>>>+{ > >>>>+int ret; > >>>>+ > >>>>+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW); > >>>>+if (ret) { > >>>>+goto out; > >>>>+} > >>>>+ > >>>>+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED); > >>> > >>>What happens at this point if the slave just doesn't respond? > >>>(i.e. the socket doesn't drop - you just don't get the byte). > >> > >>If the socket return bytes that were not expected, exit. If > >>socket return error, do some cleanup and quit COLO process. > >>refer to: colo_ctl_get() and colo_ctl_get_value() > > > >But what happens if the slave just doesn't respond at all; e.g. > >if the slave host loses power, it'll take a while (many seconds) > >before the socket will timeout. > > It will wait until the call returns timeout error, and then do some > cleanup and quit COLO process. If it was to wait here for ~30seconds for the timeout what would happen to the primary? Would it be stopped from sending any network traffic for those 30 seconds - I think that's too long to fail over. > There may be better way to handle this? In postcopy I always take reads coming back from the destination in a separate thread, because that thread can't block the main thread going out (I originally did that using async reads but the thread is nicer). You could also use something like a poll() with a shorter timeout to however long you are happy for COLO to go before it fails. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH for 2.1] virtio-scsi: fix with -M pc-i440fx-2.0
* Paolo Bonzini (pbonz...@redhat.com) wrote: > Right now starting a machine with virtio-scsi and a <= 2.0 machine type > fails with: > > qemu-system-x86_64: -device virtio-scsi-pci: Property .any_layout not > found > > This is because the any_layout bit was actually never set after > virtio-scsi was changed to support arbitrary layout for virtio buffers. > > (This was just a cleanup and a preparation for virtio 1.0; no guest > actually checks the bit, but the new request parsing algorithms are > tested even with old guest). > > Reported-by: David Gilbert > Signed-off-by: Paolo Bonzini Seems to fix the bug I hit. Dave > --- > include/hw/virtio/virtio-scsi.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h > index 0419ee4..188a2d9 100644 > --- a/include/hw/virtio/virtio-scsi.h > +++ b/include/hw/virtio/virtio-scsi.h > @@ -178,6 +178,8 @@ typedef struct { > DEFINE_PROP_UINT32("cmd_per_lun", _state, _conf_field.cmd_per_lun, 128) > > #define DEFINE_VIRTIO_SCSI_FEATURES(_state, _feature_field) > \ > +DEFINE_PROP_BIT("any_layout", _state, _feature_field, > \ > +VIRTIO_F_ANY_LAYOUT, true), > \ > DEFINE_PROP_BIT("hotplug", _state, _feature_field, > VIRTIO_SCSI_F_HOTPLUG, \ > true), > \ > DEFINE_PROP_BIT("param_change", _state, _feature_field, > \ > -- > 1.9.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK